Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Reg Exp for selecting particular strings from a query

Hello, I would like to know which reg expr can extract relevant schema and table from the following query:

 

Query:

"--select * from vfde-project.schema1.table1
select * 
from vfde-project.schema2.table2;
--select * from vfde-project.schema3.table3
--select * from vfde-project.schema4.table4
select * from INFORMATION_SCHEMA.JOBS_TIMELINE"

 

SELECT 'a'  , REGEXP_EXTRACT_ALL(QUERY'vfde-project[a-zA-Z0-9_.-]+') as Object
FROM b
 
Expected result:
Nested table:
a, schema3.table3
a, schema4.table4
 
the reg expr above not provides this result.

Schemas in lines which are commented out (starting with '--') and Schemas='INFORMATION_SCHEMA' are not relevant and should be excluded from the result.

Thank you very much!

0 6 678
6 REPLIES 6

To extract relevant schema and table names from the query while excluding those that are commented out or part of INFORMATION_SCHEMA, you can use the following regular expression:

 

WITH b AS (
  SELECT 'a' AS query_id, """--select * from vfde-project.schema1.table1
select * 
from vfde-project.schema2.table2;
--select * from vfde-project.schema3.table3
--select * from vfde-project.schema4.table4
select * from INFORMATION_SCHEMA.JOBS_TIMELINE""" AS query
)
SELECT 'a', 
       REGEXP_EXTRACT_ALL(QUERY, r'(?m)^(?!--)\s*select\s+\*\s+from\s+([a-zA-Z0-9_.-]+)\.([a-zA-Z0-9_.-]+)') AS Object
FROM b;

 

Thank you.

if I put it in Big Query it looks like this, see below. It causes an error: 

BeaWeso_0-1723472818709.png

 

Error:Cannot parse regular expression: invalid perl operator: (?!

In Teradata studio sql looks like in you query. How can I put it in Bigquery in a proper way?

I also would like to understand your reqular expression. Can you please recommend me a page where I can see this regular expression literals with their explanation?

Thank you very much, 

Beata

Apologies. The error you encountered is due to specific characters or syntax in the regular expression that are not supported by BigQuery's regex engine. Here is a query that should work without any errors:
 
WITH b AS (
  SELECT 'a' AS query_id, """--select * from vfde-project.schema1.table1
select * 
from vfde-project.schema2.table2;
--select * from vfde-project.schema3.table3
--select * from vfde-project.schema4.table4
select * from INFORMATION_SCHEMA.JOBS_TIMELINE""" AS query
)
SELECT 'a', 
       REGEXP_EXTRACT_ALL(
           query, 
           r'(?m)^(?!--)\s*select\s+\*\s+from\s+([a-zA-Z0-9_.-]+)\.([a-zA-Z0-9_.-]+)'
       ) AS Object
FROM b;

his query will correctly extract the relevant schema and table names while excluding those that are commented out and ignoring INFORMATION_SCHEMA entries.

Hi, thank you very much, but I am still getting the same error:

Cannot parse regular expression: invalid perl operator: (?!

Both reg expr are the same:

Old:
r'(?m)^(?!--)\s*select\s+\*\s+from\s+([a-zA-Z0-9_.-]+)\.([a-zA-Z0-9_.-]+)')
New:
r'(?m)^(?!--)\s*select\s+\*\s+from\s+([a-zA-Z0-9_.-]+)\.([a-zA-Z0-9_.-]+)
Thanks, 
Beata

Give this one a shot:

WITH b AS (
SELECT 'a' AS query_id, """--select * from vfde-project.schema1.table1
select *
from vfde-project.schema2.table2;
--select * from vfde-project.schema3.table3
--select * from vfde-project.schema4.table4
select * from INFORMATION_SCHEMA.JOBS_TIMELINE""" AS query
)

SELECT 'a',
REGEXP_EXTRACT_ALL(
query,
r'vfde-project\.([a-zA-Z0-9_]+)\.([a-zA-Z0-9_]+)'
) AS Object
FROM b
WHERE NOT REGEXP_CONTAINS(query, r'^--')
AND NOT REGEXP_CONTAINS(query, r'INFORMATION_SCHEMA');

Thanks, 

now I got another error:

Regular expressions passed into extraction functions must not have more than 1 capturing group

Thanks,

Beata