I'm successfully automatically capturing lineage from BigQuery and I'm able to see the SQL code in the lineage graph when I click on processing nodes.
Now I would like to get the code using the REST API as well as capturing code myself from external systems in a way that it will show up in the code section of a processing step.
I've read the entire Lineage REST API documentation including the process for tracking lineage from external systems and I understand all three steps and what the entities are (create process, run, lineage event). I've used the API to create, list and get responses from every endpoint (except Operations which requires additional permissions) but I haven't seen any examples for providing text in a way that it will show up in the code section. The best I was able to achieve was to set attributes in the process and runs requests to provide code as a process attribute and technical run parameters as runs attributes. Those show up in the BigQuery UI but not in the same way that the SQL code shows up in the gray area with syntax highlighting. The documentation doesn't mention whether this is possible at all, but I would have hoped that one of the attributes includes this code property.
I've also checked the Dataplex labs on Lineage but the examples don't show how to capture the processing code.
I'm suspecting that the mechanism that handles automatically capture lineage code in the lineage UI uses a mechanisms that is not available in the REST API.
1) Is it possible at all to capture code in a way that it shows in the lineage graphs SQL code section using the REST API?
2) Is it possible to extract the code from automatically captured BigQuery lineage using the REST API?
cc @ms4446
Solved! Go to Solution.
Currently, it is not directly possible to capture code in a way that it shows in the lineage graph's SQL code section using the REST API. While you can set code as a process attribute, this will not be displayed in the dedicated SQL code section with syntax highlighting. However, there are two potential workarounds:
Extracting the code from automatically captured BigQuery lineage using the REST API is not supported currently. The lineage information includes details like source and destination tables, timestamps, and transformations, but not the actual SQL code used in the BigQuery job.
Alternative methods to access the code include:
These methods require additional steps and configurations and may involve considerations regarding access permissions and data governance.
Please note: Google Cloud is continuously developing Dataplex and the Data Lineage API. While the current functionality does not support capturing and displaying code snippets directly, this feature might be added in future updates. It is recommended to stay informed by following the Dataplex documentation and release notes for any new features added to the platform.