I'm successfully automatically capturing lineage from BigQuery and I'm able to see the SQL code in the lineage graph when I click on processing nodes.
Now I would like to get the code using the REST API as well as capturing code myself from external systems in a way that it will show up in the code section of a processing step.
I've read the entire Lineage REST API documentation including the process for tracking lineage from external systems and I understand all three steps and what the entities are (create process, run, lineage event). I've used the API to create, list and get responses from every endpoint (except Operations which requires additional permissions) but I haven't seen any examples for providing text in a way that it will show up in the code section. The best I was able to achieve was to set attributes in the process and runs requests to provide code as a process attribute and technical run parameters as runs attributes. Those show up in the BigQuery UI but not in the same way that the SQL code shows up in the gray area with syntax highlighting. The documentation doesn't mention whether this is possible at all, but I would have hoped that one of the attributes includes this code property.
I've also checked the Dataplex labs on Lineage but the examples don't show how to capture the processing code.
I'm suspecting that the mechanism that handles automatically capture lineage code in the lineage UI uses a mechanisms that is not available in the REST API.
1) Is it possible at all to capture code in a way that it shows in the lineage graphs SQL code section using the REST API?
2) Is it possible to extract the code from automatically captured BigQuery lineage using the REST API?
cc @ms4446
Solved! Go to Solution.
Currently, it is not directly possible to capture code in a way that it shows in the lineage graph's SQL code section using the REST API. While you can set code as a process attribute, this will not be displayed in the dedicated SQL code section with syntax highlighting. However, there are two potential workarounds:
Extracting the code from automatically captured BigQuery lineage using the REST API is not supported currently. The lineage information includes details like source and destination tables, timestamps, and transformations, but not the actual SQL code used in the BigQuery job.
Alternative methods to access the code include:
These methods require additional steps and configurations and may involve considerations regarding access permissions and data governance.
Please note: Google Cloud is continuously developing Dataplex and the Data Lineage API. While the current functionality does not support capturing and displaying code snippets directly, this feature might be added in future updates. It is recommended to stay informed by following the Dataplex documentation and release notes for any new features added to the platform.
Currently, it is not directly possible to capture code in a way that it shows in the lineage graph's SQL code section using the REST API. While you can set code as a process attribute, this will not be displayed in the dedicated SQL code section with syntax highlighting. However, there are two potential workarounds:
Extracting the code from automatically captured BigQuery lineage using the REST API is not supported currently. The lineage information includes details like source and destination tables, timestamps, and transformations, but not the actual SQL code used in the BigQuery job.
Alternative methods to access the code include:
These methods require additional steps and configurations and may involve considerations regarding access permissions and data governance.
Please note: Google Cloud is continuously developing Dataplex and the Data Lineage API. While the current functionality does not support capturing and displaying code snippets directly, this feature might be added in future updates. It is recommended to stay informed by following the Dataplex documentation and release notes for any new features added to the platform.
Many thanks!
For the moment I have written a little JS script (to be included via a browser extension) that formats and syntax highlights the code if it has been provided as an attribute. If anyone is interested I could share this here but of course this might break over time as the UI changes.
Do I understand you correctly, that with the OpenLineage solution there is no need for another UI but that code should show up in the BQ UI if it has been captured via OpenLineage? I haven't looked too much into OpenLineage and therefore don't know whether it has a concept for code (unlike the Airflow integration). I've looked at the OpenLineage specification and don't see which value to set in order for code to show up in BQ lineage (or Dataplex).
Lineage from external systems using custom JS:
OpenLineage provides a robust framework for capturing and standardizing lineage data across different systems, its integration with BigQuery or Dataplex does not automatically guarantee the display of code snippets in the lineage UI. Achieving this might require additional custom development or extensions, both in the lineage capture mechanism and in the UI where this data is displayed.