I am migrating my Dataform pipelines from Web based dataform to Big Query Dataform . In the web based Dataform, if I want to execute the Dataform pipeline through Dataform API Call, the documentation was clear. See attached image/screenshot from here: However I am not getting the corresponding API Service for Big Query Dataform.
I have been looking at this documentation but is not clear as the one for web based Dataform to get the API Call. Please can anyone help?
Ayush
Solved! Go to Solution.
Sorry for the confusion. Dataform API Playground is not yet publicly available.
For testing API requests, you can use other tools like Postman or cURL commands in the terminal. These tools allow you to send HTTP requests to the API endpoints and view the responses.
Here's how you can use Postman to send a request:
Download and Install Postman:
Create a New Request:
Enter Request Details:
Send the Request:
Analyze the Response:
To obtain the compilation_result_id
using Postman:
compilation_result_id
is the value of the id field in each compilation result.Feedback and Suggestions:
Additional Considerations:
{project}
, {location}
, {repository}
) are replaced with your actual project, location, and repository details.Proceed with these additional checks and considerations to ensure a more seamless and secure experience while working with the Dataform API and other related tools. Your continuous effort for improvement and attention to detail is crucial in providing effective and reliable assistance.
While the Dataform API on Google Cloud provides methods to manage and invoke Dataform workflows, if you're looking to execute a Dataform pipeline directly in BigQuery, you might consider the following approach:
To execute the SQL in BigQuery using the BigQuery API:
POST https://bigquery.googleapis.com/v2/projects/<project_id>/jobs
Content-Type: application/json
{
"configuration": {
"query": {
"query": "<actual_sql_query>"
}
}
}
Replace <project_id>
with the ID of your Google Cloud project and <actual_sql_query>
with the SQL generated by your Dataform pipeline.
Note: Ensure the BigQuery API is enabled in your Google Cloud project.
Hi, thanks @ms4446 for the response. I am sorry if my question was not clear. I would like to use Dataform API on Google Cloud that'll invoke my Dataform workflows
Please note I am migrating from Legacy Dataform (see here: https://cloud.google.com/dataform/docs/migration) which was a web app to Dataform in Google Cloud.
In the Legacy Dataform I was using the below API (Again attached). Now, my question is:
In order to invoke the Dataform workflow in the new Dataform o Google Cloud, which API should I be using? Thanks
To invoke a Dataform workflow in the new Dataform on Google Cloud, use the workflowInvocations
resource in the Dataform API.
Specifically, use the create()
method to create a new workflow invocation. The endpoint URL is:
POST https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{repository}/workflowConfigs/{workflowConfig}/workflowInvocations
Replace {project}
, {location}
, {repository}
, and {workflowConfig}
with your actual values.
The request should include an authorization header with a valid access token:
Authorization: Bearer YOUR_ACCESS_TOKEN
Refer to the official documentation for the exact structure of the request body and additional information.
After sending the request to create the workflow invocation, Dataform will start executing the workflow. Monitor the status of the workflow invocation using the get()
method on the workflowInvocations
resource.
Thank you very much @ms4446 , this is very useful. I am relatively new to GCP. Can you confirm the following:
1) {project} - does it mean the project id?
2) {location} - does it mean europewest2 or europewest4 as the location?
3) {repository} - does it mean my GitLab repo url?
4) I am not sure what is {workflowConfig}?
Thanks again
Yes, you are correct:
{project}
refers to your Google Cloud project ID.{location}
refers to the Google Cloud region where your Dataform repository is located. For example, europewest2
.{repository}
refers to the ID of your repository in Google Cloud, not the GitLab URL. You can find this in the Source Repositories page in the Google Cloud Console.{workflowConfig}
refers to the ID of your Dataform workflow configuration in Google Cloud. You can find or create workflow configurations in the Dataform section of the Google Cloud Console.Thanks, few questions:
1) IS this still in beta?
2) Also does {project} denote GCP project id or GCP project number?
3) Unfortunately I cannot find the {repository_id} . The Source repository page doesn't list my dataform repository. I can see my dataform repo as below in the Dataform page as below(I have deleted my project name and GitLab url and repo name). But the repo id does not appear. Is the {repository_id} mandatory?
Yes, the Dataform API is in beta. Please verify this by checking the latest Google Cloud documentation.
{project}
indeed denotes the GCP project ID, not the GCP project number.
The repository ID is mandatory for invoking a Dataform workflow through the API. If you cannot find the repository ID in the Dataform section of the Google Cloud Console, it's crucial to reach out to Google Cloud Support for assistance. The exact steps and URL structures may vary, and the support team can provide the most accurate and up-to-date information.
Note: Ensure the Dataform API is enabled in your GCP project to view your Dataform repository in the relevant sections.
The 404 error with the reason "does not exist" indicates that the Dataform API is unable to find the resource specified in the request URL. This could be due to a number of reasons. To troubleshoot the issue, please consider the following steps and checks:
Verify Repository ID:
Check Repository Status:
User Access:
API Token:
Request Format:
Endpoint URL:
Project and Location in URL:
API Version:
Google Cloud Console Verification:
Refer to API Documentation:
Retry the Request:
1My repository id is correct and I have checked with the repository URL in the Dataform Console. The repository is not deleted. The project id and location are indeed correct
Can you advice what permissions I need to invoke workflows? I am not sure about this
@ms4446 wrote:Verify that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.
I am not sure what you mean by this - sorry - I am using this version as it mentioned in the documentation as we discussed int he first thread
@ms4446 wrote:Use the Dataform API Playground to generate and test requests.
I am not familiar with the Dataform API Playground, can you please advice how to go about testing the request, that will be very useful? Thanks again
To invoke workflows in Dataform:
Permissions:
To verify your permissions:
Testing request in Dataform API Playground:
Note: Verify the URL for the Dataform API Playground from the official sources as the provided URL is a placeholder.
https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{reposi...
{
"name": "{workflow_invocation_name}",
"workflow": {
"name": "{workflow_name}",
"version": 1
},
"compilationResult": "{compilation_result_id}"
}
Replace the following placeholders:
{project}
with the ID of your Google Cloud project{location}
with the region where your Dataform repository is located{repository}
with the ID of your Dataform repository{workflowConfig}
with the ID of your Dataform workflow configuration{workflow_invocation_name}
with the name of your workflow invocation{workflow_name}
with the name of your Dataform workflow{compilation_result_id}
with the ID of your Dataform compilation resultIf successful, a 200 OK status code will appear. Otherwise, analyze the error message in the response.
Additional notes:
Tips:
{project}
, {location}
, etc., in the Dataform Console.Conclusion:
Verify each step, URL, and placeholder against the official documentation to avoid any discrepancies. This comprehensive guide should assist in effectively invoking workflows in Dataform. For further issues, do not hesitate to reach out to official support channels.
Thanks again. Much appreciated This link : https://console.cloud.google.com/dataform/playground does not appear correct. It says url not found. I couldn't find the playground through google search
Sorry for the confusion. Dataform API Playground is not yet publicly available.
For testing API requests, you can use other tools like Postman or cURL commands in the terminal. These tools allow you to send HTTP requests to the API endpoints and view the responses.
Here's how you can use Postman to send a request:
Download and Install Postman:
Create a New Request:
Enter Request Details:
Send the Request:
Analyze the Response:
yes, I realised that. How can I get the {compilation_result_id} - I couldn't find in the Dataform console
To obtain the compilation_result_id
using Postman:
compilation_result_id
is the value of the id field in each compilation result.Feedback and Suggestions:
Additional Considerations:
{project}
, {location}
, {repository}
) are replaced with your actual project, location, and repository details.Proceed with these additional checks and considerations to ensure a more seamless and secure experience while working with the Dataform API and other related tools. Your continuous effort for improvement and attention to detail is crucial in providing effective and reliable assistance.
I'm trying to make a request in the dataform api to execute workflows, but it's returning a 404 error. Check the points mentioned above and they seem to be correct.
I'm using a URL:
https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{reposi...
Can you help me?
Hi @Vitória ,
Can you provide some additional details?
HI!
Thanks.
Here are some suggestions to further troubleshoot:
HI, @ms4446 !
I've been looking at this documentation, but it's not clear... could you help me? I believe are making the right request... but I still keep getting a 404... as if the url doesn't exist.... Here's my initial code:
import requests
import os
project = ""
location = ""
repository = ""
workflowConfig = ""
access_token = "My token generated by the Cloud SDK Shell, linked to my account"
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers)
if response.status_code == 200:
print("Workflow invoked successfully.")
print("status code:", response.json())
else:
print("Failed to invoke workflow.")
print("status code:", response.status_code)
To effectively troubleshoot this issue, follow these comprehensive steps:
URL Structure and Placeholders:
Validate URL Structure: Ensure the URL is structured correctly, replacing placeholders like {project}
, {location}
, {repository}
, and {workflowConfig}
with the actual values specific to your Dataform and Google Cloud setup. Remember, curly braces {}
are placeholders and should not be included in the actual URL.
Verify Placeholder Values: Double-check the accuracy of the values assigned to the placeholders. Confirm that they align with the corresponding resources and configurations within your Dataform and Google Cloud environments.
API Token and Permissions:
API Token Validity: Verify the validity of the API token by checking its expiration date and ensuring it has not been revoked.
Permission Scope: Confirm that the API token is generated with the appropriate scope for Dataform API access. This ensures it has the necessary permissions to execute the intended actions.
Authorization Header Format: Verify that the Authorization
header is correctly formatted: "Authorization": "Bearer {access_token}"
. Replace {}
with your actual access token.
Request Body and Content-Type:
POST Request Body Structure: Ensure the POST request body is structured as required by the Dataform API, including any necessary parameters. Follow the Dataform API documentation guidelines carefully.
Content-Type Header: Set the Content-Type
header to application/json
. This indicates that the request body contains JSON data.
Resource Existence and Permissions:
Repository and Workflow Configuration Existence: Confirm the existence of the specified repository and workflow configuration in the Google Cloud Console. Verify their correct naming and accessibility.
User Account Permissions: Check that the user account linked to the API token has the required permissions to access the specified repository and workflow configuration. Ensure it has the necessary privileges to perform the intended actions.
Debugging and Additional Checks:
Implement Logging: Implement additional logging in your script to capture the full request and response details, including URL, headers, body, and status code. This detailed information can aid in identifying the root cause of the issue.
Postman Testing: Test the API request manually using a tool like Postman with the actual values for your project, location, repository, workflow configuration, and API token. This can help determine whether the issue lies in the code or the API.
Dataform API Documentation: Regularly consult the latest Dataform API documentation for any updates or changes in endpoints, parameters, or authentication methods. Stay informed about any modifications that may affect your code.
Google Cloud Console Configurations:
Contacting Support: