Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform BQ API

I am migrating my Dataform pipelines from Web based dataform to Big Query Dataform . In the web based Dataform, if I want to execute the Dataform pipeline through Dataform API Call, the documentation was clear. See attached image/screenshot from here: However I am not getting the corresponding API Service for Big Query Dataform. 

I have been looking at this documentation but is not clear as the one for web based Dataform to get the API Call. Please can anyone help?

Ayush

Capture.GIF

Solved Solved
0 20 4,449
2 ACCEPTED SOLUTIONS

Sorry for the confusion. Dataform API Playground is not yet publicly available. 

For testing API requests, you can use other tools like Postman or cURL commands in the terminal. These tools allow you to send HTTP requests to the API endpoints and view the responses.

Here's how you can use Postman to send a request:

  1. Download and Install Postman:

    • If you don’t have Postman installed, you can download it from the official website: Postman
  2. Create a New Request:

    • Open Postman and create a new request by clicking the “New” button and selecting “Request.”
  3. Enter Request Details:

    • Enter the request details, such as the HTTP method (POST), the API URL, headers, and the request body.
  4. Send the Request:

    • Click the “Send” button to send the request. Postman will display the API response below.
  5. Analyze the Response:

    • Analyze the response to check if the request was successful or if there were any errors.

View solution in original post

To obtain the compilation_result_id using Postman:

  1. Open Postman and create a new request.
  2. Set the method to GET.
  3. Set the URL to:
  4. Replace the placeholders with the appropriate values.
  5. In the Authorization header, add your API token.
  6. Click the Send button.
  7. The response will contain a list of all compilation results for the repository. The compilation_result_id is the value of the id field in each compilation result.

Feedback and Suggestions:

  • API Endpoint: Verify the API endpoint URL with the official Dataform documentation or Google Cloud documentation to ensure it is correct and accessible with the appropriate permissions and API token.
  • API Version: Ensure that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.
  • API Token: Make sure the API token used in the Authorization header has the necessary permissions to access the compilation results.
  • Error Handling: If you encounter any errors or issues, carefully review the error messages as they often provide clues about the problem. Check the API endpoint, headers, and other request details to ensure they are correct.
  • Official Documentation and Support: For the most accurate and reliable information, always refer to the official documentation and consider reaching out to Google Cloud Support for assistance.

Additional Considerations:

  • Endpoint Availability: Ensure that the API endpoint is available for your subscription or plan. Some endpoints may be restricted to certain subscription levels.
  • Placeholder Replacement: Ensure that all placeholders in the URL ({project}, {location}, {repository}) are replaced with your actual project, location, and repository details.
  • Support Channels: Don’t hesitate to use other support channels like community forums or your dedicated support contact for more personalized assistance.
  • Security: Handle your API tokens and other sensitive credentials with utmost security. Ensure they are stored and transmitted securely, and are not exposed to unauthorized parties.

Proceed with these additional checks and considerations to ensure a more seamless and secure experience while working with the Dataform API and other related tools. Your continuous effort for improvement and attention to detail is crucial in providing effective and reliable assistance.

View solution in original post

20 REPLIES 20

While the Dataform API on Google Cloud provides methods to manage and invoke Dataform workflows, if you're looking to execute a Dataform pipeline directly in BigQuery, you might consider the following approach:

  1. Extract the SQL generated by your Dataform pipeline.
  2. Use the BigQuery API to execute this SQL.

To execute the SQL in BigQuery using the BigQuery API:

 

POST https://bigquery.googleapis.com/v2/projects/<project_id>/jobs
Content-Type: application/json
{
  "configuration": {
    "query": {
      "query": "<actual_sql_query>"
    }
  }
}

Replace <project_id> with the ID of your Google Cloud project and <actual_sql_query> with the SQL generated by your Dataform pipeline.

Note: Ensure the BigQuery API is enabled in your Google Cloud project.

Hi, thanks @ms4446 for the response. I am sorry if my question was not clear. I would like to use  Dataform API on Google Cloud that'll invoke my Dataform workflows

Please note I am migrating from Legacy Dataform (see here: https://cloud.google.com/dataform/docs/migration) which was a web app to Dataform in Google Cloud.

In the Legacy Dataform I was using the below API (Again attached). Now, my question is:

In order to invoke the Dataform workflow in the new Dataform o Google Cloud, which API should I be using? Thanks

 

Capture.GIF

To invoke a Dataform workflow in the new Dataform on Google Cloud, use the workflowInvocations resource in the Dataform API.

Specifically, use the create() method to create a new workflow invocation. The endpoint URL is:


POST https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{repository}/workflowConfigs/{workflowConfig}/workflowInvocations

Replace {project}, {location}, {repository}, and {workflowConfig} with your actual values.

The request should include an authorization header with a valid access token:

 

Authorization: Bearer YOUR_ACCESS_TOKEN

Refer to the official documentation for the exact structure of the request body and additional information.

After sending the request to create the workflow invocation, Dataform will start executing the workflow. Monitor the status of the workflow invocation using the get() method on the workflowInvocations resource.

Thank you very much @ms4446 , this is very useful. I am relatively new to GCP. Can you confirm the following:

1) {project} - does it mean the project id?

2) {location} - does it mean europewest2 or europewest4 as the location?

3) {repository} - does it mean my GitLab repo url? 

4) I am not sure what is {workflowConfig}? 

Thanks again

Yes, you are correct:

  • {project} refers to your Google Cloud project ID.
  • {location} refers to the Google Cloud region where your Dataform repository is located. For example, europewest2.
  • {repository} refers to the ID of your repository in Google Cloud, not the GitLab URL. You can find this in the Source Repositories page in the Google Cloud Console.
  • {workflowConfig} refers to the ID of your Dataform workflow configuration in Google Cloud. You can find or create workflow configurations in the Dataform section of the Google Cloud Console.

Thanks, few questions:

1) IS this still in beta?

2) Also does {project} denote GCP project id or GCP project number?

3) Unfortunately I cannot find the {repository_id} . The Source repository page doesn't list my dataform repository. I can see my dataform repo as below in the Dataform page as below(I have deleted my project name and GitLab url and repo name). But the repo id does not appear. Is the {repository_id} mandatory?

dataform-repo.gif

Yes,  the Dataform API is in beta. Please verify this by checking the latest Google Cloud documentation.

{project} indeed denotes the GCP project ID, not the GCP project number.

The repository ID is mandatory for invoking a Dataform workflow through the API. If you cannot find the repository ID in the Dataform section of the Google Cloud Console, it's crucial to reach out to Google Cloud Support for assistance. The exact steps and URL structures may vary, and the support team can provide the most accurate and up-to-date information.

Note: Ensure the Dataform API is enabled in your GCP project to view your Dataform repository in the relevant sections.

The 404 error with the reason "does not exist" indicates that the Dataform API is unable to find the resource specified in the request URL. This could be due to a number of reasons. To troubleshoot the issue, please consider the following steps and checks:

Verify Repository ID:

  • Ensure that the repository ID is correct. You can find the repository ID in the Dataform section of the Google Cloud Console.
  • Check the repository URL in the Dataform Console to confirm the repository ID.

Check Repository Status:

  • Confirm that the repository has not been deleted and is accessible to the user making the request.
  • Check the repository permissions to ensure the user has the necessary access to invoke workflows.

User Access:

  • Verify that the user making the request has the necessary access permissions to the repository.
  • Check the IAM & Admin section of the Google Cloud Console to verify user permissions.

API Token:

  • Ensure that the api_token is valid by trying to authenticate to the Dataform API using the token.
  • Generate a new API token if the existing one is not working.

Request Format:

  • Confirm that the run_create_request is formatted correctly, referring to the Dataform API documentation for the correct request format.
  • Use the Dataform API Playground to generate and test requests.

Endpoint URL:

  • Ensure that the dataform_project_url is correctly spelled and follows the exact structure expected by the Dataform API.
  • Verify the URL structure matches the example provided in the documentation.

Project and Location in URL:

  • Double-check that the project and location in the URL are correct and correspond to the actual project ID and location where the Dataform repository is hosted.
  • Select the correct project and location in the Dataform Console to generate the correct URL.

API Version:

  • Verify that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.
  • Use the latest API version in the documentation to ensure compatibility.

Google Cloud Console Verification:

  • Directly verify the existence and accessibility of the repository in the Google Cloud Console to ensure it hasn’t been inadvertently deleted or moved.
  • Select the repository in the Dataform Console to confirm it exists.

Refer to API Documentation:

  • Continuously refer back to the official API documentation to ensure all parameters and the URL structure are correct.
  • Use the documentation to verify the request body, URL parameters, and headers.

Retry the Request:

  • Attempt making the request again at a later time in case of a temporary outage or issue with the Dataform API.
  • Try making the request from a different network or location.

1My repository id is correct and I have checked with the repository URL in the Dataform Console. The repository is not deleted. The project id and location are indeed correct

Can you advice what permissions I need to invoke workflows? I am not sure about this

 

 


@ms4446 wrote:

Verify that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.


 

I am not sure what you mean by this - sorry - I am using this version as it mentioned in the documentation as we discussed int he first thread


@ms4446 wrote:

Use the Dataform API Playground to generate and test requests.


 

I am not familiar with the Dataform API Playground, can you please advice how to go about testing the request, that will be very useful? Thanks again

To invoke workflows in Dataform:

Permissions:

  • Ensure you have the Dataform Editor role on the repository. Refer to the Dataform or Google Cloud documentation to verify this role and its permissions for invoking workflows.
  • Confirm you have the BigQuery Job User role on the project where the Dataform repository is hosted. Refer to the official documentation to verify this role and its permissions.

To verify your permissions:

  1. Go to the IAM & Admin section of the Google Cloud Console.
  2. Click the Roles tab.
  3. Ensure your user account is listed under the Members section for each role.

Testing request in Dataform API Playground:

Note: Verify the URL for the Dataform API Playground from the official sources as the provided URL is a placeholder.

  1. Go to the Dataform API Playground.
    1. Go to the Dataform API Playground: https://console.cloud.google.com/dataform/playground
    2. Click the Workflow Invocations tab.
    3. Enter the following information:
    4. Click the Test button.

      Replace the following placeholders:

      • {project} with the ID of your Google Cloud project
      • {location} with the region where your Dataform repository is located
      • {repository} with the ID of your Dataform repository
      • {workflowConfig} with the ID of your Dataform workflow configuration
      • {workflow_invocation_name} with the name of your workflow invocation
      • {workflow_name} with the name of your Dataform workflow
      • {compilation_result_id} with the ID of your Dataform compilation result

If successful, a 200 OK status code will appear. Otherwise, analyze the error message in the response.

Additional notes:

  • Ensure the API token in the Authorization header has the necessary permissions.
  • For continuous issues, refer to the error message or contact Dataform or Google Cloud support.

Tips:

  • Find values like {project}, {location}, etc., in the Dataform Console.
  • Generate an API token in the Google Cloud Console.
  • Use tools like Postman for testing requests outside the Dataform API Playground.

Conclusion:

Verify each step, URL, and placeholder against the official documentation to avoid any discrepancies. This comprehensive guide should assist in effectively invoking workflows in Dataform. For further issues, do not hesitate to reach out to official support channels.

Thanks again. Much appreciated This link : https://console.cloud.google.com/dataform/playground does not appear correct. It says url not found. I couldn't find the playground through google search

Sorry for the confusion. Dataform API Playground is not yet publicly available. 

For testing API requests, you can use other tools like Postman or cURL commands in the terminal. These tools allow you to send HTTP requests to the API endpoints and view the responses.

Here's how you can use Postman to send a request:

  1. Download and Install Postman:

    • If you don’t have Postman installed, you can download it from the official website: Postman
  2. Create a New Request:

    • Open Postman and create a new request by clicking the “New” button and selecting “Request.”
  3. Enter Request Details:

    • Enter the request details, such as the HTTP method (POST), the API URL, headers, and the request body.
  4. Send the Request:

    • Click the “Send” button to send the request. Postman will display the API response below.
  5. Analyze the Response:

    • Analyze the response to check if the request was successful or if there were any errors.

yes, I realised that. How can I get the {compilation_result_id} - I couldn't find in the Dataform console

To obtain the compilation_result_id using Postman:

  1. Open Postman and create a new request.
  2. Set the method to GET.
  3. Set the URL to:
  4. Replace the placeholders with the appropriate values.
  5. In the Authorization header, add your API token.
  6. Click the Send button.
  7. The response will contain a list of all compilation results for the repository. The compilation_result_id is the value of the id field in each compilation result.

Feedback and Suggestions:

  • API Endpoint: Verify the API endpoint URL with the official Dataform documentation or Google Cloud documentation to ensure it is correct and accessible with the appropriate permissions and API token.
  • API Version: Ensure that the API version in the URL (v1beta1) is the version you intend to use, as different versions might have different URL structures and parameters.
  • API Token: Make sure the API token used in the Authorization header has the necessary permissions to access the compilation results.
  • Error Handling: If you encounter any errors or issues, carefully review the error messages as they often provide clues about the problem. Check the API endpoint, headers, and other request details to ensure they are correct.
  • Official Documentation and Support: For the most accurate and reliable information, always refer to the official documentation and consider reaching out to Google Cloud Support for assistance.

Additional Considerations:

  • Endpoint Availability: Ensure that the API endpoint is available for your subscription or plan. Some endpoints may be restricted to certain subscription levels.
  • Placeholder Replacement: Ensure that all placeholders in the URL ({project}, {location}, {repository}) are replaced with your actual project, location, and repository details.
  • Support Channels: Don’t hesitate to use other support channels like community forums or your dedicated support contact for more personalized assistance.
  • Security: Handle your API tokens and other sensitive credentials with utmost security. Ensure they are stored and transmitted securely, and are not exposed to unauthorized parties.

Proceed with these additional checks and considerations to ensure a more seamless and secure experience while working with the Dataform API and other related tools. Your continuous effort for improvement and attention to detail is crucial in providing effective and reliable assistance.

I'm trying to make a request in the dataform api to execute workflows, but it's returning a 404 error. Check the points mentioned above and they seem to be correct.

I'm using a URL:
https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{reposi... 
Can you help me?
 

Hi @Vitória ,

Can you provide some additional details?

  • Please provide the complete endpoint URL you are using for the request. Ensure that it is correctly formatted and includes the necessary parameters like project ID, location, repository ID, and workflow configuration ID.
  • What HTTP method are you using (e.g., POST, GET)?
  • Include the exact error message you are receiving. A 404 error typically indicates a "Not Found" response, but the exact wording can sometimes provide more clues.
  • Confirm that the API token you are using is valid and has the necessary permissions. (Do not share the token itself.)
  • What environment are you using to make the API request (e.g., a specific IDE, Postman, a script in a certain programming language)?
  • Have there been any recent changes in your Dataform project or Google Cloud setup that might affect this?
  • Are you following a specific part of the Dataform API documentation? If so, please specify which part.

HI!

Thanks.

Here are some suggestions to further troubleshoot:

  1. The URL you provided seems to be correctly formatted for invoking a workflow in Dataform. However, double-check for any typos or missing characters.
  2. The method you're using to generate the API token (gcloud auth print-access-token) is generally correct. Ensure that the account associated with this token has the necessary permissions to access the Dataform API and the specific resources (repository, workflowConfig) in your project.
  3. Since you're using a POST method, ensure that your request headers are set correctly. Typically, you would need to include Content-Type: application/json and Authorization: Bearer [YOUR_ACCESS_TOKEN].
  4. Check if the POST request requires a specific body. The body structure should match the requirements as per the Dataform API documentation.
  5. In VSCode using Python, ensure that your environment is correctly set up to make HTTP requests. If you're using the requests library, ensure it's correctly installed and imported in your script.
  6. Verify the existence and accessibility of the repository and workflow configuration in the Google Cloud Console. Ensure that the repository Novo_ERP and the workflow configuration exec_xrt_dm exist and are correctly named.
  7. Add additional logging to your script to print out the full request and response. This can sometimes provide more insights into what might be going wrong.

HI, @ms4446 !

I've been looking at this documentation, but it's not clear... could you help me? I believe are making the right request... but I still keep getting a 404... as if the url doesn't exist.... Here's my initial code: 

import requests
import os

project = "" 
location = ""
repository = ""
workflowConfig = ""

access_token = "My token generated by the Cloud SDK Shell, linked to my account"

headers = {
   "Authorization": f"Bearer {access_token}",
   "Content-Type": "application/json"
   }

url = f"https://dataform.googleapis.com/v1beta1/projects/{project}/locations/{location}/repositories/{reposi...

response = requests.post(url, headers=headers)

if response.status_code == 200:
   print("Workflow invoked successfully.")
   print("status code:", response.json())
else:
   print("Failed to invoke workflow.")
   print("status code:", response.status_code)

 

To effectively troubleshoot this issue, follow these comprehensive steps:

  1. URL Structure and Placeholders:

    • Validate URL Structure: Ensure the URL is structured correctly, replacing placeholders like {project}, {location}, {repository}, and {workflowConfig} with the actual values specific to your Dataform and Google Cloud setup. Remember, curly braces {} are placeholders and should not be included in the actual URL.

    • Verify Placeholder Values: Double-check the accuracy of the values assigned to the placeholders. Confirm that they align with the corresponding resources and configurations within your Dataform and Google Cloud environments.

  2. API Token and Permissions:

    • API Token Validity: Verify the validity of the API token by checking its expiration date and ensuring it has not been revoked.

    • Permission Scope: Confirm that the API token is generated with the appropriate scope for Dataform API access. This ensures it has the necessary permissions to execute the intended actions.

    • Authorization Header Format: Verify that the Authorization header is correctly formatted: "Authorization": "Bearer {access_token}". Replace {} with your actual access token.

  3. Request Body and Content-Type:

    • POST Request Body Structure: Ensure the POST request body is structured as required by the Dataform API, including any necessary parameters. Follow the Dataform API documentation guidelines carefully.

    • Content-Type Header: Set the Content-Type header to application/json. This indicates that the request body contains JSON data.

  4. Resource Existence and Permissions:

    • Repository and Workflow Configuration Existence: Confirm the existence of the specified repository and workflow configuration in the Google Cloud Console. Verify their correct naming and accessibility.

    • User Account Permissions: Check that the user account linked to the API token has the required permissions to access the specified repository and workflow configuration. Ensure it has the necessary privileges to perform the intended actions.

  5. Debugging and Additional Checks:

    • Implement Logging: Implement additional logging in your script to capture the full request and response details, including URL, headers, body, and status code. This detailed information can aid in identifying the root cause of the issue.

    • Postman Testing: Test the API request manually using a tool like Postman with the actual values for your project, location, repository, workflow configuration, and API token. This can help determine whether the issue lies in the code or the API.

    • Dataform API Documentation: Regularly consult the latest Dataform API documentation for any updates or changes in endpoints, parameters, or authentication methods. Stay informed about any modifications that may affect your code.

  6. Google Cloud Console Configurations:

    • Double-check Configurations: Carefully review all configurations in the Google Cloud Console related to Dataform and API access. Ensure they are set up correctly and aligned with the requirements for your specific use case.
  7. Contacting Support:

    • Seek Assistance: If the issue persists after these checks,consider reaching out to Google Cloud Support for further assistance.