Importing Google Sheets into a BigQuery Table - How to enable metadata caching?

Hey!

I'm trying to import data from google sheets into a big query table. I used the URL method and I was able to get the data. But it looks like I'm not able to store the metadata and because of this, I can't use this table with any analytical tool like ThoughtSpot. Is there a workaround for this? How can I get data from Google Sheets onto big query and then use that table elsewhere?
Basically, I think I need to enable metadata caching, how do I do that?
Thank you in advance 😄

1 6 883
6 REPLIES 6

You're right in focusing on the importance of metadata for using BigQuery tables with analytical tools like ThoughtSpot. However, the concept of "metadata caching" as you described doesn't apply in the context of BigQuery. Instead, BigQuery automatically manages metadata, such as the table schema, as part of its standard functionality. Here's how you can effectively use data from Google Sheets in BigQuery and ensure compatibility with tools like ThoughtSpot:

  1. Import Data from Google Sheets to BigQuery:

    • If you've already imported data from Google Sheets into BigQuery using the URL method, ensure that the data and its schema (metadata) are correctly defined in BigQuery.
  2. Schema and Metadata Management:

    • BigQuery automatically stores and manages metadata for each table. When you create a table in BigQuery (or import data), the schema is part of this metadata. Ensure that this schema correctly represents your data for effective use in analytical tools.
  3. Integration with Analytical Tools:

    • Connect ThoughtSpot (or any other analytical tool) to your BigQuery dataset. This usually involves configuring the data source settings within the tool to access your BigQuery tables.
    • The tool will read the data and metadata directly from BigQuery. Make sure that ThoughtSpot is set up to recognize and use the schema defined in BigQuery.
  4. Refreshing Data from Google Sheets:

    • If your Google Sheets data is updated regularly, set up a mechanism to refresh this data in BigQuery. This can be done through scheduled queries in BigQuery or using automation tools like Google Cloud Functions.
  5. Using External Tables (Optional):

    • An alternative approach is to create an external table in BigQuery that points to your Google Sheets data. This allows you to query the data directly without importing it.
    • If you use this method, remember that the external table will reflect the current state of your Google Sheets data, including any schema changes.
  6. Testing and Validation:

    • After setting up, test the integration by running queries in ThoughtSpot to ensure it reflects the data and schema correctly from your BigQuery table.

In summary, focus on ensuring that your BigQuery table's schema correctly represents your data and that ThoughtSpot is properly configured to read this schema. BigQuery handles metadata management automatically, so there's no need for a separate metadata caching mechanism.

I am having the same issue and question. I also use ThoughtSpot and have directly linked the google sheet into google big query and linked ThoughtSpot to google big query. This does work but the issue is the loading time of ThoughtSpot now. I worked with the tech team at ThoughtSpot and found that I had to create a copy table in the data set and then insert the table from the connect google sheet into the new copy table I created. This did solve my speed and time out issues. Yet now each week when we get data from our clients I need to delete the copy and recopy the table and re insert the data from the new data that came in that week. Also this was working until now I am trying that process again and now Google Big Query is not loading all of the data from the insert. The data is not matching from the google sheet table to the copy table i have made. So I had to reconnect the original table that is connected to the google sheet back to ThoughtSpot and now having speed and time out issues again. Is there a way to create a better cach on the table that the google sheet is connected to? 

Performance issues commonly arise when utilizing ThoughtSpot with a direct connection to a Google Sheet via BigQuery. These issues stem from the inherent limitations of real-time data synchronization. BigQuery's external data sources, including Google Sheets, are not designed for high-performance querying with large datasets or frequent updates. This can lead to resource strain and potential timeouts.

Current Approach and Its Limitations

The current approach of creating a copy table in BigQuery and manually updating it from the Google Sheets-linked table effectively reduces the load on BigQuery, thereby improving performance. However, this manual process can be time-consuming and tedious.

Proposed Solution: Automated Data Transfer with Google Cloud Functions (GCF)

To address the limitations of manual data transfer, a more efficient solution is to automate the process using a scheduled data pipeline, specifically through Google Cloud Functions (GCF). This automated approach eliminates manual intervention and ensures regular data updates.

Steps to Implement Automated Data Transfer

  1. Create a GCF Function:

    a. Develop a GCF function that performs the following tasks:

    i. Fetches the latest data from the Google Sheet using the Google Sheets API.

    ii. Transforms the data to match the BigQuery schema.

    iii. Inserts the transformed data into the BigQuery copy table.

  2. Schedule the GCF Function:

    a. Utilize Cloud Scheduler to trigger the GCF function at a regular interval (daily, weekly, etc.) that aligns with your data update schedule.

  3. Configure ThoughtSpot:

    a. Connect ThoughtSpot to the BigQuery copy table. This ensures ThoughtSpot utilizes the most up-to-date data, which is refreshed regularly by the GCF function.

Additional Points to Consider

  1. Verify that the data transformation in the GCF function accurately reflects the schema required by ThoughtSpot.

  2. Implement error handling and logging within the GCF function for effective management and troubleshooting.

  3. Conduct thorough testing of the entire pipeline to confirm data integrity and the expected performance improvements.

Benefits of Automated Data Transfer

  1. Eliminates manual intervention and ensures regular data updates.

  2. Reduces strain on BigQuery, improving performance for both ThoughtSpot and other BigQuery users.

  3. Maintains a synchronized and efficient data flow between Google Sheets and BigQuery.

By adopting this automated approach using Google Cloud Functions, you can effectively optimize ThoughtSpot's performance when utilizing data from Google Sheets via BigQuery. This solution eliminates manual data transfer, improves performance, and ensures data consistency.

Is there an video or step by step instructions I can view to build this on my own? I am best when I learn by watching and then doing. 

"a. Utilize Cloud Scheduler to trigger the GCF function at a regular interval (daily, weekly, etc.) that aligns with your data update schedule."

How to trigger the GCF when the Google Sheet is edited ?

To trigger a Google Cloud Function (GCF) when a Google Sheet is edited, you can use Google Apps Script to create a trigger within the sheet that calls the Cloud Function. Here's are some steps you take:

Step 1: Create the Cloud Function

First, ensure your Cloud Function is set up to handle requests:

  • Develop the Function: Write the function in a supported language (like Node.js or Python). The function should process the data as needed and then update your BigQuery table.
  • Deploy the Function: Deploy the function on Google Cloud and note the URL provided for triggering it.

Step 2: Enable Google Sheets API

Make sure that the Google Sheets API is enabled in your Google Cloud project:

  • Go to the Google Cloud Console.
  • Select your project.
  • Navigate to "APIs & Services" > "Dashboard".
  • Click “+ ENABLE APIS AND SERVICES”.
  • Search for "Google Sheets API" and enable it.

Step 3: Create Google Apps Script

Use Google Apps Script to add a script to your Google Sheet that will trigger your Cloud Function when edits are made:

  1. Open your Google Sheet.
  2. Click on “Extensions” > “Apps Script”.
  3. Replace any code in the script editor with something similar to the following:
 
function onEdit(e) {
  var url = 'YOUR_CLOUD_FUNCTION_URL'; // Replace with your Cloud Function URL
  var options = {
    'method' : 'post',
    'contentType': 'application/json',
    'payload' : JSON.stringify(e),
    'muteHttpExceptions': true
  };
  try {
    var response = UrlFetchApp.fetch(url, options);
    Logger.log(response.getContentText());
  } catch (error) {
    Logger.log('Failed to trigger Cloud Function: ' + error.toString());
  }
}
  • Configure the Script: Make sure to replace 'YOUR_CLOUD_FUNCTION_URL' with the actual URL of your deployed Cloud Function.

Step 4: Deploy the Script as a Trigger

  1. In the Apps Script editor, click on the clock icon on the left panel to open "Triggers".
  2. Click “+ Add Trigger” in the bottom right corner.
  3. Set the function to run as onEdit.
  4. Choose "From spreadsheet" for the event source.
  5. Set the event type to "On edit".
  6. Save the trigger.

Step 5: Testing and Permissions

  • Test the Integration: Make an edit in your Google Sheet and check if the Cloud Function is triggered as expected. Look at the logs in both Apps Script and Google Cloud Console.
  • Permissions: Ensure that your Apps Script has the necessary permissions to call external URLs, and that your Cloud Function is configured to accept requests from your Apps Script.