Hey community, I want to automatically push new data to BigQuery from Google sheets and save each day, instead of linking google sheets as an external table.How can I do?
Here are the steps to automatically transfer new data from Google Sheets to BigQuery daily without using an external table link:
Google Cloud Project Setup
Cloud Function for Data Transfer
import pandas as pd
from google.cloud import bigquery
import gspread
def transfer_data(event, context):
# Authenticate with Google Sheets
gc = gspread.service_account()
# Extract data from Google Sheets
sheet_id = 'YOUR_SHEET_ID'
range_name = 'YOUR_RANGE_NAME'
sheet = gc.open_by_key(sheet_id).worksheet(range_name)
data = sheet.get_all_values()
# Convert data to DataFrame
df = pd.DataFrame(data[1:], columns=data[0])
# Load data to BigQuery
client = bigquery.Client()
project = 'YOUR_PROJECT_ID'
dataset_id = 'YOUR_DATASET_ID'
table_id = 'YOUR_TABLE_ID'
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
load_job = client.load_table_from_dataframe(df, f'{project}.{dataset_id}.{table_id}', job_config=job_config)
# Wait for the load job to complete
load_job.result()
# Raises an exception if the load fails
Schedule with Cloud Scheduler
{
"name": "transfer-data-daily",
"schedule": "0 0 * * *", // Runs daily at 12:00 AM UTC
"target": {
"httpMethod": "POST",
"url": "https://us-central1-<project-id>.cloudfunctions.net/transfer-data"
}
}