Hi @,
I need to insert/update data received from another saas based application to GCP bigquery, could you suggest me best cost effective, loosely coupled and synchronous way of development. I need to send back status of the insertion/updation to saas based application. could you suggest me the best approach to achieve this.
Thanks
Sure, here are some options for you to consider:
Google Cloud Pub/Sub:
Google Cloud Dataflow:
If you're looking for a cost-effective, loosely coupled, and relatively synchronous solution, combining Google Cloud Pub/Sub with Dataflow is likely your best bet. This combination allows for real-time data ingestion into BigQuery and can provide timely feedback to the SaaS application.
Hi @ms4446,
Thanks for the input, In both the approaches,When it comes returning the status to the saas based application is not real-time( synchronous).
Basically here Saas based application is invoking rest endpoints, intern these endpoints will have to insert/update to Bigquery as soon as it inserts to Bigquery table and rest endpoint has to return 200/201 http status to saas app, But If I go with above 2 approaches I am not sure when the data will be available in Bigquery table( basically there will be small delay).
If I go with GCP bigquery insert/update apis, When It comes to update, lets assume that if table has around million records, to update it has to select the record based on some unique key and update. so in this situation will there be more charge involved as it is egress to Bigquery?
You are right, the Pub/Sub and Dataflow approaches do not guarantee a real-time response. There might be a delay before the data is available in BigQuery, even after the rest endpoint returns a 200/201 status code.
If you require a truly synchronous response, you can opt for the GCP BigQuery insert/update APIs. However, it's essential to clarify that using the BigQuery API for updates within GCP does not incur "egress" charges. Instead, you'll be charged based on the amount of data processed during the update. For instance, updating a few records might have minimal costs, but updating millions could be more significant due to BigQuery's approach to updates (old records are marked as deleted, and new records are written).
The best approach will depend on your specific requirements and budget:
If real-time feedback is essential and you're prepared for the associated costs, the BigQuery insert/update APIs are suitable. However, if you can accommodate a slight delay, Pub/Sub combined with Dataflow might be more cost-effective in the long run.
Hey ms4446 Thanks for the detailed answer, While implementing I got another doubt that,
is it avoiding duplicates in the Big query table is better or getting the latest record form the duplicates in the table is better with respect to performance and cost? - Here we are inserting to stage table before it goes to target table .
again this question is with Java insert/update APIs with bigquery.
Whether it is better to avoid duplicates in a BigQuery table or retrieve the latest record from duplicates depends on your specific requirements and use case.
Avoiding Duplicates:
Getting the Latest Record:
Additional Considerations:
In conclusion, if performance, cost-efficiency, and data cleanliness are priorities, avoiding duplicates might be the preferred approach. However, if simplicity and ensuring access to the most recent data are more important, then designing the system to easily retrieve the latest record could be beneficial. Your final decision should align with your specific requirements and use case.
Hi ms4446,
Do java insert/update/select apis have limitations/quota with Bigquery query api execution in numbers of query execution like insertions, updations or select query, If yes what are the limitations? Thanks
Hi @dharma ,
Yes, there are some limitations and quota restrictions associated with the Java insert/update/select APIs and BigQuery query API execution. These limitations are designed to protect the stability and performance of the BigQuery service.
Insert/Update/Select API Limitations:
BigQuery Query API Execution Limitations:
Quota Restrictions:
Tips for optimizing your queries and using the BigQuery API efficiently: