Duplicate Rows Across Partitions in Google Ads Tra...

BamChocolate · 11-29-2024 11:59 AM

Hi,

I'm using the Google Ads Data Transfer Service to load data into a partitioned BigQuery table (p_ads_Ad_customer_id). According to the documentation, the transfer overwrites the partition for the same date to avoid duplicate rows, even when using a refresh window. However, I'm observing duplicate rows for the same ad_group_ad_ad_id and ad_group_id across multiple partitions.

Observations:

Partition Behavior:

Rows are appended across multiple _PARTITIONTIME values.
Each duplicate row is identical, with no hidden differences in fields or metadata.
Transfer Configuration:

Single Google Ads Transfer at the Manager Account (MCC) level
Scheduled daily at 01:00 UTC.
Refresh window: 30 days.

The data appears to re-export identical rows into new partitions during the refresh window.

Example:

For a given ad_group_ad_ad_id and ad_group_id, I see identical rows in _PARTITIONTIME values ranging from 2024-11-22 to 2024-11-28.

Question:

Is this behavior expected for the Google Ads Data Transfer Service?
Should the service be deduplicating rows during the refresh window to avoid appending identical data?
Is there a recommended way to address this behavior in BigQuery, or is this something that should be handled within the transfer configuration?

Thanks in advance for any insights or suggestions!

nikacalupas

Hi BamChocolate,

Welcome to the Google Cloud Community!

The duplicate row issue you're encountering is likely not expected behavior, as Google Ads Data Transfer Service is designed to overwrite partitions for the same date to avoid duplicates. Based on the information you’ve provided, it seems that the service might not be handling deduplication as expected during the refresh window. This could be caused by overlapping transfer runs or how the refresh window is configured.

Here are some recommendations that may help resolve your issue:

You may check the job history in Google Cloud to ensure there are no overlapping transfer runs that might be causing duplicates.
You can handle deduplication within BigQuery using SQL queries. For example, you can use a `GROUP BY` clause, ‘ROW_NUMBER()’, ’DISTINCT’ to filter out duplicates based on your key fields (ad_group_ad_ad_id, ad_group_id).
Double - check your transfer configuration to ensure it is set up correctly. Make sure the refresh window and scheduling are configured properly.

If you continue to run into issues, consider reaching out to Google Cloud Support to further check underlying issues. When you contact them, be sure to provide as much detail as possible and include screenshots. This will help them understand your problem better and get it sorted out more quickly.

I hope the above information is helpful.

Duplicate Rows Across Partitions in Google Ads Transfer to BigQuery