Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Understanding the Root Cause of Unexpected System Lag in DataFlow Jobs in gcp

I am reaching out to seek your insights and advice on an issue we recently encountered with our DataFlow jobs running on Google Cloud Platform (GCP).

From March 30th to April 14th, we experienced a noticeable increase in system lag across all our DataFlow jobs. This was particularly surprising because these jobs had been deployed in February without any modifications since. Furthermore, we observed no substantial changes in the volume of incoming data during this period.

This unexpected system lag has raised concerns about potential impact on our data in the future. Thus, we are keen on investigating this issue and understanding its root cause.

As a starting point, I would appreciate it if you could guide me on how best to approach this issue. In particular:

What strategies or tools should I consider using to debug this issue? Is there a way to view or analyze the system performance during the aforementioned dates?

Were there any known issues with GCP during this time frame that could potentially have influenced the performance of DataFlow jobs?

Any insights or pointers that can help us mitigate such issues in the future would be greatly appreciated.

 

image-2023-04-05-15-59-02-946.pngimage-2023-04-05-16-01-10-943.pngScreenshot 2023-04-14 at 16.36.48.png

0 1 626
1 REPLY 1