We are seeing increased execution times (3x and 4x) after upgrading from Apache Beam 2.44 to Apache Beam 2.50. This happens for both local development (direct runner) and also the Dataflow version on GCP. We are using DataFlow with Java.
Impact is more with Direct Runner though (3X increased execution time) after moving to Apache Beam 2.50
I can only see GKE label but not Dataflow (there seems to be a bug) even though this is related to DataFlow
This is possibly due to latency issue to your API, Also it looks like a known issue is currently in progress for version 2.47 above here.
Mitigation
Until Beam 2.51.0 is released.
consider any of the following workarounds:
Use
apache-beam==2.46.0
or below.
Please see link above for other workaround listed and including 2.51 release milestone here to keep you updated, Alternatively you can file a support case to Google Support here for further investigation: https://cloud.google.com/contact
@nceniza FYI ...
1. The memory leak is related to Python SDK. We are using the Java SDK
2. There is no 3rd party API involved that we are using to fetch data rather the dataflow job is reading a BQ table (input source with 19 records) in both cases 244 and 250. In summary Dataflow job is exactly the same in all respects that runs in 244 and 250 including the input sink and output destination.