Hi, I'm just getting up and running with DataFlow and have been hitting some hiccups along the way. I have a cloud function that is using the `dataflow_v1beta3.FlexTemplateServiceClient` in Python to launch a flex template. I have launched this template using my own credentials from the CLI with a successful run, however when my cloud function successfully launches the template, the job fails with this logged:
Failed to read the result file : gs://dataflow-staging-northamerica-northeast2-31664930760/staging/template_launches/2024-02-13_13_48_32-17616911554794094215/operation_result with error message: (a91b614d0c0827f5): Unable to open template file: gs://dataflow-staging-northamerica-northeast2-31664930760/staging/template_launches/2024-02-13_13_48_32-17616911554794094215/operation_result..
Indeed, there is no file at this path, but I'm not sure what the issue would be.
Some context on how things are set up:
python apache beam pipeline code:
def run():
class MyPipelineOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_argument(
"--input-file",
required=True,
help="Input csv file to process.",
)
parser.add_argument(
"--output-table",
required=True,
help="Output table to write results to.",
)
options = MyPipelineOptions()
with beam.Pipeline(options=options) as p:
template manifest.json:
{
"name": "Airbyte GCS Raw Dump to BigQuery Template",
"description": "Takes raw data extracted to parquet format from Airbyte, transforms and loads it into BigQuery",
"parameters": [
{
"name": "input-file",
"label": "Input file",
"helpText": "The path to the parquet file in GCS",
"regexes": ["^gs:\\/\\/[^\\n\\r]+$"]
},
{
"name": "output-table",
"label": "Output table",
"helpText": "The name of the table to create in BigQuery",
"regexes": ["^[A-Za-z0-9_:.]+$"]
}
]
}
cloud function call to trigger the job:
client = dataflow_v1beta3.FlexTemplatesServiceClient()
parameters = {
"input-file": f"gs://{bucket_name}/{file_name}",
"output-table": output_table,
}
launch_parameter = dataflow_v1beta3.LaunchFlexTemplateParameter(
job_name=job_name,
container_spec_gcs_path=container_spec_gcs_path, # where manifest.json is located in GCS
parameters=parameters,
environment={
"service_account_email": dataflow_service_account,
"temp_location": temp_location,
},
)
request = dataflow_v1beta3.LaunchFlexTemplateRequest(
project_id=project_name,
location=region,
launch_parameter=launch_parameter,
)
response = client.launch_flex_template(request=request) # succeeds, but then the job fails
Solved! Go to Solution.
This error signifies a problem encountered by Google Cloud Dataflow when attempting to generate or access a crucial result file within Google Cloud Storage (GCS). This file is essential for detailing the execution outcomes of a Dataflow Flex Template job. The error can stem from various issues, including but not limited to:
container_spec_gcs_path
accurately points to the manifest.json
file and that the template is correctly formatted and valid.input-file
and output-table
) match the expected data types and constraints defined in your Flex Template is crucial.Troubleshooting Steps
Verify GCS Permissions:
Check File Paths:
container_spec_gcs_path
at the start, ensuring correctness.Validate Template File:
manifest.json
. Ensure that the parameters and their data types, along with any regex constraints, align with the data you're providing.Enable Robust Diagnostics:
Check Job Resources:
Retry with Monitoring:
Contact Google Cloud Support: