Hi - Was wondering if anyone could offer some help/guidance on an issue I'm hitting. New to GCP and Dataflow. I'm trying to run a DataFlow job for HL7v2 to FHIR pipeline from healthcare data harmonization repo (https://github.com/GoogleCloudPlatform/healthcare-data-harmonization-dataflow#how-to-run).
The required infra is setup, the Jar for the pipeline builds fine now and also running the pipeline as above (with subnet added) initially runs fine and shows a running "streaming" job on the dataflow jobs page. However it never reads or processes data from my Pub/Sub subscription. when I interrogate Logs Explorer I see the error below (which doesn't show on the job page).
Any help/pointers appreciated.
Regards
{
"insertId": "s=abfde3644eb5496f900e02c8eb92d71f;i=1b40;b=7cc8066fe1f14c9db038e06c130dccbb;m=6d30b1bc4;t=5eeb2c42b8840;x=9d0faff6d91c33e6",
"jsonPayload": {
"message": "\"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"java-streaming\\\" with ImagePullBackOff: \\\"Back-off pulling image \\\\\\\"gcr.io/cloud-dataflow/v1beta3/beam-java17-streaming:2.35.0\\\\\\\"\\\"\" pod=\"default/df-hl7v2tofhirstreamingrunne-11300031-q89m-harness-53dh\" podUID=6016f04e50d7b3cf9be7c8c30b439480",
"line": "pod_workers.go:918",
"thread": "775"
},
"resource": {
"type": "dataflow_step",
"labels": {
"project_id": "ndr-discovery",
"region": "europe-west2",
"job_id": "2022-11-30_00_31_32-12431633722710717950",
"job_name": "hl7v2tofhirstreamingrunner-gigdhc0cl0eu125435-1130083122-2b0132c9",
"step_id": ""
}
},
"timestamp": "2022-11-30T16:40:46.329880Z",
"severity": "ERROR",
"labels": {
"compute.googleapis.com/resource_name": "hl7v2tofhirstreamingrunne-11300031-q89m-harness-53dh",
"dataflow.googleapis.com/log_type": "system",
"dataflow.googleapis.com/region": "europe-west2",
"compute.googleapis.com/resource_type": "instance",
"compute.googleapis.com/resource_id": "7444062325900487659",
"dataflow.googleapis.com/job_id": "2022-11-30_00_31_32-12431633722710717950",
"dataflow.googleapis.com/job_name": "hl7v2tofhirstreamingrunner-gigdhc0cl0eu125435-1130083122-2b0132c9"
},
"logName": "projects/ndr-discovery/logs/dataflow.googleapis.com%2Fkubelet",
"receiveTimestamp": "2022-11-30T16:40:50.934513594Z"
}
Solved! Go to Solution.
Sorry Raj, only just seen this. It was a JDK issue if I recall, I was building with too new a version of the jdk and that was causing an issue when starting the 2.35.0 dataflow image. I had a separate response as below:
I understand that you have an issue where the pipeline looks like it's running but not writing to storage. Let me know if I misunderstood.
After investigating your issue I have found that the error you are getting is due to using the Java Development Kit on version 17. A possible solution is to change the Java version.
To change the Java version do: “sudo update-alternatives --config java”. Then select the JDK 11 version by introducing the corresponding number (in my case I entered ‘1’) and then press enter.
To change the Java compiler version do: “sudo update-alternatives --config javac”. Again, select the JDK 11 version by introducing the corresponding number (in my case I entered ‘1’ as in the previous case) and then press enter.