Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Issue running healthcare data harmonization pipeline

 

Hi - Was wondering if anyone could offer some help/guidance on an issue I'm hitting. New to GCP and Dataflow. I'm trying to run a DataFlow job for HL7v2 to FHIR pipeline from healthcare data harmonization repo (https://github.com/GoogleCloudPlatform/healthcare-data-harmonization-dataflow#how-to-run).

The required infra is setup, the Jar for the pipeline builds fine now and also running the pipeline as above (with subnet added) initially runs fine and shows a running "streaming" job on the dataflow jobs page. However it never reads or processes data from my Pub/Sub subscription. when I interrogate Logs Explorer I see the error below (which doesn't show on the job page).

Any help/pointers appreciated.

Regards 

 

{
  "insertId": "s=abfde3644eb5496f900e02c8eb92d71f;i=1b40;b=7cc8066fe1f14c9db038e06c130dccbb;m=6d30b1bc4;t=5eeb2c42b8840;x=9d0faff6d91c33e6",
  "jsonPayload": {
    "message": "\"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"java-streaming\\\" with ImagePullBackOff: \\\"Back-off pulling image \\\\\\\"gcr.io/cloud-dataflow/v1beta3/beam-java17-streaming:2.35.0\\\\\\\"\\\"\" pod=\"default/df-hl7v2tofhirstreamingrunne-11300031-q89m-harness-53dh\" podUID=6016f04e50d7b3cf9be7c8c30b439480",
    "line": "pod_workers.go:918",
    "thread": "775"
  },
  "resource": {
    "type": "dataflow_step",
    "labels": {
      "project_id": "ndr-discovery",
      "region": "europe-west2",
      "job_id": "2022-11-30_00_31_32-12431633722710717950",
      "job_name": "hl7v2tofhirstreamingrunner-gigdhc0cl0eu125435-1130083122-2b0132c9",
      "step_id": ""
    }
  },
  "timestamp": "2022-11-30T16:40:46.329880Z",
  "severity": "ERROR",
  "labels": {
    "compute.googleapis.com/resource_name": "hl7v2tofhirstreamingrunne-11300031-q89m-harness-53dh",
    "dataflow.googleapis.com/log_type": "system",
    "dataflow.googleapis.com/region": "europe-west2",
    "compute.googleapis.com/resource_type": "instance",
    "compute.googleapis.com/resource_id": "7444062325900487659",
    "dataflow.googleapis.com/job_id": "2022-11-30_00_31_32-12431633722710717950",
    "dataflow.googleapis.com/job_name": "hl7v2tofhirstreamingrunner-gigdhc0cl0eu125435-1130083122-2b0132c9"
  },
  "logName": "projects/ndr-discovery/logs/dataflow.googleapis.com%2Fkubelet",
  "receiveTimestamp": "2022-11-30T16:40:50.934513594Z"
}

 

Solved Solved
0 4 606
1 ACCEPTED SOLUTION

Sorry Raj, only just seen this. It was a JDK issue if I recall, I was building with too new a version of the jdk and that was causing an issue when starting the 2.35.0 dataflow image. I had a separate response as below:

I understand that you have an issue where the pipeline looks like it's running but not writing to storage. Let me know if I misunderstood.

After investigating your issue I have found that the error you are getting is due to using the Java Development Kit on version 17. A possible solution is to change the Java version. 

To change the Java version do: “sudo update-alternatives --config java”. Then select the JDK 11 version by introducing the corresponding number (in my case I entered ‘1’) and then press enter.
To change the Java compiler version do: “sudo update-alternatives --config javac”. Again, select the JDK 11 version by introducing the corresponding number (in my case I entered ‘1’ as in the previous case) and then press enter.

View solution in original post

4 REPLIES 4