Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Error in dataflow mongodb to bigquery batch template

I got this error when running a job in dataflow from a mongodb to bigquery batch template:

"Error message from worker: java.lang.IllegalArgumentException: Unable to encode element 'Document{{_id=1234567890, created_by=org.bson.BsonUndefined@0, created_date=Tue Dec 10 10:20:30 UTC 2010, last_modified_date=Tue Dec 10 12:50:30 UTC 2010, is_deleted=false}}' with coder 'SerializableCoder(org.bson.Document)'.

org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300) org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291) org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:642) org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:558) org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:384) org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:128) org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:67) org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:218) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:169) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83) org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:319) org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:291) org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:221) org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:147) org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:127) org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114) java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) org.apache.beam.sdk.util.UnboundedScheduledExecutorService$ScheduledFutureTask.run(UnboundedScheduledExecutorService.java:163) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) java.base/java.lang.Thread.run(Thread.java:834)

Caused by: java.io.NotSerializableException: org.bson.BsonUndefined

java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185) java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) java.base/java.util.LinkedHashMap.internalWriteEntries(LinkedHashMap.java:333) java.base/java.util.HashMap.writeObject(HashMap.java:1411) java.base/jdk.internal.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.base/java.lang.reflect.Method.invoke(Method.java:566) java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1145) java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497) java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) org.apache.beam.sdk.coders.SerializableCoder.encode(SerializableCoder.java:192) org.apache.beam.sdk.coders.SerializableCoder.encode(SerializableCoder.java:57) org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:297) ... 21 more


What could be the cause?

0 1 579
1 REPLY 1

The error message you are getting is caused by the fact that the org.bson.BsonUndefined class is not serializable. This means that it cannot be passed between different processes or stored in a database.

To fix this error, you can either:

  • Change the org.bson.BsonUndefined class to be serializable.
  • Remove the org.bson.BsonUndefined class from the document that you are trying to pass.

If you are not sure how to change the org.bson.BsonUndefined class to be serializable, you can consult the documentation for the Bson class.

If you need to keep the org.bson.BsonUndefined class in the document, you can remove it from the document before passing it to the Dataflow job. You can do this by using a filter.

For example, the following filter will remove all org.bson.BsonUndefined objects from a document:

Code snippet
 
document.remove("created_by");

Once you have fixed the error, you should be able to run your Dataflow job without any problems.