I'm testing dataflow for 1 MongoDB document. It moves through the different steps of the pipeline. In the UI I can see, for a specific step, how many elements (1 in this case) were added and how many (again, 1) were at the output. But, that is not happening when I'm using beam.Flatten. I would expect to return that 1 document. The step after flatten (Export Results), which is a ParDo, shows 0 element at the input.
I'm using Python 3.8 SDK 2.37.0
The issue you're encountering might be due to how the beam.Flatten
operation works. This operation takes multiple collections and flattens them into one collection. Therefore, if you are working with a single MongoDB document and passing it through beam.Flatten
, there might be nothing for beam.Flatten
to do and it may be returning an empty collection.
This is a common misunderstanding of beam.Flatten
. It is used to merge multiple PCollection objects into one, not to flatten the structure of a single object within a PCollection. If you're trying to transform or "flatten" the structure of an individual MongoDB document, you might need to use a different operation.
That being said, without seeing the specifics of your code, it's hard to say definitively what the issue could be. If you could provide more details about your pipeline, especially the part involving beam.Flatten
, it would be much easier to provide a more accurate answer.