It looks like Data Fusion requires files read from cloud storage to be newline-delimited, period.
But I have a customer that is sending a file that is a single JSON object with a nested array containing the actual records, which has many newline characters therein.
I have tried different things to get this to work. The docs for the cloud storage data source say that if 'blob' format is used with a 'bytes' output format, it will treat the entire file as a single record. I wish this were true, because then I could use the JavaScript transform to manually break apart the records.
Ive also tried using the 'delimited' format with a junk delimiter value, to try to force it to treat the whole file as 1 record.
The only workaround I have left is to create a 1-off cloud function that extracts the records and writes a file that is properly de-nested and newline-delimited, the way Data Fusion apparently demands it. I find this to be a less than elegant approach.
Please tell me I am missing something and there is an easy DF solution.
Thanks 🙏