Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Data Fusion JSON Files

It looks like Data Fusion requires files read from cloud storage to be newline-delimited, period. 

 

But I have a customer that is sending a file that is a single JSON object with a nested array containing the actual records, which has many newline characters therein. 

 

I have tried different things to get this to work. The docs for the cloud storage data source say that if 'blob' format is used with a 'bytes' output format, it will treat the entire file as a single record. I wish this were true, because then I could use the JavaScript transform to manually break apart the records.

 

Ive also tried using the 'delimited' format with a junk delimiter value, to try to force it to treat the whole file as 1 record.

 

The only workaround I have left is to create a 1-off cloud function that extracts the records and writes a file that is properly de-nested and newline-delimited, the way Data Fusion apparently demands it. I find this to be a less than elegant approach.

 

Please tell me I am missing something and there is an easy DF solution. 

 

Thanks 🙏

1 1 1,854
1 REPLY 1