Hello, I'm new to Dataflow and have created a pipeline to transfer data from a MongoDB collection to a BigQuery table using MongoDB to BigQuery template. However, I've encountered an issue where data with a foreign key referencing another table in the MongoDB database (of type ObjectId) is missing in the BigQuery table. Can you please explain why this is happening and suggest a solution to fix it?
Solved! Go to Solution.
UPD: Dataflow MongoDB to BigQuery doesn't parse columns with type ObjectId. I fixed it by adding a UDF function, which parses data from MongoDB. Here is the function:
function extractCompanyId(obj) {
if (obj && obj.company && typeof obj.company.$oid !== 'undefined') {
return obj.company.$oid;
} else {
return null;
}
}
function processVehicles(inJson) {
// Parse the input JSON string
var obj = JSON.parse(inJson);
// Extract company ID if present
obj.company_id = extractCompanyId(obj);
// Convert the modified JSON object back to string
return JSON.stringify(obj);
}
By the way, I don't create a schema in the BigQuery table. But I think it still doesn't really help...
I have the data in the MongoDB in the following format:
```
{
"_id": "661405f6d08f928ee5c1923c",
"company": "65e73e48019b0576d763c021",
"color": "black",
"deleted": false,
"createdAt": "2024-04-08T14:57:58.965Z",
"updatedAt": "2024-04-09T08:29:44.743Z",
}
```
Where company is a foreign key.
In the BigQuery I get it like this:
```
```
UPD: Dataflow MongoDB to BigQuery doesn't parse columns with type ObjectId. I fixed it by adding a UDF function, which parses data from MongoDB. Here is the function:
function extractCompanyId(obj) {
if (obj && obj.company && typeof obj.company.$oid !== 'undefined') {
return obj.company.$oid;
} else {
return null;
}
}
function processVehicles(inJson) {
// Parse the input JSON string
var obj = JSON.parse(inJson);
// Extract company ID if present
obj.company_id = extractCompanyId(obj);
// Convert the modified JSON object back to string
return JSON.stringify(obj);
}