Hi,
I am loading data into a big query table as follows
async loadDataToTable(project_id, dataset_id, table_id, uris, source_format, schema) {
const result = new CustomResponse().result;
try {
const options = {
sourceUris: uris,
sourceFormat: source_format,
schema: schema,
location: this.bigquery_location,
ignoreUnknownValues: true,
destinationTable: {
datasetId: dataset_id,
projectId: project_id,
tableId: table_id,
},
};
const [job] = await this.connection.createJob({ configuration: { load: options } });
logInfo(
`Job ${job.id} started ${job.metadata.status.state} for dataset ${dataset_id} for table ${table_id}`,
this.pipe_id,
Events.MONGODB_QUERY,
);
const [metadata] = await job.promise();
// Wait for job to complete
const status = metadata.status;
if (status.state === "DONE") {
logInfo(
`Job ${job.id} done for dataset ${dataset_id} for table ${table_id}`,
this.pipe_id,
Events.MONGODB_QUERY,
);
result.data = {
rows_processed:
metadata.statistics && metadata.statistics.load && metadata.statistics.load.outputRows
? Number(metadata.statistics.load.outputRows)
: 0,
};
} else {
logInfo(
`Job current status: ${status.state} for dataset ${dataset_id} for table ${table_id}`,
this.pipe_id,
Events.MONGODB_QUERY,
);
if (status.errors) {
result.error_occurred = true;
result.data = { rows_processed: 0 };
result.error_info = {
error: JSON.stringify(status.errors),
error_source: "BigqueryService.loadDataToTable",
error_description: `Error occurred while doing load job at BigQuery table ${table_id} in dataset ${dataset_id} for pipe ID: ${this.pipe_id}`,
};
return result;
}
}
} catch (e) {
result.error_occurred = true;
result.data = { rows_processed: 0 };
result.error_info = {
error: e.errors ? JSON.stringify(e.errors) : `${e.name}: ${e.message}`,
error_stack: e.stack,
error_source: "BigqueryService.loadDataToTable",
error_description: `Error occurred while loading data into BigQuery table ${table_id} in dataset ${dataset_id} for pipe ID: ${this.pipe_id}`,
};
}
return result;
}
the problem I am facing is, suppose I have 10 URIs to upload, and like 2 or 3 URIs are having the problem due to which the whole load job is failing, I want to find out which URIs are causing the issue,
I have tried to get the metadata of the job but that does not contain the URIs
I would really appreciate it if someone can tell me the way to do it
To effectively pinpoint and resolve issues causing BigQuery load job failures, adopting a detailed and strategic approach is crucial. Below is a strategy that integrates best practices and addresses potential challenges more effectively:
Utilize Detailed Audit Logs
protoPayload.resourceName="projects/PROJECT_ID/jobs/JOB_ID"
protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob"
Enhance Error Handling in Your Code
loadDataToTable
function. This context is crucial, even if BigQuery error objects don't include the problematic URI directly.Pre-Validation as Prevention
Address Common Errors Strategically
source_format
and provided schema accurately represent your data.Iterative Debugging and Resources
Example Error Messages
Hi @ms4446 thank you for your quick response
the error I am getting is this
ApiError: Invalid timestamp value: -62164498757000000
at new ApiError (/Users/john/work/t1/load/node_modules/@google-cloud/common/build/src/util.js:75:15)
at /Users/john/work/t1/load/node_modules/@google-cloud/bigquery/build/src/job.js:417:23
at/Users/john/work/t1/load/node_modules/@google-cloud/common/build/src/service-object.js:204:13
at/Users/john/work/t1/load/node_modules/@google-cloud/common/build/src/util.js:412:25
at Util.handleResp (/Users/john/work/t1/load/node_modules/@google-cloud/common/build/src/util.js:161:9)
at/Users/john/work/t1/load/node_modules/@google-cloud/common/build/src/util.js:534:22
at onResponse (/Users/john/work/t1/load/node_modules/retry-request/index.js:248:7)
at/Users/john/work/t1/load/node_modules/teeny-request/build/src/index.js:226:13
at processTicksAndRejections (node:internal/process/task_queues:96:5) {
code: undefined,
errors: [
{
reason: 'invalidQuery',
location: 'query',
message: 'Invalid timestamp value: -62164498757000000'
}
],
response: undefined
}
where to find from this error
The error message "ApiError: Invalid timestamp value: -62164498757000000" indicates a problem with a timestamp value in your BigQuery job. This error sheds light on two key areas:
Timestamp Value: The specified value -62164498757000000
is significantly outside the acceptable range for timestamps in BigQuery, which supports years from 0001 to 9999. This particular value often signifies an underflow or an uninitialized date variable in computing, representing a date far in the past.
Error Location: The mention of "location: 'query'" in the error suggests the issue might be within your BigQuery SQL query or the way timestamps are handled or provided in your code.
Where to Investigate
Query Analysis: Examine your BigQuery SQL query for potential issues:
Calculations: Look for date or timestamp calculations that might produce out-of-range values.
Hardcoded Timestamps: Check for any hardcoded timestamp values that may be incorrect.
Function Usage: Verify the correct usage of BigQuery timestamp functions, referring to BigQuery documentation on timestamp functions for guidance.
Data Preparation: If loading data into BigQuery, review how timestamp columns are generated or formatted:
Timestamp Generation: Ensure any code-generated timestamps are within the valid range and logic for calculation is correct.
Data Source Integrity: For data imported from other systems, confirm the format and integrity of timestamp data in the source.
Debugging Tips
Inspect Input Data: Directly inspect rows of data involved in the query or load process to identify any with erroneous timestamps. This can help narrow down the source of the issue.
Isolate the Issue: Break down complex queries into smaller parts to pinpoint where the timestamp manipulation might be going awry.
Review Logs: Utilize logs for additional context around the error. BigQuery's job information and query history can also provide insights into the issue.
Additional Considerations
Ensure that any conversions between time zones are correctly handled, as this can also lead to unexpected timestamp values.
If the data comes from external sources, verify the export or transformation processes to ensure timestamp integrity before import.
For a deeper understanding, consult BigQuery's documentation on handling dates and times, including limitations and best practices for timestamp data.