Hello,
I'm using Datastream to unload binlog files to Google Cloud Storage. Unfortunately, my source database is encoded in latin1 :
> SHOW VARIABLES LIKE 'character_set_database';
> latin1
> SHOW VARIABLES LIKE 'collation_database';
> latin1_swedish_ci
Even though I tried to start the stream using a "mysql-source-config" that looks like this :
{
"includeObjects": {
"mysqlDatabases": [
{
"database": "my_db",
"mysqlTables": [
{
"table": "my_table",
"mysqlColumns": [
{
"column": "id",
"dataType": "int"
},
{
"column": "some_text_col",
"dataType": "varchar",
"primaryKey": false,
"collation": "latin1_swedish_ci"
}
]
}
]
}
]
},
"excludeObjects": {}
}
The stream fails to read the content of the "some_text_col" column with the following error:
Discarded 2 unsupported events with reason code: MYSQL_DECODE_ERROR. Latest discarded event details: Discarded an event from my_db.my_table: Event Parsing Error: Failed to parse event: === UpdateRowsEvent === Date: 2024-07-03T13:26:08 Log position: 17343280 Event size: 839 Read bytes: 161. Successfully parsed rows: []., caused by: Row Parsing Error: Failed to parse row of table xxx ... [skipping because the full schema is written]
Solved! Go to Solution.
Hello,
Thank you for contacting Google Cloud Community!
Datastream currently doesn't fully support parsing binlog events with latin1 encoding, specifically when encountering characters outside the standard ASCII range. This leads to parsing errors and discarded events.
Unfortunately, there's no perfect solution at this moment, but you can consider:
Upgrade Source Database Encoding:
Consider raising a feature request with Google Cloud. While there's no guarantee of immediate implementation, expressing user demand can help prioritize future support for latin1 encoding in Datastream.
Regards,
Jai Ade
Hello,
Thank you for contacting Google Cloud Community!
Datastream currently doesn't fully support parsing binlog events with latin1 encoding, specifically when encountering characters outside the standard ASCII range. This leads to parsing errors and discarded events.
Unfortunately, there's no perfect solution at this moment, but you can consider:
Upgrade Source Database Encoding:
Consider raising a feature request with Google Cloud. While there's no guarantee of immediate implementation, expressing user demand can help prioritize future support for latin1 encoding in Datastream.
Regards,
Jai Ade
Thanks for answering, we will consider using a supported database encoding.
Cheers