Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GCS connector cannot read csv file with german characters

Hello,

I have an integration that reads csv file that are inventory status,

The integration uses a GCS connector to read the file and then uses js code to process each line.

The file is encoded in UTF-8 and contain some "special" character (ä, Ø, ß ...). When I open the file in Wordpad, the content is displayed.

But when I use the connector to get the content in text format (HasByte = False) the integration returns following response from the connector

[{ "Success": "False" }]

If I try to read the file as binary (HasByte=True) there is no issue and the content is returned by the connector.

Is there a limitation in the characters set supported by the connector?

Regards

Philippe

 

Solved Solved
0 5 496
1 ACCEPTED SOLUTION

Hi @phertzog

It seems like the issue you're facing is related to character encoding when using the GCS connector. I’ve tried replicating your problem taking the file and encoding format into consideration. 

While Google documentation does not explicitly mention limitations regarding the character sets supported by the GCS connector, it appears that it can read UTF-8 encoded files, but not ANSI encoded files. 

Here are a few workarounds that may help resolve the problem:

  • Use UTF-8 encoded files: If possible, ensure that your files are encoded in UTF-8 to avoid issues with special characters.
  • Process the file as binary: If you're handling files with non-UTF-8 encodings (like ANSI), you can read them as binary and handle the encoding manually in your code.

If the workarounds above don’t work, you may consider filing a product feature request. Note that I won’t be able to provide the date as to when this will be implemented.

I hope this helps.

View solution in original post

5 REPLIES 5

I made further investigation. Issue happens with ANSI encoded files. If file is UTF8 encoded than the connector reads without error.

Also I tested and there is a workaround. If I read the file as bytes and then map the content to a text variable I can get the content, the non standard characters being replaced by joker.

Don't know if there is a better solution. I cannot force my users to use only UTF8 encoding.

When file content is binary and not a plain text, then hasbytes parameter can be set as true in Download action so that the file can be downloaded in right format

Thanks for the feedback. My file is a text file. It is just that in case of an ANSi encoded file (from my deductions) containing some non standard characters (like Ø) the connector doesn't return the content when option is set to HasBytes = False.

If I set the option to HasBytes = True I get the content from the connector. I can then decode the data (which is Base64 encoded) and store it in a variable. Text seems ok, exception made of the special characters that are replaced by �.

Hi @phertzog

It seems like the issue you're facing is related to character encoding when using the GCS connector. I’ve tried replicating your problem taking the file and encoding format into consideration. 

While Google documentation does not explicitly mention limitations regarding the character sets supported by the GCS connector, it appears that it can read UTF-8 encoded files, but not ANSI encoded files. 

Here are a few workarounds that may help resolve the problem:

  • Use UTF-8 encoded files: If possible, ensure that your files are encoded in UTF-8 to avoid issues with special characters.
  • Process the file as binary: If you're handling files with non-UTF-8 encodings (like ANSI), you can read them as binary and handle the encoding manually in your code.

If the workarounds above don’t work, you may consider filing a product feature request. Note that I won’t be able to provide the date as to when this will be implemented.

I hope this helps.

Hello,

Thanks for the confirmation. I came to the same conclusion and workaround.

Regards

Top Labels in this Space