Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery compatibility with Netezza hash function for values with special character ¶

Hi Team,

I'm trying to convert NZ functions to BQ compatibility and Im facing some discrepancy when dealing with  hash function for input values with special character.

Below are the SQL statements and corresponding output,

 
Netezza:
select rawtohex(hash( 'CHR' || 'BN8193877¶' , 1 ))
313E6401C7D3A6A11A0AA54AC251345345D437BB
 
BigQuery:
select UPPER(TO_HEX(SHA1(CONCAT('CHR', CAST('BN8193877¶' AS STRING)))))
E58916ABC36F6E9D677A49C8C4D8FD523C6E5409
 
It was working (giving similar output) for input values without special characters but giving different results when handling values with special characters (In this case ¶ ). Both the queries are using SHA1 algorithms.
 
Any leads on this are much appreciated?
1 1 360
1 REPLY 1

 

The discrepancy in hash outputs between Netezza and BigQuery is likely due to differences in how they handle the encoding of special characters, such as "¶", within their hashing functions.

Character Encoding Discrepancies:

BigQuery: By default, BigQuery uses UTF-8 encoding. To achieve consistent hashing results, it's important to align the character encoding with that of Netezza.

Netezza: The encoding used by Netezza is crucial to identify. It may differ from BigQuery's UTF-8, affecting the hash output.

Potential Solutions:

1. Identify and Convert Encoding in Netezza:

Discover Netezza Encoding: Determine the exact character encoding used in Netezza. This information is key to aligning the encoding process.

Convert to UTF-8 in Netezza: If feasible, convert the input string in Netezza to UTF-8 before applying the hash function. The method for this conversion depends on Netezza's available functions.

2. Encoding Conversion Before BigQuery (If Necessary):

External Conversion: If Netezza's encoding cannot be modified, and you know its encoding type, consider converting the string's encoding to UTF-8 before loading it into BigQuery using external tools or processes.

Data Integrity Caution: Be cautious with encoding conversions, as they can lead to data loss, especially with special characters.

3. Alternative Hash Function (Last Resort):

Consistent Hashing Across Platforms: If aligning encodings is not feasible and identical hash values are critical, consider using a different hash function, like SHA-256, known for more consistent behavior across platforms.

System Impact Awareness: Changing the hash function might have broader implications on your systems, so evaluate this option carefully.

Please Note:

Specific Encodings: Understanding Netezza's exact encoding is crucial for effective conversion.

Data Integrity and Loss: Pay special attention to the potential for data loss when converting encodings.

Testing: Conduct extensive testing with various characters, including special characters, in both environments. This will help verify that the desired hash output behavior is achieved after any changes in encoding or hash function.

By carefully examining and aligning the character encodings between Netezza and BigQuery, and considering alternative solutions if necessary, you can resolve the hash output discrepancies effectively.