Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataproc Metastore couldn't start

I got an issue happened since 3h30 UTC with Dataproc Metastore

As I look into Logs Explorer, it happened after Metastore run script hive-schema-3.1.0.cloudspanner.sql

Error message:

Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.cloudspanner.sql
...
Error: FAILED_PRECONDITION: Operation with name "projects/xxx/instances/dpms-7bef6b94-a914-4ea8-b44/databases/hive/operations/rfea6af8e_6a40_422a_bd1a_8d98607d54ed" failed with status = GrpcStatusCode{transportCode=FAILED_PRECONDITION} and message = Duplicate name in schema: VERSION. (state=,code=9)

org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!

Underlying cause: java.io.IOException : Schema script failed, errorcode 2

Use --verbose for detailed stacktrace.

*** schemaTool failed ***

Before that, I got an failed validate schema log also:

/opt/hive/bin/schematool -dbType cloudspanner -validate
...
Validating sequence number for SEQUENCE_TABLE
NEXT_VAL for MPartitionColumnStatistics in SEQUENCE_TABLE < max(CS_ID) in PART_COL_STATS
Failed in bit-reversal sequence number validation for SEQUENCE_TABLE.
...

My Dataproc Metastore can't start and connect from this moment anymore

Do you guys got this issue like me? Please help me resolve this

Thank you

 

Solved Solved
1 5 791
1 ACCEPTED SOLUTION

 

The error message "Duplicate name in schema: VERSION" indicates that the VERSION table or column already exists in the Cloud Spanner database. This can happen if the Hive Metastore schema has already been initialized for the database.

To resolve this issue, you can try the following:

  1. Backup First: Before making any changes, ensure you have a backup of your Cloud Spanner database.

  2. If you're sure that the VERSION table is the cause of the issue and you want to delete it, use the following Cloud Spanner SQL statement:

 

DROP TABLE hive.VERSION;
  1. After ensuring the database is in a clean state, you can initialize the Hive Metastore schema using the following command:

 

 
/opt/hive/bin/schematool -dbType cloudspanner -initSchema

⚠️ Ensure you're using the correct version of the schema initialization script that matches your Hive Metastore version.

If you continue to face issues, consider reaching out to Google Cloud Support for further assistance.

View solution in original post

5 REPLIES 5

 

The error message "Duplicate name in schema: VERSION" indicates that the VERSION table or column already exists in the Cloud Spanner database. This can happen if the Hive Metastore schema has already been initialized for the database.

To resolve this issue, you can try the following:

  1. Backup First: Before making any changes, ensure you have a backup of your Cloud Spanner database.

  2. If you're sure that the VERSION table is the cause of the issue and you want to delete it, use the following Cloud Spanner SQL statement:

 

DROP TABLE hive.VERSION;
  1. After ensuring the database is in a clean state, you can initialize the Hive Metastore schema using the following command:

 

 
/opt/hive/bin/schematool -dbType cloudspanner -initSchema

⚠️ Ensure you're using the correct version of the schema initialization script that matches your Hive Metastore version.

If you continue to face issues, consider reaching out to Google Cloud Support for further assistance.

Thank you for your recommendation, I think I should contract GCP Support because it is managed service, I don't know how to access Cloud Spanner and command shell

Yes, that would be a good idea.

hi @hoadx did you find a solution for this?

I would reach out to support for assistance with this issue!