EEException: Vertex AI prediction error: XLA compi...

2200271008 · 02-28-2024 11:31 PM

I deployed my trained custom model to Vertex AI and used the GEE API to call the model for predicting images on Earth Engine. However, I encountered the following error message:

"EEException: Vertex AI prediction error: 2 root error(s) found. (0) UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in? [[{{function_node __inference__wrapped_model_13951}}{{node model_1/UModel_with_reshaping/UModel/down_blocEEException: Vertex AI prediction error: XLA compilation disabled [[{{function_node __inference__wrapped_model_16067}}{{node model/my_model/UModel/down_blocks0_nafBlock/sequential_2/down_blocks0_nafBlock_conv_2/PartitionedCall}}]]ks0_nafBlock/sequential_2/down_blocks0_nafBlock_conv_2/PartitionedCall}}]] [[StatefulPartitionedCall/StatefulPartitionedCall/model_1/re_serialize_output_2/map/while/body/_360/model_1/re_serialize_output_2/map/while/TensorArrayV2Read/TensorListGetItem/_356]] (1) UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in? [[{{function_node __inference__wrapped_model_13951}}{{node model_1/UModel_with_reshaping/UModel/down_blocks0_nafBlock/sequential_2/down_blocks0_nafBlock_conv_2/PartitionedCall}}]] 0 successful operations. 0 derived errors ignored."

I change CONTAINER_IMAGE from “us-docker.pkg.dev/vertex-ai/prediction/tf2-gpu.2-11:latest” to “us-docker.pkg.dev/vertex-ai/prediction/tf2-gpu.2-13:latest”，than I got an new error :

EEException: Vertex AI prediction error: XLA compilation disabled [[{{function_node __inference__wrapped_model_16067}}{{node model/my_model/UModel/down_blocks0_nafBlock/sequential_2/down_blocks0_nafBlock_conv_2/PartitionedCall}}]]

Can you please advise on how to resolve this issue?

Poala_Tenorio

It seems like you're encountering issues related to the setup of your custom model deployment on Vertex AI and its integration with Google Earth Engine (GEE).

CUDA Compiler Error:
The first error you encountered indicates that there is an issue with CUDA compiler support. CUDA is typically used for GPU acceleration in deep learning frameworks like TensorFlow. However, it seems there is a problem finding or linking the CUDA compiler. This could be due to various reasons such as incorrect configuration or incompatible versions.

Ensure that the TensorFlow version you're using in your custom model is compatible with the CUDA version provided by the Vertex AI container image you're using. Sometimes, specific TensorFlow versions might require specific CUDA versions.

Verify that the CUDA and cuDNN libraries are properly installed and configured within your Docker container.

XLA Compilation Disabled:
The second error indicates that XLA (Accelerated Linear Algebra) compilation is disabled. XLA is a domain-specific compiler for linear algebra that can optimize TensorFlow computations. It seems like there might be a configuration issue regarding XLA compilation.

Check if XLA compilation is explicitly disabled in your TensorFlow model code or configuration. Look for any settings or flags related to XLA and ensure they are appropriately configured.

Verify that the TensorFlow version you're using supports XLA compilation and that it is properly configured in your model.

Here are some general troubleshooting steps you can follow:

Double-check your model's compatibility with the TensorFlow version provided by the Vertex AI container image.
Review the Dockerfile or Docker configuration used to build your custom model's container image to ensure all necessary dependencies are included and properly configured.