Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Ray Operator on K8 with AutoPilot mode not creating any pods

kiran8
New Member

RayCluster, RayJob, and RayService are not created

0 1 120
1 REPLY 1

Hi @kiran8 ,

Welcome to Google Cloud Community!

Here’s the step-by-step basic troubleshooting you can use:

1.Verify Operator Health:

  • Check operator pod status (kubectl get pods -n <operator-namespace>)
  • View operator logs (kubectl logs -n <operator-namespace> <operator-pod-name>)

2. Check RayCluster CR Status:

  • Use kubectl get raycluster -n <your-namespace> -o yaml <your-raycluster-name> to see the current state, events, and any errors associated with your cluster.
  • Pay close attention to the status field in the output. Look for any error messages.

3.Inspect Events:

  • Use kubectl get events -n <your-namespace> to review events related to pod scheduling, creation, or any failures.
  • Look for events related to your RayCluster or the operator.

4. Describe Failing Pods:

  • If any pods have been created and are failing, use kubectl describe pod -n <your-namespace> <failing-pod-name> to get more detailed information on errors (like ImagePullBackOff, scheduling failures, etc.).

5. Simplify: 

  • Try a minimal RayCluster CR with the bare minimum configuration to see if it works. Start from the basics and add complexity gradually.

6. Check Resources: 

  • Use kubectl describe node on your cluster nodes to see the available resources and any node taints.

7. Check the Ray Operator's RBAC: 

  • Ensure the operator service account has the necessary roles.

8. Review Ray Operator and Cluster Configuration: 

  • Review the YAML files for any potential errors or typos

To understand more about Kuberay, you may check “Getting Started with KubeRay”.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.