Solved: Re: Hibernate project

moonking · 03-15-2023 11:20 AM

Hello,

Is there a way to hibernate a project temporarily? Our factory runs 4 days a week, so would be great if we could hibernate our project when factory is down and wake up before they start running again?

TIA!

kolban

I think we are on the right track. I found this StackOverflow article that seems to say if we change the node-pool size to 0, we will effectively have hibernated the environment.

View solution in original post

kolban

I am not familiar with the term "hibernate a project" ... but I notice you have tagged the question Compute Engine ... so I'm going to assume that the question is "Can I start a Compute Engine by schedule and then also shut it down by schedule so that I am not charged for it when there is no usage?"

I think the answer is yes ... and is likely described by this article which talks about "Scheduling a VM instance to start and stop".

moonking

Thank you Kolban. Sorry, there was no other appropriate channel for my question about the project itself. I will review the link you sent.

moonking

Kolban, would this work for GKE managed VM as well? Right now if I stop VM manually it gets started again automatically.

I have disabled auto-repair for the node pool and still whether I stop the gke instance or VM, it keeps coming back up.

So scheduling will also not work.

moonking

I found out I can stop the cluster itself with gcloud dataproc clusters stop <cluster name> --region=<cluster region>

But what if the cluster was created as zonal? How do I stop it then?

kolban

My assumption is that since zones are contained within regions, stopping the cluster at the region level will shutdown all resources across all the zones within that region. The way to think about it is that a Google Cloud region is a "collection" of "zones" where ... loosely ... we can think of a "zone" as its own isolated "data center" but the zones are all super close to each other. I think of a "zone" as a building on a campus with its own power supply, air conditioning, fire suppression etc etc etc. The thinking is that if there were a physical emergency (a fire, a flood, a power transformer burnout etc etc) then that would be contained to one zone and the other zones would remain fully operational. For many Google Cloud resources, data is stored in multiple zones in the region so a loss of a single zone would not necessarily disrupt your continued operations.

See https://cloud.google.com/docs/geography-and-regions

moonking

That completely makes sese @kolban.

But since GCP allows one to create a zonal (non-region) cluster, shouldn't it create the cluster under the region identified by the zone?

e.g. I create a zonal cluster in us-central1-a, then the cluster should be created under us-central1 region.

That way the dataproc commands like gcloud dataproc clusters stop <cluster name> --region=us-central1 would work.

Why is it not done that way?

kolban

Howdy Moonking. I sat down and started reading ....

https://cloud.google.com/dataproc/docs/concepts/regional-endpoints

What that seems to tell me is that when I create a Dataproc cluster I must specify a Google Cloud region where the cluster will be hosted. It then seems to say to me that I can optionally specify a zone within the region where the cluster will actually be created. So far, my thinking feels solid. When we create a cluster, we name the region where it is to be housed and we can optionally specify a zone within the region. If we don't specify a zone, then Google picks one for us. What this tells me is that the Dataproc cluster would be susceptible to a zonal failure (if the zone failed, we the cluster would be down).

Now lets drill into what you are finding ... how did you create the Dataproc cluster? What are you executing to stop it? What results are you finding? How does this differ from what you expect?

moonking

gcloud dataproc clusters stop <cluster name> --region=<cluster region>

Above is the command I use to stop the cluster. Response I get is:

ERROR: (gcloud.dataproc.clusters.stop) NOT_FOUND: Not found: Cluster projects/my-project/regions/us-central1/clusters/my-cluster.

So, then I use "gcloud dataproc clusters list --region=us-central1" to list the clusters and it shows 0.

Then I used

gcloud compute regions list --format="value(name)" | \
    xargs -n 1 gcloud dataproc clusters list --region

I get 0 clusters listed.

Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.
Listed 0 items.

This shows that the cluster is in zonal and in us-central1-a zone so, I should be able to find it in us-central1 region.

kolban

Hmmm ... I think you are saying that your Google Cloud console shows the existence of a cluster but when you run gcloud to list the clusters from the command line, you don't see any clusters. Is that the core? Let's check a couple of things. First, in the console, make a note of which exact Google Cloud project you are looking at. Next, in the command line for gcloud, add "--project <ProjectID>". If you don't specify an explicit project then the default will be used (see gcloud config list) and it may be that you are looking at the Dataproc configuration in a different Google Cloud project than you expect. Let's also check that you are using the same identity for your queries in both the console and the command line.

moonking

Yes your understanding is correct.

project is correct on shell
response for stop shows the correct project
adding --project resulted in same error (projects/my-project/regions/us-central1/clusters/my-cluster --> both my-project and my-cluster is correct
correct account and project shown for gcloud config list

kolban

Gosh ... I'm getting stumped on this one. I hear you say that the console says you have a Dataproc cluster but the command line says you don't. We have tried applying the explicit project name that hosts the cluster to the command line and still it comes back and says "no clusters". You have looped through the list of all regions using "fancy" shell scripting and still don't see it.

The next thing I'd suggest we do is look for something subtle ... let's posit that the console is lieing to us (I hope not). I'd do a hard browser refresh and see if the cluster listing goes away or somehow changes. What I'm fishing for is some possible caching of data in the browser that is listing something which isn't there.

moonking

I was stumped too when I posted this question 🙂 Thanks for staying with me on this one, hoping it has a simple explanation!

BTW, the "fancy" shell script credit goes to Dennis Huo on stackoverflow.com

So I opened Chrome in "incognito" and tried the same commands with same results.

moonking

I am creating a test zonal and regional cluster to see the difference:

BTW, I went to Dataproc Overview and there I did not see the cluster. So, GKE clusters are not connected to Dataproc?

kolban

Oh ... this is important!! So your Console is now not showing any Dataproc clusters ... this matches what we see from the gcloud command. I'm wondering if we are having a fundamental puzzle here ... it was your good self that mentioned Dataproc clusters (i.e. a collection of machines for running Spark jobs) ... but I'm now starting to wonder if what we should really have been chatting about is GKE clusters (a collection of machines for running Kubernetes jobs). Maybe we should back it up some ... to my understanding ... there is ZERO relationship between Dataproc (i.e. Google managed Spark) and GKE (i.e. Google managed Kubernetes). Have we perhaps got ourselves tied in non-existent knots 🙂 .... let's step back and see if we can't re-state what we are trying to do and what puzzles we are seeing ...

moonking

Right, I had started this conversation with GKE clusters. I guess I was waylaid by cloud console when I typed:

gcloud clusters list

It asked me if I meant:

gcloud dataproc clusters list

At that time I did not know what dataproc is and assumed it is a way to control start/stop of gke clusters.

So to restate my original task was a way to put the project or GKE clusters to sleep or stop and then wake up or start again.

And dataproc is not the answer.

So I am left with following steps to achieve what I need ( I have tested these steps and they work, albeit leaves me with no automated repair features while the cluster is running):

disable gcp update and repair for the cluster
1. remove cluster from release channel to static channel
2. disable node auto-provisioning
3. disable auto-scale and auto-repair
To turn off cluster, change node size to 0 (per zone)
To turn on cluster, change node size to 1 (per zone)

kolban

I think we are on the right track. I found this StackOverflow article that seems to say if we change the node-pool size to 0, we will effectively have hibernated the environment.