Can we scale down AlloyDB machine type - vCPU and Memory once the cluster got created?
Thanks Mark Shay for your detailed answer.
Hi @RavikumarV,
I'm a product manager for AlloyDB and am excited to help you with your question. You can absolutely scale down or up any of your instances on your cluster after cluster creation. Scale up and down operations are completed with near zero (<1s) downtime on your primary instances with high availability and with zero downtime on your read pools.
Best,
Emir
I tested it on our products instance, the primary cluster took ~15 mins minimum to scale up and down which had just 400 MB total storage data.
Hi pshah,
Once you initiate a maintenance operation (scale up/down, make a flag change that requires a restart), our non-disruptive maintenance operation workflow first launches a new database server with your desired settings, catches it up to your current server's progress, and partially warm its caches. This part of the operation is what takes up to 15 minutes, and during this time, you can continue to use your database as you've been using it (establish new connection, write, read etc.) -- this is the 15 minutes you're referring to. After cache prewarming completes, we swap the servers, which results in a momentary connection drop (milliseconds) -- which is what I was referring to.
Feel free to give it another test while running your workload or a benchmark, and you'll notice that the database is fully operational during the 15 minutes of operation time.
Let me know if you have any other questions,
Emir
Hi emirokan,
Can you share the relevant docs to scale up/down alloydb instances?
@emirokan any update?
Hi Ishaan,
Here's the doc on how to scale instances: https://cloud.google.com/alloydb/docs/instance-read-pool-scale
The non-disruptive maintenance behavior I mentioned above is covered in this doc page:
I've been playing with AlloyDB for almost a year now and I noticed that scaling up always works, but scaling down often fails. I can't understand the pattern but it seems like scaling down by more than one step (for example, 16 vCPUs -> 2 vCPUs) is more likely to fail than just by one step (16 vCPUs -> 8 vCPUs).
That being said, the errors I'm getting are no more than "Operation failed: an internal error has occurred". And that is usually after 30m or so. No info on why it failed, or what to search the logs for.