GCS - Finding Files stored in wrong storage class

LC11 · 01-22-2023 06:42 PM

Using the monitoring system in GCS I can see that I have 1TB of data in 'regional' storage class while 7.5TB is in 'archive' class.

Using CyberDuck 'info' panel on the folders, and checking all the top-level parent folders, every folder 'seems' to show as 'ARCHIVE' storage class. So how will I ever be able to know where the specific 1TB of files are (ie which files/folders are seemingly in a 'regional class' while their parent folder is 'ARCHIVE' storage class?)

I can confirm that in the attached image both project_id's and bucket_names match for the two image classes a shown

Thanks for any advice.

kolban

My first thought is to use gsutil and run a recursive listing over your buckets and capture the metadata ... see:

https://cloud.google.com/storage/docs/gsutil/commands/ls

With that data captured, we could now search the results and see which objects are flagged as ARCHIVE.

I must admit to be being confused on the notion of "ARCHIVE" vs "REGIONAL" storage classes ... when I look here:

https://cloud.google.com/storage/docs/storage-classes

I see storage classes of:

Standard
Nearline
Coldline
Archive

What I don't see is a "storage class" called "regional"

LC11

Thanks Kolban - Im trying that now.

Would the command be:

gsutil ls -r gs://[my bucket name]/**

?

kolban

Looking at the docs, possibly with an additional "-L" flag to generate a long listing. From what I see, the results seem oriented towards human readable so searching may be a bit more of a challenge. Let me know what you find. Depending on your keen-ness, another thought would be to write a GCS application/tool to do the equivalent in Java, Python or NodeJS. I'm thinking that an API call such as:

https://cloud.google.com/storage/docs/json_api/v1/objects/list

might also be useful for us. It seems to only read 1000 objects at a time but each object returned contains the storageType field. So a script that ran through all the objects and dumped their storageType with the object name doesn't *feel* like it would be too hard.