Solved: Re: count no of objects in a bucket/folder - custo...

sandeepguptha9 · 05-24-2023 07:28 AM

Hello,

I am really trying to create a custom metric to "count the no of objects present in a bucket/folder/folder"

Once I create a metric I want to export the results to the big-query table via sink with the parameters (project id, bucket name, folder name, no of objects, and timestamp)

Could anyone tell me how to make it?

ex: my project - X, bucket name: Y, subfolder: Z or maybe bucket has many sub-folders but I need to count from one of the sub-folders

KirTitievsky

GCS object count metrics can be grouped by bucket, but not by object prefix (e.g. "folder/folder"). I wonder if you might have a better time using storage insights (https://cloud.google.com/storage/docs/insights/inventory-reports) or BigQuery object tables ( https://cloud.google.com/bigquery/docs/object-table-introduction) to get all object metadata into BigQuery and then create your metrics as queries over that data in BigQuery.

Note that Inventory Reports are shipped as GCS objects. You can use them in BigQuery either by defining external tables in BigQuery or just running a regular BigQuery load job.

View solution in original post

KirTitievsky

GCS object count metrics can be grouped by bucket, but not by object prefix (e.g. "folder/folder"). I wonder if you might have a better time using storage insights (https://cloud.google.com/storage/docs/insights/inventory-reports) or BigQuery object tables ( https://cloud.google.com/bigquery/docs/object-table-introduction) to get all object metadata into BigQuery and then create your metrics as queries over that data in BigQuery.

Note that Inventory Reports are shipped as GCS objects. You can use them in BigQuery either by defining external tables in BigQuery or just running a regular BigQuery load job.

sandeepguptha9

Thank you, KirTitievsky

I configured the daily Inventory report and got the CSVs into another bucket but we have few pipelines that run hourly and I wanted to run this inventory report hourly

1. While configuring we have only two options: a) daily b) Weekly (No configuration for time)but I want to configure them on my schedule (I want to configure time manually when to start and to run the report hourly)

2. Currently my daily schedule is 5:37 UTC (time configured automatically no chance to provide manually) but the report is available after 6:30 UTC there is a 1-hour delay b/w the job run and report availability.

Could you please tell me, if Is there any chance I can custom-schedule by giving time and dates at my own convenience?

Regards,
Sandeep

kvandres

Hi @sandeepguptha9,

Thanks for hopping in @KirTitievsky, You can follow KirTitievksky suggestion or you can also try this workaround:

1. List the total count in a bucket.

gsutil ls gs://bucket/foldername/** | wc -l

count no of objects in a bucket/folder - custom metric