Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Troubleshooting Cloud Logging costs is so hard

We seem to be incurring high Cloud Logging costs for one of our clients and their Infra spans  a wide range of GCP services - GKE, Cloud Run Functions, Cloud Run, Firebase.  It is hard to understand which of the logs are contributing to the costs. There are no destination sinks setup in our case

I totally understand the Cloud Logging architecture with exclusion and inclusion filters etc and also the GCP Audit log types but the challenges I am facing are mentioned below.

  1. How do I figure out which  log category my log message belongs and can I disable  it or not  for example  for the application logs what category of log type does it belong to?  Also I see some of the cloud run functions have messages that indicate Function started .. , Function ran for 110 seconds etc.  What log types do these belong to (admin, data access, system event and can i disable them?
  2. Another challenge is I might have enabled logs for some services , but there is no way to know upfront what those services are so the only way is to check log setting for each service which is cumbersome. A good example is VPC flow logs. VPC flow logs are disabled by default but you enable it , it generates additional log messages that will add to the costs. 
  3.  Are there any custom log metrics / charts  that i can setup to understand usage maybe a graph showing the volume of logs in each category (system logs, application logs etc)?
Solved Solved
0 3 870
1 ACCEPTED SOLUTION

Dear dheerajpanyam,

I am 100% sure that the cause is the noise from too many logs written to the default bucket. The retaining period =< 30 days are included per this documentation. Even though you set the retaining period 20 days, the cost would still be the same https://cloud.google.com/stackdriver/pricing You only need to control how many GBs of logs ingested into a particular bucket.

As I previously stated, even though the GKE whether it is autopilot or standard stdout all logs  but if your ingestion settings only include specific/granular parameter. You will not get charged for the non-ingested logs. You will only get charged for the logs that are included in the specific/granular parameter (except the Networking Telemetry I mentioned before ofc)

For the query, I'm not quite sure whether it is MQL or not. I think Google have it's own language. Please refer to https://cloud.google.com/logging/docs/view/logging-query-language

For the monitoring, is something like this sufficient for you?

azzi_4-1736489990650.png

 

To be able accurately track the ingestion, I suggest to add more filter to this monitoring such as filter by label. But unfortunately I don't see any way to label a Logging Bucket. Therefore I recommend you to create a new Cloud Storage Bucket then label it with appropriate key:value. After you finished creating the bucket, you then create a new sink with destination to that bucket that you just created.

azzi_1-1736489685321.png

Fyi, this is what I usually do. I disabled the default Sink, and created so many customs sinks so that I can control my logging cost more flexible. When I need something I just enable certain sinks, disable it again once I got what I need.

azzi_3-1736489829647.png

Hope this could give you some inspiration in your own GCP projects

Regards,
Iza

View solution in original post

3 REPLIES 3

Dear dheerajpanyam,

Setup your Cloud Sink in here

https://console.cloud.google.com/logs/router

By default the Cloud Logging ingest everything, try to tune this parameter.

azzi_0-1736404153768.png

You can also disable the default sink, and create a new custom sink there.

To know what parameter to insert (include/exclude part), try explore it in the Logs Explorer. For example if I only want my Sink to ingest GKE logs only.

azzi_1-1736404303740.png

Most of the time, GKE and Cloud Run are the most Logs hungry services. As it stdout your application logs into the GCP Logging.

To answer no 2. Based on my experiences, the only thing that you need to do is to control the Sink what to include and to exclude. Even though the VPC Flow Logs are enabled, but the Logs are not ingested.  You will not get charged for the Logs storage, but yes you still get charged for the Networking Telemetry. To know what services that you already enabled, I suggest you check the SKUs in your billing report.

From my POV, most of the time if someone asked me how to optimize logging cost. I would say to them to disable all logs except Audit Logs and any other logs that you want to keep. For the rest of the logs you can turn it ON only when there is an investigation/troubleshoot that needs to be done.

Regards,
Iza

Thanks @azzi. We have 3 GKE clusters - 2 autopilot and 1 standard. Apparently i cannot disable system and event log for autopilot. For the 1 standard cluster i disabled logging altogether. Also the challenge is in writing the Log query, does it use MQL?  Is it possible to setup a metric chart that shows the volume of logs that is getting ingested and written to the bucket after applying the inclusion and exclusion filters. At present we use the _Default bucket with log retention of 20 days.  So the question really is whether the costs are coming from storage / retention or the noise from too many logs written to the default bucket. 

 

Dear dheerajpanyam,

I am 100% sure that the cause is the noise from too many logs written to the default bucket. The retaining period =< 30 days are included per this documentation. Even though you set the retaining period 20 days, the cost would still be the same https://cloud.google.com/stackdriver/pricing You only need to control how many GBs of logs ingested into a particular bucket.

As I previously stated, even though the GKE whether it is autopilot or standard stdout all logs  but if your ingestion settings only include specific/granular parameter. You will not get charged for the non-ingested logs. You will only get charged for the logs that are included in the specific/granular parameter (except the Networking Telemetry I mentioned before ofc)

For the query, I'm not quite sure whether it is MQL or not. I think Google have it's own language. Please refer to https://cloud.google.com/logging/docs/view/logging-query-language

For the monitoring, is something like this sufficient for you?

azzi_4-1736489990650.png

 

To be able accurately track the ingestion, I suggest to add more filter to this monitoring such as filter by label. But unfortunately I don't see any way to label a Logging Bucket. Therefore I recommend you to create a new Cloud Storage Bucket then label it with appropriate key:value. After you finished creating the bucket, you then create a new sink with destination to that bucket that you just created.

azzi_1-1736489685321.png

Fyi, this is what I usually do. I disabled the default Sink, and created so many customs sinks so that I can control my logging cost more flexible. When I need something I just enable certain sinks, disable it again once I got what I need.

azzi_3-1736489829647.png

Hope this could give you some inspiration in your own GCP projects

Regards,
Iza