Solved: Re: OpenTelemetry and GKE

Navirash · 11-23-2023 07:30 AM

Hi @ms4446

Yesterday you told me we could do this. So I wanted to know how to do this because I'm using GKE. It may be simpler than going through an ops agent.

2. Cloud Operations for GKE:

If you are using GKE, you can leverage Cloud Operations for GKE, which integrates OpenTelemetry natively.
It automatically collects traces from your applications deployed on GKE and sends them to Google Cloud Trace without needing to configure the OpenTelemetry Collector manually.

Thank you

ms4446

To activate tracing with OpenTelemetry and export traces to Google Cloud Trace in a GKE environment, you can use the following configuration:

Enable OpenTelemetry in Strimzi:

Add the following configuration to your Strimzi deployment:
```
tracing:
  type: opentelemetry
```

Configure the OpenTelemetry Collector: a. Deploy an OpenTelemetry Collector in your GKE cluster. b. Apply the following configuration:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:55680"
exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    service: pipelines
    traces:
      receivers: [otlp]
      exporters: [googlecloud]

This configuration sets up the Collector to receive OTLP trace data and export it to Google Cloud Trace.

Apply and Restart: a. Apply the OpenTelemetry Collector configuration. b. Restart the OpenTelemetry Collector to apply the changes.
Permissions and Network Configuration: a. Ensure that the OpenTelemetry Collector has the necessary permissions to send data to Google Cloud Trace. b. Verify that your network configuration allows communication between Strimzi, the Collector, and Google Cloud Trace.

After completing these steps, your Strimzi deployment will emit OpenTelemetry traces, which the OpenTelemetry Collector will collect and export to Google Cloud Trace. You can then view and analyze these traces in the Google Cloud Console.

View solution in original post

ms4446

Hi @Navirash ,

To use Cloud Operations for GKE for OpenTelemetry tracing on Google Kubernetes Engine (GKE), follow these steps:

Default Observability Features: By default, GKE clusters (both Standard and Autopilot) are configured to send system logs, audit logs, and application logs to Cloud Logging, and system metrics to Cloud Monitoring. They also use Google Cloud Managed Service for Prometheus to collect configured third-party and user-defined metrics and send them to Cloud Monitoring.
Customize and Enhance Data Collection: You have control over which logs and metrics are sent from your GKE cluster to Cloud Logging and Cloud Monitoring. You can also decide whether to enable Google Cloud Managed Service for Prometheus. For GKE Autopilot clusters, the integration with Cloud Monitoring and Cloud Logging cannot be disabled.
Additional Observability Metrics: You can enable additional observability metrics packages for more detailed monitoring. This includes control plane metrics for monitoring the health of Kubernetes components and kube state metrics for monitoring Kubernetes objects like deployments, nodes, and pods.
Third-Party and User-Defined Metrics: To monitor third-party applications running on your clusters (like Postgres, MongoDB, Redis), use Prometheus exporters with Google Cloud Managed Service for Prometheus. You can also write custom exporters to monitor other signals of health and performance.
Use Collected Data: Utilize the data collected for analyzing application health, debugging, troubleshooting, and testing. GKE provides built-in observability features like customizable dashboards, key cluster metrics, and the ability to create your own dashboards or import Grafana dashboards.
Other Features: GKE integrates with other Google Cloud services for additional monitoring and management capabilities, such as security posture dashboards, insights and recommendations for cluster optimization, and network policy logging.

For detailed configuration instructions and more information, you can refer to the Google Cloud documentation on Observability for GKE.

Navirash

ok thank you @ms4446 . Is it possible to do all this step with an config file like yaml ?

ms4446

Yes, it is possible to configure many aspects of observability in GKE using YAML configuration files. YAML files are commonly used in Kubernetes and GKE for defining, configuring, and managing resources.

For more detailed and specific configurations, you can visit the All GKE code samples page.

Navirash

Thank you but I don’t understand how to activate tracing open telemetry and export trace in google trace. I didn’t find an specific example.

Do you have a specific example please ?

ms4446

To activate tracing with OpenTelemetry and export traces to Google Cloud Trace in a GKE environment, you can use the following configuration:

Enable OpenTelemetry in Strimzi:

Add the following configuration to your Strimzi deployment:
```
tracing:
  type: opentelemetry
```

Configure the OpenTelemetry Collector: a. Deploy an OpenTelemetry Collector in your GKE cluster. b. Apply the following configuration:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:55680"
exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    service: pipelines
    traces:
      receivers: [otlp]
      exporters: [googlecloud]

This configuration sets up the Collector to receive OTLP trace data and export it to Google Cloud Trace.

Apply and Restart: a. Apply the OpenTelemetry Collector configuration. b. Restart the OpenTelemetry Collector to apply the changes.
Permissions and Network Configuration: a. Ensure that the OpenTelemetry Collector has the necessary permissions to send data to Google Cloud Trace. b. Verify that your network configuration allows communication between Strimzi, the Collector, and Google Cloud Trace.

After completing these steps, your Strimzi deployment will emit OpenTelemetry traces, which the OpenTelemetry Collector will collect and export to Google Cloud Trace. You can then view and analyze these traces in the Google Cloud Console.

ms4446

For this approach, you do not need the Ops Agent in this configuration. The OpenTelemetry Collector alone is sufficient for collecting and exporting traces from Strimzi to Google Cloud Trace.

For step 2 of the OpenTelemetry Collector configuration, you can start with the provided YAML configuration. However, be aware that additional adjustments may be necessary depending on your specific setup. For example, if your Strimzi deployment sends traces over a different protocol or port, you will need to modify the receivers section accordingly.

The connection between your Strimzi configuration and the OpenTelemetry Collector is established through the OTLP protocol. Ensure that Strimzi is configured to send OTLP trace data to the correct endpoint where the OpenTelemetry Collector is listening. This means matching the IP and port in the Strimzi configuration with the endpoint specified in the OpenTelemetry Collector's receivers section.

Here's a summary of the steps:

Enable OpenTelemetry in your Strimzi deployment: Configure Strimzi to emit OpenTelemetry traces.
Deploy and configure the OpenTelemetry Collector: Use the provided YAML as a base, but be prepared to make adjustments based on your environment's specifics, such as network settings and trace volume.
Ensure Proper Network Configuration and Permissions: Make sure the OpenTelemetry Collector has the necessary permissions to access Google Cloud Trace. Also, configure network policies and firewall rules within your GKE cluster to allow communication between Strimzi and the OpenTelemetry Collector.
Monitor and Scale as Needed: Keep an eye on the resource usage and performance of the OpenTelemetry Collector, especially if dealing with high volumes of traces. Scale the Collector if necessary to handle the load.

After completing these steps, Strimzi will emit OpenTelemetry traces, which the OpenTelemetry Collector will then collect and export to Google Cloud Trace. You can view and analyze these traces in the Google Cloud Console.

Navirash

Thank you very much for your explanation @ms4446 😄

Navirash

Hi @ms4446
I implemented the opentelemetry collector solution. I receive the traces from my strimzi to my opentelemetry collector. But unfortunately I don't have these traces in google trace explorer.

When I look at the logs at my opentelemetry collector. I have a problem of permission while my service account has the role cloudtrace.agent.

Do you have any suggestion ?

ms4446

Hi @Navirash ,

If you're encountering permission issues with your OpenTelemetry Collector despite the service account having the cloudtrace.agent role, here are some suggestions to troubleshoot and resolve the issue:

Verify Service Account Permissions:
- Double-check that the service account used by the OpenTelemetry Collector indeed has the cloudtrace.agent role. This role should allow the account to write trace data to Google Cloud Trace.
- Ensure that the service account is correctly associated with the OpenTelemetry Collector. If the Collector is running in a Kubernetes environment, this typically involves setting up a Kubernetes secret with the service account key and mounting it in the Collector's pod.
Check for IAM Policy Propagation Delay:
- Sometimes, there can be a delay in IAM policy changes taking effect. If you've recently added the cloudtrace.agent role to the service account, wait a few minutes and then retry.
Review OpenTelemetry Collector Logs:
- Examine the logs of the OpenTelemetry Collector more closely to identify any specific error messages related to the permission issue. This can provide clues about what might be going wrong.
Validate Service Account Key:
- Ensure that the service account key file used by the OpenTelemetry Collector is valid and has not expired. If necessary, create a new key file in the Google Cloud Console and update the Kubernetes secret accordingly.
Network Configuration:
- Although this seems like a permission issue, it's also worth checking that there are no network configuration issues preventing the OpenTelemetry Collector from reaching Google Cloud Trace.
Google Cloud Trace API Enabled:
- Make sure that the Google Cloud Trace API is enabled in your Google Cloud project.

Navirash

Thanks for your help @ms4446 . I just restart the deployment and it works.
But now, i have this error : failed to export to Google Cloud Trace: context deadline exceeded.

Do you know this error ?

ms4446

The error "failed to export to Google Cloud Trace: context deadline exceeded" typically indicates a timeout issue. This error occurs when the OpenTelemetry Collector is unable to send trace data to Google Cloud Trace within a specified time frame. Here are some steps to troubleshoot and resolve this issue:

Network Latency or Connectivity Issues:
- Check for any network latency or connectivity issues between the OpenTelemetry Collector and Google Cloud Trace. This could be due to network congestion, firewall rules, or other network-related configurations that might be blocking or slowing down the connection.
Increase Timeout Settings:
- If network latency is an issue, consider increasing the timeout settings in the OpenTelemetry Collector's configuration. This gives more time for the Collector to send data to Google Cloud Trace before timing out.
Review Collector Configuration:
- Ensure that the OpenTelemetry Collector is correctly configured to communicate with Google Cloud Trace. This includes verifying endpoint URLs, authentication credentials, and other relevant settings.
Check for High Volume of Traces:
- If your system is generating a high volume of trace data, the Collector might be getting overwhelmed, leading to timeouts. In this case, consider scaling up the Collector (e.g., increasing resources like CPU and memory) or optimizing how traces are batched and sent to Google Cloud Trace.
Monitor Collector Performance:
- Monitor the performance metrics of the OpenTelemetry Collector to see if it's experiencing resource constraints (like CPU or memory pressure) that could be causing the timeouts.
Examine Logs for Additional Clues:
- Check the OpenTelemetry Collector logs for any additional error messages or warnings that might provide more context about the timeout issue.
Update Collector to Latest Version:
- Ensure that you are using the latest version of the OpenTelemetry Collector, as updates often include performance improvements and bug fixes.

Navirash

Thanks @ms4446. Do you know how to increase timeout because my system generate a high volume of trace ?

Thanks

ms4446

To address timeout issues when exporting traces to Google Cloud Trace, you'll need to modify the configuration of the googlecloud exporter in your OpenTelemetry Collector configuration. Follow these steps:

1. Locate the Exporter Configuration:

Identify the section in your OpenTelemetry Collector configuration file where the googlecloud exporter is defined.

2. Adjust the Timeout Setting:

Add or modify the timeout setting within the googlecloud exporter configuration. The timeout is typically specified in seconds.

Example:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    timeout: 30s

3. Apply the Configuration Changes:

Save the updated configuration file.

4. Restart the OpenTelemetry Collector:

Restart the OpenTelemetry Collector to apply the new configuration.

5. Monitor the Results:

Observe the OpenTelemetry Collector logs to check if the "context deadline exceeded" errors are resolved.

6. Consider Batch Processing:

Configure the batch processor in your OpenTelemetry Collector to handle high volumes of traces efficiently.

Example:

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

7. Review Network Performance:

Verify that network latency and bandwidth are not contributing to the timeouts.

Remember:

Monitor the performance and resource usage of the Collector, especially with high trace volumes.
Carefully balance timeout settings with Collector performance and resource utilization.

Navirash

Hi @ms4446

Thank you, it works, my traces are exported well to Google Trace.

I have a quick question: I want to export the logs to Google cloud logging. But this requires a json format.
Are the logs in opentelemetry in JSON format by default?
If it is not in JSON format is there a way to convert to JSON format?

ms4446

Hi @Navirash ,

The OpenTelemetry Collector dictates the format, and for Google Cloud Logging, we need JSON. Let's fix that!

Configure a Logging Exporter

Add a logging exporter to your OpenTelemetry Collector config,specifying json as the output format. This exporter will wrap your logs in JSON before sending them to Google Cloud Logging.

Example Configuration:

exporters:
  logging:
    loglevel: debug
    encoding: json

Here, encoding: json is the key! 🪄

Include the Exporter in Your Pipeline

Tell your pipelines to use this new JSON-loving exporter. Here's an example:

service:
  pipelines:
    logs:
      receivers: [your_log_receiver]
      processors: [your_processors]
      exporters: [logging]

Replace your_log_receiver and your_processors with your actual log collection and processing components.

Apply and Verify

Apply the updated config to your OpenTelemetry Collector and restart it. Then, check the logs!They should now be formatted as JSON and happily chilling in Google Cloud Logging.

Navirash

Hi @ms4446
Thanks for your answer. Can I put encoding : json in googlecloud exporters ?
If I understood correctly, if i add the encoding json in googlecloud exporters this will export the logs from my opentelemetry collector (see attachment) to Google cloud logging.

When i try with logging. I have this error :

To resolve this error, I added this :
service:
telemetry:
logs:
encoding: json

That's works. So can i export the telemetry logs in google cloud ?

ms4446

As of the latest OpenTelemetry Collector versions, this setting is unnecessary. The exporter automatically handles formatting for Google Cloud Logging, which typically involves JSON. My previous information about specifying encoding: json was outdated and potentially misleading. I apologize for the confusion.

Here's a revised overview of how to export your telemetry logs to Google Cloud:

1. Configure the googlecloud exporter:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    # other relevant configuration options...

This configuration focuses on the project ID and other essential settings, not explicit encoding.

2. (Optional) Use a dedicated logging exporter:

If you need more control over the JSON format or require advanced processing, consider a separate logging exporter like logging or fluentd. Configure it with your desired format and Google Cloud Logging details (project ID, log name, etc.).

3. Restart the OpenTelemetry Collector:

After any configuration changes, restarting the Collector ensures the new settings take effect.

4. Verify your logs in Google Cloud Logging:

Once everything is set up and restarted, your telemetry logs should be flowing to Google Cloud in JSON format. You can access and analyze them using the Google Cloud Console or other tools.

Navirash

Hi @ms4446

Thanks for your answer. Where do you find this information "The exporter automatically handles formatting for Google Cloud Logging" ?

Can I add this to be sure that the logs are in json format ?
service:
telemetry:
logs:
encoding: json
And then I export like this :
service:
pipeline:
logs:
receiver: [oltp]
exporters: [googlecloud]

ms4446

While the OpenTelemetry Collector documentation doesn't explicitly mention "JSON encoding" for the googlecloud exporter, it does imply automatic format handling. This is evident in statements about the exporter "formatting and sending log entries to the Google Cloud Logging API." This suggests adherence to the expected format, typically JSON, for Google Cloud Logging.

Standard Configuration Structure:

You're absolutely right; the standard Collector configuration doesn't include service: telemetry: logs: encoding: json. The Collector primarily focuses on receivers, processors, exporters, and pipelines. The encoding setting typically resides within a dedicated logging exporter, not under the service section.

Exporting Logs to Google Cloud Logging:

The correct approach is to configure the googlecloud exporter within your pipeline. This exporter handles both formatting and exporting of logs to Google Cloud Logging. Here's a recommended configuration:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    # other configuration options...
service:
  pipelines:
    logs:
      receivers: [your_log_receiver]
      processors: [your_processors]
      exporters: [googlecloud]

This configuration assigns responsibility for log handling and exporting to the googlecloud exporter, eliminating the redundant and potentially misleading encoding: json setting.

Navirash

Ok thanks for your help @ms4446 . I have a last question. What is the receiver when I want to directly export the Opentelemetry collector logs?

For the trace i used oltp.

ms4446

When directly exporting OpenTelemetry collector logs, the specific receiver you'll need depends on your desired export destination and log source:

1. Exporting to Google Cloud Logging:

You don't need a separate receiver like filelog for the Collector's logs. They are managed internally and can be directly exported using an appropriate exporter, like googlecloud.
Example Configuration:

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID"
    # other configuration options...
service:
  pipelines:
    logs:
      exporters: [googlecloud]

2. Exporting to Another OpenTelemetry Collector:

The otlp receiver remains correct for receiving data from another Collector in a tiered architecture.

3. Exporting to a Third-Party Logging System:

Specific receivers like fluentforward or loki are used to collect logs from those systems, not for exporting the Collector's own logs.

4. Directly Exporting to a Backend:

Exporters like the elasticsearch or kafka exporters are used for direct exports, not receivers. Receivers are for collecting external logs.

Key takeaway:

To export the Collector's own logs, configure an appropriate exporter in the logs pipeline without specifying a receiver.
The Collector handles internal logs differently than external sources.

Navirash

Hi @ms4446

I can't put only exporter. I have to put at least one receiver.
Do you have an idea which receiver should I use to export the logs from the opentelemetry Collector please?
Thanks

ms4446

You can try using the filelog receiver to tail a log file that's not expected to receive any data, essentially acting as a placeholder. This allows you to fulfill the requirement of having a receiver in the pipeline without actually processing external log data.

Here's how you can set it up:

Configure the Filelog Receiver:

Set up the filelog receiver to read from a log file that either doesn't exist or is not expected to receive any log entries. This way, the receiver is active but not processing any meaningful data.

Example Configuration:

receivers:
  filelog:
    include: ["/path/to/nonexistent/logfile.log"]

exporters:
  googlecloud:
    project: "YOUR_PROJECT_ID" # other configuration options...

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [googlecloud]

In this configuration, the filelog receiver is included as a formality, and the googlecloud exporter is configured to export the logs.

Collector's Own Logs:

It's important to note that this setup is a workaround and the primary purpose is to export the Collector's own logs. The filelog receiver in this context is just to satisfy the configuration requirement.

Navirash

Ok thank you @ms4446 . Is it possible to write Opentelemetry collector logs in a log file (which we set up in our Kubernetes deployment) and then give this file as a receiver ?

ms4446

Yes, it is possible to configure the OpenTelemetry Collector to write its own logs to a file and then use the filelog receiver to read from that file. This approach involves two main steps:

1. Configure the OpenTelemetry Collector to Write Logs to a File:

In your Kubernetes deployment configuration for the OpenTelemetry Collector, you can set up logging to direct the Collector's logs to a specific file.
This is typically done through the Collector's command-line arguments or environment variables, depending on how logging is configured in the Collector.

2. Use the filelog Receiver to Read the Log File:

Once the Collector's logs are being written to a file, you can use the filelog receiver in the Collector's configuration to read from this log file.
The filelog receiver can be configured to tail the log file, allowing it to process and export the logs as they are written.

Here's an example of how this might look in the OpenTelemetry Collector configuration:

receivers:
  filelog:
    include: ["/path/to/collector/logs.log"]
exporters:
  # Your exporter configuration (e.g., googlecloud, otlp, etc.)
service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [your_exporter]

In this configuration:

The filelog receiver is set to read from the specified log file (/path/to/collector/logs.log).
The logs pipeline is configured to use the filelog receiver and your chosen exporter.

Important Considerations:

Ensure that the log file path in the Kubernetes deployment and the filelog receiver configuration match.
Make sure that the OpenTelemetry Collector has the necessary permissions to write to the log file and that the filelog receiver has permissions to read from it.
Be aware of the potential for increased resource usage, as the Collector will be both writing to and reading from the log file.
This approach is somewhat unconventional, as it involves the Collector processing its own logs. Typically, the Collector's logs are either managed separately or exported directly without being written to a file first.

This setup allows you to use the OpenTelemetry Collector's own logs as a source for the filelog receiver, which can then process and export these logs according to your pipeline configuration.