Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Ops Agent, google-cloud-ops-agent-opentelemetry-collector.service failed to start

Hey all,

After installing the Ops Agent, the google-cloud-ops-agent-opentelemetry-collector.service failed to start. The logs after running `sudo journalctl -u google-cloud-ops-agent-opentelemetry-collector.service`:

...
Jun 02 14:33:45 pbx1 systemd[1]: Started Google Cloud Ops Agent - Metrics Agent.
Jun 02 14:33:45 pbx1 otelopscol[4714]: 2023-06-02T14:33:45.625Z        info        service/telemetry.go:90        Setting up own telemetry...
Jun 02 14:33:45 pbx1 otelopscol[4714]: 2023-06-02T14:33:45.627Z        info        service/telememetry.go:116        Serving Prometheus metrics        {"address": "0.0.0.0:20201", "level": "Basic"}
Jun 02 14:33:45 pbx1 otelopscol[4714]: Error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
Jun 02 14:33:45 pbx1 otelopscol[4714]: 2023/06/02 14:33:45 application run finished with error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Main process exited, code=exited, status=1/FAILURE
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Failed with result 'exit-code'.
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Scheduled restart job, restart counter is at 5.
Jun 02 14:33:45 pbx1 systemd[1]: Stopped Google Cloud Ops Agent - Metrics Agent.
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Start request repeated too quickly.
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Failed with result 'exit-code'.
Jun 02 14:33:45 pbx1 systemd[1]: Failed to start Google Cloud Ops Agent - Metrics Agent.

I tried everything in the documentation without luck.

I think it stopped working after I did `gcloud auth application-default login`,  but I'm not sure.

Any idea how to fix this?

 

Solved Solved
0 5 2,916
1 ACCEPTED SOLUTION

Hey, yes I fixed with a patch.

Modify the file: 

/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service

With:

# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=Google Cloud Ops Agent - Metrics Agent
PartOf=google-cloud-ops-agent.service
Requires=network-online.target
After=network-online.target google-cloud-ops-agent.service

[Service]
RuntimeDirectory=google-cloud-ops-agent-opentelemetry-collector
StateDirectory=google-cloud-ops-agent/opentelemetry-collector
LogsDirectory=google-cloud-ops-agent/subagents
Type=simple
ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY}
ExecStart=/bin/bash -c '/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml || awk '\''/googlecloud:/ {print; print "    project: YOUR_PROJECT_ID"; next }1'\'' /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml > /run/google-cloud-ops-agent-opentelemetry-collector/temp && mv /run/google-cloud-ops-agent-opentelemetry-collector/temp /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml && /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml'
Restart=always
# For debugging:
RuntimeDirectoryPreserve=yes

 Only modify the YOUR_PROJECT_ID in ExecStart

View solution in original post

5 REPLIES 5

Hello @tal952,

Welcome to Google Cloud Community!

There might be an invalid configuration on your Ops agent. Can you please post the exact result after running this? So we can see the exact error message.

sudo journalctl -xe | grep "google_cloud_ops_agent_engine"

Thanks!

Hey, I missed your message, sorry.
I Semi solved the issue, but not really.

Here is the status that I'm getting:

 

 

sudo systemctl status google-cloud-ops-agent-opentelemetry-collector.service

● google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
     Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static)
     Active: failed (Result: exit-code) since Fri 2023-06-30 19:19:47 UTC; 5s ago
    Process: 136480 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
    Process: 136488 ExecStart=/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=${RUNTIME_DIRECTORY}/otel.yaml (code=exited, status=1/FAILURE)
   Main PID: 136488 (code=exited, status=1/FAILURE)
        CPU: 319ms

 

 

 As we can see, the ExecStartPre runs ok, but the ExecStart fails.
When I run the command of ExecStart alone, I'm getting:

 

/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml

2023-06-30T19:24:37.111Z        info    service/telemetry.go:90 Setting up own telemetry...
2023-06-30T19:24:37.111Z        info    service/telemetry.go:116        Serving Prometheus metrics      {"address": "0.0.0.0:20201", "level": "Basic"}
Error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
2023/06/30 19:24:37 application run finished with error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials

 

For some unknown reason, we can't know what is the project.
when I add the project to the export, it starts working:

 

# Modifing /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml

---------------- From --------------------

exporters:
  googlecloud:
    metric:
      instrumentation_library_labels: false
      prefix: ""
      resource_filters: []
      service_resource_labels: false
      skip_create_descriptor: true
    retry_on_failure:
      enabled: false
    user_agent: Google-Cloud-Ops-Agent-Metrics/2.33.0 (BuildDistro=bullseye;Platform=linux;ShortName=debian;ShortVersion=11.7)

----------------- To ---------------------

exporters:
  googlecloud:
    project: my-project
    metric:
      instrumentation_library_labels: false
      prefix: ""
      resource_filters: []
      service_resource_labels: false
      skip_create_descriptor: true
    retry_on_failure:
      enabled: false
    user_agent: Google-Cloud-Ops-Agent-Metrics/2.33.0 (BuildDistro=bullseye;Platform=linux;ShortName=debian;ShortVersion=11.7)

 

You can see that I explicitly defined the project.
The problem now, is that the ExecStartPre regenerate this file, so I need to comment it out and then it works. Any Idea why we don't recognize the project?

Hello, have you fixed it yet? I also experienced the same thing, maybe you can share if you have fixed it.

Thank you

Hey, yes I fixed with a patch.

Modify the file: 

/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service

With:

# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=Google Cloud Ops Agent - Metrics Agent
PartOf=google-cloud-ops-agent.service
Requires=network-online.target
After=network-online.target google-cloud-ops-agent.service

[Service]
RuntimeDirectory=google-cloud-ops-agent-opentelemetry-collector
StateDirectory=google-cloud-ops-agent/opentelemetry-collector
LogsDirectory=google-cloud-ops-agent/subagents
Type=simple
ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY}
ExecStart=/bin/bash -c '/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml || awk '\''/googlecloud:/ {print; print "    project: YOUR_PROJECT_ID"; next }1'\'' /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml > /run/google-cloud-ops-agent-opentelemetry-collector/temp && mv /run/google-cloud-ops-agent-opentelemetry-collector/temp /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml && /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml'
Restart=always
# For debugging:
RuntimeDirectoryPreserve=yes

 Only modify the YOUR_PROJECT_ID in ExecStart

hey could you explain why Type=simple needed for collector?

Top Labels in this Space
Top Solution Authors