Hey all,
After installing the Ops Agent, the google-cloud-ops-agent-opentelemetry-collector.service failed to start. The logs after running `sudo journalctl -u google-cloud-ops-agent-opentelemetry-collector.service`:
...
Jun 02 14:33:45 pbx1 systemd[1]: Started Google Cloud Ops Agent - Metrics Agent.
Jun 02 14:33:45 pbx1 otelopscol[4714]: 2023-06-02T14:33:45.625Z info service/telemetry.go:90 Setting up own telemetry...
Jun 02 14:33:45 pbx1 otelopscol[4714]: 2023-06-02T14:33:45.627Z info service/telememetry.go:116 Serving Prometheus metrics {"address": "0.0.0.0:20201", "level": "Basic"}
Jun 02 14:33:45 pbx1 otelopscol[4714]: Error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
Jun 02 14:33:45 pbx1 otelopscol[4714]: 2023/06/02 14:33:45 application run finished with error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Main process exited, code=exited, status=1/FAILURE
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Failed with result 'exit-code'.
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Scheduled restart job, restart counter is at 5.
Jun 02 14:33:45 pbx1 systemd[1]: Stopped Google Cloud Ops Agent - Metrics Agent.
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Start request repeated too quickly.
Jun 02 14:33:45 pbx1 systemd[1]: google-cloud-ops-agent-opentelemetry-collector.service: Failed with result 'exit-code'.
Jun 02 14:33:45 pbx1 systemd[1]: Failed to start Google Cloud Ops Agent - Metrics Agent.
I tried everything in the documentation without luck.
I think it stopped working after I did `gcloud auth application-default login`, but I'm not sure.
Any idea how to fix this?
Solved! Go to Solution.
Hey, yes I fixed with a patch.
Modify the file:
/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service
With:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=Google Cloud Ops Agent - Metrics Agent
PartOf=google-cloud-ops-agent.service
Requires=network-online.target
After=network-online.target google-cloud-ops-agent.service
[Service]
RuntimeDirectory=google-cloud-ops-agent-opentelemetry-collector
StateDirectory=google-cloud-ops-agent/opentelemetry-collector
LogsDirectory=google-cloud-ops-agent/subagents
Type=simple
ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY}
ExecStart=/bin/bash -c '/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml || awk '\''/googlecloud:/ {print; print " project: YOUR_PROJECT_ID"; next }1'\'' /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml > /run/google-cloud-ops-agent-opentelemetry-collector/temp && mv /run/google-cloud-ops-agent-opentelemetry-collector/temp /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml && /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml'
Restart=always
# For debugging:
RuntimeDirectoryPreserve=yes
Only modify the YOUR_PROJECT_ID in ExecStart
Hello @tal952,
Welcome to Google Cloud Community!
There might be an invalid configuration on your Ops agent. Can you please post the exact result after running this? So we can see the exact error message.
sudo journalctl -xe | grep "google_cloud_ops_agent_engine"
Thanks!
Hey, I missed your message, sorry.
I Semi solved the issue, but not really.
Here is the status that I'm getting:
sudo systemctl status google-cloud-ops-agent-opentelemetry-collector.service
● google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static)
Active: failed (Result: exit-code) since Fri 2023-06-30 19:19:47 UTC; 5s ago
Process: 136480 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
Process: 136488 ExecStart=/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=${RUNTIME_DIRECTORY}/otel.yaml (code=exited, status=1/FAILURE)
Main PID: 136488 (code=exited, status=1/FAILURE)
CPU: 319ms
As we can see, the ExecStartPre runs ok, but the ExecStart fails.
When I run the command of ExecStart alone, I'm getting:
/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml
2023-06-30T19:24:37.111Z info service/telemetry.go:90 Setting up own telemetry...
2023-06-30T19:24:37.111Z info service/telemetry.go:116 Serving Prometheus metrics {"address": "0.0.0.0:20201", "level": "Basic"}
Error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
2023/06/30 19:24:37 application run finished with error: failed to build pipelines: failed to create "googlecloud" exporter for data type "metrics": no project set in config, or found with application default credentials
For some unknown reason, we can't know what is the project.
when I add the project to the export, it starts working:
# Modifing /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml
---------------- From --------------------
exporters:
googlecloud:
metric:
instrumentation_library_labels: false
prefix: ""
resource_filters: []
service_resource_labels: false
skip_create_descriptor: true
retry_on_failure:
enabled: false
user_agent: Google-Cloud-Ops-Agent-Metrics/2.33.0 (BuildDistro=bullseye;Platform=linux;ShortName=debian;ShortVersion=11.7)
----------------- To ---------------------
exporters:
googlecloud:
project: my-project
metric:
instrumentation_library_labels: false
prefix: ""
resource_filters: []
service_resource_labels: false
skip_create_descriptor: true
retry_on_failure:
enabled: false
user_agent: Google-Cloud-Ops-Agent-Metrics/2.33.0 (BuildDistro=bullseye;Platform=linux;ShortName=debian;ShortVersion=11.7)
You can see that I explicitly defined the project.
The problem now, is that the ExecStartPre regenerate this file, so I need to comment it out and then it works. Any Idea why we don't recognize the project?
Hello, have you fixed it yet? I also experienced the same thing, maybe you can share if you have fixed it.
Thank you
Hey, yes I fixed with a patch.
Modify the file:
/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service
With:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=Google Cloud Ops Agent - Metrics Agent
PartOf=google-cloud-ops-agent.service
Requires=network-online.target
After=network-online.target google-cloud-ops-agent.service
[Service]
RuntimeDirectory=google-cloud-ops-agent-opentelemetry-collector
StateDirectory=google-cloud-ops-agent/opentelemetry-collector
LogsDirectory=google-cloud-ops-agent/subagents
Type=simple
ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY}
ExecStart=/bin/bash -c '/opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml || awk '\''/googlecloud:/ {print; print " project: YOUR_PROJECT_ID"; next }1'\'' /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml > /run/google-cloud-ops-agent-opentelemetry-collector/temp && mv /run/google-cloud-ops-agent-opentelemetry-collector/temp /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml && /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml'
Restart=always
# For debugging:
RuntimeDirectoryPreserve=yes
Only modify the YOUR_PROJECT_ID in ExecStart
hey could you explain why Type=simple needed for collector?