Apigee Hybrid 1.6 Diagnostic collector not working

rad
Bronze 1
Bronze 1

Hi,

I am new in this community however I've been working with Apigee Hybrid for some time. I have a query about Apigee Hybrid 1.6 and its feature called Diagnostic Collector. I wanted to capture diagnostics for runtime pods but this feature doesn't seem to be working. 

Here's what I've done (following https://cloud.google.com/apigee/docs/hybrid/v1.6/diagnostic-collector)

1. Created Google Cloud Storage Bucket "rad-apigee" in EU region.

2. Created a service account with the Storage Admin role in our project + Downloaded json key file.

3. Configured overrides.yaml for Diagnostic collector as following:

diagnostic:
  # required properties:
  serviceAccountPath: "./service-accounts/apigee-poc-XYZ-apigee-diagnostic.json"
  operation: "LOGGING"
  bucket: "rad-apigee"
  container: "apigee-runtime"
  namespace: "apigee"
  podNames:
  - apigee-runtime-apigee-poc-XYZ-mvp-dev-0c034c1-160-uhaof-94kg9
  - apigee-runtime-apigee-poc-XYZ-mvp-dev-0c034c1-160-uhaof-gq2kz
  # optional properties:
  tcpDumpDetails:
    maxMsgs: 10
    timeoutInSeconds: 100

  threadDumpDetails:
    iterations: 5
    delayInSeconds: 2

  loggingDetails:
    loggerNames:
    - ALL
    logLevel: FINE
    logDuration: 60000

4. Ran Diagnostic collector.

$APIGEECTL_HOME/apigeectl diagnostic -f overrides/overrides.yaml

Parsing file: config/values.yaml
Parsing file: overrides/overrides.yaml

Invoking "kubectl apply" with diagnostic YAML config...

namespace/apigee-diagnostic created
peerauthentication.security.istio.io/apigee-diagnostic created
clusterrole.rbac.authorization.k8s.io/apigee-diagnostic created
serviceaccount/apigee-diagnostic created
clusterrolebinding.rbac.authorization.k8s.io/apigee-diagnostic created
secret/apigee-diagnostic-config created
secret/apigee-diagnostic-svc-account created
job.batch/apigee-diagnostic created

5. Get the pods in the apigee-diagnostic namespace.

kubectl get pods -n apigee-diagnostic

NAME                      READY   STATUS      RESTARTS   AGE
apigee-diagnostic-x7585   0/1     Completed   0          77s

6. Make note of the pod with the name containing diagnostic-collector.

There is no such a pod! (there's only apigee-diagnostic apigee-diagnostic-x7585)

7. There are no logs in the Google Cloud Storage Bucket.

No rows to display

8. Delete the Diagnostic collector.

$APIGEECTL_HOME/apigeectl diagnostic delete -f overrides/overrides.yaml

Parsing file: config/values.yaml
Parsing file: overrides/overrides.yaml

Invoking "kubectl delete" with diagnostic YAML config...

namespace "apigee-diagnostic" deleted
peerauthentication.security.istio.io "apigee-diagnostic" deleted
clusterrole.rbac.authorization.k8s.io "apigee-diagnostic" deleted
serviceaccount "apigee-diagnostic" deleted
clusterrolebinding.rbac.authorization.k8s.io "apigee-diagnostic" deleted
secret "apigee-diagnostic-config" deleted
secret "apigee-diagnostic-svc-account" deleted
job.batch "apigee-diagnostic" deleted

 

Google Cloud Storage Bucket still empty!

I am including output from

kubectl logs -n apigee-diagnostic apigee-diagnostic-x7585

There are some warnings present.

https://drive.google.com/file/d/164n0nnAPbUtVks4RhzLLtFEzJ8FeArGv/view?usp=sharing

I am also including output from

kubectl describe pod -n apigee-diagnostic apigee-diagnostic-x7585

https://drive.google.com/file/d/1xvk7Xx0XcnBV43Z5w6ozpNNBOxL9k8-W/view?usp=sharing

 

Could you please advise on possible root cause of why is this not working for me? Thank you very much in advance for any kind of help!

0 4 172
4 REPLIES 4

Let me see if I can find someone to help....

Hi rad, 

I see that the operation is defined like below. 

operation: "LOGGING"

 

When the operation is LOGGING, nothing will uploaded to GCS. Only the log levels are changed and logs are directly transported to the Logs Viewer in GCP control plane or you can check it locally if logger is disabled. 

 

If you want to what data is getting uploaded please change the operation to ALL. This will collect all the required and upload the data to GCS.

rad
Bronze 1
Bronze 1

Hi pbhagwat,

Thank you for your advice!
I've tried it exactly as you recommended but there are still no data in Google Cloud Storage Bucket. After activating the Diagnostic Collector I have checked all the pods and noticed following:

NAMESPACE           NAME                                                              READY   STATUS         RESTARTS   AGE
apigee-diagnostic   apigee-diagnostic-6sgrk                                           1/1     Running        0          11s
apigee              diagnose-aks-apigeertime-21752489-vmss000001                      0/1     ErrImagePull   0          5s
apigee              diagnose-aks-apigeertime-21752489-vmss000002                      0/1     ErrImagePull   0          5s

Status of diagnose-aks-apigeertime pods is "ErrImagePull". I further executed kubectl describe pod -n apigee diagnose-aks-apigeertime-21752489-vmss000001, this is the Events part:

Events:
  Type     Reason   Age                From     Message
  ----     ------   ----               ----     -------
  Normal   BackOff  26s                kubelet  Back-off pulling image "us.gcr.io/google.com/edge-ci/base/edge-hybrid:diagnostics-datacollector_871f8e4"
  Warning  Failed   26s                kubelet  Error: ImagePullBackOff
  Normal   Pulling  14s (x2 over 29s)  kubelet  Pulling image "us.gcr.io/google.com/edge-ci/base/edge-hybrid:diagnostics-datacollector_871f8e4"
  Warning  Failed   12s (x2 over 26s)  kubelet  Failed to pull image "us.gcr.io/google.com/edge-ci/base/edge-hybrid:diagnostics-datacollector_871f8e4": rpc error: code = Unknown desc = failed to pull and unpack image "us.gcr.io/google.com/edge-ci/base/edge-hybrid:diagnostics-datacollector_871f8e4": failed to resolve reference "us.gcr.io/google.com/edge-ci/base/edge-hybrid:diagnostics-datacollector_871f8e4": unexpected status code [manifests diagnostics-datacollector_871f8e4]: 401 Unauthorized
  Warning  Failed   12s (x2 over 26s)  kubelet  Error: ErrImagePull

Any idea about this?  Thank you!

rad
Bronze 1
Bronze 1

Hi Folks,

Any update on this?

Thank you!