Hello there,
I'm facing issues with deploying new revisions of proxies to runtime. It's stuck at 0% and can run like this forever. I can undeploy current revision, but cannot deploy new and old ones.
It's an on-prem multi-region (two datacenters) setup with region-dedicated environments and environment groups - they are specific to one region only. Cassandra sync is enabled and it does not report any problems. No ApigeeIssues reported. App version 1.11.
Synchronizer access to management plane checked. Service account used by it also verified against required roles.
I have checked environment's synchronizer pod and there are some files gradually being saved to /application/var/tmp after I restart the pod. Plenty of filesystem space available on all nodes. Synchronizer logs repeatedly show block of three SEVERE entries for following classes.
apigee-synchronizer {"level":"SEVERE","thread":"Apigee-Timer-7","mdc":{"action":"SYNC","contextid":"2728","env":"@@REDACTED@@","org":"@@REDACTED@@"},"className":"com.apigee.hybrid.runtime.contract.sync.context.MasterArtifactDownloader","method":"download","severity":"SEVERE","message":"failed to download gs://apigee-@@REDACTED@@ from GCS to /application/var/tmp/artifact_8939489889449048897art. Failing the replication","formattedDate":"2024-04-10T14:39:22.619Z","logger":"MasterArtifactDownloader","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = runtime.contract.sync.DownloadError, message = Error downloading gs://apigee-@@REDACTED@@ cause : Connection reset, associated contexts = []}\n"}
apigee-synchronizer {"level":"SEVERE","thread":"Apigee-Timer-7","mdc":{"action":"SYNC","contextid":"2728","env":"@@REDACTED@@","org":"@@REDACTED@@"},"className":"com.apigee.hybrid.runtime.contract.sync.context.ControlPlaneReplicationContext","method":"download","severity":"SEVERE","message":"Error in downloading uri gs://apigee-@@REDACTED@@ to file /application/var/tmp/artifact_8939489889449048897art","formattedDate":"2024-04-10T14:39:22.619Z","logger":"MasterArtifactDownloader","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = runtime.contract.sync.DownloadError, message = Error downloading gs://apigee-@@REDACTED@@ cause : Connection reset, associated contexts = []}\n"}
apigee-synchronizer {"level":"SEVERE","thread":"Apigee-Timer-7","mdc":{"action":"SYNC","contextid":"2728","env":"@@REDACTED@@","org":"@@REDACTED@@"},"className":"com.apigee.hybrid.runtime.contract.sync.replicators.ControlPlaneToCassandraContractReplicatorImpl","method":"lambda$replicateContract$0","severity":"SEVERE","message":"error in downloading artifact gs://apigee-@@REDACTED@@","formattedDate":"2024-04-10T14:39:22.620Z","logger":"CONTRACT-REPLICATION","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.replication.DownloadException{ code = runtime.contract.sync.DownloadError, message = Error downloading gs://apigee-@@REDACTED@@ cause : Error downloading gs://apigee-@@REDACTED@@ cause : Connection reset, associated contexts = []}\n"}
I checked http connectivity to storage.googleapis.com from synchronizer pod using curl and I get proper response. Telnet to storage.googleapis.com 443 ends in closed by foreign host. Don't know how to check gs:// connectivity directly from pod, tbh.
Might this be a firewall case with TCP being blocked? While I'm waiting for my network team to check that on their end I'm asking for any hints here in parallel. Much obliged.
I came across this same issue with Apigee Hybrid 1.9.4. Have you found the reason for this ?
@mi-sie Hello,
I am facing the same issue. How can I resolve this problem.
Thanks.
Hi,
"Connection reset" in the Synchronzier logs is almost always a firewall or networking issue between your Apigee install and Google Storage.
The most other common cause of Synchronizer failures is when the correct permission have not been given to the Synchronizer Service Account. You can check that the correct permissions have been set per the instructions at: