Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Java program hangs by logger write to STDOUT inside GKE pod

Hi all,

I'm facing a bug using the logback on GCP.
we have 30 number java programs and which are same version code running inside the GKE pod, it is been observed the case around 12 number JVM hangs becomes unresponsive at the same time , but doesn't meet this case every day. 😭

Thread dump analysis confirm them hangs with console logging.

Environment details below

Kernel version: 5.10.162+
OS image: Container-Optimized OS from Google
Container runtime version: docker://20.10.6
kubelet version: v1.22.17-gke.3100
kube-proxy version: v1.22.17-gke.3100

Please let me know if u have idea , thanks!! 😋

Full thread dump :

2023-05-24 12:28:36
Full thread dump OpenJDK 64-Bit Server VM (11.0.16+8-LTS mixed mode, sharing):

Threads class SMR info:
_java_thread_list=0x00007f67585df830, length=21, elements={
0x00007f67b026a000, 0x00007f67b026c000, 0x00007f67b0271800, 0x00007f67b0273800,
0x00007f67b0275800, 0x00007f67b0277800, 0x00007f67b02ae000, 0x00007f67b0771800,
0x00007f67b098c800, 0x00007f67b0990000, 0x00007f67b0a7c800, 0x00007f67b0a95000,
0x00007f67b0a99800, 0x00007f67b0b02800, 0x00007f67b0b04800, 0x00007f67b08c4000,
0x00007f67b08c5800, 0x00007f67b0604000, 0x00007f67b0029000, 0x00007f671c006800,
0x00007f6720006800
}

"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=1.64ms elapsed=19580.63s tid=0x00007f67b026a000 nid=0x8d4 waiting on condition [0x00007f6794e14000]
 java.lang.Thread.State: RUNNABLE
 at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.16/Native Method)
 at java.lang.ref.Reference.processPendingReferences(java.base@11.0.16/Unknown Source)
 at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.16/Unknown Source)

"Finalizer" #3 daemon prio=8 os_prio=0 cpu=1.14ms elapsed=19580.63s tid=0x00007f67b026c000 nid=0x8d5 in Object.wait() [0x00007f6794d13000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <0x0000000680039b08> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(java.base@11.0.16/Unknown Source)
 - waiting to re-lock in wait() <0x0000000680039b08> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(java.base@11.0.16/Unknown Source)
 at java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.16/Unknown Source)

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.36ms elapsed=19580.63s tid=0x00007f67b0271800 nid=0x8d9 waiting on condition [0x0000000000000000]
 java.lang.Thread.State: RUNNABLE

"Service Thread" #5 daemon prio=9 os_prio=0 cpu=0.11ms elapsed=19580.63s tid=0x00007f67b0273800 nid=0x8db runnable [0x0000000000000000]
 java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #6 daemon prio=9 os_prio=0 cpu=13231.58ms elapsed=19580.63s tid=0x00007f67b0275800 nid=0x8dc waiting on condition [0x0000000000000000]
 java.lang.Thread.State: RUNNABLE
 No compile task

"Sweeper thread" #8 daemon prio=9 os_prio=0 cpu=1.89ms elapsed=19580.63s tid=0x00007f67b0277800 nid=0x8dd runnable [0x0000000000000000]
 java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #9 daemon prio=8 os_prio=0 cpu=18.79ms elapsed=19580.60s tid=0x00007f67b02ae000 nid=0x8e0 in Object.wait() [0x00007f679470c000]
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <no object reference available>
 at java.lang.ref.ReferenceQueue.remove(java.base@11.0.16/Unknown Source)
 - waiting to re-lock in wait() <0x0000000680039cc0> (a java.lang.ref.ReferenceQueue$Lock)
 at jdk.internal.ref.CleanerImpl.run(java.base@11.0.16/Unknown Source)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)
 at jdk.internal.misc.InnocuousThread.run(java.base@11.0.16/Unknown Source)

"TCPTransmitterWatchDog-127.0.0.1:10204" #11 daemon prio=5 os_prio=0 cpu=1378.51ms elapsed=19579.41s tid=0x00007f67b0771800 nid=0x8eb in Object.wait() [0x00007
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <no object reference available>
 at java.util.TimerThread.mainLoop(java.base@11.0.16/Unknown Source)
 - waiting to re-lock in wait() <0x000000068003ac10> (a java.util.TaskQueue)
 at java.util.TimerThread.run(java.base@11.0.16/Unknown Source)

"FileDataLogAppenderLogFileRotate" #12 daemon prio=5 os_prio=0 cpu=103.55ms elapsed=19578.97s tid=0x00007f67b098c800 nid=0x8ee in Object.wait() [0x00007f679410
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <no object reference available>
 at java.util.TimerThread.mainLoop(java.base@11.0.16/Unknown Source)
 - waiting to re-lock in wait() <0x0000000680039e88> (a java.util.TaskQueue)
 at java.util.TimerThread.run(java.base@11.0.16/Unknown Source)

"FixedRateTimerSchedulerThread" #13 daemon prio=5 os_prio=0 cpu=2315.76ms elapsed=19578.97s tid=0x00007f67b0990000 nid=0x8ef in Object.wait() [0x00007f674fb1e0
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <no object reference available>
 at java.util.TimerThread.mainLoop(java.base@11.0.16/Unknown Source)
 - waiting to re-lock in wait() <0x000000068003acd8> (a java.util.TaskQueue)
 at java.util.TimerThread.run(java.base@11.0.16/Unknown Source)

"kafka-producer-network-thread | producer-1" #14 daemon prio=5 os_prio=0 cpu=570.32ms elapsed=19578.66s tid=0x00007f67b0a7c800 nid=0x8f2 runnable [0x00007f674f
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPoll.wait(java.base@11.0.16/Native Method)
 at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.16/Unknown Source)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.16/Unknown Source)
 - locked <0x000000068003b5d8> (a sun.nio.ch.Util$2)
 - locked <0x000000068003b580> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(java.base@11.0.16/Unknown Source)
 at org.apache.kafka.common.network.Selector.select(Selector.java:874)
 at org.apache.kafka.common.network.Selector.poll(Selector.java:465)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560)
 at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

"AsyncKafkaTransmitter-CMCClientAppEvtMsgTransmitter" #15 prio=5 os_prio=0 cpu=25.78ms elapsed=19578.65s tid=0x00007f67b0a95000 nid=0x8f4 waiting on condition 
 java.lang.Thread.State: WAITING (parking)
 at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
 - parking to wait for <0x000000068003a060> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(java.base@11.0.16/Unknown Source)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.16/Unknown Source)
 at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.16/Unknown Source)
 at com.xxxx.nxgen.cmc.comm.AsyncKafkaTransmitter.run(AsyncKafkaTransmitter.java:77)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

"kafka-producer-network-thread | producer-2" #16 daemon prio=5 os_prio=0 cpu=3820.23ms elapsed=19578.63s tid=0x00007f67b0a99800 nid=0x8f5 runnable [0x00007f674
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPoll.wait(java.base@11.0.16/Native Method)
 at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.16/Unknown Source)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.16/Unknown Source)
 - locked <0x000000068003a2f0> (a sun.nio.ch.Util$2)
 - locked <0x000000068003a298> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(java.base@11.0.16/Unknown Source)
 at org.apache.kafka.common.network.Selector.select(Selector.java:874)
 at org.apache.kafka.common.network.Selector.poll(Selector.java:465)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560)
 at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

" TCPRecvLst-127.0.0.1:12704" #18 prio=5 os_prio=0 cpu=0.30ms elapsed=19578.51s tid=0x00007f67b0b02800 nid=0x8f8 runnable [0x00007f674f31a000]
 java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(java.base@11.0.16/Native Method)
 at java.net.AbstractPlainSocketImpl.accept(java.base@11.0.16/Unknown Source)
 at java.net.ServerSocket.implAccept(java.base@11.0.16/Unknown Source)
 at java.net.ServerSocket.accept(java.base@11.0.16/Unknown Source)
 at com.xxxx.nxgen.fbidx.common.comm.TCPReceiver.run(TCPReceiver.java:359)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

"FixedRateTimerSchedulerThread" #19 daemon prio=5 os_prio=0 cpu=1620.39ms elapsed=19578.51s tid=0x00007f67b0b04800 nid=0x8f9 in Object.wait() [0x00007f674f2190
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <no object reference available>
 at java.util.TimerThread.mainLoop(java.base@11.0.16/Unknown Source)
 - waiting to re-lock in wait() <0x000000068003b970> (a java.util.TaskQueue)
 at java.util.TimerThread.run(java.base@11.0.16/Unknown Source)

"KafkaReceiverListen-kafka.confluent.svc.cluster.local:9071" #17 prio=5 os_prio=0 cpu=234902.05ms elapsed=19572.89s tid=0x00007f67b08c4000 nid=0x90e runnable [
 java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPoll.wait(java.base@11.0.16/Native Method)
 at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.16/Unknown Source)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.16/Unknown Source)
 - locked <0x00000006800910c0> (a sun.nio.ch.Util$2)
 - locked <0x0000000680090f60> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(java.base@11.0.16/Unknown Source)
 at org.apache.kafka.common.network.Selector.select(Selector.java:874)
 at org.apache.kafka.common.network.Selector.poll(Selector.java:465)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560)
 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:280)
 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251)
 at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1307)
 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1243)
 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
 at com.xxxx.nxgen.fbidx.common.comm.KafkaReceiver.run(KafkaReceiver.java:107)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

"Calc-10068-BB-ZNA" #21 daemon prio=5 os_prio=0 cpu=8201.35ms elapsed=19572.87s tid=0x00007f67b08c5800 nid=0x90f runnable [0x00007f674e614000]
 java.lang.Thread.State: RUNNABLE
 at java.io.FileOutputStream.writeBytes(java.base@11.0.16/Native Method)
 at java.io.FileOutputStream.write(java.base@11.0.16/Unknown Source)
 at java.io.BufferedOutputStream.flushBuffer(java.base@11.0.16/Unknown Source)
 at java.io.BufferedOutputStream.flush(java.base@11.0.16/Unknown Source)
 - locked <0x000000068002e4f8> (a java.io.BufferedOutputStream)
 at java.io.PrintStream.write(java.base@11.0.16/Unknown Source)
 - locked <0x000000068002e4d0> (a java.io.PrintStream)
 at java.io.FilterOutputStream.write(java.base@11.0.16/Unknown Source)
 at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
 at ch.qos.logback.core.OutputStreamAppender.writeBytes(OutputStreamAppender.java:199)
 at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:231)
 at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
 at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
 at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
 at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
 at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
 at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
 at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
 at ch.qos.logback.classic.Logger.info(Logger.java:579)
 at com.xxxx.nxgen.fbidx.formula.module.mvBasedPriceIndex.BISMVBasedPricePerformCalculation.execute(BISMVBasedPricePerformCalculation.java:70)
 at jdk.internal.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
 at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.16/Unknown Source)
 at java.lang.reflect.Method.invoke(java.base@11.0.16/Unknown Source)
 at com.xxxx.nxgen.fbidx.formula.module.impl.proxy.PerformCalculationProxy.invoke(PerformCalculationProxy.java:32)
 at com.sun.proxy.$Proxy15.execute(Unknown Source)
 at com.xxxx.nxgen.fbidx.formula.formulaGroup.BISBaseFormulaGroup.executeSequence(BISBaseFormulaGroup.java:44)
 at com.xxxx.nxgen.fbidx.calcengine.FbidxCalcEngineApps$FbidxCalcTask.run(FbidxCalcEngineApps.java:238)
 at com.xxxx.nxgen.fbidx.common.scheduler.FixedRateTimerScheduler$TimerSchedulerTimerTask.run(FixedRateTimerScheduler.java:48)
 - locked <0x000000068042f208> (a java.util.ArrayList)
 at java.util.TimerThread.mainLoop(java.base@11.0.16/Unknown Source)
 at java.util.TimerThread.run(java.base@11.0.16/Unknown Source)

"Cmd-10068-BB TCPRecvLst-127.0.0.1:12404" #22 prio=5 os_prio=0 cpu=0.19ms elapsed=19572.86s tid=0x00007f67b0604000 nid=0x910 runnable [0x00007f674e513000]
 java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(java.base@11.0.16/Native Method)
 at java.net.AbstractPlainSocketImpl.accept(java.base@11.0.16/Unknown Source)
 at java.net.ServerSocket.implAccept(java.base@11.0.16/Unknown Source)
 at java.net.ServerSocket.accept(java.base@11.0.16/Unknown Source)
 at com.xxxx.nxgen.fbidx.common.comm.TCPReceiver.run(TCPReceiver.java:359)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

"DestroyJavaVM" #23 prio=5 os_prio=0 cpu=6460.59ms elapsed=19572.86s tid=0x00007f67b0029000 nid=0x8c5 waiting on condition [0x0000000000000000]
 java.lang.Thread.State: RUNNABLE

"kafka-coordinator-heartbeat-thread | gke-bis-calceng-85bb794676-lld62_10068_BB" #24 daemon prio=5 os_prio=0 cpu=18591.02ms elapsed=19572.84s tid=0x00007f671c0
 java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(java.base@11.0.16/Native Method)
 - waiting on <no object reference available>
 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1481)
 - waiting to re-lock in wait() <0x000000068007a8c0> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

"ConnectionMonitorThread" #25 daemon prio=5 os_prio=0 cpu=0.24ms elapsed=19516.67s tid=0x00007f6720006800 nid=0xa35 runnable [0x00007f674f118000]
 java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(java.base@11.0.16/Native Method)
 at java.net.SocketInputStream.socketRead(java.base@11.0.16/Unknown Source)
 at java.net.SocketInputStream.read(java.base@11.0.16/Unknown Source)
 at java.net.SocketInputStream.read(java.base@11.0.16/Unknown Source)
 at java.io.BufferedInputStream.fill(java.base@11.0.16/Unknown Source)
 at java.io.BufferedInputStream.read1(java.base@11.0.16/Unknown Source)
 at java.io.BufferedInputStream.read(java.base@11.0.16/Unknown Source)
 - locked <0x000000068a1827d0> (a java.io.BufferedInputStream)
 at java.io.FilterInputStream.read(java.base@11.0.16/Unknown Source)
 at com.xxxx.nxgen.fbidx.common.comm.TCPTransmitter$ConnectionMonitorTask.run(TCPTransmitter.java:70)
 at java.lang.Thread.run(java.base@11.0.16/Unknown Source)

"VM Thread" os_prio=0 cpu=930.86ms elapsed=19580.63s tid=0x00007f67b0267800 nid=0x8d1 runnable 

"GC Thread#0" os_prio=0 cpu=677.21ms elapsed=19580.65s tid=0x00007f67b0040000 nid=0x8c9 runnable 

"GC Thread#1" os_prio=0 cpu=677.88ms elapsed=19579.78s tid=0x00007f6784001000 nid=0x8e7 runnable 

"GC Thread#2" os_prio=0 cpu=672.08ms elapsed=19579.78s tid=0x00007f6784002800 nid=0x8e8 runnable 

"GC Thread#3" os_prio=0 cpu=681.99ms elapsed=19579.78s tid=0x00007f6784004000 nid=0x8e9 runnable 

"G1 Main Marker" os_prio=0 cpu=0.69ms elapsed=19580.65s tid=0x00007f67b009e000 nid=0x8ca runnable 

"G1 Conc#0" os_prio=0 cpu=10.10ms elapsed=19580.65s tid=0x00007f67b00a0000 nid=0x8cb runnable 

"G1 Refine#0" os_prio=0 cpu=1.88ms elapsed=19580.64s tid=0x00007f67b021f000 nid=0x8ce runnable 

"G1 Refine#1" os_prio=0 cpu=0.90ms elapsed=19577.28s tid=0x00007f678c001000 nid=0x900 runnable 

"G1 Young RemSet Sampling" os_prio=0 cpu=9701.45ms elapsed=19580.64s tid=0x00007f67b0220800 nid=0x8cf runnable 
"VM Periodic Task Thread" os_prio=0 cpu=10235.17ms elapsed=19580.61s tid=0x00007f67b02a3000 nid=0x8de waiting on condition 

JNI global refs: 16, weak refs: 0

Heap
 garbage-first heap total 672768K, used 220221K [0x0000000680000000, 0x0000000800000000)
 region size 1024K, 51 young (52224K), 1 survivors (1024K)
 Metaspace used 28605K, capacity 29747K, committed 29952K, reserved 1075200K
 class space used 3331K, capacity 3742K, committed 3840K, reserved 1048576K

 

0 2 985
2 REPLIES 2

I am experiencing pretty much the same on my local machine (OSX) in Docker. Were you able to get past this?

I don't see any hung thread that is also writing. The writer thread

Calc-10068-BB-ZNA

is in a Runnable state, which means it's about to run or is already running.

You might get better insights if you take multiple thread dumps at 5 second intervals and analyze using something like https://fastthread.io/

Top Labels in this Space