生产环境5个节点的Zookeeper集群进程异常退出,重启节点后恢复,现在需要找出异常原因。
由于是公司标装电脑,不便贴出原日志,现在用个人电脑大致贴下异常日志。
zk版本:3.5.5
kafka版本:2.2.1
leader节点:
2020-06-15 23:57:00,049 [myid:3] - ERROR [LearnerHandler-/<fqdn zoo6>:35922:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2020-06-15 23:57:00,071 [myid:3] - WARN [LearnerHandler-/<fqdn zoo6>:35922:LearnerHandler@661] - ******* GOODBYE /<fqdn zoo6>:35922 ********
follower节点:
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:47,139 [myid:6] - WARN [RecvWorker:9:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 9, my id = 6, error =
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: java.net.SocketException: Connection timed out (Read failed)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.net.SocketInputStream.socketRead0(Native Method)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.net.SocketInputStream.read(SocketInputStream.java:171)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.net.SocketInputStream.read(SocketInputStream.java:141)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.io.DataInputStream.readInt(DataInputStream.java:387)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1010)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:47,139 [myid:6] - WARN [RecvWorker:9:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:47,140 [myid:6] - WARN [SendWorker:9:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: java.lang.InterruptedException
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: #011at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
Apr 12 16:33:47 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:47,140 [myid:6] - WARN [SendWorker:9:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,373 [myid:6] - INFO [/10.3.50.250:3888:QuorumCnxManager$Listener@743] - Received connection request /10.3.50.66:35984
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,380 [myid:6] - INFO [LearnerHandler-/10.3.50.66:40180:LearnerHandler@346] - Follower sid: 9 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@24db1c2b
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,381 [myid:6] - INFO [LearnerHandler-/10.3.50.66:40180:LearnerHandler@401] - Synchronizing with Follower sid: 9 maxCommittedLog=0x9000134b2 minCommittedLog=0x9000132be peerLastZxid=0x9000126d5
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,381 [myid:6] - WARN [LearnerHandler-/10.3.50.66:40180:LearnerHandler@468] - Unhandled proposal scenario
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,381 [myid:6] - INFO [LearnerHandler-/10.3.50.66:40180:LearnerHandler@475] - Sending SNAP
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,381 [myid:6] - INFO [LearnerHandler-/10.3.50.66:40180:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x9000126d5 zxid of leader is 0x9000134b2sent zxid of db as 0x9000134b2
Apr 12 16:33:58 prod-zk-voter-aza2 zookeeper[1129]: 2019-04-12 16:33:58,519 [myid:6] - INFO [LearnerHandler-/10.3.50.66:40180:LearnerHandler@535] - Received NEWLEADER-ACK message from 9
目前初步诊断为同步异常问题,现在需要找出根因
查看报错源码为
if (sock != null && !sock.isClosed()) {LOG.error("Unexpected exception causing shutdown while sock "+ "still open", e);
连接超时,是否设置了ip防火墙或者端口错误
很明显几个个zookeeper服务器中的一个服务器上的数据损坏造成,需要检查下 / var / lib / zookeeper
需要检查下 / var / lib / zookeeper
这个不好判断!!!
依据仅有的日志,只能判断是网路异常了!
两个日志时间戳对不上
交换机或路由器工作不稳定或工作环境有大的电磁干扰。
有点难