Zookeeper 4 Letter Word Netcat No Response
To report Zookeeper service health, we use the 4 letter word (4LW) command
and check that response is imok
is the channel via which the message is
After upgrading the operating system from Ubuntu 16.04
to 18.04
, we see that
intermittently there is no response. Hmm…
Is Zookeeper getting those commands?#
A simple bash script can be used to verify this:
for i in {1..10}
status=$(echo "ruok" | nc localhost 2181)
echo -e "try ${i}: ${status}"
Output looks suspect:
root@zk-node:~# ./check_zk.sh
try 1: imok
try 2: imok
try 3: imok
try 4: imok
try 5: imok
try 6:
try 7: imok
try 8:
try 9: imok
try 10: imok
But Zookeeper service has no issues processing the requests:
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,054 [myid:] - INFO [NIOWorkerThread-31:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,091 [myid:] - INFO [NIOWorkerThread-22:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,110 [myid:] - INFO [NIOWorkerThread-20:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,125 [myid:] - INFO [NIOWorkerThread-4:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,139 [myid:] - INFO [NIOWorkerThread-3:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,153 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,166 [myid:] - INFO [NIOWorkerThread-23:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,179 [myid:] - INFO [NIOWorkerThread-11:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,190 [myid:] - INFO [NIOWorkerThread-9:NIOServerCnxn@507] - Processing ruok command from /
Jul 24 22:21:07 zk-node zookeeper[23515]: 2020-07-24 22:21:07,201 [myid:] - INFO [NIOWorkerThread-5:NIOServerCnxn@507] - Processing ruok command from /
No errors or warnings or exceptions. Looks like it is responding just fine.
4LW commands do not work consistently#
Next, we tried looking for various causes that result in 4LW commands silently failing. Bumped into this known Zookeeper issue ZOOKEEPER-737: some 4 letter words may fail with netcat (nc) 1.
Now we did not observe any exceptions in ZK logs - probably because we’re
using a later version of ZK - v3.6.1
. But it does seem like a plausible
root cause.
A StackOverflow post 2 points to the same with a fix to add delay in netcat before closing the channel.
Add delay to netcat before closing the channel#
Based on the StackOverflow post 2, we added the -q 1
which adds a 1 second delay after sending the message and before it closes
the channel.
# status=$(echo "ruok" | nc localhost 2181 -q 1)
root@zk-node:~# ./check_zk.sh
try 1: imok
try 2: imok
try 3: imok
try 4: imok
try 5: imok
try 6: imok
try 7: imok
try 8: imok
try 9: imok
try 10: imok
Consistency at last!
The goal of any platform - it works or it doesn’t; either is fine, as long as it does that consistently.