cv BADASSOC failures
Eric S. Raymond
esr at thyrsus.com
Thu Dec 15 10:23:43 UTC 2016
Gary E. Miller <gem at rellim.com>:
> The fact that some ntpq associations was broken for
> long jsut demonstrates the problems with ntpq that ntpmon can
> solve.
They're not broken, and weren't. You've tripped over a problem deeper
than the client code.
I've been meaning to brief you and the rest of the team on what I
found out about this. That apparent endianness issue turned out to be
a red herring - I was looking at the wrong variable.
The real problem is more fundamental than that, and stems from a
design error in the Mode 6 control protocol.
Before ntpq or ntpmon makes a peer listing, it first has to get a list
of peer association IDs from the server. This is what it summarizes in the
'assoc' listing, if you ask for one.
ntpq> assoc
ind assid status conf reach auth condition last_event cnt
===========================================================
1 32103 911a yes yes none falsetick sys_peer 1
2 32104 801b yes no none reject clock_alarm 1
3 32105 8811 yes none none reject mobilize 1
4 32106 16fa no yes none sys.peer sys_peer 15
5 32107 1324 no yes none outlier reachable 2
6 32113 14c4 no yes none candidate reachable 12
7 32115 13c4 no yes none outlier reachable 12
8 32119 13b4 no yes none outlier reachable 11
9 32123 14fa no yes none candidate sys_peer 15
10 32124 13e4 no yes none outlier reachable 14
The hex in column two is the peer status word. The remaining
fields in each row unpack that word.
There are two kinds of peers, servers and clocks. Either kind can be
queried for peer status variables
ntpq> peers
remote refid st t when poll reach delay offset jitter
=================================================================================
xSHM(0) .GPS. 0 l 42 64 377 0.000 -497.705 43.046
SHM(1) .PPS. 0 l - 64 0 0.000 0.000 0.000
us.pool.ntp.org .POOL. 16 p - 64 0 0.000 0.000 0.004
*ntp.your.org .CDMA. 1 u 266 1024 317 29.357 0.036 0.593
-pbx.cytranet.net 204.9.54.119 2 u 527 1024 377 71.206 -2.568 0.327
+x.ns.gin.ntt.net 249.224.99.213 2 u 891 1024 377 8.150 -1.910 1.605
-209-133-217-165.st 128.227.205.3 2 u 1056 1024 347 39.207 -4.078 0.486
-lithium.constant.c 18.26.4.105 2 u 583 1024 377 13.387 -2.687 0.480
+level1f.cs.unc.edu .PPS. 1 u 646 1024 377 30.786 -2.512 0.626
-clockb.ntpjs.org 132.163.4.101 2 u 528 1024 277 31.241 -1.530 4.763
In the above list, &2 (32104) is a clock. It can be queried in two ways:
ntpq> rv &2
status=801b conf, sel_reject, 1 event, clock_event,
srcadr=127.127.28.1, srcport=123, srchost="SHM(1)", dstadr=127.0.0.1,
dstport=123, leap=11, stratum=0, precision=-30, rootdelay=0.0, rootdisp=0.0,
refid=PPS, reftime=00000000.00000000 2036-02-07T01:28:16.000,
rec=00000000.00000000 2036-02-07T01:28:16.000, reach=000, unreach=0, hmode=3,
pmode=4, hpoll=6, ppoll=6, headway=0, flash=1200peer_stratum peer_unreach,
keyid=0, ttl=0, offset=0.0, delay=0.0, dispersion=15937.5, jitter=0.0,
xmt=dbfc8208.63d587b4 2016-12-14T21:44:24.389,
filtdelay=0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtoffset=0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtdisp=16000.00 16000.00 16000.00 16000.00 16000.00 16000.00 16000.00 16000.00
ntpq> cv &2
associd=32104 status=00f1 15 events, clk_no_reply,
device="SHM/Shared memory interface", name="SHM", timecode="", poll=5127,
noreply=5127, badformat=0, baddata=0, stratum=0, refid=PPS, flags=0
On the other hand, &4 (32106) is a remote peer. It can only be queried one way:
ntpq> rv &4
status=16fa reach, sel_sys.peer, 15 events, sys_peer,
srcadr=204.9.54.119, srcport=123, dstadr=192.168.1.248, dstport=123, leap=00,
stratum=1, precision=-19, rootdelay=0.0, rootdisp=1.083, refid=CDMA,
reftime=dbfc809f.9556cd4f 2016-12-14T21:38:23.583,
rec=dbfc80a6.6b6defcf 2016-12-14T21:38:30.419, reach=317, unreach=0, hmode=3,
pmode=4, hpoll=10, ppoll=10, headway=0, flash=00 ok, keyid=0, offset=0.036,
delay=29.357, dispersion=17.45, jitter=0.593,
xmt=dbfc80a6.67ae4d0d 2016-12-14T21:38:30.405,
filtdelay=29.36 30.31 27.87 29.36 29.90 30.03 28.49 31.59,
filtoffset=0.04 0.28 -0.23 -0.12 -0.89 0.11 -1.11 0.38,
filtdisp=0.01 16.30 32.36 48.11 96.38 112.03 127.63 143.71
ntpq> cv &4
***Server error code BADASSOC
The error is thrown because you sent a request to dump clock variables
to a peer that isn't a clock.
Here's the design flaw: There doesn't seem to be any reliable way to
tell whether or not a peer is a clock from the status word, before
you've done either an rv or failed at a cv (these generate different
request types).
The status word is described here:
https://docs.ntpsec.org/latest/decode.html
If anyone can tell me how to get "this is a clock" from that I will be
profoundly grateful. No amount of client-side logic in ntpmon can
compensate for not being able to tell before you do an rv.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
The spirit of resistance to government is so valuable on certain occasions,
that I wish it always to be kept alive. It will often be exercised when
wrong, but better so than not to be exercised at all. I like a little
rebellion now and then. -- Thomas Jefferson, letter to Abigail Adams, 1787
More information about the devel
mailing list