cv BADASSOC failures

Eric S. Raymond esr at thyrsus.com
Thu Dec 15 10:23:43 UTC 2016


Gary E. Miller <gem at rellim.com>:
>                The fact that some ntpq associations was broken for
> long jsut demonstrates the problems with ntpq that ntpmon can
> solve.

They're not broken, and weren't.  You've tripped over a problem deeper
than the client code.

I've been meaning to brief you and the rest of the team on what I
found out about this.  That apparent endianness issue turned out to be
a red herring - I was looking at the wrong variable.

The real problem is more fundamental than that, and stems from a
design error in the Mode 6 control protocol.

Before ntpq or ntpmon makes a peer listing, it first has to get a list
of peer association IDs from the server.  This is what it summarizes in the
'assoc' listing, if you ask for one.

ntpq> assoc

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 32103  911a   yes   yes  none falsetick    sys_peer  1
  2 32104  801b   yes    no  none    reject clock_alarm  1
  3 32105  8811   yes  none  none    reject    mobilize  1
  4 32106  16fa    no   yes  none  sys.peer    sys_peer 15
  5 32107  1324    no   yes  none   outlier   reachable  2
  6 32113  14c4    no   yes  none candidate   reachable 12
  7 32115  13c4    no   yes  none   outlier   reachable 12
  8 32119  13b4    no   yes  none   outlier   reachable 11
  9 32123  14fa    no   yes  none candidate    sys_peer 15
 10 32124  13e4    no   yes  none   outlier   reachable 14

The hex in column two is the peer status word.  The remaining
fields in each row unpack that word.

There are two kinds of peers, servers and clocks.  Either kind can be
queried for peer status variables

ntpq> peers
     remote              refid      st t when poll reach   delay   offset  jitter
=================================================================================
xSHM(0)             .GPS.            0 l   42   64  377    0.000 -497.705  43.046
 SHM(1)             .PPS.            0 l    -   64    0    0.000    0.000   0.000
 us.pool.ntp.org    .POOL.          16 p    -   64    0    0.000    0.000   0.004
*ntp.your.org       .CDMA.           1 u  266 1024  317   29.357    0.036   0.593
-pbx.cytranet.net   204.9.54.119     2 u  527 1024  377   71.206   -2.568   0.327
+x.ns.gin.ntt.net   249.224.99.213   2 u  891 1024  377    8.150   -1.910   1.605
-209-133-217-165.st 128.227.205.3    2 u 1056 1024  347   39.207   -4.078   0.486
-lithium.constant.c 18.26.4.105      2 u  583 1024  377   13.387   -2.687   0.480
+level1f.cs.unc.edu .PPS.            1 u  646 1024  377   30.786   -2.512   0.626
-clockb.ntpjs.org   132.163.4.101    2 u  528 1024  277   31.241   -1.530   4.763

In the above list, &2 (32104) is a clock.  It can be queried in two ways:

ntpq> rv &2
status=801b conf, sel_reject, 1 event, clock_event,
srcadr=127.127.28.1, srcport=123, srchost="SHM(1)", dstadr=127.0.0.1,
dstport=123, leap=11, stratum=0, precision=-30, rootdelay=0.0, rootdisp=0.0,
refid=PPS, reftime=00000000.00000000 2036-02-07T01:28:16.000,
rec=00000000.00000000 2036-02-07T01:28:16.000, reach=000, unreach=0, hmode=3,
pmode=4, hpoll=6, ppoll=6, headway=0, flash=1200peer_stratum peer_unreach,
keyid=0, ttl=0, offset=0.0, delay=0.0, dispersion=15937.5, jitter=0.0,
xmt=dbfc8208.63d587b4 2016-12-14T21:44:24.389,
filtdelay=0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00,
filtoffset=0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00,
filtdisp=16000.00	16000.00	16000.00	16000.00	16000.00	16000.00	16000.00	16000.00
ntpq> cv &2
associd=32104 status=00f1 15 events, clk_no_reply,
device="SHM/Shared memory interface", name="SHM", timecode="", poll=5127,
noreply=5127, badformat=0, baddata=0, stratum=0, refid=PPS, flags=0

On the other hand, &4 (32106) is a remote peer.  It can only be queried one way:

ntpq> rv &4
status=16fa reach, sel_sys.peer, 15 events, sys_peer,
srcadr=204.9.54.119, srcport=123, dstadr=192.168.1.248, dstport=123, leap=00,
stratum=1, precision=-19, rootdelay=0.0, rootdisp=1.083, refid=CDMA,
reftime=dbfc809f.9556cd4f 2016-12-14T21:38:23.583,
rec=dbfc80a6.6b6defcf 2016-12-14T21:38:30.419, reach=317, unreach=0, hmode=3,
pmode=4, hpoll=10, ppoll=10, headway=0, flash=00 ok, keyid=0, offset=0.036,
delay=29.357, dispersion=17.45, jitter=0.593,
xmt=dbfc80a6.67ae4d0d 2016-12-14T21:38:30.405,
filtdelay=29.36	30.31	27.87	29.36	29.90	30.03	28.49	31.59,
filtoffset=0.04	0.28	-0.23	-0.12	-0.89	0.11	-1.11	0.38,
filtdisp=0.01	16.30	32.36	48.11	96.38	112.03	127.63	143.71
ntpq> cv &4
***Server error code BADASSOC

The error is thrown because you sent a request to dump clock variables
to a peer that isn't a clock.

Here's the design flaw: There doesn't seem to be any reliable way to
tell whether or not a peer is a clock from the status word, before
you've done either an rv or failed at a cv (these generate different
request types).

The status word is described here:

https://docs.ntpsec.org/latest/decode.html

If anyone can tell me how to get "this is a clock" from that I will be
profoundly grateful.  No amount of client-side logic in ntpmon can
compensate for not being able to tell before you do an rv.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The spirit of resistance to government is so valuable on certain occasions, 
that I wish it always to be kept alive.  It will often be exercised when 
wrong, but better so than not to be exercised at all. I like a little 
rebellion now and then.	-- Thomas Jefferson, letter to Abigail Adams, 1787


More information about the devel mailing list