Eric S. Raymond
esr at thyrsus.com
Thu Dec 1 00:27:44 UTC 2016
Hal Murray <hmurray at megapathdsl.net>:
> The mru list is getting printed out oldest first. I thought you fixed that.
I thought I did too. Testing...
esr at snark:~/software/ntp-rescue/ntpsec$ ntpq -c mru
Ctrl-C will stop MRU retrieval and display partial results.
Retrieved 12 unique MRU entries and 0 updates.
lstint avgint rstr r m v count rport remote address
229 896 1d0 . 4 4 383 123 mail.coldnorthadmin.com
242 887 1d0 . 4 4 387 123 level1e.cs.unc.edu
257 1004 1d0 . 4 4 342 123 22.214.171.124
646 963 1d0 . 4 4 356 123 126.96.36.199 (mirror)
679 1068 1d0 . 4 4 321 123 enigma.wiredgoats.com
707 895 1d0 . 4 4 383 123 188.8.131.52
776 991 1d0 . 4 4 346 123 184.108.40.206
929 892 1d0 . 4 4 384 123 helium.constant.com
1031 960 1d0 . 4 4 357 123 mis.wci.com
87595 10973 0 . 3 4 20 37200 localhost
342915 55 1d0 . 4 4 11 123 cheezum.mattnordhoff.net
342932 59 1d0 . 4 4 10 123 209-133-217-165.static.hvvc.us
That looks newest first. Either I'm deeply confused or you're running
a stale version somehow.
> The print-what-you-have after ^C doesn't work if the output is a pipe.
> (Classic doesn't work either.)
> [murray at second ~]$ ntpq -nc mru | tee foo.log
> Ctrl-C will stop MRU retrieval and display partial resultcds.
> ^Cclose failed in file object destructor:
> sys.excepthook is missing
> lost sys.stderr
This, alas, is unfixable - Python loses its cookies in a way I can do
nothing about. I'm not sure whether the Python interpeter's behavior
qualifies as a bug or if there's nothing better it can do. C ntpq's
similar failure to cope hints at the latter.
I'll ask around on #python about this. It's possible there's some workaround;
I doubt it, but it's possible.
> The date in the python version string is close to useless.
> [murray at hgm ~]$ ntpd --version
> ntpd 0.9.6-b9cadf5-hgm Nov 30 2016 03:44:12
> [murray at hgm ~]$ ntpq --version
> ntpq 0.9.6-119-play 2016-11-06T00:05:45Z
> [murray at hgm ~]$
> I don't know what the right answer is.
> The date is ages ago. The "play" is the directory I built it in. The "hgm"
> in the ntpd string is what I put there with waf --build-version-tag=
Here's what generates that:
return "%s-%s-%s %s" % (ntp.version.VERSION, ntp.version.VCS_TICK,
You're looking at the value of VCS_DATE, which is "The date of the
most recent commit in true ISO-8601/RFC3339 format, including
What do you think should be there instead?
> I'm getting hangs with mrulist. I haven't figured out what's going on. Yes,
> it works OK in small local cases. I'm getting hangs in big local cases (pool
> server). If I wait a while then ^C, it prints out a bunch of stuff (in
> backwards order), then stops. It doesn't print out any of the
> 346 (0 updates)
> messages, but I don't know if the new version does that normally.
> 1231 0 90 . 3 4 1 123 220.127.116.11
> 1230 0 90 . 3 4 1 30704 18.104.22.168
> 1230 0 90 . 3 3 1 60456 22.214.171.124
> 1230 0.00 90 . 3 3 3 3688 126.96.36.199
> 1230 0 90 . 3 3 1 44231 188.8.131.52
> [murray at second ~]$
> It's a pool server getting lots of traffic. The 1200 is ballpark of how long
> since ntpd was restarted. So either local packets are getting dropped and
> the retransmission logic is broken or something else is broken.
> Any suggestions for how to debug this?
My normal procedure is just to add transaction debugging messages until
something jumps out at me. It usually doesn't take long.
> How about adding a few counters to ntpq so we can at least get some hints?
You can try setting -D 4 to see the packet traffic.
The good news is this is probably not a frag-reassembly problem. If that
logic were busted we'd almost see much more obvious problems in peer requests,
which usually get two-frag responses.
It is much more likely that the error is in the mrulist() method of the
ControlSession class in pylib/util.py. If I weren't about to go out
to bimontly Steak Night, I would scrutinize lines 1325-1364 carefully;
that's where it sets up to ship the second through nth requests for
mrulist spans. Comparing it to the mrulist code in Classic's C ntpq
would probably be fruitful.
I wouldn't be at all surprised if I fluffed something in translating the C
code and didn't catch it because tickling this case is difficult here. It's
not like the protocol was ever really documented.
Should I be able to reproduce this on tom?
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the devel