ntpq mru hang

Mon Dec 19 11:19:01 UTC 2016

Hal Murray <hmurray at megapathdsl.net>:
> 
> I added + and - as keyboard commands to ntpmon to bump the debug level.
> The first + opens ntpmon.log in append mode
> 
> ntpq's debug commands now open ntpq.log

Thank you for not only implementing this but documenting it - that
was a pleasant surprise.  Nice design, too.

> I haven't sorted out what gets printed at each level.
> I bumped the hex packet printout to level 5.
> Level 4 just prints out the text which works fine for mru debugging and is 
> much easier to read and saves screen space.

That is completely reasonable.

> >> A successful batch returns a new nonce.  ntpq asks for more using the old 
> >> one.  Nothing comes back.
> 
> > Don't do that, then! 
> 
> I assume that's sarcastic.  All I'm doing is running ntpq/mru pointing at a 
> server that has enough mru slots to need more than one batch.  Testing on a 
> local server never triggers that case.

Wait, then I have failed to undersrtaand your bug report.  This can happen in
a different, less odd way than the nonce update getting lost by packet drop?

> Perhaps we should add a debugging hack to ntpd that would preload the mru 
> list with a batch of crap so we can test this case with a local server that 
> doesn't get enough traffic to do it naturally.

Or we could just write a script using the Python Mode 6 library to flood a
running ntp with bogus Mode 6 packets.  That way we wouldn't have to add
cruft in C.

> > That is relatively easily arranged.  I could add a command to flip it to an
> > all-MRU display. 
> 
> I think that would be great.

Done. There is now an 'm' command that does what you want. That took
me a *whole nine minutes* to implement, test, and document.  (See
also: Why I Moved Stuff To Python.)

> >> I think we need an option to ntpq/mru to tell it how many slots to ask for,
> > We have that. I added it a few days ago.  ntpmon uses it to avoid hanging on
> > servers with long MRU lists. See "recent" on the Mode 6 page. 
> 
> I think that ntpd is ready.  We don't have a way to use it from ntpq

Actually, we do.  You can pass name-value pairs as arguments to the mrulist
command of ntpq.  I first tested the "recent" extension by typing
"mrulist recent=4" and observing that I got back exactly four records 
rather than the dozen or so I saw from a bare mrulist command.

> > Trouble is, "all that fits in a batch" can differ depending on the length of
> > the data literals in records... 
> 
> I'm willing to round down.  (I assume we can come up with a worst case length.)  I'm willing to specify how-many manually.  I'd like to be able to specify the number of packets in a batch to experiment.  (I'm happy to use the editor for that until we learn more.)
> 
> The big picture is that I'm trying to understand more about what's going on with a busy pool server.  Tangled up with that is that the new/python ntpq isn't working yet and/or I don't know how to use what does work.

What is failing to work exactly?

This is a pertinent question because once you get past the request/response
logic there really isn't much *there* there.  If basic transmission and frag
reassembly works, the rest is...not quite trivialm, but pretty close to it.

(Except, as previously noted, for MRU span reassembly.)

> But ntpq-classic doesn't work either.  I think the problem is that slots are getting recycled faster than it can read them so it never converges.

Ah, now that is key information.  It tells us we have a protocol problem, not
an implememtation problem.

> I think being able to read N newest slots would be a very important tool.

Well, you have it. "ntpq c 'mrulist recent=23'" should work just fine.

A script around ntpq could collect samples of data to be manually inspected.  It might be better if the lstint column were turned into seconds-this-day.
> 
> How does the MRU list stuff know when it is finished?  We may need to teach that area to give up sooner and print what it has.  Mumble.

The daemon ships an end sentinel.  It's documented on the Mode 6 page.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>