Monitoring busy/pool servers

Eric S. Raymond esr at thyrsus.com
Wed Dec 14 22:17:15 UTC 2016


Hal Murray <hmurray at megapathdsl.net>:
> ghane0 at gmail.com said:
> > Unfortunately, I am unable to play with this new toy, as all my servers that
> > run NTPsec are in the pool, and ntpmon just goes to sleep on them (see issue
> > #206). 
> 
> There are two problems in this area.  (at least that I know about.  Maybe 
> more)
> 
> The first is that the interesting data doesn't change often enough for a 
> display that updates frequently to be helpful.  For my eye/brain, updating 
> the when or lstint column is a distraction since my peripheral vision grabs 
> my attention when I'm trying to look at something that hasn't changed.

That's a UI problem for which I can imagine a couple of different
possible solutions.  It's not what Sanjeev is complaining about, and
I don't think it's the thing to tackle first.

> The other problem is that a busy server has way more clients than will fit on 
> a screen so a mrulist printout is useless.  A busy pool server will collect 
> ballpark of a million clients over 24 hours.  That takes ballpark of 10 
> minutes to collect the data on localhost.  Round up if you have network 
> delays.

There's an obvious fix for this, which is to stop requesting MRU report frags
once we've gathered enough records to fill the window.  The only reason I haven't
done this is that it's going to be useless if the reporting order is oldest
first rather than newest first.  I guess I'd better go nail that down...

Nope.  We're screwed.  Entries are retrieved oldest first.  I'm going to have
to document this as a known bug.

    == Known Bugs ==

    +ntpmon+ will appear to hang when monitoring hosts with extremely long
    MRU lists - in particular, public pool hosts.  There is no easy fix
    for this, as the records are returned oldest first and the portion
    of interest is usually the newest.

The only way to fix this would be to extend the protocol, give mode 6
clients a way to request newest first.  If we're going to do that we
might as well support capping the count to be returned.

> On top of that, the mru code for ntpq (and I assume ntpmon) is broken.  It 
> seems to hang when the server has lots of clients.

I think you have misdiagnosed this. My testing on your flaky wifi link
suggested strongly that lots of clients isn't the issue - lots of
packet dropouts is the issue.

> The mru sort options are also reversed in some options.

That's trivial to fix.  I already told you how, in fact.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


More information about the devel mailing list