How much do we care about high-load scenarios?

Wed Sep 14 20:34:59 UTC 2016

Eric S. Raymond writes:
> Hal:
>
> But maybe we should have a  separate thread per refclock to get the
> time stamp.  ...

…or move the refclocks out of ntpd altogether and use some shared memory
or mailbox system to have ntpd have a look at the timestamp stream from
each refclock.  Then you have to have either one process per refclock or
refclock type that interfaces to the actual time source.  I could also
see good reasons for having the client support run separately from the
actual clock keeping.  I don't think there's much of a difference
processes vs. threads w.r.t. performance, so it all boils down how
tightly coupled you need to be and what API is more convenient.

> Me:
[…]
> The crude metric that occurs to me first is just interval between
> select calls.  If you think a different timing figure would be a
> better predictor, I'm very open to argument.

I think the appeal would be to use multiple cores on a multiprocessor,
ostensibly not to cut down on the response time, but rather to shorten
the long tail of the distribution.  Whether or not multiple threads or
processes achieve that objective needs investigation of course.

[…]
> What I've just realized, somewhat to my own startlement, is that I no
> longer care enough about high-load scenarios to spend a lot of effort
> hedging the design against them.  It's because of the Pis on my
> windowsill - I've grown used to thinking of NTP as something you throw
> on a cheap single-use server so it's not contending with a
> conventional job load.

I don't think you're going to find a single use server just for NTP in
most commercial compute farms.  If they think of having a dedicated
stratum-1 clock they'll probably go and buy one from the usual suspects
with planning, setup, support and warranty.  If they don't, then ntpd is
surely running on some server in addition to other duties _or_ they even
run it in a VM.

For each stratum-1 they'll likely have a number of stratum-2 to serve
the actual network clients.  These will also most likely be on servers
with shared function (DNS, DHCP, monitoring, logging…), if only due to
convenience.  It is also quite typical that the NTP server has multiple
network interfaces (both physical or virtual) in this situation, so it
might be useful to serve each of those with a separate thread or
process.

Your experience with the rasPi is maybe just the hammer that makes all
problems look like nails when in reality you'll also need to deal with
welds, screws, press-fit, snap-in and folded cardboard.

Even the rasPi has quite a bit of load variability.  The two of mine are
running well under 1µs offset when left alone at constant temperature,
but both temperature swings and simply being logged in via SSH and
looking at the log data pulls it off some 10s of µs.  None of that
counts as high load in any usual sense.  If your goal is to keep within
one ms on a local network, then I guess that variability doesn't make a
difference at the client side, since it gets swamped by the delay and
delay variability of the rasPi's network interface that actually is
connected via USB.

> I think over-fixating on the Pi's limitations is a mistake.  And the EMI
> issue is orthogonal to whether you expect your time service to be running
> on a lightly or heavily-loaded machine - either way you're going to need
> to pipe your GPS signal down from a roof mount.

That argument only holds for stratum-1 anyway and only if you disregard
LF clocks.  There's also quite a bunch more considerations to make than
just those already mentioned.  The most prevalent NTP servers will be
stratum-2 in any larger installation (and also what I get from NTPpool
most of the time), serving a few hundred clients at least.

> The underlying point is that blade and rack servers are cheap.  Cycles
> are cheap.  This gives the option of implicitly saying to operators
> "high-load conditions are *your* problem - fix it by rehosting your
> NTP" rather than doing what I think would be premature optimization
> for the high-load case.  If we jump right in and implement threading
> we *are* going to pay for it in increased future defect rates.

They are not yet cheap enough to have a single server per function, at
least not on iron.  Virtualization and containerization hasn't yet
diffused into all places so you can't even assume that NTP is isolated
to one VM or container.

> My preference is to engineer on the assumption that local cycles are cheap
> and we can therefore stick with the simple dataflow that we
> have now - synchronous I/O with one worker thread to avoid stalls
> on DNS lookups.
>
> I'd prefer to bias towards architectural simplicity unless and until
> field reports force us to optimize for the high-load case.

You seem to be arguing from the standpoint of whether or not ntpd keeps
the local clock in sync (or at least that's my impression).  Let's
assume you already ascertain that, then the question becomes how many
clients you can serve that time with some bounded degradation under
various load conditions.  For NTP the response time is somewhat
critical, but even more important is that the variability of the
response time should be minimized.  This is of course where it gets
tricky to even measure…

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds