How much do we care about high-load scenarios?
Eric S. Raymond
esr at thyrsus.com
Wed Sep 14 21:15:09 UTC 2016
Achim Gratz <Stromeko at nexgo.de>:
> …or move the refclocks out of ntpd altogether and use some shared memory
> or mailbox system to have ntpd have a look at the timestamp stream from
> each refclock.
Yeah, this is one of my longer-term plans. It was in the original technical
proposal I wrote 18 months ago, labeled REFCLOCKD.
> I think the appeal would be to use multiple cores on a multiprocessor,
> ostensibly not to cut down on the response time, but rather to shorten
> the long tail of the distribution. Whether or not multiple threads or
> processes achieve that objective needs investigation of course.
As I just wrote, I want to see measurements before I invest in complexity.
Especially since one of the things we know is that the servers deployed
out there (a) are *not* using concurrency, and (b) nobody is identifying
poor performance under load as a pain point.
Therefore I'm going to need persuading that high load is even a real
problem, let alone that concurrency is the right solution.
> Your experience with the rasPi is maybe just the hammer that makes all
> problems look like nails when in reality you'll also need to deal with
> welds, screws, press-fit, snap-in and folded cardboard.
Fair point. You may be right that I'm being too optimistic here.
> They are not yet cheap enough to have a single server per function, at
> least not on iron. Virtualization and containerization hasn't yet
> diffused into all places so you can't even assume that NTP is isolated
> to one VM or container.
Not that that would matter - VMing doesn't magically make more cycles
available, in fact quite the reverse.
> You seem to be arguing from the standpoint of whether or not ntpd keeps
> the local clock in sync (or at least that's my impression). Let's
> assume you already ascertain that, then the question becomes how many
> clients you can serve that time with some bounded degradation under
> various load conditions. For NTP the response time is somewhat
> critical, but even more important is that the variability of the
> response time should be minimized. This is of course where it gets
> tricky to even measure…
Yes. On the other hand, I repeat: we have the real-world information
that "Help! My time-service performance is degrading under load!" is
*not* a theme being constantly sounded on bug-trackers or in time-nuts
or elsewhere. In fact I've never seen this complaint even once.
The simplest explanation for this dog not barking is that there's no
burglar - that is, you have to load your timeserver to a *ridiculous*
extent before performance will degrade enough to be visible at the
scale of WAN or even LAN time service.
I think this explanation is very likely to be the correct one. If
you want to persuade me otherwise, show me data.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the devel