How much do we care about high-load scenarios?

Wed Sep 14 21:40:57 UTC 2016

Quoting "Eric S. Raymond" <esr at thyrsus.com>:
> Achim Gratz <Stromeko at nexgo.de>:
>> …or move the refclocks out of ntpd altogether and use some shared memory
>> or mailbox system to have ntpd have a look at the timestamp stream from
>> each refclock.
>
> Yeah, this is one of my longer-term plans.  It was in the original technical
> proposal I wrote 18 months ago, labeled REFCLOCKD.

I'll add my +1 to this, setting the local time is a logical process  
split from serving time to clients.

:
>> You seem to be arguing from the standpoint of whether or not ntpd keeps
>> the local clock in sync (or at least that's my impression).  Let's
>> assume you already ascertain that, then the question becomes how many
>> clients you can serve that time with some bounded degradation under
>> various load conditions.  For NTP the response time is somewhat
>> critical, but even more important is that the variability of the
>> response time should be minimized.  This is of course where it gets
>> tricky to even measure…
>
> Yes. On the other hand, I repeat: we have the real-world information
> that "Help!  My time-service performance is degrading under load!" is
> *not* a theme being constantly sounded on bug-trackers or in time-nuts
> or elsewhere.  In fact I've never seen this complaint even once.
>
> The simplest explanation for this dog not barking is that there's no
> burglar - that is, you have to load your timeserver to a *ridiculous*
> extent before performance will degrade enough to be visible at the
> scale of WAN or even LAN time service.
>
> I think this explanation is very likely to be the correct one. If
> you want to persuade me otherwise, show me data.

This page from cloudflare does a nice job of describing the  
limitations of a single process UDP receive model:
https://blog.cloudflare.com/how-to-receive-a-million-packets/

The limit they hit with their hardware was around 370kpps (with a  
single process receive), which is a lot of NTP.

 From my own testing with iperf high rate 64 byte UDP packets, max  
rate before 1% receive packet loss:

i3-540 / Intel 82574L nic: ~469kpps
Athlon(tm) 64 X2 4400+ / RTL8168 gig nic: ~64kpps
Odroid C2: ~62kpps
Raspberry Pi 2: ~19kpps
Beaglebone Black: ~9kpps
Raspberry Pi B+: ~4kpps

Even these low end machines would be able to serve thousands (or  
millions even, if the clients are mostly nice) of NTP clients each.

In their blog, Cloudflare describes their vertical scaling (one server  
doing more) choices, but horizontal scaling (adding more servers)  
works very well with NTP.

So this doesn't seem like a burning issue to the average user.

Even NIST's numbers don't seem all that hard to do:

http://nvlpubs.nist.gov/nistpubs/jres/121/jres.121.003.pdf
"Maximum request rate  [server A] 67,340/s  [server B] 96,277/s"