sys_fuzzMime-Version: 1.0

Eric S. Raymond esr at thyrsus.com
Wed Jan 25 01:35:23 UTC 2017


Hal Murray <hmurray at megapathdsl.net>:
> 
> gem at rellim.com said:
> > Makes no sense to me.  Adding randomness helps when you have hysteresis,
> > stiction, friction, lash and some other things, but none of those apply to
> > NTP.
> 
> The NTP case is roughly stiction.  Remember the age of this code.  It was 
> working long before CPUs had instructions to read a cycle counter.  Back 
> then, the system clock was updated on the scheduler interrupt.  There was no 
> interpolation between ticks.

*blink*   I think I just achieved enlightenment.  Gary, Hal, please
review the following carefully to ensure that I haven't updated my
beliefs wrongly.

Stiction in this context = "adjacent clock reads could get back the
same value", is that right?  Suddenly a whole bunch of things, like
the implications of only updating the clock on a scheduler interrupt,
make sense.

And now I think I get (a) why Mills fuzzed the clock, and (b) why the
code is so careful about checking for clock stepback.  If your working
assumption is that the clock will only update on a scheduler tick, and
your PLL correction requires you to have a monotonically increasing
clock, stiction is *bad*.  You have no choice but to fuzz the clock,
and the probabilistically least risky way to do it is by around half
the tick interval, but because random is random you cannot guarantee
when you have to do it twice between ticks that your second
pseudosample will be greater than your first.  You need what the code
calls a "Lamport violation" check to throw out bad pseudosamples.

Therefore I *deduce* that the PLL correction (the one NTP does, not
the in-kernel one Hal tells us is associated with PPS) requires a
monotonically increasing clock.  It's the simplest explanation for the
way libntp/systime.c works, and it explains *everything* that has puzzled
me about that code.

I love this project - it makes me learn new things.

> Mark/Eric: Can you guarantee that we will never run on a system with a crappy 
> clock?  In this context, crappy means one that takes big steps.

OK, now that I think I understand this issue I'm going to say "Yes, we
can assume this".

All x86 machines back to the Pentium (1993) have a hardware cycle
counter; it's called the TSC. As an interesting detail, this was a
64-bit register even when the primary word size was 32 bits.

All ARM processors back to the ARM6 (1992) have one as well. A little
web searching finds clear indications of cycle counters on the
UltraSparc (SPARC V9), Alpha, MIPS, PowerPC, IA64 and PA-RISC.

I also hunted for information on dedicated smartphone processors.
I found clear indication of a cycle counter on the Qualcomm Snapdragon
and clouded ones for Apple A-series processors.  The Nvidia Tegra, MediaTek,
HiSilicon and Samsung HyExynos chips are all recent ARM variants and can
therefore be assumed to have an ARM %tick register.

Reading between the lines, it looks to me like this hardware feature
became ubiquitous in the early 1990s and that one of the drivers was
hardware-assisted crypto.  It is therefore *highly* unlikely to be
omitted from any new design, even in low-power embedded.  And if you
have a TSC, sampling it is a trivial handful of assembler
instructions.

> I thinnk that all Gary's test proved is that his system doesn't have a crappy 
> clock.

Yes. Agreed.

> If we are serious about getting rid of that code, I'll put investigating that 
> area higher on my list.  I  think we have more important things to do.

I think I can take it from here.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


More information about the devel mailing list