Removing interleaved mode
Eric S. Raymond
esr at thyrsus.com
Tue Aug 30 10:12:05 UTC 2016
Hal Murray <hmurray at megapathdsl.net>:
> esr at thyrsus.com said:
> > I'm not clear how we would tell if anything were broken. Or are you just
> > suggesting looking for a timing difference?
> Mostly I was suggesting that we needed to test it to make sure that it seemed
> to work rather than worrying about how well it worked.
> After that, it would be nice to see how well it works. I think that would
> take 2 systems with PPS on both. That's 2 per test. I think we could test
> old and new with 4 systems: 2 new, 2 old, and exchange packets in all 12
> My suggestion of using Pi-s is probably bogus. The Ethernet on USB will add
> enough noise to confuse things.
Given the nature of the bugs Daniel found, I don't think this effort can really
be justified. Here is his comment in full.
Remove interleaved mode
Interleaved mode was an invention intended to improve timekeeping
precision in symmetric and broadcast mode. The problem it intended to
solve is that transmit timestamps have to be written before the packet
is sent, but right *after* the packet is sent, better information
becomes available because you know exactly when the packet made it
through the kernel and out onto the wire. So, the basic idea of
interleaved mode was to dump that better value into the *next* packet,
and have the peer follow along with that, always one packet behind.
This is a problem that PTP is clearly better suited to solving, but
interleaved mode still seems at least reasonable in theory. However,
there are two big problems.
First, interleaved mode adds a great deal of complexity to NTP's state
machine. This led to at least one terrible vulnerability (CVE-2016-1548)
which took two tries to fix (CVE-2016-4956), and probably indirectly
led to a few others.
Second, the implementation was flawed. "Drivestamps" were collected
simply by calling get_systime() immediately after sendpkt() returned.
However, on modern kernels, send() returns immediately unless the
network buffer is full. So the timestamp that NTP was collecting had
nothing to do with the time the packet actually went out, and was not
any more accurate than the transmit timestamp obtained in basic mode.
If interleaved mode ever provided a timekeeping improvement, there are
two possible explanations for why. One possibility is that the Solaris
boxen that Dave Mills tested it on had a simpler kernel networking
stack, so the timestamp he was collecting was something closer to a
true drivestamp. Another possibility is the presence of a simple bug:
before the recent refactor of receive(), in every mode except
interleaved mode, NTP was storing a transmit timestamp where a receive
timestamp belonged. Interleaved mode may have been improving
performance just by dodging this buggy code.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the devel