NTPsec's longer-term objectives

Sat Sep 10 13:52:42 UTC 2016

With Zero Perl Day just achieved, this seems like a good time to
review what I see as NTPsec's longer-term objectives.  This note
has two purposes. First, to explain why I've been prioritizing
our technical efforts as I have.  Second, to invite comment and
direction from CII if there's any mismatch between their objectives
and our execution.

Here are what I consider our major objectives, in priority order.
I'll discuss our progress and future plans relative to each in turn.

1. Improved security.
2. Improved long-term maintainability.
3. Improved time-service performance.

Improved security, of course, is the most urgent problem NTP has and the
reason this team was convened. We have made some really dramatic gains
in this area, as measured by the high percentage of CVEs issued
against NTP Classic in 2016 to which we proved to be invulnerable.
Last I checked we were dodging 75% of these, and the percentage
has seemed to be rising with each batch issued.

We know how to make even more gains.  I won't be more specific about
code changes on a public mailing list, and don't need to be; the
relevant team members know what they need to do.  I will mention
Daniel Franke's work with the IETF on NTS.  It looks very likely that
we will be first implementation on that, replacing Autokey with
public-key crypto that actually works.

Though security remains job #1, I worry less about it than I used to
because the field evidence we have suggests that our
security-enhancement strategy, centered on severe reductions in attack
surface, is working very well; we've not just matched but exceeded
what could reasonably have been expected from us.  I judge we don't
need particular luck or heroic effort to win this game, just continued
diligence and good practice.

I give improved long-term maintainability priority over improved
performance because I think of the NTP codebase as critical
infrastructure that needs to persist on a timescale of decades,
perhaps even centuries.  We will be (rightly) judged by the quality of
the codebase we leave to our successors.

There is a happy coincidence (which is not really a coincidence at
all) between the things we need to do to improve long-term
maintainability and the things we have done to improve security.
Both are best served by large reductions in total code volume and
complexity.

That, too, is something we have achieved.  At fork time the C code was
227KLOC, with Perl and shell adding a handful of KLOC (about 2%). As
of yesterday C was down to 85KLOC, Perl is gone, and Python is now
4%.

That is a huge reduction in C complexity - nearly 2/3rds.  And it
actually understates the gains, as the worst snarls in the code were
heavily concentrated in the legacy cruft we were able to drop.  So
were the security problems (this, too, is not a coincidence).

Going forward, one of the improvements I intend to pursue is moving
everything in the suite other than ntpd itself into Python.  I'd move
ntpd, too, if it were practical, but Python's virtues do not include
being any good for realtime work.

There are two sufficient reasons to do this.  One is that moving code
from C to Python gives you somewhere between a x2.5 and x5 compression
ratio, and I'm all about reducing LOC for improved maintainability.
The other is that Python is *exceptionally* easy to read six months
later, more so than C and light-years better than Perl or shell; I have
never seen another language that is better on that metric.  Thus,
every line of code we can move to Python is a gift to our successors.

I will also continue to look for cuts in the C code, but I think we've
reached a point of diminishing returns there.  The largest future one
I expect to make is from moving ntpq to Python, about 6KLOC.  I don't
think there will be any more that big.

Improved time-service performance comes third because, frankly, nobody
is really complaining about it.  There are things we can do in this
area; Gary Miller wants to improve the speed of convergence after
startup, and this is probably achievable.  Daniel Franke has some more
blue-sky ideas for improving the core algorithms, but I think we're
still a good nine months from being ready to tackle that.

One thing that will help here is that, thanks to a recent push by Gary and
myself, we now have *much* better visualization tools.  We are getting
insights into performance and the error budget that simply weren't
possible before, and exploding some long-held myths.  

I can't yet predict how that will direct our future work...but we're
definitely doing research along with the development now, breaking
new ground.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Sometimes it is said that man cannot be trusted with the government
of himself.  Can he, then, be trusted with the government of others?
	-- Thomas Jefferson, in his 1801 inaugural address