<div dir="ltr">Thank you Eric. Have read, am pondering, and welcome other people to weigh in.<div><br></div><div>..m</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Jun 28, 2016 at 8:30 PM Eric S. Raymond <<a href="mailto:esr@thyrsus.com">esr@thyrsus.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In recent discussion of the removal of memlock, Hal Murray said<br>
"Consider ntpd running on an old system that is mostly lightly loaded<br>
and doesn't have a lot of memory."<br>
<br>
By doing this, he caused me to realize that I have not been explicit<br>
about some of the assumptions behind my technical strategy. I'm now<br>
going to try to remedy that. This should have one of three results:<br>
<br>
(a) We all develop a meeting of of the minds.<br>
<br>
(b) Somebody gives me technical reasons to change those assumptions.<br>
<br>
(c) Mark tells me there are political/marketing reasons to change them.<br>
<br>
So here goes...<br>
<br>
One of the very first decisions we made early last year was to code to a<br>
modern API - full POSIX and C99. This was only partly a move for ensuring<br>
portability; mainly I wanted a principled reason (one we could give potential<br>
users and allies) for ditching all the cruft in the codebase from the big-iron<br>
era.<br>
<br>
Even then I had clearly in mind the idea that the most effective<br>
attack we could make on the security and assurance problem was to<br>
ditch as much weight as possible. Hence the project motto: "Perfection<br>
is achieved, not when there is nothing more to add, but when there is<br>
nothing left to take away."<br>
<br>
There is certainly a sense in which my ignorance of the codebase and<br>
application domain forced this approach on me. What else *could* I<br>
have done but prune and refactor, using software-engineering skills<br>
relatively independent of the problem domain, until I understood enough<br>
to do something else?<br>
<br>
And note that we really only reached "understood enough" last week<br>
when I did magic-number elimination and the new refclock directive.<br>
It took a year because *it took a year!*. (My failure to deliver<br>
TESTFRAME so far has to be understood as trying for too much in the<br>
absence of sufficient acquired knowledge.)<br>
<br>
But I also had from the beginning reasons for believing, or at least<br>
betting, that the most drastic possible reduction in attack surface<br>
would have been the right path to better security even if the state of<br>
my knowledge had allowed alternatives. C. A. R. Hoare: "There are two<br>
ways of constructing a software design: One way is to make it so<br>
simple that there are obviously no deficiencies, and the other way is<br>
to make it so complicated that there are no obvious deficiencies.<br>
<br>
So, simplify simplify simplify and cut cut cut...<br>
<br>
I went all-in on this strategy. Thus the constant code excisions over<br>
the last year and the relative lack of attention to NTP Classic bug<br>
reports. I did so knowing that there were these associated risks: (1)<br>
I'd cut something I shouldn't, actual function that a lot of potential<br>
customers really needed, or (2) the code had intrinsic flaws that would<br>
make it impossible to secure even with as much reduction in attack surface<br>
and internal complexity as I could engineer, or (3) my skills and intuition<br>
simply weren't up to the job of cutting everything that needed to be cut<br>
without causing horrible, subtle breakage in the process.<br>
<br>
(OK, I didn't actually worry that much about 3 compared to 1 and 2 - I<br>
know how good I am. But any prudent person would have to give it a<br>
nonzero probability. I figured Case 1 was probably manageable with good<br>
version-control practice. Case 2 was the one that made me lose some<br>
sleep.)<br>
<br>
This bet could have failed. It could have been the a priori *right*<br>
bet on the odds and still failed because the Dread God Finagle<br>
pissed in our soup. The success of the project at its declared<br>
objectives was riding on it. And for most of the last year that was a<br>
constant worry in the back of my mind. *What if I was wrong?* What I<br>
was like the drunk in that old joke, looking for his keys under the<br>
streetlamp when he's dropped then two darkened streets over because<br>
"Offisher, this is where I can see".<br>
<br>
It didn't really help with that worry that I didn't know *anyone* I<br>
was sure I'd give better odds at succeeding at this strategy than<br>
me. Keith Packard, maybe. Poul Henning-Kemp, maybe, if he'd give up<br>
timed for the effort, which he wouldn't. Recently I learned that Steve<br>
Summit might have been a good bet. But some problems are just too<br>
hard, and this codebase was *gnarly*. Might be any of us would have<br>
failed.<br>
<br>
And then...and then, earlier this year, CVEs started issuing that we<br>
dodged because I had cut out their freaking attack surface before we<br>
knew there was a bug! This actually became a regular thing, with the<br>
percentage of dodged bullets increasing over time.<br>
<br>
Personally, this came as a vast and unutterable relief. But,<br>
entertaining narrative hooks aside, this was reality rewarding my<br>
primary strategy for the project.<br>
<br>
So, when I make technical decisions about how to fix problems, one of<br>
the main biases I bring in is favoring whatever path will allow me<br>
to cut the most code.<br>
<br>
On small excisions (like removing memory locking, or yet another<br>
ancient refclock driver) I'm willing to trade a nonzero risk that<br>
removing code will break some marginal use cases, in part because I am<br>
reasonably confident of my ability to revert said small excisions. We<br>
remove it, someone yells, I revert it, no problem.<br>
<br>
So don't think I'm being casual when I do this. What I'm really doing<br>
is exploiting how good modern version control is. The kind of tools<br>
we now have for spelunking code histories give us options we didn't<br>
have in elder days. Though of course there's a limit to this sort of<br>
thing. It would be impractical to restore mode 7 at this point.<br>
<br>
Now let's talk about hardware spread and why, pace Hal, I don't really<br>
care about old, low-memory systems and am willing to accept a fairly high<br>
risk of breaking on them in order to cut out complexity.<br>
<br>
The key word here is "old". I do care a lot about *new* low-memory<br>
systems, like the RasPis in the test farm. GPSD taught me to always<br>
keep an eye towards the embedded space, and I have found that the<br>
resulting pressure to do things in lean and simple ways is valuable<br>
even when designing and implementing for larger systems.<br>
<br>
So what's the difference? There are a couple of relevant ones. One<br>
is that new "low-memory" systems are actually pretty unconstrained<br>
compared to the old ones, memory-wise. The difference between (say) a<br>
386 and the ARM 7 in my Pis or the Snapdragon in my smartphone is<br>
vast, and the worst-case working set of ntpd is pretty piddling stuff<br>
by modern standards. Looking at the output of size(1) and thinking<br>
about the size of struct peer my guess was that it would be running<br>
with about 0.8GB of RAM, and top(1) on one of my Pis seems to confirm<br>
this.<br>
<br>
Another is that disk access is orders of magnitude faster than it<br>
used to be, and ubiquitous SSDs are making it faster yet. Many<br>
of the new embedded systems (see: smartphones) don't have spinning<br>
rust at all.<br>
<br>
What this means in design terms is that with one single exception,<br>
old-school hacks to constrain memory usage, stack size, volume<br>
of filesystem usage, and so forth - all those made sense on<br>
those old systems but are almost dead weight even on something<br>
as low-end as a Pi. The one exception is that if you have an<br>
algorithmic flaw that causes your data set to grow without bound<br>
you're screwed either way.<br>
<br>
But aside from that, the second that resource management becomes a<br>
complexity and defect source, it should be dumped. This extends from<br>
dropping mlockall() all the way up to using a GC-enabled language like<br>
Python rather than C whenever possible. Not for nothing am I planning<br>
to at some point scrap ntpq in C to redo it in Python.<br>
<br>
Now, as to *why* I don't care about old low-power systems - it's<br>
because the only people who are going to run time service on them are<br>
a minority of hobbyists. A minority, I say, because going forward<br>
most of the hobbyists interested in that end of things are going to be<br>
on Pis or Beaglebones or ODroids so they can have modern toolchains<br>
thank you.<br>
<br>
Let's get real, here. The users we're really chasing are large data<br>
centers and cloud services, because that's where the money (and<br>
potential funding) is. As long as we don't make algorithmic mistakes<br>
that blow up our big-O, memory and I/O are not going to be performance<br>
problems for their class of hardware in about any conceivable<br>
scenario.<br>
<br>
Here's what this means to me: if I can buy a complexity reduction (and<br>
thus a security gain) by worrying less about how the resulting code<br>
will perform on machines from before the 64-bit transition of 2007-2008,<br>
you damn betcha I will do it and sleep the sleep of the just that<br>
night.<br>
<br>
When all is said and done, we could outright *break* on hardware that<br>
old and I wouldn't care much. Unless somebody is paying us to care and<br>
I get a cut, in which case I will cheerfully haul out my shovels and<br>
rakes and implements of destruction and fix it, and odds are high<br>
we'll end up with better code than we inherited.<br>
<br>
Yeah, it's nice to squeeze performance out of old hardware, and it's<br>
functional to be sparing of resources. But when everything in both<br>
our security objectives and our experience says "cut more code" I'm<br>
going to put that first.<br>
<br>
This is how I will proceed until someone persuades me otherwise or<br>
our PM directs me otherwise.<br>
--<br>
<a href="<a href="http://www.catb.org/~esr/" rel="noreferrer" target="_blank">http://www.catb.org/~esr/</a>">Eric S. Raymond</a><br>
<br>
_______________________________________________<br>
devel mailing list<br>
<a href="mailto:devel@ntpsec.org" target="_blank">devel@ntpsec.org</a><br>
<a href="http://lists.ntpsec.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ntpsec.org/mailman/listinfo/devel</a><br>
</blockquote></div>