Technical strategy and performance
fallenpegasus at gmail.com
Wed Jun 29 18:12:04 UTC 2016
Thank you Eric. Have read, am pondering, and welcome other people to weigh
On Tue, Jun 28, 2016 at 8:30 PM Eric S. Raymond <esr at thyrsus.com> wrote:
> In recent discussion of the removal of memlock, Hal Murray said
> "Consider ntpd running on an old system that is mostly lightly loaded
> and doesn't have a lot of memory."
> By doing this, he caused me to realize that I have not been explicit
> about some of the assumptions behind my technical strategy. I'm now
> going to try to remedy that. This should have one of three results:
> (a) We all develop a meeting of of the minds.
> (b) Somebody gives me technical reasons to change those assumptions.
> (c) Mark tells me there are political/marketing reasons to change them.
> So here goes...
> One of the very first decisions we made early last year was to code to a
> modern API - full POSIX and C99. This was only partly a move for ensuring
> portability; mainly I wanted a principled reason (one we could give
> users and allies) for ditching all the cruft in the codebase from the
> Even then I had clearly in mind the idea that the most effective
> attack we could make on the security and assurance problem was to
> ditch as much weight as possible. Hence the project motto: "Perfection
> is achieved, not when there is nothing more to add, but when there is
> nothing left to take away."
> There is certainly a sense in which my ignorance of the codebase and
> application domain forced this approach on me. What else *could* I
> have done but prune and refactor, using software-engineering skills
> relatively independent of the problem domain, until I understood enough
> to do something else?
> And note that we really only reached "understood enough" last week
> when I did magic-number elimination and the new refclock directive.
> It took a year because *it took a year!*. (My failure to deliver
> TESTFRAME so far has to be understood as trying for too much in the
> absence of sufficient acquired knowledge.)
> But I also had from the beginning reasons for believing, or at least
> betting, that the most drastic possible reduction in attack surface
> would have been the right path to better security even if the state of
> my knowledge had allowed alternatives. C. A. R. Hoare: "There are two
> ways of constructing a software design: One way is to make it so
> simple that there are obviously no deficiencies, and the other way is
> to make it so complicated that there are no obvious deficiencies.
> So, simplify simplify simplify and cut cut cut...
> I went all-in on this strategy. Thus the constant code excisions over
> the last year and the relative lack of attention to NTP Classic bug
> reports. I did so knowing that there were these associated risks: (1)
> I'd cut something I shouldn't, actual function that a lot of potential
> customers really needed, or (2) the code had intrinsic flaws that would
> make it impossible to secure even with as much reduction in attack surface
> and internal complexity as I could engineer, or (3) my skills and intuition
> simply weren't up to the job of cutting everything that needed to be cut
> without causing horrible, subtle breakage in the process.
> (OK, I didn't actually worry that much about 3 compared to 1 and 2 - I
> know how good I am. But any prudent person would have to give it a
> nonzero probability. I figured Case 1 was probably manageable with good
> version-control practice. Case 2 was the one that made me lose some
> This bet could have failed. It could have been the a priori *right*
> bet on the odds and still failed because the Dread God Finagle
> pissed in our soup. The success of the project at its declared
> objectives was riding on it. And for most of the last year that was a
> constant worry in the back of my mind. *What if I was wrong?* What I
> was like the drunk in that old joke, looking for his keys under the
> streetlamp when he's dropped then two darkened streets over because
> "Offisher, this is where I can see".
> It didn't really help with that worry that I didn't know *anyone* I
> was sure I'd give better odds at succeeding at this strategy than
> me. Keith Packard, maybe. Poul Henning-Kemp, maybe, if he'd give up
> timed for the effort, which he wouldn't. Recently I learned that Steve
> Summit might have been a good bet. But some problems are just too
> hard, and this codebase was *gnarly*. Might be any of us would have
> And then...and then, earlier this year, CVEs started issuing that we
> dodged because I had cut out their freaking attack surface before we
> knew there was a bug! This actually became a regular thing, with the
> percentage of dodged bullets increasing over time.
> Personally, this came as a vast and unutterable relief. But,
> entertaining narrative hooks aside, this was reality rewarding my
> primary strategy for the project.
> So, when I make technical decisions about how to fix problems, one of
> the main biases I bring in is favoring whatever path will allow me
> to cut the most code.
> On small excisions (like removing memory locking, or yet another
> ancient refclock driver) I'm willing to trade a nonzero risk that
> removing code will break some marginal use cases, in part because I am
> reasonably confident of my ability to revert said small excisions. We
> remove it, someone yells, I revert it, no problem.
> So don't think I'm being casual when I do this. What I'm really doing
> is exploiting how good modern version control is. The kind of tools
> we now have for spelunking code histories give us options we didn't
> have in elder days. Though of course there's a limit to this sort of
> thing. It would be impractical to restore mode 7 at this point.
> Now let's talk about hardware spread and why, pace Hal, I don't really
> care about old, low-memory systems and am willing to accept a fairly high
> risk of breaking on them in order to cut out complexity.
> The key word here is "old". I do care a lot about *new* low-memory
> systems, like the RasPis in the test farm. GPSD taught me to always
> keep an eye towards the embedded space, and I have found that the
> resulting pressure to do things in lean and simple ways is valuable
> even when designing and implementing for larger systems.
> So what's the difference? There are a couple of relevant ones. One
> is that new "low-memory" systems are actually pretty unconstrained
> compared to the old ones, memory-wise. The difference between (say) a
> 386 and the ARM 7 in my Pis or the Snapdragon in my smartphone is
> vast, and the worst-case working set of ntpd is pretty piddling stuff
> by modern standards. Looking at the output of size(1) and thinking
> about the size of struct peer my guess was that it would be running
> with about 0.8GB of RAM, and top(1) on one of my Pis seems to confirm
> Another is that disk access is orders of magnitude faster than it
> used to be, and ubiquitous SSDs are making it faster yet. Many
> of the new embedded systems (see: smartphones) don't have spinning
> rust at all.
> What this means in design terms is that with one single exception,
> old-school hacks to constrain memory usage, stack size, volume
> of filesystem usage, and so forth - all those made sense on
> those old systems but are almost dead weight even on something
> as low-end as a Pi. The one exception is that if you have an
> algorithmic flaw that causes your data set to grow without bound
> you're screwed either way.
> But aside from that, the second that resource management becomes a
> complexity and defect source, it should be dumped. This extends from
> dropping mlockall() all the way up to using a GC-enabled language like
> Python rather than C whenever possible. Not for nothing am I planning
> to at some point scrap ntpq in C to redo it in Python.
> Now, as to *why* I don't care about old low-power systems - it's
> because the only people who are going to run time service on them are
> a minority of hobbyists. A minority, I say, because going forward
> most of the hobbyists interested in that end of things are going to be
> on Pis or Beaglebones or ODroids so they can have modern toolchains
> thank you.
> Let's get real, here. The users we're really chasing are large data
> centers and cloud services, because that's where the money (and
> potential funding) is. As long as we don't make algorithmic mistakes
> that blow up our big-O, memory and I/O are not going to be performance
> problems for their class of hardware in about any conceivable
> Here's what this means to me: if I can buy a complexity reduction (and
> thus a security gain) by worrying less about how the resulting code
> will perform on machines from before the 64-bit transition of 2007-2008,
> you damn betcha I will do it and sleep the sleep of the just that
> When all is said and done, we could outright *break* on hardware that
> old and I wouldn't care much. Unless somebody is paying us to care and
> I get a cut, in which case I will cheerfully haul out my shovels and
> rakes and implements of destruction and fix it, and odds are high
> we'll end up with better code than we inherited.
> Yeah, it's nice to squeeze performance out of old hardware, and it's
> functional to be sparing of resources. But when everything in both
> our security objectives and our experience says "cut more code" I'm
> going to put that first.
> This is how I will proceed until someone persuades me otherwise or
> our PM directs me otherwise.
> <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
> devel mailing list
> devel at ntpsec.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the devel