Technical strategy and performance

Wed Jun 29 18:12:04 UTC 2016

Thank you Eric.  Have read, am pondering, and welcome other people to weigh
in.

..m

On Tue, Jun 28, 2016 at 8:30 PM Eric S. Raymond <esr at thyrsus.com> wrote:

> In recent discussion of the removal of memlock, Hal Murray said
> "Consider ntpd running on an old system that is mostly lightly loaded
> and doesn't have a lot of memory."
>
> By doing this, he caused me to realize that I have not been explicit
> about some of the assumptions behind my technical strategy.  I'm now
> going to try to remedy that.  This should have one of three results:
>
> (a) We all develop a meeting of of the minds.
>
> (b) Somebody gives me technical reasons to change those assumptions.
>
> (c) Mark tells me there are political/marketing reasons to change them.
>
> So here goes...
>
> One of the very first decisions we made early last year was to code to a
> modern API - full POSIX and C99. This was only partly a move for ensuring
> portability; mainly I wanted a principled reason (one we could give
> potential
> users and allies) for ditching all the cruft in the codebase from the
> big-iron
> era.
>
> Even then I had clearly in mind the idea that the most effective
> attack we could make on the security and assurance problem was to
> ditch as much weight as possible.  Hence the project motto: "Perfection
> is achieved, not when there is nothing more to add, but when there is
> nothing left to take away."
>
> There is certainly a sense in which my ignorance of the codebase and
> application domain forced this approach on me.  What else *could* I
> have done but prune and refactor, using software-engineering skills
> relatively independent of the problem domain, until I understood enough
> to do something else?
>
> And note that we really only reached "understood enough" last week
> when I did magic-number elimination and the new refclock directive.
> It took a year because *it took a year!*.  (My failure to deliver
> TESTFRAME so far has to be understood as trying for too much in the
> absence of sufficient acquired knowledge.)
>
> But I also had from the beginning reasons for believing, or at least
> betting, that the most drastic possible reduction in attack surface
> would have been the right path to better security even if the state of
> my knowledge had allowed alternatives. C. A. R. Hoare: "There are two
> ways of constructing a software design: One way is to make it so
> simple that there are obviously no deficiencies, and the other way is
> to make it so complicated that there are no obvious deficiencies.
>
> So, simplify simplify simplify and cut cut cut...
>
> I went all-in on this strategy.  Thus the constant code excisions over
> the last year and the relative lack of attention to NTP Classic bug
> reports. I did so knowing that there were these associated risks: (1)
> I'd cut something I shouldn't, actual function that a lot of potential
> customers really needed, or (2) the code had intrinsic flaws that would
> make it impossible to secure even with as much reduction in attack surface
> and internal complexity as I could engineer, or (3) my skills and intuition
> simply weren't up to the job of cutting everything that needed to be cut
> without causing horrible, subtle breakage in the process.
>
> (OK, I didn't actually worry that much about 3 compared to 1 and 2 - I
> know how good I am. But any prudent person would have to give it a
> nonzero probability. I figured Case 1 was probably manageable with good
> version-control practice.  Case 2 was the one that made me lose some
> sleep.)
>
> This bet could have failed.  It could have been the a priori *right*
> bet on the odds and still failed because the Dread God Finagle
> pissed in our soup. The success of the project at its declared
> objectives was riding on it. And for most of the last year that was a
> constant worry in the back of my mind.  *What if I was wrong?* What I
> was like the drunk in that old joke, looking for his keys under the
> streetlamp when he's dropped then two darkened streets over because
> "Offisher, this is where I can see".
>
> It didn't really help with that worry that I didn't know *anyone* I
> was sure I'd give better odds at succeeding at this strategy than
> me. Keith Packard, maybe.  Poul Henning-Kemp, maybe, if he'd give up
> timed for the effort, which he wouldn't. Recently I learned that Steve
> Summit might have been a good bet. But some problems are just too
> hard, and this codebase was *gnarly*.  Might be any of us would have
> failed.
>
> And then...and then, earlier this year, CVEs started issuing that we
> dodged because I had cut out their freaking attack surface before we
> knew there was a bug!  This actually became a regular thing, with the
> percentage of dodged bullets increasing over time.
>
> Personally, this came as a vast and unutterable relief. But,
> entertaining narrative hooks aside, this was reality rewarding my
> primary strategy for the project.
>
> So, when I make technical decisions about how to fix problems, one of
> the main biases I bring in is favoring whatever path will allow me
> to cut the most code.
>
> On small excisions (like removing memory locking, or yet another
> ancient refclock driver) I'm willing to trade a nonzero risk that
> removing code will break some marginal use cases, in part because I am
> reasonably confident of my ability to revert said small excisions. We
> remove it, someone yells, I revert it, no problem.
>
> So don't think I'm being casual when I do this. What I'm really doing
> is exploiting how good modern version control is.  The kind of tools
> we now have for spelunking code histories give us options we didn't
> have in elder days. Though of course there's a limit to this sort of
> thing.  It would be impractical to restore mode 7 at this point.
>
> Now let's talk about hardware spread and why, pace Hal, I don't really
> care about old, low-memory systems and am willing to accept a fairly high
> risk of breaking on them in order to cut out complexity.
>
> The key word here is "old".  I do care a lot about *new* low-memory
> systems, like the RasPis in the test farm. GPSD taught me to always
> keep an eye towards the embedded space, and I have found that the
> resulting pressure to do things in lean and simple ways is valuable
> even when designing and implementing for larger systems.
>
> So what's the difference?  There are a couple of relevant ones.  One
> is that new "low-memory" systems are actually pretty unconstrained
> compared to the old ones, memory-wise.  The difference between (say) a
> 386 and the ARM 7 in my Pis or the Snapdragon in my smartphone is
> vast, and the worst-case working set of ntpd is pretty piddling stuff
> by modern standards.  Looking at the output of size(1) and thinking
> about the size of struct peer my guess was that it would be running
> with about 0.8GB of RAM, and top(1) on one of my Pis seems to confirm
> this.
>
> Another is that disk access is orders of magnitude faster than it
> used to be, and ubiquitous SSDs are making it faster yet.  Many
> of the new embedded systems (see: smartphones) don't have spinning
> rust at all.
>
> What this means in design terms is that with one single exception,
> old-school hacks to constrain memory usage, stack size, volume
> of filesystem usage, and so forth - all those made sense on
> those old systems but are almost dead weight even on something
> as low-end as a Pi.  The one exception is that if you have an
> algorithmic flaw that causes your data set to grow without bound
> you're screwed either way.
>
> But aside from that, the second that resource management becomes a
> complexity and defect source, it should be dumped.  This extends from
> dropping mlockall() all the way up to using a GC-enabled language like
> Python rather than C whenever possible.  Not for nothing am I planning
> to at some point scrap ntpq in C to redo it in Python.
>
> Now, as to *why* I don't care about old low-power systems - it's
> because the only people who are going to run time service on them are
> a minority of hobbyists.  A minority, I say, because going forward
> most of the hobbyists interested in that end of things are going to be
> on Pis or Beaglebones or ODroids so they can have modern toolchains
> thank you.
>
> Let's get real, here.  The users we're really chasing are large data
> centers and cloud services, because that's where the money (and
> potential funding) is.  As long as we don't make algorithmic mistakes
> that blow up our big-O, memory and I/O are not going to be performance
> problems for their class of hardware in about any conceivable
> scenario.
>
> Here's what this means to me: if I can buy a complexity reduction (and
> thus a security gain) by worrying less about how the resulting code
> will perform on machines from before the 64-bit transition of 2007-2008,
> you damn betcha I will do it and sleep the sleep of the just that
> night.
>
> When all is said and done, we could outright *break* on hardware that
> old and I wouldn't care much. Unless somebody is paying us to care and
> I get a cut, in which case I will cheerfully haul out my shovels and
> rakes and implements of destruction and fix it, and odds are high
> we'll end up with better code than we inherited.
>
> Yeah, it's nice to squeeze performance out of old hardware, and it's
> functional to be sparing of resources.  But when everything in both
> our security objectives and our experience says "cut more code" I'm
> going to put that first.
>
> This is how I will proceed until someone persuades me otherwise or
> our PM directs me otherwise.
> --
>                 <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
>
> _______________________________________________
> devel mailing list
> devel at ntpsec.org
> http://lists.ntpsec.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ntpsec.org/pipermail/devel/attachments/20160629/8f5ee66f/attachment.html>