Proposed technical roadmap

Tue Nov 17 18:02:51 UTC 2015

Here is how I see the road towards 1.0.

In laying out this plan, I'm going to make the optimistic assumption
that nobody comes back at us with a report that we've screwed up
something complicated and have to scramble to recover.

On that assumption, I think we're looking at six to eight weeks to 1.0,
with two or maybe three 0.9.x betas in between.  For funding-politics
reasons, six would be better than eight - CII wants to see velocity,
and we ought to be able to deliver it.

The core C codebase (that is, excluding tests) appears to be in good
shape. Putting it on a serious reduction diet seems to have done it no
real harm, and quite a lot of good in the maintainability department.
I have no real worries about it other than that one unreproduced
report of an ntpdig coredump from Hal.

Other than currently embargoed vuln fixes, he only core C change that
I think really needs to land before 1.0 is Chris's rewrite of the
worker code for handling getaddrinfo() requests.  That will fix an
annoying bug that we inherited from Classic and make us look good.

The unit tests are a different matter.  It's *bad* that we shipped
with only 20% of them working.  That needs to be fixed ASAP and
should in my opinion be top priority for 0.9.1.

Myself, I think I badly need to get working on TESTFRAME and
concentrate on it.  It's absolutely central to our strategy and
positioning that the code be able to demonstrate replicability and
end-to-end correctness.

That means I need to be able to offload everything but five-alarm
vuln responses onto other people. It's time for that anyway; it's
not really long-term healthy for just one person to be doing 85%
of the coding, it means others are not developing enough implicit
knowledge to sustain the project if that one person gets hit by
a truck.

So start spelunking, people!  And thank your fortunate stars that
I sandblasted most of the accreted historical crud off the code
before you had to look at much of it. Later in this note I'll
list some get-to-know-the-C-code tasks.

Here's how I think the pre-1.0 tesks naturally break down.  Mark
may correct my priority assessments...

Daniel Franke:
   1. Vuln response, embargo tracking.  Your #1 job is to make sure
      that we look *golden* to InfoSec people, merging solid vuln fixes
      to the main repo the day they come out of embargo.
   2. Explore. You are at or near the top of my list of people most
      qualified to get intimate with the time-sync algorithms.  A
      prerequisite is that you become comfortable with the codebase
      as it is.

Chris Johns:
   1. Asynch worker code for getaddrinfo bug.  Should land in a beta.
   2. Windows port work.  Port not required for 1.0 - take your time,
      get it right, no kludges.
   3. Explore. You are one of the people I am counting on to get
      fluent with the codebase and do things that surprise me.

Hal Murray:
   1. Live testing. You are going to be our most important reality
      check until TESTFRAME lands and probably after it as well.
   2. Bring the ports to older BSDs up to snuff.  Not absolutely
      necessary for 1.0 but would be nice.
   3. Watch what Classic is doing.  Alert us of the need to cross-port
      new bug and security fixes.
   4. Feature vs. test status matrix.

Me:
    1. TESTFRAME.  I should be able to finish at least capture mode before 1.0.
    2. Replay branch construction (gated on one of Amar's tasks).

Joel Sherrill:
    You don't have a record yet, so I have no idea what you ought to
    be doing. Maybe you can come up with something.

Amar Takhar:
   1. Make all the unit tests work.
   2. Backport the waf port so it works at the fork point (needed for
      replay branch construction).

If I've missed anything, or any of you has any objection to these
assignments, speak up so we can allocate more efficiently.  I encourage
everyone to reply with time and difficulty assessments.

Explanation of replay branch construction: Right now, due to
occasional build breakage and general autoconf horribleness,
running bisections all the way back to the fork point is hard.

I want to fix that - making sure we can easily isolate bugs on
our backtrail is the best anti-Murphy medicine against actually
*having* bugs on our backtrail.

Thus, I want Amar to start a new branch from the fork point that
builds the codebase as it then existed with waf.  I will then replay
the post-fork commit history onto the branch; I expect this to be
about 4 or 5 days of hard slogging.

At the end of it we'll have a new master branch with no build breaks
that can be bisected fast all the way back to its zero point. Among
other good things, this will give us the ability to say of inherited
bugs "We didn't do it!" and *prove* that.

Now, for intrepid codebase explorers out there, some C tasks that I am not
planning to do because TESTFRAME:

* seccomp sandboxing fails to build under Ubuntu due to some confusion
  in the Linux headers.  Investigate.

* systime.c needs patching to put ntpdsim's hook back in place. Deferred
  until the ntpdsim build is fixed.

* There is a mess around the symbols NO_MAIN_ALLOWED, BUILD_AS_LIB, and
  LIBNTP_C that needs to be refactored.  ntpd should *always* be built as
  a library linked to a main module, these guard symbols should go away.

* Use the snprintb in util/ntptime for flag words like flash
  codes and use it systematically to make reports more readable.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The two pillars of `political correctness' are, 
  a) willful ignorance, and
  b) a steadfast refusal to face the truth
	-- George MacDonald Fraser