Resuming the great cleanup

Sun May 27 21:37:24 UTC 2018

Hal Murray <hmurray at megapathdsl.net>:
> 
> Eric said:
> > SINGLESOCK:  While messy and somewhat difficult, this is mostly a SMOP
> > (Simple Matter of Programming). There is one potential technical risk,
> > relatively minor I think.
> 
> > The reason for iterating over interfaces is that ntpd has the capability to
> > block incoming packets by interface of origin. In order to go to a single
> > epoll we either need to (a) abandon this feature, or (b) find a way to query
> > the device a packet came through from the packet. 
> 
> Could that feature be moved to a packet filter?  I think most OSes support 
> some form of kernel level packet filtering.  I'm not familiar with any 
> details.

It could be.  That would move control of it out of the ntp.conf file, though,
which I think would count as dropping the feature.

> Does anybody use interface filtering?

We don't know.

But: part of our pitch to Classic users is not requiring them to
change the way they do things unless what they're doing is
intrinsically insecure.  Thus, so far we haven't dropped any
user-visible features without a rather strong security reason. I was
worried that doing so would be bad optics back when dropping
interface-name-based filtering first came up.  I still am.

> > EVENTS: The code currently has a once-per-second tick that we want to
> > eliminate in favor of alarms that only fire as needed.  Unfortunately, this
> > is going to be quite difficult.  And we won't collect the major benefit
> > (lower power consumption) until every piece of it is done. 
> 
> Is there a better term than alarm?  The normal case will be to wait for a 
> packet to arrive with a N second timeout.  That's just a timeout on a poll.  
> I don't see that as anything alarming.

I use 'alarm' because I think of it as what alarm(2) does.  I'm not
wedded to that term.

> We can migrate the code in the right direction without major changes by 
> collecting future work events and putting them on a sorted queue.

Yes, we can do that.  I meant to mention in ny note tha at least we wouldn't
have to migrate all the event types at once - we'd  lift them out of timer()
one by one until timer() is empty.  Still a big messy job, though.

> > In our deployment scenarios, how often do we think a low-power device is
> > *not* going to be watching a GPS/1PPS refclock?  Smartphones and tablets are
> > right out - anything mobile with a browser wants to know location, therefore
> > will have a GPS. 
> 
> Just because a platform has GPS doesn't mean that ntpd should get tangled up 
> with using it.  On the scale of cell phones, GPS eats a lot of power.  I'll 
> bet they play all sorts of turn-it-off games to save power.

They do. On my Android phone, for example, you have to enable "Precise
location" for GPS to be used rather than a cruder approximation based
on timing distances from nearby cell towers. It normally fores up only
when you use the Maps app; you get a little icon in your status bar to
remind you that it's on and eating power.

Cellphones aren't a very interesting deployment case for us, though,
because they already get GPS-steered high precision time from their
network.  While it's possible to put NTP on one (and I actually did
this two phones ago) doing so is a stunt with no practical benefit.

> Also, consider laptops instead of cell phones.  How many of them have GPS?

Few or none have it built in.  You have a better argument here than on
smartphones.  

The question for our product manager (Mark) is then whether laptop users
are important enough to our strategy to motivate this change.

> You should probably add cleaning up SHM to your list.  I think we want to 
> make the read side read-only.  The current approach is polled.  Maybe we 
> should move to a socket.   ???

If we move to a socket it's not SHM any more.

I'm not clear what you mean by making it read-only.  Can you explain?

> PPS processing is also polled.  I think the API has an option to wakeup on 
> new data.  I don't know if anybody has tried it.

Not much point as long as clock reports are waking us up anyway.

> There is a potential tangle in the low power area.  To really save power, you 
> want to turn off the CPU clock that is used for timekeeping.  That means 
> switching to the RTC/TOY clock.  It may need a separate drift correction.  
> Maybe we need a hook to catch return-from-superlow-power so we can restart 
> the internal state, similar to what happens after the clock is stepped.

*blink* Where's the portable API to use for switching bwrween these clocks?

> > 2. There's a subtle issue here with frequency of clock adjustment. Currently
> > if we're slewing the clock it gets adjusted once per second. If we go to a
> > fully event-driven architecture (and there are no refclocks) the frequency
> > of adjustments will drop to the frequency of network traffic. This may not
> > be a practical problem - I'm inclined to think it won't be - but we won't
> > know until we measure. 
> 
> Does the no-refclock case really adjust anything each second?  There is no 
> new data.  Why would it change the clock?  The slewing is handled in the 
> kernel - there is no reason to keep poking it.

Look at ntp_timer.c:189 where it calls adj_host_clock().

> The refclock case is batched and merged into the normal packet flow.

Maybe.  Now look at ntp_timer.c:194.

> How thread friendly is GO?

*Extremely.*

Go has a really lovely concurrency toolkit based on threads (which it
calls "goroutines") communicating via typed thread-locked queues
(which it calls "channels")  It's Hoare's CSP done with style.

> There is another potential cleanup area.  There are 2 modes of PPS.  The 
> normal mode mostly treats the PPS as another refclock.  The other mode is to 
> let the kernel do all the work.  This is not included in most kernels, but if 
> you are willing to build your own kernel you can get much better results.  I 
> don't see any reason that we can't do the equivalent logic in userland.
> 
> This could potentially be included in the great refclock cleanup, but it 
> requires feedback from the sanity check level of ntpd to tell the PPS 
> processing that it should/shouldn't actually feed corrections to the kernel.

I don't understand this area very well.  Could you write up a more
detailed work plan?  You might have to be the one to do it.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.