PPS undersampling

Tue Aug 23 23:13:04 UTC 2016

> Yeah, been looking at that.  Since ntpd is undersampling the PPS it can be
> either good, or real bad.  I'm tempted to get Eric to fix the bug first, but
> maybe I'll need to data so he sees the bug first. 

I don't know of any ntpd bug in this area.  Do you have a test case?  Or data 
from a buggy run?

There are/were glitches in some of the gpsd utilities.  I fixed one a long 
time ago and I think you fixed one recently.  But they are using a different 
mechanism.

Your mention of Nyquist on IRC a few days ago makes me think that you don't 
understand this area or more likely are just grabbing a word in a related 
area for a similar problem.

Nyquist refers to analog signals.  If your signal has a bandwidth of X Hz, 
you need 2*X samples per second in order to be able to reconstruct the signal.

The PPS is not analog.  We are not trying to reconstruct a signal.  We are 
trying to pass information across clock domains.  The equivalent of the 
Nyquist rule is that you have to grab the data before it gets updated with 
new data.  There is no factor of 2 in there.  Just "fast enough".

The gpsd code I fixed was doing something like:
  while true {
    if sampleready() then dosomething()
    sleep 1 second
  }
That is polling at slightly slower than 1 second.  If the PPS goes off every 
second, that approach will occasionally miss a sample.  But the occasional 
will be very small (unless your scheduler is very busy or your CPU is slow or 
your dosomething does a lot of work).  Unless you look carefully it will be 
hard to notice.  (But geeks do look carefully.)

The code in ntpd doesn't work that way.  It runs off a timer that signals and 
resets itself.  That signal goes off every second, the same rate as the PPS.

If you are polling at a fixed rate from an unsynchronized clock, the "fast 
enough" has to include the jitter on the data source and the jitter on the 
scheduling clock.  If the PPS signal has 1 ms peak-peak of jitter, you would 
have to poll every 999 ms.  If your scheduler adds 1 ms of jitter, you have 
to ask it to run you every 998 ms.

But the ntpd clock is not unsynchronized.  It's running at exactly 1 second 
per second in the long term.

ntpd doesn't do anything special about when it starts the timer.  So if you 
are unlucky and you start ntpd so that it is near a PPS pulse when it sets 
the timer, you might miss occasional samples.  It will be tough to reproduce 
but there should be evidence in clockstats.

If we decide to fix this, the fix is not to sample twice as fast, but to make 
sure the timer doesn't go off too close to the top of a second.  In the 
meantime, we should collect data on the PPS jitter and/or scheduler jitter so 
we can make an informed estimate of what "too close" will be.

Actually, any fix should wait for a bigger cleanup in this area.  We should 
get rid of the current every-second timer unless some refclock needs it.  
(battery power)  The PPS API has an optional wait/wakeup option.  We should 
use that if available.  The timer stuff also covers when to transmit a 
packet, but we should peek ahead and set the timeout on the select for as 
long as possible.  But all that should wait until after TESTFRAME so we can 
test it.

-- 
These are my opinions.  I hate spam.