NTP Performance

Richard Laager rlaager at wiktel.com
Sat Nov 23 07:53:33 UTC 2019


On 11/23/19 12:57 AM, ASSI via devel wrote:
> Richard Laager via devel writes:
>> On 11/21/19 12:55 AM, ASSI via devel wrote:
>>>> I could try fiddling around with the polling interval. Next steps might
>>>> be to try raising the polling the interval to 4 and/or lowering it to 1.
>>
>>> It is generally inadvisable to use too short polling intervals unless
>>> rthe clock you are disciplining is exceptionally unstable.
>>
>> Do you have a recommended polling interval for the GPS PPS source? I've
>> looked at some ADEV charts (in general, nothing generated here)
>> comparing quartz and GPS, and it seems like maybe I want ~30-60 seconds?
> 
> I use 4 at the moment, but mainly to have the statistics for all
> machines on the same basis.  The best measurements I've seen for
> contemporary hardware are from the RADclock papers and they all suggest
> that the Allan intercept is between 300…3000s.  So once you've removed
> all the bumps from other influences, a polling interval of 10 (1024s)
> should be OK.
> 
>> I'm currently running with the values removed (i.e. using the defaults
>> of minpoll 6 maxpoll 10). It's still at 64. I'm avoiding touching the
>> server until it's been at least 24 hours of stability.
> 
> In general, a short polling interval will keep the clock close to
> the PPS pulse, but chases the frequency around a lot to do so.  A
> longer polling interval will keep the frequency more stable while
> allowing the PPS to be less well aligned over short periods of time.

That trade-off makes sense to me. But, of the two, it seems like
following the PPS closely is what I'd want. The PPS is supposed to be
more accurate than my computer. Now, if the computer is more accurate
for short durations, but the PPS (because it's from GPS) is more
accurate over the longer-term, then I think I understand why I'd want a
bit higher polling interval: to hit the Allan intercept.

>> I tried changing from gpsd SHM to the PPS driver, but then my offset
>> (which was with SHM too if I added it back) from ntpq -p was -100, which
>> I believe is milliseconds. That made me wonder if it was catching the
>> wrong edge of the PPS signal, as the pulse width on this GPS is 100 ms.
> 
> Install pps-tools and look at the PPS sequence with ppstest or ppswatch
> and it should become pretty clear what the correct edge is.  You said
> you use a "motherboard serial port".  If that's RS232 it probably needs
> clear edge, otherwise if it's TTL most likely assert edge.  Depending on
> what the serial controller actually is it may need to poll one of the
> two edges (which can actually be more precise than an interrupt on some
> hardware).  Don't forget to put the serial into low-latency mode.

Here is ntp1 (again, the TimeSource 3000 telecom GPSDO OCXO), which has
not been touched:

rlaager at ntp1:~$ sudo ppstest /dev/pps0
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1574493752.000003514, sequence: 4871490 - clear  1574493752.999958500, sequence: 338
source 0 - assert 1574493753.000003963, sequence: 4871491 - clear  1574493752.999958500, sequence: 338
source 0 - assert 1574493753.000003963, sequence: 4871491 - clear  1574493753.999956393, sequence: 339
source 0 - assert 1574493754.000001838, sequence: 4871492 - clear  1574493754.999956193, sequence: 340
source 0 - assert 1574493755.000003358, sequence: 4871493 - clear  1574493755.999957459, sequence: 341
source 0 - assert 1574493756.000002340, sequence: 4871494 - clear  1574493756.999956557, sequence: 342
source 0 - assert 1574493757.000001942, sequence: 4871495 - clear  1574493757.999955758, sequence: 343
source 0 - assert 1574493758.000003849, sequence: 4871496 - clear  1574493758.999976961, sequence: 344
source 0 - assert 1574493759.000003110, sequence: 4871497 - clear  1574493759.999956815, sequence: 345
source 0 - assert 1574493760.000002831, sequence: 4871498 - clear  1574493760.999961530, sequence: 346
source 0 - assert 1574493761.000002058, sequence: 4871499 - clear  1574493761.999955544, sequence: 347
source 0 - assert 1574493762.000004889, sequence: 4871500 - clear  1574493762.999956634, sequence: 348
source 0 - assert 1574493763.000002245, sequence: 4871501 - clear  1574493763.999955686, sequence: 349
source 0 - assert 1574493764.000004485, sequence: 4871502 - clear  1574493764.999955259, sequence: 350
source 0 - assert 1574493765.000005882, sequence: 4871503 - clear  1574493765.999956504, sequence: 351
source 0 - assert 1574493766.000006812, sequence: 4871504 - clear  1574493766.999961250, sequence: 352
source 0 - assert 1574493767.000002534, sequence: 4871505 - clear  1574493767.999955604, sequence: 353
source 0 - assert 1574493768.000001302, sequence: 4871506 - clear  1574493768.999954838, sequence: 354
source 0 - assert 1574493769.000005182, sequence: 4871507 - clear  1574493769.999956700, sequence: 355
source 0 - assert 1574493770.000002313, sequence: 4871508 - clear  1574493770.999956065, sequence: 356
source 0 - assert 1574493771.000003913, sequence: 4871509 - clear  1574493771.999963208, sequence: 357
source 0 - assert 1574493772.000002626, sequence: 4871510 - clear  1574493772.999956626, sequence: 358

On ntp2 (again, this is the u-blox 6 eval kit), I'm getting duplicated
sequence numbers, which doesn't seem quite right:

rlaager at ntp2:~$ sudo ppstest /dev/pps0
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1574493759.100134288, sequence: 93299 - clear  1574493760.000155012, sequence: 93299
source 0 - assert 1574493760.100134349, sequence: 93300 - clear  1574493760.000155012, sequence: 93299
source 0 - assert 1574493760.100134349, sequence: 93300 - clear  1574493761.000138190, sequence: 93300
source 0 - assert 1574493761.100134464, sequence: 93301 - clear  1574493761.000138190, sequence: 93300
source 0 - assert 1574493761.100134464, sequence: 93301 - clear  1574493762.000135892, sequence: 93301
source 0 - assert 1574493762.100135026, sequence: 93302 - clear  1574493762.000135892, sequence: 93301
source 0 - assert 1574493762.100135026, sequence: 93302 - clear  1574493763.000135960, sequence: 93302
source 0 - assert 1574493763.100134992, sequence: 93303 - clear  1574493763.000135960, sequence: 93302
source 0 - assert 1574493763.100134992, sequence: 93303 - clear  1574493764.000136569, sequence: 93303
source 0 - assert 1574493764.100134950, sequence: 93304 - clear  1574493764.000136569, sequence: 93303
source 0 - assert 1574493764.100134950, sequence: 93304 - clear  1574493765.000137680, sequence: 93304
source 0 - assert 1574493765.100135744, sequence: 93305 - clear  1574493765.000137680, sequence: 93304
source 0 - assert 1574493765.100135744, sequence: 93305 - clear  1574493766.000138950, sequence: 93305
source 0 - assert 1574493766.100135396, sequence: 93306 - clear  1574493766.000138950, sequence: 93305
source 0 - assert 1574493766.100135396, sequence: 93306 - clear  1574493767.000140203, sequence: 93306
source 0 - assert 1574493767.100135947, sequence: 93307 - clear  1574493767.000140203, sequence: 93306
source 0 - assert 1574493767.100135947, sequence: 93307 - clear  1574493768.000134559, sequence: 93307
source 0 - assert 1574493768.100135676, sequence: 93308 - clear  1574493768.000134559, sequence: 93307
source 0 - assert 1574493768.100135676, sequence: 93308 - clear  1574493769.000136480, sequence: 93308
source 0 - assert 1574493769.100139799, sequence: 93309 - clear  1574493769.000136480, sequence: 93308
source 0 - assert 1574493769.100139799, sequence: 93309 - clear  1574493770.000151973, sequence: 93309
source 0 - assert 1574493770.100133999, sequence: 93310 - clear  1574493770.000151973, sequence: 93309
source 0 - assert 1574493770.100133999, sequence: 93310 - clear  1574493771.000135982, sequence: 93310

>> At this point, I'm focusing on proving ntp2.wiktel.com is back to normal
>> GPS operation and seeing how the polling interval change behaves. Then I
>> might try to switch from SHM to PPS again.
> 
> You might want to say what you changed and when…

ntp1 remains untouched.

ntp2:

There were various reboots and even full power-cycles not listed here.

1) I changed from the SHM driver to the PPS driver. This involved
   getting ldattach running on /dev/ttyS0 (like I have on ntp1 where
   there is no gpsd).

2) The PPS was then -100 offset in ntpq -p and was not being used.

3) I tried adding the SHM back with noselect, to compare the two. It was
   also -100.

4) I eventually found, via u-blox u-center on Windows, via network
   forwarding of the serial port, that the pulse width was 100ms.

5) I reverted my changes and went back to SHM. I also stopped running
   ldattach, as gpsd handles this itself. I also changed my
   /etc/default/gpsd DEVICES from DEVICES="/dev/ttyS0 /dev/pps0" to just
   DEVICES="/dev/ttyS0", again as gpsd seems to Do The Right Thing by
   itself. I rebooted. This was 2019-11-22 05:27 UTC.

6) I removed the minpoll & maxpoll. This was the last restart of ntpd
   and was at 2019-11-22 08:32 UTC.

The scale of the offset swings seems to have settled down around 20:00
UTC, so I'd like to wait until at least 24 hours from that point before
doing anything else. This is for several reasons... It means I've given
ntpd enough time to hopefully stabilize. I will have given it a
reasonable opportunity to raise the polling intervals automatically
(right now we're still at a poll of 64s). The ntpviz graph will rescale
once the old samples are gone.

I might even want to wait another 12-24 hours for the samples to scroll
off the ntppool.org graph so it rescales. It's nice to have an external
view of how things are performing over the network.

If the polling interval _does_ go up, then I'll want to wait some period
of time to see how that is performing before making any further changes.

-- 
Richard


More information about the devel mailing list