Clocks broken on Mac mini

Hal Murray hmurray at megapathdsl.net
Tue Jan 31 23:16:42 UTC 2017


gem at rellim.com said:
> I think we are heading the wrong way on this.  It is not an Open Firmware
> problem and it is not a Mac problem.  I have seen this before with NTP
> Classic.  I have verified before that Chronyd does not have the same issue.

> There was a time I used to buy and provision  10 identical servers at a
> time. Of the 10, one often had this problem.  The fix was chrony.

If you ever get another system where ntpd doesn't work but chrony does, I'd 
like to investigate.

> Sometimes the clock needs to be pulled more than ntpd expects and ntpd gives
> up just before it succeeds.

> I think this is clearly an ntpd bug. 

I think it's more complicated than that.

There are long standing occasional reports of ntp classic getting stuck with 
a drift of 500 and not being able to recover by itself.  It does recover if 
you delete the drift file and restart ntpd.  I assume our code will do the 
same thing.

I think your "gives up just before" is off by quite a bit.

ntpd has a limit of 500 ppm.  The kernel may have a similar limit.  I haven't 
dived into that corner of the kernel source yet.  A limit may be a good idea 
to prevent the system getting into strange modes where it jumps around rather 
than converging smoothly.  I'm not enough of a PLL geek to explain that area 
but I know that it's easy to get into oscillations.

I hacked our code to have a limit of 2500.  It didn't work.  Somebody claimed 
it was running at 500 ppm.  I don't know if that's the kernel or some corner 
of our code that I didn't catch.  (I think there are also a few lines of code 
in libc.)

I think it's reasonable for ntpd to expect the kernel clock to be reasonably 
close.  We could have an interesting discussion about how close is 
reasonable, but most systems get well within 500.  If not, it's usually a bug 
in the setup/calibration chain someplace.  There are lots of opportunities to 
get something wrong.  It's easy to get close-enough so that you won't notice 
unless you are interested in time and go looking for troubles.

In this case, I was half looking, but didn't notice the problem for quite a 
while.  Sure, it's easy to spot after you know what to look for, but it does 
work well enough to hide any gross errors.  I didn't get around to making 
graphs where it would have been obvious.

> So what is your 'adjtimex -p' output?
         mode: 0
       offset: -6106235
    frequency: 565962
     maxerror: 52530
     esterror: 1820
       status: 8193
time_constant: 6
    precision: 1
    tolerance: 32768000
         tick: 10029
     raw time:  1485903664s 513432740us = 1485903664.513432740

I'm not familiar with adjtimex.  I installed in a while ago.  The 
installation ran it.

Comparing clocks (this will take 70 sec)...done.
Adjusting system time by 251.512 sec/day to agree with CMOS clock...done.

That's 2911 ppm.

Looks like ntpd is happy now.


-- 
These are my opinions.  I hate spam.





More information about the devel mailing list