nmea refclock not locking to the pps

Sun Mar 5 01:29:56 UTC 2017

Gary E. Miller wrote:
> Yo Tony!
> 
> On Fri, 3 Mar 2017 17:20:43 -0800
> "Tony Hain" <tony at tndh.net> wrote:
> 
> > Gary E. Miller wrote:
> > > Yo Tony!
> > >
> > > On Fri, 3 Mar 2017 01:00:23 -0800
> > > "Tony Hain" <tony at tndh.net> wrote:
> > >
> > > > Not clear if this is an ntpsec issue, or comes from upstream, but
> > > > the pps lock on the nmea stream never converges to a reasonable
> > > > offset or apparently itself.
> > >
> > > Fiar warning: I always recommend people not to use that refclok.
> >
> > Well I tried gpsd with no luck, so I switched back, and the nmea
> > driver was working up through 4.2.4p5, (which I was using because I
> > couldn't get gpsd to work with ntpd back then either). In both cases I
> > can get gpsd to show it is getting the correct feed from the gps, I
> > just can't make it play well with ntpd.
> 
> Which sounds similar to the problem you now report on the nmea driver.
> Just with fewer tools to use.  And by gpsd, I hope you mene the SHM driver,
> not the flakey gpsd-ng driver.

Flakey doesn't even begin to describe it:
# gpsmon
gpsmon:ERROR: SER: device open of tcp://localhost:2947 failed: No such file or directory - retrying read-only
# netstat -an|grep LIST
tcp4       0      0 127.0.0.1.2947         *.*                    LISTEN
tcp4       0      0 *.21                   *.*                    LISTEN
tcp6       0      0 *.21                   *.*                    LISTEN
tcp4       0      0 *.22                   *.*                    LISTEN
tcp6       0      0 *.22                   *.*                    LISTEN
# ntpmon
     remote           refid      st t when poll reach   delay   offset   jitter
oPPS(0)          .PPS.            0 l    4    8  377   0.0000   0.0513   0.0008
xGPSD(0)         .GPSD.           0 l   43   64  377   0.0000 16385488   0.0003
*2603:3023:102:1 .PPS.            1 u 1050    8  376   1.4244   0.1479   0.4357

My first reaction to the above is that gpsmon is broken because it likely picked localhost as ::1 then didn't try IPv4, because ntpd is showing reachability, even if it doesn't like the values it is getting. My second reaction is that gpsd is broken in that it doesn't bind to ::1.  ;-)

So, yes the SHM driver is the one I have been testing and it is not playing well with ntpd. 

> 
> > I don't know if the offset shown in 4.2.8p9 is due to changes in that
> > driver, or something in the BBB dmtpps driver implementation.
> 
> As I said, looks to me like the PPS got out voted.
> 
> > I will
> > be taking that up on the FreeBSD arm list, but I was less concerned
> > about that than the fact that the ntp-sec nmea driver appears to
> > behave very differently from 4.2.8p9 nmea driver on the same hardware.
> > I assumed that driver would have come down without changes other than
> > device identifier, but clearly something is different.
> 
> I doubt it is a driver issue, my guess is that it is a dice toss thing.
> With your setup NTP, of either flavor will lock onto the wrong source now
> and again.

I would buy that if it weren't a consistent behavior where the ntp-sec nmea driver is off by > 10x the offset shown by the 4.2.8p9 nmea driver, because other than  the refclock xxx / server 127.127.x.x syntax issues, the config is identical. 

> 
> > > That should be a good verion.
> > >
> > > > oPPS(0)          .PPS.            0 l    7    8  377   0.0000
> > > > 0.0001 0.0001
> > > > xNMEA(0)         .GPS.            0 l    6    8  377   0.0000
> > > > -52.1062 1.4493
> > > > *2001:470:e930:7 .GPS.            1 u   58   64  377   0.8173
> > > > -0.9772 1.9614
> > >
> > > Clearly the PPS got outvoted.
> >
> > The behavior in that configuration is what I expected because the nmea
> > start time jitter is high. It is the configurations where the pps flag
> > is turned on for the nmea interface that are not working as
> > expected.
> 
> Your expectations do not match mine.  That looks bad to me and is fixable.

That is the flag1 0 configuration where the nmea stream has no reference to the top-of-second mark, so any variance in the start of transmission will show as an offset from ntp time. The sirf manual does not give a fixed time for start of transmission, and given its reactions probably varies based on position on the ellipsoid and the number of satellites in view. Experimentation shows that for this location, adding 370ms to the sentence time results in nmea offset +/- 50ms from the pps0 mark.

In the flag1 1 configuration, the driver has the gpspps0 marker to compensate. My original point was that the ntp-sec version of that driver is an order of magnitude sloppier than the 4.2.8p9 driver on the same hardware with effectively the same configuration.

> 
> > > You neglected the most important par of a bug report: your ntp.conf.
> >
> > Well it appears from the copy I got back that the message format was
> > garbled.
> 
> Yeah, email does that.  Still very hard to read.
> 
> > # trying different values to see how it shifts refclock pps stratum 0
> > refid PPS minpoll 3 maxpoll 6 prefer time1
> > 0.000000337160
> 
> Looks fairly good.  My experiments on RasPi 3 shows that
> minpoll=maxpoll=2 will give best results.

I can't find anyplace in the code that actually specifies what the range is, but for 15+ years the documentation has said that the minimum is 4, but my experimentation has always shown that the minimum is 3. You can set less than that, but the result is always 8 sec polls, which equates to 3. I tried it again today, and minpoll = maxpoll = 2 still locks in at poll = 8.

> 
> > #refclock nmea refid GPS baud 4800 mode 8 minpoll 3 maxpoll 6 prefer
> > flag1 0 flag4 1 time2 0.442916667 refclock nmea refid GPS baud 4800
> > mode 8 minpoll 3 maxpoll 6 prefer flag1 1 flag4 1 time1
> > 0.000000337160 time2 0.072916667
> 
> hard to tell, I think that is the line being used?  If so, that is part if your
> problem.  You either need noselect, of minpoll much greater than the
> minpossl of the PPS.  Otherwise, as you see, ntpd flips a coin and locks onto
> the wrgon refclock.

That was garbled. That should be 2 lines where I can flip the first char to switch between flag1 settings. Flag1 0 doesn't need a time1, and needs the estimated 370ms additional offset, where flag1 1 is otherwise identical, and the offsets track measured/calculated corrections. 

> 
> > # The following three servers will give you a random set of three #
> > NTP servers geographically close to you.
> > # See http://www.pool.ntp.org/ for details. Note, the pool encourages
> > # users with a static IP and good upstream NTP servers to add a server
> > # to the pool. See http://www.pool.ntp.org/join.html if you are
> > interested. # # The option `iburst' is used for faster initial
> > synchronisation.
> > # The option `maxpoll 9' is used to prevent PLL/FLL flipping on
> > FreeBSD. # server 0.freebsd.pool.ntp.org iburst maxpoll 8
> 
> Best not to mix pool servers and specific servers.

I generally turn the pool servers off when calibrating because you never get the same one, and path symmetry can be anything. At least with specific servers path symmetry is generally consistent, even if it creates an offset. I turned one pool server on just to get another vote because the NIST servers were showing a persistent ~50ms offset lately. 

> 
> > > I'm guessing you do not have prefer set on your PPS?  You'll also
> > > want to set the mib- and max-poll on the PPS to much less than the
> > > for the nmea driver.
> >
> > Prefer is set on the pps & nmea, as well as the reference i386 system
> > which is what it keep locking to.
> 
> Well, that is part of the problem.  When ntpd start is it as likely to
> choose the nmea and the pps.   You prefer your besst source, not a
> flakey source.

I understand that. The pps driver says it is disabled unless there is a preferred server in the surviving set, or when another driver is tracking pps. When I make the nmea driver the only preferred option, the pps driver drops out because when the nmea driver is flag1 0, its offset is so large that it becomes a false ticker so there are no preferred survivors, and with flag1 1 the nmea driver takes over the pps tracking. 

> 
> > So right now the relevant lines are:
> > refclock pps stratum 0 refid PPS minpoll 3 maxpoll 6 prefer time1
> > 0.000000337160 refclock nmea refid GPS baud 4800 mode 8 minpoll 3
> > maxpoll 6 prefer flag1 1 flag4 1 time1 0.000000337160 time2
> > 0.072916667 server ntp2.tndh.net  minpoll 4 prefer
> 
> Pretty mushed together, did you actually prefer an internet source?

That should be 3 lines, parsed at refclock/server. The last line is the i386/FreeBSD-8 system the BBB is sitting on top of. 

> 
> Don't do that!

It was there for the nmea flag1 0 case so the pps would not drop out, and I have tried without it altogether, as well as without the prefer flag.

> 
> > Is it possible that something in the interpretation of refclock nmea
> > vs. peer 127.127.20.0 would account for the difference in handling the
> > pps event?
> 
> Nope.  The problem is you did not help ntpd select the best refclock.

I have cut the config down to 3 lines in the ntp-sec case, and the 5 functionally equivalent lines in the 4.2.8p9 case, and it still acts the same. If the nmea driver is set to track the pps, it locks and the 4.2.8p9 offset is ~20us while the ntp-sec offset is ~500us. The only thing I do between those is change the sym links to point at the different daemons and config files and restart the service. 

> 
> 
> > If it would be useful I can try switching back to gpsd to show what
> > that is doing. It has been awhile so I don't recall exactly off the
> > top of my head.
> 
> Is you convert to SHN, but keep your 'prefer's the same way your will get
> similar results.  Not every time as ntpd has to copin flip on start about which
> of the 3 prefer to believe.

I have changed the prefer to single on pps, single on nmea, flags on and off, and just about every other configuration I can think of. I am not convinced it is a predictability issue, because it does pretty much what I expect every time, except the ntp-sec nmea driver is an order of magnitude offset from the 4.2.8p9 version of the same configuration. The only other thing that was initially surprising was that the i386 system kept being preferred even when the nmea flag 1 was set to track the pps, but the persistent ~20us offset would explain that. Like I said earlier, that could be the gpio dmtpps implementation, so I will take that up on the FreeBSD side. 

Right now the configuration is SHM :::
refclock shm unit 0 refid GPS time1 0.4429166667
refclock shm unit 1 refid PPS prefer minpoll 3 maxpoll 3 time1 0.000000337160

# ppsapitest  /dev/pps0
1488676128 .949531317 451379 0 .000000000 0
1488676129 .949460094 451380 0 .000000000 0
1488676130 .949391909 451381 0 .000000000 0
1488676131 .949324748 451382 0 .000000000 0

Starting gpsd -D 5 shows :::
# grep pps /var/log/messages
Mar  4 16:52:22 tic gpsd[31863]: gpsd:INFO: KPPS:/dev/gps0 pps_caps 0x1133
Mar  4 16:52:22 tic gpsd[31863]: gpsd:INFO: stashing device /dev/pps0 at slot 1
Mar  4 16:52:22 tic gpsd[31863]: gpsd:PROG: PPS:/dev/pps0 chrony socket /var/run/chrony.pps0.sock doesn't exist
Mar  4 16:52:22 tic gpsd[31863]: gpsd:INFO: KPPS:/dev/pps0 RFC2783 path:/dev/pps0, fd is -2
Mar  4 16:52:22 tic gpsd[31863]: gpsd:INFO: KPPS:/dev/pps0 time_pps_create(-2) failed: Bad file descriptor
Mar  4 16:52:22 tic gpsd[31863]: gpsd:WARN: KPPS:/dev/pps0 kernel PPS unavailable, PPS accuracy will suffer
Mar  4 16:52:22 tic gpsd[31863]: gpsd:PROG: PPS:/dev/pps0 thread launched
Mar  4 16:52:22 tic gpsd[31863]: gpsd:PROG: KPPS:/dev/pps0 gps_fd:-2 not a tty, can not use TIOMCIWAIT
Mar  4 16:52:22 tic gpsd[31863]: gpsd:WARN: PPS:/dev/pps0 die: no TIOMCIWAIT, nor RFC2783 CANWAIT
Mar  4 16:52:22 tic gpsd[31863]: gpsd:PROG: PPS:/dev/pps0 gpsd_ppsmonitor exited.
Mar  4 16:52:22 tic gpsd[31863]: gpsd:INFO: PPS:/dev/pps0 ntpshm_link_activate: 0
Mar  4 16:52:22 tic gpsd[31863]: gpsd:INFO: device /dev/pps0 activated
Mar  5 00:52:49 tic gpsd[31863]: gpsd:WARN: PPS:/dev/gps0 unchanged state, ppsmonitor sleeps 10
Mar  5 00:53:30 tic gpsd[31863]: gpsd:WARN: PPS:/dev/gps0 unchanged state, ppsmonitor sleeps 10

# ntpshmmon
ntpshmmon version 1
#      Name Seen@                Clock                Real                 L Prec
sample NTP0 1488676083.453855318 1488676083.452996370 1488676083.000000000 0 -20
sample NTP0 1488676084.453409041 1488676084.452360690 1488676084.000000000 0 -20
sample NTP0 1488676085.359069375 1488676085.358358653 1488676085.000000000 0 -20
sample NTP0 1488676086.453295445 1488676086.452264486 1488676086.000000000 0 -20
sample NTP0 1488676087.450587130 1488676087.449534086 1488676087.000000000 0 -20

# ntpmon
     remote           refid      st t when poll reach   delay   offset   jitter
xSHM(0)          .GPS.            0 l   39   64  377   0.0000  31.7789   9.3181
 SHM(1)          .PPS.            0 l    -    8    0   0.0000   0.0000   0.0000
*2603:3023:102:1 .PPS.            1 u    5    8  377   1.4972  -0.1347   1.2510
+2001:470:e930:7 .GPS.            1 u    -   16  377   1.2670  -0.1301   1.0264
ntpd ntpsec-0.9.6+536 2017-02-22T20:26:50Z  Last update: 2017-03-04T17:28:31

So part of the reason I have been having a problem with gpsd is that ntpd can't get a response from the PPS unit. The pps0 comments in the log are conflicting in that it claims to have problems, 
gpsd:PROG: PPS:/dev/pps0 gpsd_ppsmonitor exited
then shows it works,
gpsd:INFO: device /dev/pps0 activated

So which is it? The last message in the log says activated, but ntpd's response suggests otherwise. Ntpshmmon doesn't tell me anything other than the precision is -20 instead of the -30 that it should be if the pps is really active. Gpsmon crashes, so how do I have more tools this way rather than using the discrete nmea and pps device drivers?

> 
> > PS:   a web page for gpsd with ntpsec {can't find it right now} says
> > to ensure 3.17+, but the gpsd download page only offers 3.16-.
> 
> git head.  We ahave been remiss getting 3.17 released.  Nothing related to
> what you see.

I eventually found it, but it took a little while. Search kept leading to the downloads page which didn't have it.

> 
> 
> > PPS:   tried ntpviz this morning, and it failed due to variance in
> > stats path assumption.
> 
> Easy to fix, just tell ntpviz where your stats are.

Was just following directions at: https://blog.ntpsec.org/2016/12/19/ntpviz-intro.html	which didn't indicate that telling ntpviz where the stats were was necessary. 

> 
> > I see from the command line help that command line and a config file
> > is an option, and that is good because I will likely post-process
> > these somewhere else because the clock system doesn't have gnuplot
> > installed (2nd failure) and my cross-build system ran out of disk (3rd
> > failure, done for the day)
> 
> So copy the files to a server with space and gnuplot.  Rsync, scp, NFS, etc.

Had planned to rsync the files like the current systems before seeing the ntpviz thing. That system doesn't currently have gnuplot either, but that is fixable.

> 
> 
> >. That said, I
> > had expected that ntpviz would read the ntp.conf file for the
> > location of the stats dir,
> 
> Bad expectation.  Precisely because many people do rsync, scp, nfs
> the stat files to unexpected places.

Just following directions.

> 
> And when you change default one plae you should expect to need to change
> defaults other places.  For example, ntpviz has no way of weven know
> which ntp.conf you are using.  There really is no standard place for
> ntp.conf.

That is a problem. Until the recent changes, on FreeBSD I would have said you could look in rc.conf to find out which ntp.conf is being used, but someone decided to make that concise location as diversified and difficult as linux ...  ;(

> 
> 
> 
> > so maybe have it try its command line
> > option, then its config file, then ntp.conf, then the existing
> > default, would allow for post-processing while tracking with where
> > ntpd has been told to put the stats, if it was told something
> > specific.
> 
> Ugh.  Once you go off plan, ntpviz could never read you mind and get
> back on plan.

I was simply thinking in terms of inserting ntp.conf in the search sequence, without otherwise impacting what is there. I understand about mind reading. The other way to look at it is that the web page assumes a specific location for the stats files without indicating that, or what options there are for resolving differences. Personally the command line is fine because I really don't want yet another conf file to maintain, and if a central system is processing for several servers it doesn't make sense to be editing a file or changing files for every run. 

Tony

> 
> RGDS
> GARY
> ---------------------------------------------------------------------------
> Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
> 	gem at rellim.com  Tel:+1 541 382 8588
> 
> 	    Veritas liberabit vos. -- Quid est veritas?
>     "If you can’t measure it, you can’t improve it." - Lord Kelvin