nmea refclock not locking to the pps

Tony Hain tony at tndh.net
Tue Mar 7 21:31:14 UTC 2017


Gary E. Miller wrote:
> Yo Tony!
> 
> On Sat, 4 Mar 2017 17:29:56 -0800
> "Tony Hain" <tony at tndh.net> wrote:
> 
> > > Which sounds similar to the problem you now report on the nmea
> > > driver. Just with fewer tools to use.  And by gpsd, I hope you mene
> > > the SHM driver, not the flakey gpsd-ng driver.
> >
> > Flakey doesn't even begin to describe it:
> > # gpsmon
> > gpsmon:ERROR: SER: device open of tcp://localhost:2947 failed: No such
> > file or directory - retrying read-only
> 
> Odd, something wrong with your setup.  My gpsd is working fine on IPv6:

I didn't do anything to specify it should build IPv6, so if the default is to build without it, that is what I am running. I was just following the instructions on catb, so I will look at the ports makefile when I get a chance to see if there is something that I should have specified before the make.

> 
> tcp6       0      0 ::1:2947                :::*                    LISTEN      25354/gpsd
> 
> BTW, I never use gpsmon.  I much prefer cpgs.  Many tools in the gpsd
> basket.  Try another one.  Or just tell gpsmon to use ipv4:
>     gpsmon 127.0.0.1
> 

scons didn't build cgps. With the 3.14 port there was a cgps, but it fails on a lib error if used with the 3.17 gpsd. Fresh installing the ports 3.14 build, cpgs worked and showed the pps was turned off consistent with the log file stating it was being shutdown due to lack of a serial data stream. Removing the 3.14 port and running scons on the 3.17 src, there is no cgps. 
~/src/gpsd # which cgps
cgps: Command not found.
~/src/gpsd # ls /usr/local/bin | grep gps
gpsmon
ntploggps

> 
> 
> > So, yes the SHM driver is the one I have been testing and it is not
> > playing well with ntpd.
> 
> Odd, I have been using that on large numbers of hosts for well over a decade
> with no issues.  Many others as well.  Care to elaborate?

I am aware lots of people run it without issue. I have wanted it to work so I could have other clients pulling GSA samples; but trying periodically I have never had luck getting it to play with ntpd with any accuracy, even on linux. In the subsequent note to the one you are responding to I suggested a place to look, as at least ppsthread.c treats "non-linux" pps differently, and there is a discussion of ppsonly in various files, but no place to set the flag that enables it. :::::

-- Ok, so part of the issue with gpsd is that ppsthread.c assumes that "non-linux" 
-- systems always have the pps on the primary serial port control pins. That might 
-- be a reasonable default assumption if  /dev/ppsx isn't provided, but ...
--
-- A related issue, and maybe core to why this is failing is gpsd.c and others talk about 
-- ppsonly in   if def (PPS_ENABLE)  blocks, but I can't find in any of the files where 
-- PPS_ENABLE gets defined. So providing /dev/pps0 as a second source should 
-- resolve the first problem, but only when a ppsonly device is enabled. Given content
--  in the logs when debug is turned on, /dev/pps0 is tried, but when no serial stream
-- comes in it is shut down. How would one go about forcing PPS_ENABLE so that the 
-- ppsonly /dev/pps0 device is not turned off? Wouldn't it make more sense to test 
-- a port that times out the serial stream for a pps signal before shutting it down? 
--
-- I backed up to 3.14 in the FreeBSD ports tree, and by magic, gpsmon started 
-- working (ntpd still doesn't like the JSON data). I don't know what build option 
-- differences there might be, so that is something to look into Monday. In any case, 
-- getting the maintainer to move from 3.14 to something current needs to happen.

> 
> > > I doubt it is a driver issue, my guess is that it is a dice toss
> > > thing. With your setup NTP, of either flavor will lock onto the
> > > wrong source now and again.
> >
> > I would buy that if it weren't a consistent behavior where the ntp-sec
> > nmea driver is off by > 10x the offset shown by the 4.2.8p9 nmea
> > driver, because other than  the refclock xxx / server 127.127.x.x
> > syntax issues, the config is identical.
> 
> Identically bad.

Both less than accurate, but not 'identical'. They are an order of magnitude off from each other. 

> 
> > > Your expectations do not match mine.  That looks bad to me and is
> > > fixable.
> >
> > That is the flag1 0 configuration where the nmea stream has no
> > reference to the top-of-second mark, so any variance in the start of
> > transmission will show as an offset from ntp time.
> 
> Which is why 'prefer'ing it is a bad idea.
> 
> > > Looks fairly good.  My experiments on RasPi 3 shows that
> > > minpoll=maxpoll=2 will give best results.
> >
> > I can't find anyplace in the code that actually specifies what the
> > range is,
> 
> Lost me, range of what?  You mean poll?  It is documented as one to I think
> 2048.  Anything past 8 is just wrong for PPS.

Yes poll, but the min range of 2 doesn't result in a poll of 4. It polls at 8 which is identical to minpoll = 3.

> 
> 
> >  but for 15+ years the documentation has said that the minimum is 4,
> 
> Im pretty sure the RasPi is not that old.  minpoll is very host specific.

The ntp documentation has never discussed poll in a host specific way that I recall. Look back at all the ntp4 documentation you can find, and the minimum number discussed is 4.

> 
> > but my experimentation has always shown that the minimum is 3.
> 
> I did a series of experiments, documented on devel at ntpsec.org list.
> They showed poll =1 or poll =2 is clearly better, depending on what you are
> optimizing.  Check the list archives.

That must be an ntpsec specific fix, because when you first said that I set minpoll=2 maxpoll=2 in 4.2.8p9 which was running at the time and the period was locked at 8 sec, just as it always has. When I set that on the ntpsec config  for the run below it does poll at 4.

> 
> The other reason to set poll low is so that ntpd will lock onto the PPS before
> is has the chance to be mislead by other clocks.

I understand, and that is why I have always used 3 because that was the minimum demonstrated to work, and I didn't want some code update to decide that 1 or 2 were "out of bounds" and to be ignored.

> 
> > I understand that. The pps driver says it is disabled unless there is
> > a preferred server in the surviving set, or when another driver is
> > tracking pps. When I make the nmea driver the only preferred option,
> > the pps driver drops out because when the nmea driver is flag1 0, its
> > offset is so large that it becomes a false ticker so there are no
> > preferred survivors, and with flag1 1 the nmea driver takes over the
> > pps tracking.
> 
> Right, so don't do that!

Don't do what? Setting flag1 1 works ~fine on 4.2.8p9, and tracks the pps as it should. It looks like the offset I am seeing from the local reference systems in the 4.2.8p9 case might be explained by an asymmetry in the stack. Ping times are about 2x the offset longer when initiated from the BBB side than they are when initiated from the reference systems (which are on the same switch, or one / two local router hops apart). That is being chased on the FreeBSD arm list. 

For the purposes here, running with flag1 0 is only shown for completeness, and to demonstrate that the pps api is functional. It is not the way I intend to operate. 

> 
> > Right now the configuration is SHM :::
> > refclock shm unit 0 refid GPS time1 0.4429166667
> 
> If your offset is really 442 milli Sec than that is a really bad time1.

I can see on the scope that the serial stream starts around 350 ms after the pps, but can't get a good clean mark because it apparently varies based on the time to process the number of satellites in view. The vendor manual says it starts 'several hundered ms ...', so it appears to be expected behavior. As I said in the conf comments, experimentation shows that adding 370ms to the sentence time results in +/- 50ms offset from the pps mark over the course of the observations.

In any case, as I said earlier the 4.2.8p9 nmea driver in pps enable mode compensates for that, so the offset is:
--- ZDA standard includes timezone offset fields not sent by sirf3, and number of characters may vary by offset 
---     the printed char length here is always 33 
# Time2 -- end of ZDA incl chksum/crlf:   35 char @4800bps(480cps) 0.072916667

peer 127.127.20.0 mode 8 minpoll 2 maxpoll 2 prefer
fudge 127.127.20.0 refid GPS stratum 0 flag1 1 time1 0.000000337160 time2 0.0729166667

or

refclock nmea refid GPS baud 4800 mode 8 prefer flag1 1 flag4 1 time1 0.000000337160 time2 0.072916667

> 
> > refclock shm unit 1 refid PPS prefer minpoll 3 maxpoll 3 time1
> > 0.000000337160
> 
> Close, minpoll and maxpoll will do better at 2.
> 
> Where is your 3rd chimer?  Having just 2 is bad.

I normally run with 3 internal and 2 different external's on each. For the purposes of focus here, it is cut back to the nmea / gpio-pps, raspbian, and a freebsd 8. I realize that is a minimal set, but the point is it 'works' with 4.2.8p9 in a configuration that is functionally identical except for the refclock/127.127 syntax changes. I added another freebsd 8 system to the mix again, but I have done that before and it makes no difference. 

# ntpmon
     remote           refid      st t when poll reach   delay   offset   jitter
o127.127.20.0    .GPS.            0 s    5    8  377    0.000   -0.002    0.002
+2001:470:e930:2 .PPS.            1 u   14   16  377    0.634   -0.021    0.986
*express.tndh.ne .GPS.            1 u    8   64  377    1.230   -0.026    0.917
+2001:470:e930:7 .GPS.            1 u    4   64  377    0.817   -0.031    1.369
ntpd 4.2.8p9-a (1)                          Last update: 2017-03-06T21:13:15

# ls -l /etc/ntp.conf
lrwxr-xr-x  1 root  wheel  20 Mar  6 19:15 /etc/ntp.conf -> /etc/ntp-legacy.conf
# rm /etc/ntp.conf
# ln -s /etc/ntp-sec.conf /etc/ntp.conf
# ls -l /usr/sbin/ntpd
lrwxr-xr-x  1 root  wheel        21 Mar  6 19:16 ntpd -> /usr/sbin/ntpd-legacy
# rm /usr/sbin/ntpd
# ln -s /usr/local/sbin/ntpd /usr/sbin/ntpd
# service ntpd restart
...

# ntpmon
     remote           refid      st t when poll reach   delay   offset   jitter
oNMEA(0)         .GPS.            0 l   50   64  377   0.0000  -0.0115   0.0043
*2001:470:e930:2 .PPS.            1 u    4   64  377   0.7986  -0.1119   0.0187
+express.tndh.ne .GPS.            1 u   48   64  377   1.1708  -0.1617   0.0779
+2001:470:e930:7 .GPS.            1 u   16   64  377   0.9215  -0.1506   0.0154
ntpd ntpsec-0.9.6+536 2017-02-22T20:26:50Z Last update: 2017-03-07T08:50:40

edit ntp.conf : swap from nmea to shm refclocks
	since gpsd refuses to enable pps0, turn on that refclock as well
# gpsd -n /dev/gps0 /dev/pps0
# service ntpd restart
...

# ntpmon
     remote           refid      st t when poll reach   delay   offset   jitter
oPPS(0)          .PPS.            0 l    -    4  375   0.0000   0.0370   0.0008
xSHM(0)          .GPS.            0 l   23   64  377   0.0000  34.3316   0.8796
 SHM(1)          .PPS.            0 l    -    8    0   0.0000   0.0000   0.0000
*2001:470:e930:2 .PPS.            1 u    5    8  377   0.6840  -0.0151   2.2705
+express.tndh.ne .GPS.            1 u    3   16  377   1.2744   0.1288   1.7483
+2001:470:e930:7 .GPS.            1 u    2   16  377   0.9668   0.0085   0.4302
ntpd ntpsec-0.9.6+536 2017-02-22T20:26:50Z  Last update: 2017-03-07T13:26:18


> 
> > # ppsapitest  /dev/pps0
> > 1488676128 .949531317 451379 0 .000000000 0
> 
> Not useful until you say how long it has been running.

That was just the first sample from a truncated list, simply to show that the ppspai is 'functional' even if it is only tracking the leading edge. The fact that gpsd can't find it is most likely explained by the "non-linux" code blocks.

> 
> really better if you present an ntpviz URL.

Haven't been able to run ntpviz yet. It is only installed on the BBB system without gnuplot.

> 
> > Starting gpsd -D 5 shows :::
> 
> Whoa!  Huge red flag!
> 
> Mar  4 16:52:22 tic gpsd[31863]: gpsd:WARN: KPPS:/dev/pps0 kernel PPS
> unavailable, PPS accuracy will suffer
> 
> No point continuing until that is fixed.

It is not clear if that is because /dev/gps0 device does not have a  corresponding pps (because the code appears to assume that the pps will be on the control lines of a /dev/gpsN device), or if it was from the explicit second entry of /dev/pps0 timing out its serial stream. The ppsapitest and the fact that ntpd 4.2.8p9 or ntp-sec pps driver have no problem with the ppsapi, says that it is functional, and suggests the thing that needs fixing is gpsd. 

KPPS:/dev/pps0 gps_fd:-2 not a tty, can not use TIOMCIWAIT
------------------------------^^^^^^^^^^------------------

The /dev/pps0 device was explicitly provided as a second source but, since this isn't a linux system, the gpsd code shuts the interface down when there is no serial stream. My reading of ppsthread.c says that since this is gpio on FreeBSD, it is designed to fail. If pps was on the serial interface associated control lines it would work, or the gpio device would work if the OS was linux.

> 
> 
> > That is a problem. Until the recent changes, on FreeBSD I would have
> > said you could look in rc.conf to find out which ntp.conf is being
> > used, but someone decided to make that concise location as diversified
> > and difficult as linux ...  ;(
> 
> NTPsec runs on way more than FreeBSD, and Linux.  And even on Linux the
> ntp.conf file has at least 10 different locations that I can name off the top of
> my head.  And no preferred order of searching!  For example Gentoo
> supplies a large number configs in a config directory and starts ntpd using one
> that matches the local configuration.

Prior to linux emerging and splitting / moving the file(s) all over the place, the "most likely" place was /etc/rc.conf. Standards being what they are, even that was not always true. Unfortunately the FreeBSD team has recently been infected by the concept that 1000's of files in random places is better than a concise sequential file.

> 
> RGDS
> GARY
> ---------------------------------------------------------------------------
> Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
> 	gem at rellim.com  Tel:+1 541 382 8588
> 
> 	    Veritas liberabit vos. -- Quid est veritas?
>     "If you can’t measure it, you can’t improve it." - Lord Kelvin



More information about the bugs mailing list