nmea refclock not locking to the pps

Tony Hain tony at tndh.net
Wed Mar 8 02:16:12 UTC 2017


Gary E. Miller wrote:
...
> > > Yo Tony!
> > >
> > > On Sat, 4 Mar 2017 17:29:56 -0800
> > > "Tony Hain" <tony at tndh.net> wrote:
> > >
> > > > > Which sounds similar to the problem you now report on the nmea
> > > > > driver. Just with fewer tools to use.  And by gpsd, I hope you
> > > > > mene the SHM driver, not the flakey gpsd-ng driver.
> > > >
> > > > Flakey doesn't even begin to describe it:
> > > > # gpsmon
> > > > gpsmon:ERROR: SER: device open of tcp://localhost:2947 failed: No
> > > > such file or directory - retrying read-only
> > >
> > > Odd, something wrong with your setup.  My gpsd is working fine on
> > > IPv6:
> >
> > I didn't do anything to specify it should build IPv6, so if the
> > default is to build without it, that is what I am running. I was just
> > following the instructions on catb, so I will look at the ports
> > makefile when I get a chance to see if there is something that I
> > should have specified before the make.
> 
> The default is to build with IPv6.
> 
> > scons didn't build cgps.
> 
> cgps has never been an option.  It is always built.

This was built from the example at 
http://www.catb.org/gpsd/gpsd-time-service-howto.html
simply changing the port speed
scons timeservice=yes nmea0183=yes fixed_port_speed=4800 fixed_stop_bits=1

# scons --help
scons: Reading SConscript files ...
GPS regression tests suppressed because socket_export or python is off.
...
>
>  # which python
>  /usr/local/bin/python
>
>  # python -V
>  Python 2.7.13
>
...
RTCM2 regression tests suppressed because rtcm104v2 is off.
AIVDM regression tests suppressed because aivdm is off.
Part of the website build requires asciidoc, not installed.
scons: done reading SConscript files.
...
clientdebug: client debugging support (yes|no)
    default: True
    actual: False

control_socket: control socket for hotplug notifications (yes|no)
    default: True
    actual: False

controlsend: allow gpsctl/gpsmon to change device settings (yes|no)
    default: True
    actual: False
...
fixed_port_speed: fixed serial port speed
    default: 0
    actual: 4800

fixed_stop_bits: fixed serial port stop bits
    default: 0
    actual: 1
...
gpsdclients: gspd client programs (yes|no)
    default: True
    actual: False
...
ipv6: build IPv6 support (yes|no)
    default: True
    actual: False
...
nmea0183: NMEA0183 support (yes|no)
    default: True
    actual: True
...
ntp: NTP time hinting support (yes|no)
    default: True
    actual: True

ntpshm: NTP time hinting via shared memory (yes|no)
    default: True
    actual: True
...
pps: PPS time syncing support (yes|no)
    default: True
    actual: True
...
timeservice: time-service configuration (yes|no)
    default: False
    actual: True

> 
> 
> > With the 3.14 port there was a cgps, but it fails on a lib error if
> > used with the 3.17 gpsd.
> 
> Of course, the shared lib of 3.14 is not compatible with that of 3.17.
> 
> > Fresh installing the
> > ports 3.14 build, cpgs worked and showed the pps was turned off
> > consistent with the log file stating it was being shutdown due to lack
> > of a serial data stream.
> 
> Uh, no, the log file output you sent me said no KPPS.  That is a compile time
> problem.
> 
> You have a LOT of compile time problems.  Very odd.

I can recompile forever, but until the "not linux" sections get replaced by code that doesn't assume pps is on the serial port control lines, it will not make any difference:

ppsthread.c ---

    /*
     * This next code block abuses "ret" by storing the filedescriptor
     * to use for RFC2783 calls.
     */
#ifndef __clang_analyzer__
    ret = -1;  /* this ret will not be unneeded when the 'else' part
                * of the followinng ifdef becomes an #elif */
#endif /* __clang_analyzer__ */
#ifdef __linux__
    /*
     * Some Linuxes, like the RasPi's, have PPS devices preexisting.
     * Other OS have no way to automatically determine the proper /dev/ppsX.
     * Allow user to pass in an explicit PPS device path.
     *
     * (We use strncpy() here because this might be compiled where
     * strlcpy() is not available.)
     */
    if (strncmp(pps_thread->devicename, "/dev/pps", 8) == 0) {
...
        (void)strncpy(path, pps_thread->devicename, sizeof(path)-1);
...

#else /* not __linux__ */
    /*
     * On BSDs that support RFC2783, one uses the API calls on serial
     * port file descriptor.
     *
     * FIXME! need more specific than 'not linux'
     */
    (void)strlcpy(path, pps_thread->devicename, sizeof(path));
    // cppcheck-suppress redundantAssignment
    ret  = pps_thread->devicefd;
#endif
    /* assert(ret >= 0); */
    pps_thread->log_hook(pps_thread, THREAD_INF,
                "KPPS:%s RFC2783 path:%s, fd is %d\n",
                pps_thread->devicename, path,
                ret);

note the differences in the pps_thread->devicename statements ...
fwiw: FreeBSD 10+ defaults to clang

gpsd.c ---

#if defined(PPS_ENABLE)
        /* propagate this in-band-time to all PPS-only devices */
        for (ppsonly = devices; ppsonly < devices + MAX_DEVICES; ppsonly++)
            if (ppsonly->sourcetype == source_pps)
                pps_thread_fixin(&ppsonly->pps_thread, &td);
#endif /* PPS_ENABLE */

Clearly the gpio is a ppsonly device, but gets tested for a serial stream, then times out as 'no tty'. I can't find that block again, but when searching all the files for PPS_ENABLE, I never did find a place that it is defined, they are all ifdef's.

> 
> 
> > > > So, yes the SHM driver is the one I have been testing and it is
> > > > not playing well with ntpd.
> > >
> > > Odd, I have been using that on large numbers of hosts for well over
> > > a decade with no issues.  Many others as well.  Care to elaborate?
> >
> > I am aware lots of people run it without issue.
> 
> So once again, pointing to an issue about your host.
> 
> > -- Ok, so part of the issue with gpsd is that ppsthread.c assumes that
> > "non-linux" -- systems always have the pps on the primary serial port
> > control pins.
> 
> Yes, by default, Just override on the command line.  But pointless until you
> fix KPPS.

To isolate if the code is trying to attach pps to /dev/gps0, I tried (did this earlier but didn't save the logs)
gpsd -n /dev/gps0 /dev/pps1    

Mar  7 16:06:50 tic gpsd[53648]: gpsd:INFO: KPPS:/dev/gps0 pps_caps 0x1133
Mar  7 16:06:50 tic gpsd[53648]: gpsd:INFO: stashing device /dev/pps1 at slot 1
Mar  7 16:06:50 tic gpsd[53648]: gpsd:PROG: PPS:/dev/pps1 chrony socket /var/run/chrony.pps1.sock doesn't exist
Mar  7 16:06:50 tic gpsd[53648]: gpsd:INFO: KPPS:/dev/pps1 RFC2783 path:/dev/pps1, fd is -2
Mar  7 16:06:50 tic gpsd[53648]: gpsd:INFO: KPPS:/dev/pps1 time_pps_create(-2) failed: Bad file descriptor
Mar  7 16:06:50 tic gpsd[53648]: gpsd:WARN: KPPS:/dev/pps1 kernel PPS unavailable, PPS accuracy will suffer
Mar  7 16:06:50 tic gpsd[53648]: gpsd:PROG: PPS:/dev/pps1 thread launched
Mar  7 16:06:50 tic gpsd[53648]: gpsd:PROG: KPPS:/dev/pps1 gps_fd:-2 not a tty, can not use TIOMCIWAIT
Mar  7 16:06:50 tic gpsd[53648]: gpsd:WARN: PPS:/dev/pps1 die: no TIOMCIWAIT, nor RFC2783 CANWAIT
Mar  7 16:06:50 tic gpsd[53648]: gpsd:PROG: PPS:/dev/pps1 gpsd_ppsmonitor exited.
Mar  7 16:06:50 tic gpsd[53648]: gpsd:INFO: PPS:/dev/pps1 ntpshm_link_activate: 0
Mar  7 16:06:50 tic gpsd[53648]: gpsd:INFO: device /dev/pps1 activated

So the fd as -2 is a problem. Don't know if that is due to the difference between strlcpy vs. strncpy statements, or if it didn't like the device. Since the ppsapitest, and both ntp pps drivers have no problem with it as either pps0 or pps1, the implication would be gpsd didn't actually construct the proper fd.

> 
> > getting the
> > maintainer to move from 3.14 to something current needs to happen.
> 
> Yes, disappointing to dfix bugs and add features and not have them go
> downstream.
> 
> > > Lost me, range of what?  You mean poll?  It is documented as one to
> > > I think 2048.  Anything past 8 is just wrong for PPS.
> >
> > Yes poll, but the min range of 2 doesn't result in a poll of 4. It
> > polls at 8 which is identical to minpoll = 3.
> 
> No.  minpoll is 1. Not 2.  Your best minpoll is 2, or maybe 3, depending on the
> result you want.
> 
> > > Im pretty sure the RasPi is not that old.  minpoll is very host
> > > specific.
> >
> > The ntp documentation has never discussed poll in a host specific way
> > that I recall. Look back at all the ntp4 documentation you can find,
> > and the minimum number discussed is 4.
> 
> If you can point me to a specific doc error I can fix it.  And remember, the doc
> is not the place for discussion, that belongs in howto's, faq's and blogs.
> 
> > > I did a series of experiments, documented on devel at ntpsec.org list.
> > > They showed poll =1 or poll =2 is clearly better, depending on what
> > > you are optimizing.  Check the list archives.
> >
> > That must be an ntpsec specific fix, because when you first said that
> > I set minpoll=2 maxpoll=2 in 4.2.8p9 which was running at the time and
> > the period was locked at 8 sec, just as it always has. When I set that
> > on the ntpsec config  for the run below it does poll at 4.
> 
> I don't keep track of what NTP Classic does.  This is the NTPsec list.

That is simply a place that the code differs in a good way. 

> 
> > > > I understand that. The pps driver says it is disabled unless there
> > > > is a preferred server in the surviving set, or when another driver
> > > > is tracking pps. When I make the nmea driver the only preferred
> > > > option, the pps driver drops out because when the nmea driver is
> > > > flag1 0, its offset is so large that it becomes a false ticker so
> > > > there are no preferred survivors, and with flag1 1 the nmea driver
> > > > takes over the pps tracking.
> > >
> > > Right, so don't do that!
> >
> > Don't do what? Setting flag1 1 works ~fine on 4.2.8p9, and tracks the
> > pps as it should.
> 
> Yes, sometimes, not always.  IMHO you are seeing experimeter error.  You
> can't judge NTP Classic or NTPsec until your setup has run untouched for
> 24 hours.  You have been violating that rule.

Just since I started the thread, because it  was converging back to the point that it had been when I ran it for 48 hours before deciding that something was amiss.

> 
> > It looks like the offset I am seeing from the local reference systems
> > in the 4.2.8p9 case might be explained by an asymmetry in the stack.
> 
> ntpviz should help you see that.
> 
> 
> > > If your offset is really 442 milli Sec than that is a really bad
> > > time1.
> >
> > I can see on the scope that the serial stream starts around 350 ms
> > after the pps, but can't get a good clean mark because it apparently
> > varies based on the time to process the number of satellites in view.
> > The vendor manual says it starts 'several hundered ms ...', so it
> > appears to be expected behavior.
> 
> Once again, you do not want the NMEA offset set correctly, you want it set
> WRONG!  ntpviz will rivially calculate for you the optimum offset, then use
> something very different.
> 
> 
> > As I said in the conf comments,
> > experimentation shows that adding 370ms to the sentence time results
> > in +/- 50ms offset from the pps mark over the course of the
> > observations.
> 
> And as I said in previous emails, that means NOT to use 370!  Use something
> very wrong, like 500.
> 
> > In any case, as I said earlier the 4.2.8p9 nmea driver in pps enable
> > mode compensates for that, so the offset is: --- ZDA standard includes
> > timezone offset fields not sent by sirf3, and number of
> > characters may vary by offset ---     the printed char length here is
> > always 33 # Time2 -- end of ZDA incl chksum/crlf:   35 char
> > @4800bps(480cps) 0.072916667
> 
> You think it compensates, but it does so badly.
> 
> > > Where is your 3rd chimer?  Having just 2 is bad.
> >
> > I normally run with 3 internal and 2 different external's on each.
> > For the purposes of focus here,
> 
> And then you wonder why you don't get the results you have before?
> Just change on thing during testing!

I have been doing this for awhile. After changing one thing at a time trying to isolate what might be introducing error, I started the thread after I had cut it back to a minimum set of things that were repeatable. Occasionally I put several things back in just make sure I hadn't overlooked a change that was due to subtle incremental moves. 

> 
> > > Not useful until you say how long it has been running.
> >
> > That was just the first sample from a truncated list, simply to show
> > that the ppspai is 'functional' even if it is only tracking the
> > leading edge.
> 
> I know what you did and why, don't do that!
> 
> > The fact that gpsd can't find it is most likely explained by the
> > "non-linux" code blocks.
> 
> As I has told you in previsou emails, the missing KPPS is game over.
> 
> Fix that or give up now.

Tried changing the strlcpy to strncpy to match the linux block, but it made no difference. The rest of the linux discussion with magic device names is difficult to follow, but there is clearly error checking that doesn't happen in the non-linux case. I copied the error check from the linux block after ret = & before the endif, but there was no "running as" message in the log.
    if ( 0 > ret ) {
        char errbuf[BUFSIZ] = "unknown error";
        (void)strerror_r(errno, errbuf, sizeof(errbuf));
        pps_thread->log_hook(pps_thread, THREAD_INF,
                    "KPPS:%s running as %d/%d, cannot open %s: %s\n",
                    pps_thread->devicename,
                    getuid(), geteuid(),
                    path, errbuf);
        return -1;
    }

It printed the line right after the endif

    pps_thread->log_hook(pps_thread, THREAD_INF,
                "KPPS:%s RFC2783 path:%s, fd is %d\n",
                pps_thread->devicename, path,
                ret);
Mar  7 17:51:02 tic gpsd[54405]: gpsd:PROG: PPS:/dev/pps1 chrony socket /var/run/chrony.pps1.sock doesn't exist
Mar  7 17:51:02 tic gpsd[54405]: gpsd:INFO: KPPS:/dev/pps1 RFC2783 path:/dev/pps1, fd is -2


> 
> > >
> > > really better if you present an ntpviz URL.
> >
> > Haven't been able to run ntpviz yet. It is only installed on the BBB
> > system without gnuplot.
> 
> No one says ntpviz has to run on the BBB.  Few do, they run it remotely.
> 
> > > > Starting gpsd -D 5 shows :::
> > >
> > > Whoa!  Huge red flag!
> > >
> > > Mar  4 16:52:22 tic gpsd[31863]: gpsd:WARN: KPPS:/dev/pps0 kernel
> > > PPS unavailable, PPS accuracy will suffer
> > >
> > > No point continuing until that is fixed.
> >
> > It is not clear if that is because /dev/gps0 device does not have a
> > corresponding pps (because the code appears to assume that the pps
> > will be on the control lines of a /dev/gpsN device), or if it was from
> > the explicit second entry of /dev/pps0 timing out its serial stream.
> 
> Makes no difference.
> 
> > > NTPsec runs on way more than FreeBSD, and Linux.  And even on Linux
> > > the ntp.conf file has at least 10 different locations that I can
> > > name off the top of my head.  And no preferred order of searching!
> > > For example Gentoo supplies a large number configs in a config
> > > directory and starts ntpd using one that matches the local
> > > configuration.
> >
> > Prior to linux emerging and splitting / moving the file(s) all over
> > the place, the "most likely" place was /etc/rc.conf. Standards being
> > what they are, even that was not always true. Unfortunately the
> > FreeBSD team has recently been infected by the concept that 1000's of
> > files in random places is better than a concise sequential file.
> 
> And now you have perfectly described why we can never assume which
> ntp.conf is being used.
> 
> RGDS
> GARY
> ---------------------------------------------------------------------------
> Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
> 	gem at rellim.com  Tel:+1 541 382 8588
> 
> 	    Veritas liberabit vos. -- Quid est veritas?
>     "If you can’t measure it, you can’t improve it." - Lord Kelvin



More information about the bugs mailing list