Stratum one autonomy and assumptions about GPS

Thu Aug 25 04:19:46 UTC 2016

This was going to be a note to just Hal originally, but it will do the
rest of the team no harm to know more about the scenarios and
assumptions driving some of my design choices.

Hal objected (off list) to me drawing a conclusion from today's
offset multiplot that check servers aren't necessary when you have
a local GPS - a Stratum 1 really can run autonomously. He said,
correctly of course, that the check servers aren't there to improve
time accuracy when the GPS has sat lock, but to backstop the GPS when
it flakes out.

I shall now discuss three interlocking reasons this possibility does
not loom as large in my mind as it does in Hal's.

1. GPS outage length and frequencies are decreasing

I've been watching GPS technology evolve since I took the maintainer's
baton of GPSD in 2004.  A very clear trend is that GPS reliability has
gone way up. Weak-signal sensitivity keeps improving, the number of
channels in the engines has gone from 1 or 2 to 12 and (with newer
devices) up in to 18-24 range.

It's now pretty normal for state-of-market chips like the ublox-8 or
MTK3339 to be able to operate *continuously* indoors - I know this
because I have a row of six of them on the Official Windowsill of Mad
Science and can see their blinkenlights in my upper peripheral vision when
I look at my main monitor.  And this is with tall trees right outside.

If your expectations are formed by experience with older hardware,
you're going to overestimate the frequency and mean duration of
outages significantly.  And things will get better - in fact, are
still getting better fast enough that improvements are visible on
roughly 6-month timescales.

(Case in point: Gary is playing with a recent chip that does *centimeter*
accuracy in *real time* - not postprocessed.)

2. The autonomy scenarios I think about are not hobbyist-budget productions

The user story I have in mind when I think about supporting autonomous
Stratum 1 is not the Official Windowsill of Mad Science. It's a big
data center, an oil tanker, or a military base - an organization that
wants to firewall out NTP packets and can afford to put a radio mast (or
three) on a roof.

Thus we get to assume good skyview from the get-go.  We also get to assume
high-end GPS hardware and extended antennas and other good things. It's not
like paying for quality in this space is expensive - by the standards of
anyone who can out up a radio mast it's actually trivial.

3. There's a lower bound below which outages don't matter; we may be there.

Any given fixed accuracy target for deviation from UTC, combined with a maximum
crystal drift rate, defines a longest tolerable GPS outage. 

When I think about this, I consider 10ms total deviation from UTC as
the target for WAN service.  Let's say your local clock drifts by 3ms
per hour.  Then you can tolerate a GPS outage of a bit over 3 hours
before you will start shipping bad time.  I have *never* seen a GPS
outage that long, even with older hardware.

My observed worst case these days is about 20 minutes, and that is *rare* -
usually only when cold-booting.  For that to be intolerably long, the local
clock would have to drift by 30ms per hour.  Actually I seldom see outages
longer than 5 minutes, which implies a critical drift of 120ms/hr.  And
this is with my consumer-grade hardware, indoors of a windowsill with
trees outside.

We may already be at a technological place where GPS outages don't bust the
tolerable-error budget, even with cheap hardware. If we aren't, we'll
probably be there soon.  One of my medium-term agenda items is to measure
and see.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>