Driver strategy - we need to decide among incompatible goals

Achim Gratz Stromeko at nexgo.de
Sun Aug 11 12:51:13 UTC 2019


Eric S. Raymond via devel writes:
> You've forgotten much, then. I remind you of the Type 2 Bancomm, the
> Type 45 Spectracom TSync PCI, and the Type 16 Bancomm GPS/IRIG
> Receiver.

Type 2 was actually the venerable Trak GPS, which was available in a
number of output configurations.  NTPd probably only supported PPS/RS232
and as such would not have needed any blobs.  Both the Spectracomm and
Bancomm you cite (which I've never seen in real life) are add-in cards
(PCI in one case and VME in the other), so they would have needed a
device driver in the kernel and likely (based on some SMPTE hardware
from around the same timeframe that I did get my hands on briefly) also
some firmware that you'd need to download for them to actually start.
The Bancomm PCI cards (variously branded as Datum and Symmetricom) must
have been built by the shipload as you can still buy them NOS, the later
models at least seemed to have EEPROM, so no need for downloading the
firmware anymore.

In other words, while there may have been blobs there, none of them were
actually in NTPd.  I've started using xntpd on Sun hardware when the
method of distribution was still a QIC250 tape that was sent around via
the postal service, so I would very much have noticed if there were any
blobs besides the actual source code.

>> > I wrote about this bit of history because it's a precedent for
>> > narrowing our hardware support in order to improve our security
>> > and reduce our expected downstream defect rate.
>> 
>> Before you start to go down that road remind us again what threat model
>> you are trying to protect against.  Any talk about security is hollow
>> theater without that bit of information.
>
> Whatever your threat model is, reducing attack surface is effective
> security hardening.  Reducing total LOC and complexity in the codebase
> reduces attack surface.  Thus, reducing LOC anywhere you can do it 
> is a hardening strategy.

As a strategy it's fine, as a criterion for deciding what to let go it's
useless.  That is very much the point I was trying to make: your
proposed criterion doesn't actually tell us anything about which threat
you are going to mitigate.

> If you're only just now noticing that this is NTPsec development's
> central thrust, and has been since 2015, and that judging by CVEs in
> Classic that we've evaded it has been rather spectacularly successful,
> maybe you ought to be paying closer attention to what we're actually
> doing and achieving before you criticize.

How many of these were related to device support, obsolete or not?

>> > NTPsec aims to be highly secure and reliable.  If we're serious about
>> > that, we need to reduce our vulnerability to defects from these
>> > wraparound/rollover problems. 
>> 
>> You won't make even a tiny step in that direction based on your current
>> understanding of the issues.
>
> Please read https://docs.ntpsec.org/latest/rollover.html so you won't 
> be under any misapprehensions about what we understand.  You might
> also want to read the big comment at  
>
> https://gitlab.com/gpsd/gpsd/blob/master/timebase.c
>
> You can see from that how firm a grasp Gary Miller and I had on these
> problems before NTPsec.

Appeal to authority won't get you anywhere while you continue to skirt
the actual discussion.  But the first of the two citations is in fact a
lot more careful and nuanced in its claims than your broad-brushed
missive regarding device support.

> Yes, in the presence of era wraparounds perfect resolution of absolute time
> is not possible. We're not under any illusion that it is. What  *is* 
> achievable to to reduce the complexity of the failure cases and make the
> code better at self-auditing and notifying a human when it enters a bad 
> state.

Yet you haven't addressed the actual failure cases and how you plan to
mitigate them.

> Generally speaking, you can tell improvement of this kind is happening
> any time you rip out old shims.  The code that prevented autonomous
> operation from working at all before I fixed it in 2017 was, I believe,
> an old shim from the early days of the Y2K panic.

More anectodal arguing.

>> > My thinking was that we would eventually drop all of the 2-digit-only
>> > modes and drivers, and say "if your refclock doesn't ship 4-digit
>> > years, it's disqualified".  Besides the autonomy issue, devices with
>> > this quirk are often very old hardware with wraparound problems.
>> 
>> So, all GPS receivers, to start with? 
>
> No, but it is conceivable that we might someday disqualify NMEA receivers
> that don't ship a ZDA sentence.

Based on what argument?

> Yes, of course the ZDA payload will be wrong after a wraparound. By
> removing the kludges that try to deduce a century from a two-digit
> year, though, we'd make the code to detect failure cases easier to
> reason about and be able to assert stronger invarients.

You've already cited an ntpsec document that (correctly) states that
this is just not going to happen as each GPS receiver does internal
pivioting and the reasoning about failure states you are talking about
just isn't possible without knowing exactly how that's done.  The only
invariant is that you have to treat each and every data point as suspect
unless you can line it up with other, independently derived data points
that produce a converging confidence interval.

> You don't get to a clock that never breaks this way.  You *do* get to
> an ntpd that is less likely to fuck up in some obscure way (even when
> the hardware is sane) because of unintended effects of shims and
> crocks added to support 2-digit years.

Let's pretend I buy that argument, I'm still left with an ntpd that
starts using data it should know is bogus because it places undue trust
based on the fact that the data was coming from what it assumes to be a
GPS in full working order.  I'd very much suggest to work on the real
problem before tackling imagined ones, however plausible.

>> > Now we have a request to remove the deprecation marker from the Oncore
>> > driver. The Oncore product line is EOL, but we are told there are 
>> > receivers still in production that can use this driver.
>> 
>> Would it surprise you to hear that while the Oncore receivers aren't
>> available anymore, there are (at least used to be) more modern receivers
>> that behave like one?  Ditto for the Motorola M12 and other "classic"
>> GPS.
>
> Not surprised at all.  In fact, that news was in issue #608.
>
> The question is *whether we ought to give a shit* about modern hardware
> that emulates an Oncore or an M12.  If you start from the assumption that
> our highest priority is nursemaiding crufty old designs and cufty old
> clock protocols, then maybe you'll never see beyond that assumption.

I'm not going to continue that line of discussion.  Again, what you're
continue to ignore is that I haven't said _anything_ about any
particular driver getting the axe or not, but that the criteria you
propose to make that decision are bunk.

> On the other hand, if you grasp that modern primary clocks are *not
> expensive*, then maybe you start seeing support for that old hardware
> as a source of technical debt that should be cleared.

A GPS receiver is not a primary clock, when it's used like that it's a
clock distribution system based on common view principles.  At least
some of the old stuff you want to throw out actually have clocks in them
that allow limited hold-over.  Getting that when buying new is _not_
cheap by any means, even if you're willing to spend considerable effort
in building the system yourself.

>> How about an option 4. where you admit that full autonomy not only is
>> provably impossible to result in an absolute time with bounded error
>> margin, but also not even interesting to NTP.  That might get you to
>> recognize that the only way to synchronize to some notationally correct
>> time is to use as many _independent_ sources of time as you can get hold
>> of.
>
> Fine, we agree on that.  But there's no rule that says any of those 
> multiple sources must be a network peer, and important deployment cases 
> in which you'd like them to be (say) GPSes watching three different
> constellations with two rubidium clocks for holddover/backup.

I've said independent.  Having two GPS using the same antenna are not
independent even when they watch two different constellations for
instance.

> Classic couldn't handle that case.  NTPsec *can*.  And there's room forv 
> more improvement in that area.

Classic in fact could, just not in the way you envision it.  It would
have spread the radios / clocks over a number of stratum-1 servers and a
secondary layer of stratum-2 with symmetric peering would have presented
the resulting amalgam clock to your network.  In a way that is much more
resilient than the setup you allude to above, which would need to pull
the function of the secondary layer up into the clock source handling of
ntpd.  Now that we have NTS we could get symmetric peering back without
the non-security implications it had.


Regarding your blurb on driver retention on the website:

"It may actually be the case that all the Stratum 1 sites running
non-custom ntpd instances are using GPSes now."

Bzzzt. Wrong.  Right now I have:

+isis.uni-paderborn.de       .DCF.            1 u    8   64  377 22.657ms 126.19us 184.79us
-h-213.61.224.35.host.de.col .DCFa.           1 u   49  128  377 29.603ms -9.225ms 203.35us

I regularly get Meinberg references from the pool as well (PZF) and DCF
phase modulation receivers (DCFp).  I have seen MSF once or twice, but
not TDF (but that's not my pool region anyway).  DCF is actually very
common with colocation servers, where VLF reception is easy to
establish, but GPS would require infrastructure the provider may not be
prepared to provide or only at high expense.

Type 1: This clock type is used as a fallback clock source (often
stratum-15) in several distributions to keep NTPd running in degraded
mode during outages of other clock sources.

Type 17: Datum?  They were a very respected manufacturer of (mostly)
Rubidium frequency standards and one of their divisions was Bancomm.
They bought Ball Efratom (hence you can find a lot of Datum Efratom
Rubidiums still), then got themselves bought by Symmetricom
(Datum-Symmetricom was a branding for quite some time), which got bought
by Microsemi, which is now owned by Microchip.  As to what one of their
later systems looked like, see here:

https://www.realhamradio.com/datum2000.htm

Their distinction (as with the Symmetricom stuff that came a bit later)
was that they only needed "good" GPS about once a day to keep time to
below a ms and could hold over to the low-ms range for at least one week
(if you sprung for the expensive high-stability option).  These were
pretty common in the telco space and going around quite a bit when the
telcos upgraded to the latest and greatest.  There's a french company
(HEOL Design) that still supports the more common models of these with
upgrades (receivers modules and bug-fixed firmware).


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada


More information about the devel mailing list