FWD: NTPsec panic and abort

Hal Murray halmurray at sonic.net
Fri Mar 18 07:38:39 UTC 2022


------- Forwarded Message

Date: Fri, 18 Mar 2022 06:02:51 +0530
From: Mukund Sivaraman <muks at mukund.org>
To: hmurray at megapathdsl.net
Subject: NTPsec panic and abort
Message-ID: <YjPTM/9y9kZPtB04 at d1>


- --q9OuToa696kGPIE0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Hal

I apologize for emailing you directly instead of creating an NTPsec
issue but I am currently not able to login into my GitLab account.

I am reporting an ntpd abort/crash. It is from a stock Fedora RPM:

> [muks at gw1 ~]$ rpm -q ntpsec
> ntpsec-1.2.1-4.fc35.x86_64
> [muks at gw1 ~]$=20

The computer has a Garmin 18x LVC GPS receiver device hooked up to a
serial port, and ntpd's builtin NMEA driver is used to interface with it
directly. It also has a working PPS signal. The relevant ntp.conf config
lines are:

> server 127.127.20.0 mode 1 prefer minpoll 4
> fudge 127.127.20.0 flag1 1 flag2 0 flag3 0 flag4 1 time2 0.5100621

This is how it looks normally (the device's datasheet claims 1us
accuracy):

> [muks at gw1 ~]$ ntpq -np
>      remote                     refid      st t when poll reach   delay  =
 offset   jitter
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> oNMEA(0)                   .GPS.            0 l   15   16  377   0.0000  =
 0.0002   0.0003

> [muks at gw1 ~]$ ntpq -c clocklist
> associd=3D0 status=3D0000 no events, clk_unspec,
> name=3D"NMEA",
> timecode=3D"$GPRMC,002016,A,____.____,_,_____.____,_,000.0,315.7,180322,0=
01.5,W*__",
> poll=3D103, noreply=3D0, badformat=3D0, baddata=3D0, fudgetime2=3D510.062=
, stratum=3D0, refid=3DGPS,
> flags=3D9, device=3D"NMEA GPS Clock"
> [muks at gw1 ~]$=20

I've been running it this way for several years now, previously with the
ntp.org implementation of ntpd, and for a few months now with NTPsec.

The Garmin 18x LVC GPS receiver device stopped working yesterday due to
hardware failure, and I replaced it with another identical unit with the
same firmware version. Within a few hours of running ntpd with the new
device, the ntpd process terminated with the following syslog message:

> Mar 18 05:10:10 gw1 ntpd[2200]: CLOCK: Panic: offset too big: -604800.000
> Mar 18 05:10:10 gw1 systemd[1]: ntpd.service: Main process exited, code=
=3Dexited, status=3D1/FAILURE
> Mar 18 05:10:10 gw1 systemd[1]: ntpd.service: Failed with result 'exit-co=
de'.

It appears that the GPS receiver sent a faulty date in the $GPRMC NMEA
sentence. The following are a sequence of lines from
/var/log/ntpstats/clockstats:

> 59655 85114.640 NMEA(0) $GPRMC,233834,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 85130.640 NMEA(0) $GPRMC,233850,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 85146.640 NMEA(0) $GPRMC,233906,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 85162.640 NMEA(0) $GPRMC,233922,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 85178.640 NMEA(0) $GPRMC,233938,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 85194.640 NMEA(0) $GPRMC,233954,A,____.____,_,_____.____,_,000.0,31=
5.7,100322,001.5,W*__
> 59655 85982.616 NMEA(0) $GPRMC,235302,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 85998.616 NMEA(0) $GPRMC,235318,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 86014.616 NMEA(0) $GPRMC,235334,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__
> 59655 86030.616 NMEA(0) $GPRMC,235350,A,____.____,_,_____.____,_,000.0,31=
5.7,170322,001.5,W*__

Note the spurious "100322" date that is 1 week in the past. -604800 from
the syslog message is -1 week in seconds (7 * 24 * 3600). Note the ",A,"
is returned in the status (<2>) field in the $GPRMC sentence, by which
the GPS receiver still claims it has the "fix" and a valid position. If
you want a reference for the $GPRMC NMEA sentence for this receiver,
please see page 18 of:

https://static.garmin.com/pumac/GPS_18x_Tech_Specs.pdf

It appears that some bogus condition has occurred within the GPS
receiver and it has sent a spurious $GPRMC sentence. However, it seems
too extreme for ntpd to abort due to this. Could it ignore the sentence
with the big offset instead? The GPS receiver appears to correct itself
eventually. If ntpd aborts, the running NTP service is no longer present
causing other problems.

I have never come across such an ntpd abort before. This is the first
time I'm seeing it.

Can this condition be handled in any other way, so that the service
doesn't terminate?

		Mukund

- --q9OuToa696kGPIE0
Content-Type: application/pgp-signature; name="signature.asc"

- -----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEcpanf3Bxi94C0NsVude/iQOlsOwFAmIz0zAACgkQude/iQOl
sOwo3hAAgKsh6EF2mSM/tCew5AnRKAoOu/S5wDEfzJJU9qgLvxVpbEV4U4jGcIyd
MicyUXeCtq19WpzMoxaB5AKLcFxNu4i1gg7b0w9IN8iQ+NYFOqEpBpmwOgYKzxRJ
Xyeufayz9wasCoPgeA4SAXS1ttz2fmJVNvniXRJaHzUlHQzVIYe2lE9/l4a3lF+Z
67QExxKr/rvTSY+3+PMpPX3rywkxulniGQjpH8c0k7NYAH8WUgCpumh5IaQR2KWW
FkGhCDxf5RhuSa52fI0bxcRu8T78KddJPbc4pc9VEy9o6iw/Cs/nH9UWWiiomdnb
C4W6PoPnyufgIZagvIEnpzFNvCkiwy6rP38s3iJ4/d5AObHZBCz3aOQWWXaPRY5V
qkqr6D3Phhz+3xMX3q2JLS9gPOXcs0bmOsBpB/W0zoq1aNejOJQSNiawwaqY6PlA
AsMU3NWAgFNWsBnElUbndo6Ks+vFWZ/YA2Al8++WfpEW8FFf9dV0a9pVxZ/aez2M
udygZs0/VLZEW+8pAu1bEV2mVEPiP+udw8qiXkCop2AOdvRu1ERqTx9xfERwRmX1
5JVgSQmX8Q0hDHpnhgIZlfhHaoUYOe9QncAbBupBrMfqhkQ3nUR01XARksAExsFQ
/Z6GuH3kGkU4dPIgKVORGaoDhW2T9ss9OQKV92aYckAzD337v3U=
=DKkJ
- -----END PGP SIGNATURE-----

- --q9OuToa696kGPIE0--


------- End of Forwarded Message


-- 
These are my opinions.  I hate spam.





More information about the devel mailing list