From hmurray at megapathdsl.net Mon Aug 1 00:13:23 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Sun, 31 Jul 2016 17:13:23 -0700 Subject: Removing the worst cruft In-Reply-To: Message from Mark Atwood of "Sun, 31 Jul 2016 23:38:46 -0000." Message-ID: <20160801001323.F0BA9406057@ip-64-139-1-69.sjc.megapath.net> > Can the palisade/trimble driver be replaced with a parse driver? I doubt it, but I'm far from familiar with the parse driver. Based on Eric's previous comments, the parse driver handles devices that provide the time in an easy to parse format. TSIP might fit that if all goes well. But there are many variations of TSIP. One covers reversing the normal PPS operation. Instead of needing kernel support to time stamp the PPS pulse, you send it a pulse by flapping one of the modem control signals and it tells you the time that happened. My vote would be to not rock that boat. There are more important things to work on. -- These are my opinions. I hate spam. From esr at thyrsus.com Mon Aug 1 03:13:25 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 31 Jul 2016 23:13:25 -0400 Subject: Removing the worst cruft In-Reply-To: <20160801001323.F0BA9406057@ip-64-139-1-69.sjc.megapath.net> References: <20160801001323.F0BA9406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160801031325.GA18480@thyrsus.com> Hal Murray : > > Can the palisade/trimble driver be replaced with a parse driver? > > I doubt it, but I'm far from familiar with the parse driver. > > Based on Eric's previous comments, the parse driver handles devices that > provide the time in an easy to parse format. TSIP might fit that if all goes > well. > > But there are many variations of TSIP. One covers reversing the normal PPS > operation. Instead of needing kernel support to time stamp the PPS pulse, > you send it a pulse by flapping one of the modem control signals and it tells > you the time that happened. > > My vote would be to not rock that boat. There are more important things to > work on. I concur with this. While I like the idea of replacing as many as possible of the remaining legacy drives with modes of the generic driver, an absolute requirement for this is that we be able to live-test with the equipment. Obviously we're not set up for that yet. -- Eric S. Raymond From fallenpegasus at gmail.com Mon Aug 1 03:14:32 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Mon, 01 Aug 2016 03:14:32 +0000 Subject: Removing the worst cruft In-Reply-To: <20160801031325.GA18480@thyrsus.com> References: <20160801001323.F0BA9406057@ip-64-139-1-69.sjc.megapath.net> <20160801031325.GA18480@thyrsus.com> Message-ID: Good point. On Sun, Jul 31, 2016, 8:13 PM Eric S. Raymond wrote: > Hal Murray : > > > Can the palisade/trimble driver be replaced with a parse driver? > > > > I doubt it, but I'm far from familiar with the parse driver. > > > > Based on Eric's previous comments, the parse driver handles devices that > > provide the time in an easy to parse format. TSIP might fit that if all > goes > > well. > > > > But there are many variations of TSIP. One covers reversing the normal > PPS > > operation. Instead of needing kernel support to time stamp the PPS > pulse, > > you send it a pulse by flapping one of the modem control signals and it > tells > > you the time that happened. > > > > My vote would be to not rock that boat. There are more important things > to > > work on. > > I concur with this. > > While I like the idea of replacing as many as possible of the > remaining legacy drives with modes of the generic driver, an absolute > requirement for this is that we be able to live-test with the equipment. > Obviously we're not set up for that yet. > -- > Eric S. Raymond > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hmurray at megapathdsl.net Mon Aug 1 09:40:11 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Mon, 01 Aug 2016 02:40:11 -0700 Subject: Kernel PLL graphs Message-ID: <20160801094011.D771A406057@ip-64-139-1-69.sjc.megapath.net> There are two parts to PPS processing in the kernel. RFC 2783 describes an API for capturing time stamps. RFC 1589 describes a PLL that lives in the kernel. Most Linux distros don't support RFC 1589. The code is in the kernel, but it doesn't work with the shipped kernels. It requires !NO_HZ, but most distros prefer NO_HZ. I pulled over the sources and built my own kernel. Here are the before and after graphs: http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel.png The data is from two separate days so this isn't a clean comparison. I don't know what that machine was doing on either day. Here is a zoom in on the Kernel PLL day. http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel2.png Note that the peak offset is less than a microsecond. ------------ We should see if we can get similar results on a Raspberry Pi. I haven't tried building an ARM kernel. I think we should be able to run the PLL code outside the kernel. The PPS time stamp is key. The PLL calculations don't need to be run in the kernel. They need to be run soon after the PPS, but not interrupt level immediately. The API has an option to wakeup on PPS. I don't know if it is implemented on Linux. The no-PLL test was run at the default maxpoll of 6. I should try faster. I also need a standard test load. I remember various FreeBSD-is-better type comments from many years ago. I don't know if the PLL was working in Linux at the time. I should setup a test case. -- These are my opinions. I hate spam. From gem at rellim.com Mon Aug 1 22:22:00 2016 From: gem at rellim.com (Gary E. Miller) Date: Mon, 1 Aug 2016 15:22:00 -0700 Subject: Kernel PLL graphs In-Reply-To: <20160801094011.D771A406057@ip-64-139-1-69.sjc.megapath.net> References: <20160801094011.D771A406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160801152200.69d752c5@spidey.rellim.com> Yo Hal! On Mon, 01 Aug 2016 02:40:11 -0700 Hal Murray wrote: > Here are the before and after graphs: > http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel.png > The data is from two separate days so this isn't a clean comparison. > I don't know what that machine was doing on either day. I would have expected your non-kernel PLL to be 30x or 100x better. Even on a RasPi2 I get much better, -400 ?Sec to +800?Sec offset on the PPS: https://pi2.rellim.com/day/#PPS Are you sure your kernel PPS (RFC 2783) time stamping is working? RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: remote-peerstats.127.127.28.1.png Type: image/png Size: 15365 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From gem at rellim.com Tue Aug 2 02:40:18 2016 From: gem at rellim.com (Gary E. Miller) Date: Mon, 1 Aug 2016 19:40:18 -0700 Subject: =?UTF-8?B?4pyYZHJpZnQ=?= file Message-ID: <20160801194018.17e7a3e6@spidey.rellim.com> Yo All! Eric asked me to write up why I thought the chrony drift file handling is better than NTPsec's handling. 1. On startup chronyd checks the time stamp on the drift file. if the timestamp > sysclock, the sysclock is set to the timestamp This is a nice sanity check on the system clock. 2. ntpd stores the frequency ppm offset in the driftfile. chronyd stores the frequency ppm offset and the 'skew' (estimated accuracy of the existing frequency value). Knowing the 'skew' at startup allows chrony to better reject bad reclock input. I can see that saving the 'skew' is a nice touch, but I suspect much the good chronyd startup behavior is explained elsewhere. In a related topic, it would be nice (maybe an option) for ntpd to hold off logging the initial aweful data until after the -g option has set the system clock. And a bit longer, so the wonky startup data is masked. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From gem at rellim.com Tue Aug 2 03:40:27 2016 From: gem at rellim.com (Gary E. Miller) Date: Mon, 1 Aug 2016 20:40:27 -0700 Subject: =?UTF-8?B?4pyYbnRwdml6?= Message-ID: <20160801204027.0e9d6518@spidey.rellim.com> Yo Eric! No rush on these, but i was playing with stats today, so I looked at ntpstats. Looking good, but not ready for me to try to use live. 1. not installed by default 2. no help: # ./ntpviz -h option -h not recognized 3. not finding my Liberation fonts: # ./ntpviz -d /var/log/ntpstats warning: liberation truetype fonts not found chrony-graph finds them just fine, they are in: /usr/share/fonts/liberation-fonts/ And I have the Sans-regular: /usr/share/fonts/liberation-fonts/LiberationSans-Regular.ttf 3. not working options: # ./ntpviz -d /var/log/ntpstats --clock-offset option --clock-offset not recognized # ./ntpviz -d /var/log/ntpstats --clock-jitter option --clock-jitter not recognized # ./ntpviz -d /var/log/ntpstats --clock-stability > tmp.plot option --clock-stability not recognized 4. --peer-jitters, works. Missing 99%, 90%, etc. lines and values. Why '--peer-jitters" when it only takes one argment, and can only be used once? 5. --peer-jitters, works. Missing 50% line and value. I'd also like the 90% and 5% lines/values. Why '--peer-offsets" when it only takes one argment, and can only be used once? 6. error handling needs work: # ./ntpviz -d /var/log/ntpstats --peer-offsets 204.17.205.8 > tmp.plot Traceback (most recent call last): File "./ntpviz", line 107, in plot = stats.peer_offsets_gnuplot(show_peer_offsets) File "/u1/src/NTP/ntpsec/ntpstats/ntpstats.py", line 225, in peer_offsets_gnuplot return self.peerstats_gnuplot(peerlist, 4, "Peer clock offset") File "/u1/src/NTP/ntpsec/ntpstats/ntpstats.py", line 221, in peerstats_gnuplot stderr.write("No such peer as %s" % key) NameError: global name 'stderr' is not defined 7. "-n name" seems to do nothing. 8. looks good: -all-peer-offsets 9. looks good: -all-peer-jitters 10. -s broken: # ./ntpviz -d /var/log/ntpstats --peer-offsets 204.17.205.1 -s 1470021644 > tmp.plot Traceback (most recent call last): File "./ntpviz", line 48, in starttime = iso_to_unix(val) File "/u1/src/NTP/ntpsec/ntpstats/ntpstats.py", line 282, in iso_to_unix return calendar.timelocal(time.strptime(tv, "%Y-%m-%dT%H:%M:%S")) AttributeError: 'module' object has no attribute 'timelocal' 11. "-e endtime" would be nice. Then I could do a nice week or month plot without having to run exactly after the period end. 12. seems to do nothing by default: # ./ntpviz # 13. not sure what to do about html... RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From fallenpegasus at gmail.com Tue Aug 2 19:34:16 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Tue, 02 Aug 2016 19:34:16 +0000 Subject: =?UTF-8?B?UmU6IOKcmGRyaWZ0IGZpbGU=?= In-Reply-To: <20160801194018.17e7a3e6@spidey.rellim.com> References: <20160801194018.17e7a3e6@spidey.rellim.com> Message-ID: If we make this change, framing it as "it's how chronyd has been doing it for the past N years" makes it a much easier sell. Especially if we can make the file format the same. What principled objections would the hardcore time nerds have? We do have to keep their needs firmly in mind. ..m On Mon, Aug 1, 2016 at 7:40 PM Gary E. Miller wrote: > Yo All! > > Eric asked me to write up why I thought the chrony drift file handling > is better than NTPsec's handling. > > 1. On startup chronyd checks the time stamp on the drift file. > if the timestamp > sysclock, the sysclock is set to the timestamp > > This is a nice sanity check on the system clock. > > 2. ntpd stores the frequency ppm offset in the driftfile. > chronyd stores the frequency ppm offset and the 'skew' (estimated > accuracy > of the existing frequency value). > > Knowing the 'skew' at startup allows chrony to better reject bad > reclock input. > > I can see that saving the 'skew' is a nice touch, but I suspect much the > good chronyd startup behavior is explained elsewhere. > > In a related topic, it would be nice (maybe an option) for ntpd to hold > off logging the initial aweful data until after the -g option has > set the system clock. And a bit longer, so the wonky startup data is > masked. > > RGDS > GARY > --------------------------------------------------------------------------- > Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 > gem at rellim.com Tel:+1 541 382 8588 > _______________________________________________ > devel mailing list > devel at ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From gem at rellim.com Tue Aug 2 19:53:09 2016 From: gem at rellim.com (Gary E. Miller) Date: Tue, 2 Aug 2016 12:53:09 -0700 Subject: =?UTF-8?B?4pyYZHJpZnQ=?= file In-Reply-To: References: <20160801194018.17e7a3e6@spidey.rellim.com> Message-ID: <20160802125309.4480bc3f@spidey.rellim.com> Yo Mark! On Tue, 02 Aug 2016 19:34:16 +0000 Mark Atwood wrote: > If we make this change, framing it as "it's how chronyd has been > doing it for the past N years" makes it a much easier sell. chronyd setting sysclock to driftfile time is only since Agu 2014. In Oct 2015 they made it optiona with their -s option, musta been some problems for some people. With the non-default -s option it is backward compatible and people can use it if they want. A lot of distros, like Gentoo, preceed ntpd with a shell script to set a worst case sysclock from a file touched on shutdown. So the -s is just internalizing common practice. Gentoo calls it /etc/init.d/swclock. > Especially if we can make the file format the same. ntpd puts just the freq ppm offset on one line in the driftfile. chronyd puts the freq ppm and the freq skew on one line in the file. It would be easy to change ntpd to accept the second parameter, and always write the second parameter. Then maybe only use the second parameter with a non-default option. Only if people like it and gain confidence would it then become Best Practice, or even default. > What principled objections would the hardcore time nerds have? We > do have to keep their needs firmly in mind. I can't see how they can object if the two features are non-default. Saving the freq skew for restart could improve startup performance, but how much is TBD. With the option we can test it. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From gem at rellim.com Tue Aug 2 20:00:02 2016 From: gem at rellim.com (Gary E. Miller) Date: Tue, 2 Aug 2016 13:00:02 -0700 Subject: =?UTF-8?B?4pyYZHJpZnQ=?= file In-Reply-To: <20160802125309.4480bc3f@spidey.rellim.com> References: <20160801194018.17e7a3e6@spidey.rellim.com> <20160802125309.4480bc3f@spidey.rellim.com> Message-ID: <20160802130002.0ed11a48@spidey.rellim.com> Yo Mark! On Tue, 02 Aug 2016 19:34:16 +0000 Mark Atwood wrote: > If we make this change, framing it as "it's how chronyd has been > doing it for the past N years" makes it a much easier sell. Forgot an important part. chronyd has saved the freq ppm and freq skew in the driftfile since Jan 2006. Over 10 years ago. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From hmurray at megapathdsl.net Wed Aug 3 10:13:47 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Wed, 03 Aug 2016 03:13:47 -0700 Subject: driftMime-Version: 1.0 Message-ID: <20160803101347.3ADDD406060@ip-64-139-1-69.sjc.megapath.net> gem at rellim.com said: > 1. On startup chronyd checks the time stamp on the drift file. > if the timestamp > sysclock, the sysclock is set to the timestamp I vote that we don't do anything, not even make it optional behind a command line switch. We have more important things to do. The OS should be doing that sort of thing, probably using the root directory. Why stop with the drift file? Should we check the log files too? It's the sort of code that is hard to test and likely to have subtle problems. I think it's a good item to put on the what-do-customers-want list. > 2. ntpd stores the frequency ppm offset in the driftfile. > chronyd stores the frequency ppm offset and the 'skew' > (estimated accuracy of the existing frequency value). > I can see that saving the 'skew' is a nice touch, but I suspect much the > good chronyd startup behavior is explained elsewhere. I'm not sure that ntpd has a parameter equivalent to skew. Again, I vote that we don't do anything now. The current startup stuff is broken. There is no point in working on things like this until we understand and fix the current problems. gem at rellim.com said: > In a related topic, it would be nice (maybe an option) for ntpd to hold off > logging the initial aweful data until after the -g option has set the system > clock. And a bit longer, so the wonky startup data is masked. But that is when you really really want the logging. I might agree to put it someplace other than the normal place. -- These are my opinions. I hate spam. From Matthew.Selsky at twosigma.com Wed Aug 3 18:49:44 2016 From: Matthew.Selsky at twosigma.com (Matthew Selsky) Date: Wed, 3 Aug 2016 14:49:44 -0400 Subject: Kernel PLL graphs In-Reply-To: <20160801094011.D771A406057@ip-64-139-1-69.sjc.megapath.net> References: <20160801094011.D771A406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160803184944.GE25969@twosigma.com> On Mon, Aug 01, 2016 at 02:40:11AM -0700, Hal Murray wrote: > > There are two parts to PPS processing in the kernel. RFC 2783 describes an > API for capturing time stamps. RFC 1589 describes a PLL that lives in the > kernel. > > Most Linux distros don't support RFC 1589. The code is in the kernel, but it doesn't work with the shipped kernels. It requires !NO_HZ, but most distros prefer NO_HZ. > > I pulled over the sources and built my own kernel. > > Here are the before and after graphs: > http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel.png > The data is from two separate days so this isn't a clean comparison. I don't know what that machine was doing on either day. > > Here is a zoom in on the Kernel PLL day. > http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel2.png > Note that the peak offset is less than a microsecond. > > ------------ > > We should see if we can get similar results on a Raspberry Pi. I haven't tried building an ARM kernel. > > I think we should be able to run the PLL code outside the kernel. The PPS time stamp is key. The PLL calculations don't need to be run in the kernel. They need to be run soon after the PPS, but not interrupt level immediately. The API has an option to wakeup on PPS. I don't know if it is implemented on Linux. > > The no-PLL test was run at the default maxpoll of 6. I should try faster. I also need a standard test load. > > I remember various FreeBSD-is-better type comments from many years ago. I don't know if the PLL was working in Linux at the time. I should setup a test case. Hey Hal, I'm using maxpoll of 1 on my stratum 1 servers. And I have !NO_HZ set. My offsets stay belong 1 microsecond as reported by ntpq. If we switched the units to nanoseconds, that might be interesting. I don't have !NO_HZ set on my stratum 2 servers, but I'm looking at the ramifications of that. I'm curious what your results are. Cheers, -Matt From gem at rellim.com Wed Aug 3 19:26:09 2016 From: gem at rellim.com (Gary E. Miller) Date: Wed, 3 Aug 2016 12:26:09 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: <20160803101347.3ADDD406060@ip-64-139-1-69.sjc.megapath.net> References: <20160803101347.3ADDD406060@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160803122609.2847616f@spidey.rellim.com> Yo Hal! On Wed, 03 Aug 2016 03:13:47 -0700 Hal Murray wrote: > gem at rellim.com said: > > 1. On startup chronyd checks the time stamp on the drift file. > > if the timestamp > sysclock, the sysclock is set to the > > timestamp > We have more important things to do. A few lines of code, faster to code than to argue about it. Better yet, steal tthe chronyd code for it. > The OS should be doing that sort of thing, probably using the root > directory. Why stop with the drift file? Should we check the log > files too? We can't trust the OS to do the rightt thing about time. The OS trusts ntpd to do tthe right thing, but it does not. > It's the sort of code that is hard to test and likely to have subtle > problems. Really? Just 'touch /var/log/ntppstats/driftfile' to a time in the future, start nttpd, and then check the system time. > I think it's a good item to put on the what-do-customers-want list. I assure you Rasi uuers really want this. The lack of an RTC causes real problems. Every time I reboot a RasPi I curse ntpd's poor startup behavior. Almost enough to go back to chronyd. > > 2. ntpd stores the frequency ppm offset in the driftfile. > > chronyd stores the frequency ppm offset and the 'skew' > > (estimated accuracy of the existing frequency value). > > > I can see that saving the 'skew' is a nice touch, but I suspect > > much the good chronyd startup behavior is explained elsewhere. > > I'm not sure that ntpd has a parameter equivalent to skew. I think this is what ntpd calls 'RMS Jitter'. I'm not clear on the ntpd internals, but I do know it has methods for clock selection, and they fail badly on startup. ntpd does not need to exactly mirror what chronyd does. But it clearly needs better start up behavior, and since chronyd starts up muuch better than ntpd that is a good place to start looking. > Again, I vote that we don't do anything now. The current startup > stuff is broken. There is no point in working on things like this > until we understand and fix the current problems. Clearly chronyd understands the problem, so looking at that code is ppart of the path to unuderstanding and fixing the problem. > gem at rellim.com said: > > In a related topic, it would be nice (maybe an option) for ntpd to > > hold off logging the initial aweful data until after the -g option > > has set the system clock. And a bit longer, so the wonky startup > > data is masked. > > But that is when you really really want the logging. Sometimes, but then it takes a week for my graphs to be readable again... > I might agree to put it someplace other than the normal place. Works for me, or maybe a switch. Or maybe fix the startup problems. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From gem at rellim.com Wed Aug 3 19:45:53 2016 From: gem at rellim.com (Gary E. Miller) Date: Wed, 3 Aug 2016 12:45:53 -0700 Subject: Kernel PLL graphs In-Reply-To: <20160803184944.GE25969@twosigma.com> References: <20160801094011.D771A406057@ip-64-139-1-69.sjc.megapath.net> <20160803184944.GE25969@twosigma.com> Message-ID: <20160803124553.3cd9d755@spidey.rellim.com> Yo Matthew! On Wed, 3 Aug 2016 14:49:44 -0400 Matthew Selsky wrote: > I'm using maxpoll of 1 on my stratum 1 servers. And I have !NO_HZ > set. My offsets stay belong 1 microsecond as reported by ntpq. If > we switched the units to nanoseconds, that might be interesting. chronyc reports to the nanoSec. ntpd pputs nanoSec in the peerstats log file. > I don't have !NO_HZ set on my stratum 2 servers, but I'm looking at > the ramifications of that. I found no huge difference, but a reduuction in the odd wild hare measurement. Noticeable on chrony-graph. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From gem at rellim.com Wed Aug 3 20:35:26 2016 From: gem at rellim.com (Gary E. Miller) Date: Wed, 3 Aug 2016 13:35:26 -0700 Subject: =?UTF-8?B?4pyYYnVpbGQ=?= with Python 3.4 broken. Message-ID: <20160803133526.04adf280@spidey.rellim.com> Yo All! I just tried to compile ntpsec on a new gentoo install. It failed. pi ntpsec # ../Do-config-ntpsec ../Do-config-ntpsec: line 3: cd: ntpsec: No such file or directory Waf: The wscript in '/usr/local/src/NTP/ntpsec' is unreadable Traceback (most recent call last): File "/usr/local/src/NTP/ntpsec/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 104, in waf_entry_point set_main_module(os.path.normpath(os.path.join(Context.run_dir,Context.WSCRIPT_FILE))) File "/usr/local/src/NTP/ntpsec/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 129, in set_main_module Context.g_module=Context.load_module(file_path) File "/usr/local/src/NTP/ntpsec/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 354, in load_module try:exec(compile(code,path,'exec'),module.__dict__) File "/usr/local/src/NTP/ntpsec/wscript", line 11, in from pylib.configure import cmd_configure File "/usr/local/src/NTP/ntpsec/pylib/configure.py", line 154 print "ID Description" ^ SyntaxError: Missing parentheses in call to 'print' Waf: The wscript in '/usr/local/src/NTP/ntpsec' is unreadable Traceback (most recent call last): File "/usr/local/src/NTP/ntpsec/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 104, in waf_entry_point set_main_module(os.path.normpath(os.path.join(Context.run_dir,Context.WSCRIPT_FILE))) File "/usr/local/src/NTP/ntpsec/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 129, in set_main_module Context.g_module=Context.load_module(file_path) File "/usr/local/src/NTP/ntpsec/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 354, in load_module try:exec(compile(code,path,'exec'),module.__dict__) File "/usr/local/src/NTP/ntpsec/wscript", line 11, in from pylib.configure import cmd_configure File "/usr/local/src/NTP/ntpsec/pylib/configure.py", line 154 print "ID Description" ^ SyntaxError: Missing parentheses in call to 'print' Ah, Gentoo now uses Python 3.4 by default. Switching back to 2.7 and things are better. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From verm at darkbeer.org Wed Aug 3 20:40:29 2016 From: verm at darkbeer.org (Amar Takhar) Date: Wed, 3 Aug 2016 20:40:29 +0000 Subject: ???build with Python 3.4 broken. In-Reply-To: <20160803133526.04adf280@spidey.rellim.com> References: <20160803133526.04adf280@spidey.rellim.com> Message-ID: <20160803204029.GA49838@darkbeer.org> On 2016-08-03 13:35 -0700, Gary E. Miller wrote: > Yo All! > > I just tried to compile ntpsec on a new gentoo install. It failed. There is a ticket for this I will try to get my patch in soon. Amar. From hmurray at megapathdsl.net Wed Aug 3 21:24:01 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Wed, 03 Aug 2016 14:24:01 -0700 Subject: Kernel PLL graphs In-Reply-To: Message from Matthew Selsky of "Wed, 03 Aug 2016 14:49:44 EDT." <20160803184944.GE25969@twosigma.com> Message-ID: <20160803212401.75CE5406057@ip-64-139-1-69.sjc.megapath.net> Matthew.Selsky at twosigma.com said: > I'm using maxpoll of 1 on my stratum 1 servers. And I have !NO_HZ set. My > offsets stay belong 1 microsecond as reported by ntpq. If we switched the > units to nanoseconds, that might be interesting. Time to make sure I've got the right number of negatives... "I have !NO_HZ set" means you have unset NO_HZ which probably means you had to build your own kernel. Do you have flag3 turned on? If so, the kernel does all the work and maxpoll is essentially ignored. I though there was a min to maxpoll so I'm a bit surprised you could set it to 1. > I don't have !NO_HZ set on my stratum 2 servers, but I'm looking at the > ramifications of that. At least for the effect I'm discussing, it only matters if you have a PPS. > I'm curious what your results are. -- These are my opinions. I hate spam. From Matthew.Selsky at twosigma.com Wed Aug 3 21:49:41 2016 From: Matthew.Selsky at twosigma.com (Matthew Selsky) Date: Wed, 3 Aug 2016 17:49:41 -0400 Subject: Kernel PLL graphs In-Reply-To: <20160803212401.75CE5406057@ip-64-139-1-69.sjc.megapath.net> References: <20160803184944.GE25969@twosigma.com> <20160803212401.75CE5406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160803214940.GF25969@twosigma.com> On Wed, Aug 03, 2016 at 02:24:01PM -0700, Hal Murray wrote: > Time to make sure I've got the right number of negatives... "I have !NO_HZ > set" means you have unset NO_HZ which probably means you had to build your > own kernel. We build our own kernels and we boot our stratum 1 clocks with "nohz=off" > Do you have flag3 turned on? If so, the kernel does all the work and maxpoll > is essentially ignored. We do not have flag3 turned on. ntpd is reading the GPS via shared memory. The GPS comes with a lightweight daemon that reads /dev/refclock0 and stuffs the time in shared memory. > I though there was a min to maxpoll so I'm a bit surprised you could set it > to 1. We had a local patch to change the minimum value of minpoll/maxpoll to 1. Eric recently committed a similar patch upstream. https://gitlab.com/NTPsec/ntpsec/commit/a3047c7a375877436d422e04a138aace7ce1bd06 > At least for the effect I'm discussing, it only matters if you have a PPS. We're using PCIe GPS receiver cards. Cheers, -Matt From hmurray at megapathdsl.net Wed Aug 3 22:15:42 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Wed, 03 Aug 2016 15:15:42 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: Message from "Gary E. Miller" of "Wed, 03 Aug 2016 12:26:09 PDT." <20160803122609.2847616f@spidey.rellim.com> Message-ID: <20160803221542.7434B406057@ip-64-139-1-69.sjc.megapath.net> gem at rellim.com said: > A few lines of code, faster to code than to argue about it. Better yet, > steal tthe chronyd code for it. I could make a similar argument. In this case, I think it doesn't apply. I can kludge up a one time test. We can't test all the strange cases. Yes, the risk is low, but we need to focus on important things. We need to get Eric working on TESTFRAME. I think it's long past time to stop getting distracted by things like this. I already spend too much time sorting out minor bugs introduced by changes. -- These are my opinions. I hate spam. From gem at rellim.com Wed Aug 3 22:23:37 2016 From: gem at rellim.com (Gary E. Miller) Date: Wed, 3 Aug 2016 15:23:37 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: <20160803221542.7434B406057@ip-64-139-1-69.sjc.megapath.net> References: <20160803122609.2847616f@spidey.rellim.com> <20160803221542.7434B406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160803152337.63be27f7@spidey.rellim.com> Yo Hal! On Wed, 03 Aug 2016 15:15:42 -0700 Hal Murray wrote: > gem at rellim.com said: > > A few lines of code, faster to code than to argue about it. Better > > yet, steal tthe chronyd code for it. > > I could make a similar argument. In this case, I think it doesn't > apply. > > I can kludge up a one time test. We can't test all the strange > cases. Yes, the risk is low, but we need to focus on important > things. Anything that might improve startup will be a big win. Saving a couple more state variables could be that win. > We need to get Eric working on TESTFRAME. I think it's long past > time to stop getting distracted by things like this. I already spend > too much time sorting out minor bugs introduced by changes. Yes, Eric needs to focus on TESTFRAME, once he finishes the ntpviz. The ntpviz at least tells you the build is working. Something lacking now. chrony-graph has picked up a lot of subtle things for me. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From hmurray at megapathdsl.net Wed Aug 3 22:54:19 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Wed, 03 Aug 2016 15:54:19 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: Message from "Gary E. Miller" of "Wed, 03 Aug 2016 15:23:37 PDT." <20160803152337.63be27f7@spidey.rellim.com> Message-ID: <20160803225419.AF369406057@ip-64-139-1-69.sjc.megapath.net> gem at rellim.com said: > Yes, Eric needs to focus on TESTFRAME, once he finishes the ntpviz. The > ntpviz at least tells you the build is working. Something lacking now. > chrony-graph has picked up a lot of subtle things for me. I'd put ntpviz behind TESTFRAME. We'll have plenty of time to work on visualization while collecting interesting test cases to feed to TESTFRAME. -- These are my opinions. I hate spam. From gem at rellim.com Wed Aug 3 22:58:19 2016 From: gem at rellim.com (Gary E. Miller) Date: Wed, 3 Aug 2016 15:58:19 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: <20160803225419.AF369406057@ip-64-139-1-69.sjc.megapath.net> References: <20160803152337.63be27f7@spidey.rellim.com> <20160803225419.AF369406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160803155819.44887045@spidey.rellim.com> Yo Hal! On Wed, 03 Aug 2016 15:54:19 -0700 Hal Murray wrote: > gem at rellim.com said: > > Yes, Eric needs to focus on TESTFRAME, once he finishes the ntpviz. > > The ntpviz at least tells you the build is working. Something > > lacking now. chrony-graph has picked up a lot of subtle things for > > me. > > I'd put ntpviz behind TESTFRAME. We'll have plenty of time to work > on visualization while collecting interesting test cases to feed to > TESTFRAME. Except ntpviz only took a few days, and likely only needs another day or two. Just needs some tweaks to the graphs, and a harness to create all the graphs automagically from the ntp.conf. Plus maybe an html generator. Bsaically the hard part is copied from chrony-graph, and a bit more to do. TESTFRAME might take the rest of the year. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From hmurray at megapathdsl.net Wed Aug 3 23:23:44 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Wed, 03 Aug 2016 16:23:44 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: Message from "Gary E. Miller" of "Wed, 03 Aug 2016 15:58:19 PDT." <20160803155819.44887045@spidey.rellim.com> Message-ID: <20160803232345.09426406057@ip-64-139-1-69.sjc.megapath.net> gem at rellim.com said: > Except ntpviz only took a few days, and likely only needs another day or > two. Just needs some tweaks to the graphs, and a harness to create all the > graphs automagically from the ntp.conf. Plus maybe an html generator. Plus all the other stuff that you toss in once that gets working well enough to see what else you want. > TESTFRAME might take the rest of the year. It will take even longer if Eric keeps working on other things. We need to get Eric focused on TESTFRAME. -- These are my opinions. I hate spam. From gem at rellim.com Wed Aug 3 23:52:31 2016 From: gem at rellim.com (Gary E. Miller) Date: Wed, 3 Aug 2016 16:52:31 -0700 Subject: driftMime-Version: 1.0 In-Reply-To: <20160803232345.09426406057@ip-64-139-1-69.sjc.megapath.net> References: <20160803155819.44887045@spidey.rellim.com> <20160803232345.09426406057@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160803165231.18debae8@spidey.rellim.com> Yo Hal! On Wed, 03 Aug 2016 16:23:44 -0700 Hal Murray wrote: > gem at rellim.com said: > > Except ntpviz only took a few days, and likely only needs another > > day or two. Just needs some tweaks to the graphs, and a harness to > > create all the graphs automagically from the ntp.conf. Plus maybe > > an html generator. > > Plus all the other stuff that you toss in once that gets working well > enough to see what else you want. I've been using chrony-graph for months now. When ntpviz mostly duplicates that, I'll be happy for the mear term. > > TESTFRAME might take the rest of the year. > > It will take even longer if Eric keeps working on other things. > > We need to get Eric focused on TESTFRAME. I suspect that Eric has not dived into TESTFRAME since he does not have a good mental model he likes yet. That is the hard part, and the part I think he is unsure on. If it were me, I'd consider shimming a few syscalls: adjtime(), send(), recv(), clock_gettime(). Maybe at the runtime linker level, maybe at the libntpd_lib.a level. Then run a good ntpd in record mode. With that saved data, a test ntpd could be run in playback mode. To keep the playback time small, maybe also a way to save internal state to a file, and reload that state in a test ntpd. A nice state dump could also be a good debugging tool. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From christian.ehrhardt at canonical.com Tue Aug 9 15:10:16 2016 From: christian.ehrhardt at canonical.com (Christian Ehrhardt) Date: Tue, 9 Aug 2016 17:10:16 +0200 Subject: Discussion about PR: WIP: Snapify ntpsec Message-ID: Hi, I wanted to give the ML a ping as well about this, so that not only the Pull Request is existing. Eventually one here might chime in as well. There is a prototype to snap ntpsec at https://gitlab.com/NTPsec/ntpsec/merge_requests/49 I'll quote my PR text here and hope for a great discussion: "Hi, on one hand I worked on packaging ntp (classic) recently and on the other hand I worked a bit with snapcraft (=> http://snapcraft.io/). I really think ntpsec would be a perfect candidate to exploit snap packaging. Please consider this an RFC for now - following the spirit of NTPsec contribution policy "Before starting significant work, please propose it and discuss it first" I'll also write to the ML linking to this branch. But also did I not just want to mention snapcraft and run away - instead I thought to provide a prototype that can be tested, but discuss motivation, tech and details before doing some more heavy lifting work. My current example is meant for a daily build, but this can easily be changed to whatever you prefer. Snapcraft could - for example - build from a stable branch of your tree automatically or whatever else you want. Benefits of exploiting snap(craft) in ntpsec (in my opinion): - for security it is often important to be able to push fixes fast to consumers, snaps are great for that as it somewhat cut's out the distributions as a gatekeeper of a release process - ntpsec isn't packaged in distributions yet, an upload to the snapstore would make you instantly available on multiple distributions - faster development iteration cycles, which is especially useful for new (or newly forked) projects - and of course all the benefits listed at http://snapcraft.io/ Limitations: - this doesn't use any of the great snap isolation features yet (still using --devmode to get the prototype fast). Implementing those will need a few new interfaces and that effort should be spent after the discussion (but on the good side, you haven't lost anything - just not gained all of the snap isolation features yet). - currently there is no snapcraft plugin for waf, so I provided one (but I also started to push it to snapcraft already so it can be dropped from ntpsec in a bit) I'm looking forward and hope that the security improvements of ntpsec and those of snap's for packaging will one day stack up to be even better together. Let's discuss. Kind Regards Christian P.S. FYI - I'm soon going to vaction - so please don't wonder if there is kind of no-response between 13th and 23rd August. OTOH this gives everyone more time to play and experiment with it." -- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From esr at thyrsus.com Tue Aug 9 15:38:55 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 9 Aug 2016 11:38:55 -0400 Subject: Discussion about PR: WIP: Snapify ntpsec In-Reply-To: References: Message-ID: <20160809153855.GA30444@thyrsus.com> Christian Ehrhardt : > I'm looking forward and hope that the security improvements of ntpsec and > those of snap's for packaging will one day stack up to be even better > together. Let's discuss. This looks like good work to me. I'm encouraging Christian to go forward. -- Eric S. Raymond From fallenpegasus at gmail.com Tue Aug 9 15:58:47 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Tue, 09 Aug 2016 15:58:47 +0000 Subject: Discussion about PR: WIP: Snapify ntpsec In-Reply-To: References: Message-ID: This looks great, Christian. Is there anything we need to do to have our buildbot system test it? ..m On Tue, Aug 9, 2016 at 8:10 AM Christian Ehrhardt < christian.ehrhardt at canonical.com> wrote: > Hi, > I wanted to give the ML a ping as well about this, so that not only the > Pull Request is existing. > Eventually one here might chime in as well. > > There is a prototype to snap ntpsec at > https://gitlab.com/NTPsec/ntpsec/merge_requests/49 > > I'll quote my PR text here and hope for a great discussion: > > "Hi, on one hand I worked on packaging ntp (classic) recently and on the > other hand I worked a bit with snapcraft (=> http://snapcraft.io/). I > really think ntpsec would be a perfect candidate to exploit snap packaging. > > Please consider this an RFC for now - following the spirit of NTPsec > contribution policy "Before starting significant work, please propose it > and discuss it first" I'll also write to the ML linking to this branch. But > also did I not just want to mention snapcraft and run away - instead I > thought to provide a prototype that can be tested, but discuss motivation, > tech and details before doing some more heavy lifting work. > > My current example is meant for a daily build, but this can easily be > changed to whatever you prefer. Snapcraft could - for example - build from > a stable branch of your tree automatically or whatever else you want. > > Benefits of exploiting snap(craft) in ntpsec (in my opinion): > > - for security it is often important to be able to push fixes fast to > consumers, snaps are great for that as it somewhat cut's out the > distributions as a gatekeeper of a release process > - ntpsec isn't packaged in distributions yet, an upload to the > snapstore would make you instantly available on multiple distributions > - faster development iteration cycles, which is especially useful for > new (or newly forked) projects > - and of course all the benefits listed at http://snapcraft.io/ > > Limitations: > > - this doesn't use any of the great snap isolation features yet (still > using --devmode to get the prototype fast). Implementing those will need a > few new interfaces and that effort should be spent after the discussion > (but on the good side, you haven't lost anything - just not gained all of > the snap isolation features yet). > - currently there is no snapcraft plugin for waf, so I provided one > (but I also started to push it to snapcraft already so it can be dropped > from ntpsec in a bit) > > I'm looking forward and hope that the security improvements of ntpsec and > those of snap's for packaging will one day stack up to be even better > together. Let's discuss. > > Kind Regards Christian > > P.S. FYI - I'm soon going to vaction - so please don't wonder if there is > kind of no-response between 13th and 23rd August. OTOH this gives everyone > more time to play and experiment with it." > > > > -- > Christian Ehrhardt > Software Engineer, Ubuntu Server > Canonical Ltd > _______________________________________________ > devel mailing list > devel at ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From verm at darkbeer.org Tue Aug 9 16:02:14 2016 From: verm at darkbeer.org (Amar Takhar) Date: Tue, 9 Aug 2016 16:02:14 +0000 Subject: Discussion about PR: WIP: Snapify ntpsec In-Reply-To: References: Message-ID: <20160809160214.GA16897@darkbeer.org> On 2016-08-09 15:58 +0000, Mark Atwood wrote: > This looks great, Christian. > > Is there anything we need to do to have our buildbot system test it? I will take a look at this later on and let you know. Amar. From christian.ehrhardt at canonical.com Wed Aug 10 08:27:11 2016 From: christian.ehrhardt at canonical.com (Christian Ehrhardt) Date: Wed, 10 Aug 2016 10:27:11 +0200 Subject: Discussion about PR: WIP: Snapify ntpsec In-Reply-To: <20160809160214.GA16897@darkbeer.org> References: <20160809160214.GA16897@darkbeer.org> Message-ID: Thank you all so much for your positive feedback and support on this. I'll try to poke on it again then when back from vacation in about two weeks. My personal plan would be: - get the devmode version rebased and make sure it has no obvious faults via more tests and discussions - get the PR accepted into ntpsec + here you can start to spin off a buildbot-driven-auto-upload if you want that - work step by step on all the pieces needed to make it run in strict confinement keeping you in the loop (that will be a long term effort) Kind Regards Christian On Tue, Aug 9, 2016 at 6:02 PM, Amar Takhar wrote: > On 2016-08-09 15:58 +0000, Mark Atwood wrote: > > This looks great, Christian. > > > > Is there anything we need to do to have our buildbot system test it? > > I will take a look at this later on and let you know. > > > Amar. > _______________________________________________ > devel mailing list > devel at ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel > -- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd -------------- next part -------------- An HTML attachment was scrubbed... URL: From esr at thyrsus.com Thu Aug 11 12:11:05 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 11 Aug 2016 08:11:05 -0400 (EDT) Subject: Need modification to waf build recipe Message-ID: <20160811121105.C8C9513A0E7D@snark.thyrsus.com> This is a heads-up to Amar. As part of the effort to rewrite ntpq in Python, I have renamed the old pylib/ directory to wafhelpers/ and created a new pylib/ that contains an installable Python module. (The rename is to maintain parallelism with perllib). On install, I need the build recipe to copy this module directory to rootspace where Python programs can import it under the name 'ntp'. I looked in the waf book to try to do this myself but have not found a recipe. Eventually this will replace perllib as we translate all the old Perl scripts to Python. -- Eric S. Raymond "Those who make peaceful revolution impossible will make violent revolution inevitable." -- John F. Kennedy From verm at darkbeer.org Thu Aug 11 12:46:46 2016 From: verm at darkbeer.org (Amar Takhar) Date: Thu, 11 Aug 2016 12:46:46 +0000 Subject: Need modification to waf build recipe In-Reply-To: <20160811121105.C8C9513A0E7D@snark.thyrsus.com> References: <20160811121105.C8C9513A0E7D@snark.thyrsus.com> Message-ID: <20160811124646.GA70530@darkbeer.org> On 2016-08-11 08:11 -0400, Eric S. Raymond wrote: > This is a heads-up to Amar. > > As part of the effort to rewrite ntpq in Python, I have renamed the old > pylib/ directory to wafhelpers/ and created a new pylib/ that contains > an installable Python module. (The rename is to maintain parallelism > with perllib). wafhelpers is a terrible name since they aren't helpers it's the entire build. Please call it 'wafbuild'. Some projects do use it as re-usable modules but that is not how ours is written. > On install, I need the build recipe to copy this module directory to > rootspace where Python programs can import it under the name 'ntp'. I > looked in the waf book to try to do this myself but have not found a > recipe. There is built-in support in waf to do this It's been a long time since I uhsed it I will take a look. > Eventually this will replace perllib as we translate all the old > Perl scripts to Python. Great! Amar. From esr at thyrsus.com Thu Aug 11 13:04:28 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 11 Aug 2016 09:04:28 -0400 Subject: Need modification to waf build recipe In-Reply-To: <20160811124646.GA70530@darkbeer.org> References: <20160811121105.C8C9513A0E7D@snark.thyrsus.com> <20160811124646.GA70530@darkbeer.org> Message-ID: <20160811130428.GA2513@thyrsus.com> Amar Takhar : > On 2016-08-11 08:11 -0400, Eric S. Raymond wrote: > > This is a heads-up to Amar. > > > > As part of the effort to rewrite ntpq in Python, I have renamed the old > > pylib/ directory to wafhelpers/ and created a new pylib/ that contains > > an installable Python module. (The rename is to maintain parallelism > > with perllib). > > wafhelpers is a terrible name since they aren't helpers it's the entire build. How is it the entire build? There are lots of wscript files not in there... > Please call it 'wafbuild'. Some projects do use it as re-usable modules but > that is not how ours is written. 'wafbuild' was in fact my first choice. Can't use it; there's a collision with something named 'wafbuild' that waf is using internally. I don't really care what it's named. -- Eric S. Raymond From verm at darkbeer.org Thu Aug 11 13:26:49 2016 From: verm at darkbeer.org (Amar Takhar) Date: Thu, 11 Aug 2016 13:26:49 +0000 Subject: Need modification to waf build recipe In-Reply-To: <20160811130428.GA2513@thyrsus.com> References: <20160811121105.C8C9513A0E7D@snark.thyrsus.com> <20160811124646.GA70530@darkbeer.org> <20160811130428.GA2513@thyrsus.com> Message-ID: <20160811132649.GA71311@darkbeer.org> On 2016-08-11 09:04 -0400, Eric S. Raymond wrote: > How is it the entire build? There are lots of wscript files not in there... That's the source designation files, all the logic is in the former 'pylib' dir. > 'wafbuild' was in fact my first choice. Can't use it; there's a collision with > something named 'wafbuild' that waf is using internally. > > I don't really care what it's named. I'll figure something out. More notice would have been nice. Amar. From fallenpegasus at gmail.com Thu Aug 11 21:28:56 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Thu, 11 Aug 2016 21:28:56 +0000 Subject: Next point release Message-ID: It is time for another point release. I've received enough private communications from people who are successfuilly running lab machines on tip, and people who are successfully running lab machines on the previous point release, and that my work with the CII is illuminating the need for proof of motion, that I have decided to make another point release. I understand the objection that we do not yet have a formalized release criteria system, nor do we yet have a formalized checklist or automated release process. As nobody wise is yet running us in load production, that is not yet an issue. This point release will not have Daniel's state machine patch merged in, because it is a minor risk I don't want to take for the point release just yet. Other than that, everyone please chime in: Daniel Eric Amar Hal everyone else? Please get your low risk bug fixes in in the next couple of days. Unless there is a credible stop reason, I will issue the tag this coming Tuesday morning. Thank you everyone! ..m -------------- next part -------------- An HTML attachment was scrubbed... URL: From verm at darkbeer.org Thu Aug 11 21:33:25 2016 From: verm at darkbeer.org (Amar Takhar) Date: Thu, 11 Aug 2016 21:33:25 +0000 Subject: Next point release In-Reply-To: References: Message-ID: <20160811213325.GA81884@darkbeer.org> On 2016-08-11 21:28 +0000, Mark Atwood wrote: > It is time for another point release. > > I've received enough private communications from people who are successfuilly > running lab machines on tip, and people who are successfully running lab > machines on the previous point release, and that my work with the CII is > illuminating the need for proof of motion, that I have decided to make another > point release. Can you quantify what 'lab machine' means here? Are we talking scientific laboratories running science equipment or general purpose lab workstations? > I understand the objection that we do not yet have a formalized release > criteria system, nor do we yet have a formalized checklist or automated release > process.?? As nobody wise is yet running us in load production, that is not yet > an issue. That's not really a reason to now have a formalised process if we're going to do it in the future we should do it now. It lets us have a chance at getting organised under it when it comes time for it to be a hard set of rules to follow it will be easier. > Please get your low risk bug fixes in in the next couple of days. > > Unless there is a credible stop reason, I will issue the tag this coming > Tuesday morning. Tuesday should be fine. There are some build changes I need to make to handle the new Python module I will do that this weekend. Amar. From gem at rellim.com Thu Aug 11 21:34:53 2016 From: gem at rellim.com (Gary E. Miller) Date: Thu, 11 Aug 2016 14:34:53 -0700 Subject: Next point release In-Reply-To: References: Message-ID: <20160811143453.7761bfc0@spidey.rellim.com> Yo Mark! On Thu, 11 Aug 2016 21:28:56 +0000 Mark Atwood wrote: > It is time for another point release. Not yet, unless something is urgent. In the middle of August no one will see it. It will get lost in the haze of summer. People are out and about, or watching the Olympics. If you wait until most Universities are back in session then all the script kiddies will jump on it. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From esr at thyrsus.com Thu Aug 11 22:28:40 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 11 Aug 2016 18:28:40 -0400 Subject: Next point release In-Reply-To: References: Message-ID: <20160811222840.GB12690@thyrsus.com> Mark Atwood : > It is time for another point release. No blockers. I was expecting this and have been cleaning up various minor things on the issues list. > This point release will not have Daniel's state machine patch merged in, > because it is a minor risk I don't want to take for the point release just > yet. Good plan. I have every confidence in Daniel, but rushing the new code into production would be a needless risk. I'd like to have it simmering on the test farm machines for a month or so before we ship it. > Please get your low risk bug fixes in in the next couple of days. I should have ntpviz ready to ship this weekend, so we'll have a nice tasty feature to brag about in the release notes. > Unless there is a credible stop reason, I will issue the tag this coming > Tuesday morning. Hal, I think that tells us when we dive into TESTFRAME - right after that. -- Eric S. Raymond From hmurray at megapathdsl.net Thu Aug 11 23:05:36 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Thu, 11 Aug 2016 16:05:36 -0700 Subject: Using first server to respond, Issue #68 Message-ID: <20160811230536.E60D1406076@ip-64-139-1-69.sjc.megapath.net> I think I have found the problem. minsane defaults to 1 so ntpd is "happy" as soon as a server gets past the individual server filtering. I don't see any simple fix. Even if minsane is set (via ntp.conf) to something larger, there is still the possibility that a majority of the first N servers are not the ones you want. We might be able to work out something based on how many servers you are using, but that gets tangled up with how many servers to use and why. (If you haven't been around for a while, that's an endless topic.) We could add a warning. Does anybody look at them? It won't break anything, so I guess I'll give it a try. -- These are my opinions. I hate spam. From hmurray at megapathdsl.net Fri Aug 12 03:10:22 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Thu, 11 Aug 2016 20:10:22 -0700 Subject: Next point release Message-ID: <20160812031022.C662B406076@ip-64-139-1-69.sjc.megapath.net> I don't know of any problems that need fixing. fallenpegasus at gmail.com said: > I understand the objection that we do not yet have a formalized release > criteria system, nor do we yet have a formalized checklist or automated > release process. As nobody wise is yet running us in load production, that > is not yet an issue. Please take notes on what you do for the release. If you send them to me, I'll try to turn them into some sort of document. I'm not looking for anything ultra-detailed, just enough reminders that a step won't fall through the cracks. Chunks that can be copy/pasted would be good. If the purpose isn't obvious, then I'd like a few words to describe why/what. -- These are my opinions. I hate spam. From fallenpegasus at gmail.com Fri Aug 12 15:10:13 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Fri, 12 Aug 2016 15:10:13 +0000 Subject: Using first server to respond, Issue #68 In-Reply-To: <20160811230536.E60D1406076@ip-64-139-1-69.sjc.megapath.net> References: <20160811230536.E60D1406076@ip-64-139-1-69.sjc.megapath.net> Message-ID: Start with the warning, while we think of a solution. Thank you, Hal. On Thu, Aug 11, 2016 at 4:05 PM Hal Murray wrote: > > I think I have found the problem. > > minsane defaults to 1 so ntpd is "happy" as soon as a server gets past the > individual server filtering. > > I don't see any simple fix. Even if minsane is set (via ntp.conf) to > something larger, there is still the possibility that a majority of the > first > N servers are not the ones you want. > > We might be able to work out something based on how many servers you are > using, but that gets tangled up with how many servers to use and why. (If > you haven't been around for a while, that's an endless topic.) > > We could add a warning. Does anybody look at them? It won't break > anything, > so I guess I'll give it a try. > > > -- > These are my opinions. I hate spam. > > > > _______________________________________________ > devel mailing list > devel at ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hmurray at megapathdsl.net Sun Aug 14 18:07:47 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Sun, 14 Aug 2016 11:07:47 -0700 Subject: Why did you remove that "unnecessary" link? In-Reply-To: Message from esr@thyrsus.com (Eric S. Raymond) of "Sun, 14 Aug 2016 13:04:47 EDT." <20160814170447.754FE13A0E7D@snark.thyrsus.com> Message-ID: <20160814180747.A206B406078@ip-64-139-1-69.sjc.megapath.net> esr at thyrsus.com said: > You broke my test setup. The purpose of that link was to allow the > neighboring pyntpq script to import ntp.packet even when the corresponding > Python module under pylib hasn't been installed in rootspace. Which it > never is, yet - the waf production to do that hasn't landed. > I guess I'll add a README so this doesn't happen again. Sorry. I thought I was helping by removing some cruft. It built and tested without that link. Can you add a test that will catch it? That opens up the whole mess of testing with local libraries vs installed libraries. I remember lots of "fun" in that area for gpsd. -- These are my opinions. I hate spam. From esr at thyrsus.com Sun Aug 14 18:22:16 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Aug 2016 14:22:16 -0400 Subject: Why did you remove that "unnecessary" link? In-Reply-To: <20160814180747.A206B406078@ip-64-139-1-69.sjc.megapath.net> References: <20160814170447.754FE13A0E7D@snark.thyrsus.com> <20160814180747.A206B406078@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160814182216.GA9614@thyrsus.com> Hal Murray : > Can you add a test that will catch it? Yes, maybe a pretty trivial one. But we don't really have the larger check framework for it to live inside yet; that gets into some waf issues rgat I need to chase down in my copious free time. > That opens up the whole mess of testing with local libraries vs installed > libraries. I remember lots of "fun" in that area for gpsd. You misspelled "I'd rather have root canal than go through that again." Yeesh... On NTPsec, I'm planning to stay strictly away from building shared libraries and Python extensions written in C for that exact reason. They were a fun stunt in GPSD but not worth the downstream complexity cost. -- Eric S. Raymond From hmurray at megapathdsl.net Sun Aug 14 22:49:15 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Sun, 14 Aug 2016 15:49:15 -0700 Subject: Possible abuse from fetching the leap second file Message-ID: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> Matt Selsky is working on Pythonizing the script that grabs a new leap second file. The idea is to run a cron job that keeps it up to date. That opens an interesting can of worms. As a general rule, you shouldn't use a resource on a system that you don't own without permission from the owner. Informed consent might be a better term. A system open to occasional downloads by a human might not be willing to support automated fetches from many many systems. This case is doubly nasty in two ways. First, the load will normally be light but then go up sharply 60 days before the file expires. (The doc mentions a crontab, but I can't find specifics.) That could easily turn into a DDoS. Second, the URL from NIST is unreliable[1] and the IEFT clone is out of date. It's not obvious that NIST is expecting to support non US clients or that either NIST or IEFT is prepared to support high volumes of automated fetches. The clean solution is for us to provide the server(s), or at least the DNS so we can provide the servers tomorrow. That commits us to long term support, but since we have control of everything we can fix it if something goes wrong. Does anybody know how many downloads/hour a cloud server can suppor? I'm interested in this simple case, just downloading a small file, no fancy database processing. Are there special web server packages designed for this case? How many clients are we expecting to run this code? Another approach might be to get the time zone people to distribute the leap second file too. That seems to get updated often enough. --------- 1] The current URL is ftp://time.nist.gov/pub/leap-seconds.list DNS for time.nist.gov is setup for time, not ftp. It rotates through all their public NTP servers and many of them don't support ftp. Matt: The current code has an option to restart ntpd. The current ntpd will check for a new leap file on SIGHUP but that will kill ntp-classic. Please see if you can find a simple way to spread the load. We can reduce the load on the servers by a factor of 30 if you can spread that out over a month. -- These are my opinions. I hate spam. From dan-ntp at drown.org Mon Aug 15 00:15:32 2016 From: dan-ntp at drown.org (Dan Drown) Date: Sun, 14 Aug 2016 19:15:32 -0500 Subject: Possible abuse from fetching the leap second file Message-ID: <20160814191532.Horde.4LVSOxSUn-r6y6wqTb9LOCq@mail.drown.org> Quoting Hal Murray : > Matt Selsky is working on Pythonizing the script that grabs a new leap second > file. The idea is to run a cron job that keeps it up to date. That opens an > interesting can of worms. > > As a general rule, you shouldn't use a resource on a system that you don't > own without permission from the owner. Informed consent might be a better > term. A system open to occasional downloads by a human might not be willing > to support automated fetches from many many systems. > > This case is doubly nasty in two ways. > > First, the load will normally be light but then go up sharply 60 days before > the file expires. (The doc mentions a crontab, but I can't find specifics.) > That could easily turn into a DDoS. I agree that it's impolite to automate this. What's ok for 100 servers to do isn't ok for 1 million. > Second, the URL from NIST is unreliable[1] and the IEFT clone is out of date. > It's not obvious that NIST is expecting to support non US clients or that > either NIST or IEFT is prepared to support high volumes of automated fetches. > > The clean solution is for us to provide the server(s), or at least the DNS so > we can provide the servers tomorrow. That commits us to long term support, > but since we have control of everything we can fix it if something > goes wrong. > > Does anybody know how many downloads/hour a cloud server can suppor? I'm > interested in this simple case, just downloading a small file, no fancy > database processing. Are there special web server packages designed for this > case? There are a few webservers designed for high connection count static file serving - lighttpd, nginx are two examples I'd guess downloads/hour would be mainly limited on the packets per second side of things (especially on a cloud server, which are usually bad at high packets per second rates). Starting with 100k packets per second and 21 packets to complete a http GET for the leapsecond file. This gives a rate of 4,761 requests per second completed (and 409Mbit/s rate). After an hour, that's 17 million requests completed (182.4 Gbyte out, ~$16 in EC2). Looking at it in a different way, let's take a theoretical cloud server that includes 4TB/month transfer. That plan would cover around 372 million requests for the leapsecond file over a month (at an average rate of around 143 requests/second). This is also a thing that would be easy to mirror. You'd want to distribute a gpg external signature with the file (updated every 6 months?), so end users could be confident the leapsecond file wasn't messed with by a mirror. All those numbers were with HTTP overhead, HTTPS overhead reduces these numbers by around 33%. > How many clients are we expecting to run this code? > > Another approach might be to get the time zone people to distribute the leap > second file too. That seems to get updated often enough. I'm using chrony's feature to read the leapsecond from the timezone files: https://chrony.tuxfamily.org/manual.html#leapsectz-directive I like this because the leapsecond updates come with regular OS updates. Doesn't look like Ubuntu or Fedora have the Dec 31, 2016 leap second yet, though. [2015] $ TZ=right/UTC date -d 'Jun 30 2015 23:59:60' Tue Jun 30 23:59:60 UTC 2015 [2016] $ TZ=right/UTC date -d 'Dec 31 2016 23:59:60' date: invalid date ?Dec 31 2016 23:59:60? > 1] The current URL is ftp://time.nist.gov/pub/leap-seconds.list > DNS for time.nist.gov is setup for time, not ftp. It rotates through all > their public NTP servers and many of them don't support ftp. > > > Matt: The current code has an option to restart ntpd. The current ntpd will > check for a new leap file on SIGHUP but that will kill ntp-classic. > > Please see if you can find a simple way to spread the load. We can reduce > the load on the servers by a factor of 30 if you can spread that out over a > month. From esr at thyrsus.com Mon Aug 15 01:28:01 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Aug 2016 21:28:01 -0400 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> References: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815012801.GA16465@thyrsus.com> Hal Murray : > Matt Selsky is working on Pythonizing the script that grabs a new leap second > file. The idea is to run a cron job that keeps it up to date. That opens an > interesting can of worms. > > As a general rule, you shouldn't use a resource on a system that you don't > own without permission from the owner. Informed consent might be a better > term. A system open to occasional downloads by a human might not be willing > to support automated fetches from many many systems. While I accept this as a general principle, is there anything about the new ntpleapfetch that inflicts a heavier load than the old ntpleapfetch has been causing for decades with the tolerance of NIST and USNO? If not, then I think we get to mutter "customary usage" and move on. I will also note that the GPSD build process has actually been doing something very like ntpleapfetch (to get the current leap-second so it can be compiled into the build) for about a decade. I didn't see it as a potential problem when I wrote it, and nobody associated with the targeted servers has ever complained to me. -- Eric S. Raymond From hmurray at megapathdsl.net Mon Aug 15 06:02:08 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Sun, 14 Aug 2016 23:02:08 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: Message from "Eric S. Raymond" of "Sun, 14 Aug 2016 21:28:01 EDT." <20160815012801.GA16465@thyrsus.com> Message-ID: <20160815060208.3D77A406070@ip-64-139-1-69.sjc.megapath.net> esr at thyrsus.com said: > While I accept this as a general principle, is there anything about the new > ntpleapfetch that inflicts a heavier load than the old ntpleapfetch has been > causing for decades with the tolerance of NIST and USNO? The old stuff has poor publicity. None of the major distros/OSes come setup to run it from a cron job. As long as you don't change that we won't have any problems. The problem will happen if somebody improves our documentation enough so that somebody notices, and that seems reasonably likely. > I will also note that the GPSD build process has actually been doing > something very like ntpleapfetch (to get the current leap-second so it can > be compiled into the build) for about a decade. I didn't see it as a > potential problem when I wrote it, and nobody associated with the targeted > servers has ever complained to me. Two things: One, that only happens when building from source. All the systems running gpsd as provided by their distro aren't doing that. Two, it's not synchronized so that a zillion systems all try to do it 60 days before the file expires. -- These are my opinions. I hate spam. From kurt at roeckx.be Mon Aug 15 09:29:28 2016 From: kurt at roeckx.be (Kurt Roeckx) Date: Mon, 15 Aug 2016 11:29:28 +0200 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> References: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815092928.GA1814@roeckx.be> On Sun, Aug 14, 2016 at 03:49:15PM -0700, Hal Murray wrote: > Matt Selsky is working on Pythonizing the script that grabs a new leap second > file. The idea is to run a cron job that keeps it up to date. That opens an > interesting can of worms. I've been pondering about using ntp-classic's script for that in Debian, but decided not to. I also don't know if there is any other Linux distribution that automates this. However, I already get a /usr/share/zoneinfo/leap-seconds.list from the tzdata package that updates the timezone information and I should probably look into using that file to update things. Those files get regular updates. The tz at iana.org list has been notified, the git repository updated, there just hans't been a release since. There has been a proposal by Poul-Henning Kamp to use an DNS A-record for it, see: http://phk.freebsd.dk/time/20151122.html I don't know if this went anywhere. Kurt From esr at thyrsus.com Mon Aug 15 12:37:14 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Aug 2016 08:37:14 -0400 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815060208.3D77A406070@ip-64-139-1-69.sjc.megapath.net> References: <20160815012801.GA16465@thyrsus.com> <20160815060208.3D77A406070@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815123714.GA11020@thyrsus.com> Hal Murray : > > esr at thyrsus.com said: > > While I accept this as a general principle, is there anything about the new > > ntpleapfetch that inflicts a heavier load than the old ntpleapfetch has been > > causing for decades with the tolerance of NIST and USNO? > > The old stuff has poor publicity. None of the major distros/OSes come setup > to run it from a cron job. As long as you don't change that we won't have > any problems. > > The problem will happen if somebody improves our documentation enough so that > somebody notices, and that seems reasonably likely. I've thought about this some more, and now I am in doubt that the general principle (don't use other peoples' resources without their permission) applies here. I think we need to apply what tort law would call a reasonable-person test. Some kinds of public-facing offer of a resource clearly constitute an implied invitation to download it as needed. Consider, for example, a web page. I think the NIST/IERS public offer of an authoritative leap-second resource constitutes the same sort of invitation. If you disagree, ask yourself if your evaluation would change if that data were in HTML and accessed through port 80, or accessed by anonymous FTP. Surely the mechanics of how it's downloaded are irrelevant to the ethics of the situation! That said, I think we do have a duty in this case, which is to implement some load-spreading so that the process doesn't hit those servers harder than it has to. A random delay on the fetch would be polite. -- Eric S. Raymond From kurt at roeckx.be Mon Aug 15 13:11:07 2016 From: kurt at roeckx.be (Kurt Roeckx) Date: Mon, 15 Aug 2016 15:11:07 +0200 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815123714.GA11020@thyrsus.com> References: <20160815012801.GA16465@thyrsus.com> <20160815060208.3D77A406070@ip-64-139-1-69.sjc.megapath.net> <20160815123714.GA11020@thyrsus.com> Message-ID: <20160815131107.vsj5ozlh3dlb5ip4@roeckx.be> On Mon, Aug 15, 2016 at 08:37:14AM -0400, Eric S. Raymond wrote: > Hal Murray : > > > > esr at thyrsus.com said: > > > While I accept this as a general principle, is there anything about the new > > > ntpleapfetch that inflicts a heavier load than the old ntpleapfetch has been > > > causing for decades with the tolerance of NIST and USNO? > > > > The old stuff has poor publicity. None of the major distros/OSes come setup > > to run it from a cron job. As long as you don't change that we won't have > > any problems. > > > > The problem will happen if somebody improves our documentation enough so that > > somebody notices, and that seems reasonably likely. > > I've thought about this some more, and now I am in doubt that the > general principle (don't use other peoples' resources without their > permission) applies here. I think we need to apply what tort law > would call a reasonable-person test. > > Some kinds of public-facing offer of a resource clearly constitute an > implied invitation to download it as needed. Consider, for example, a web page. > > I think the NIST/IERS public offer of an authoritative leap-second resource > constitutes the same sort of invitation. If you disagree, ask yourself > if your evaluation would change if that data were in HTML and accessed > through port 80, or accessed by anonymous FTP. Surely the mechanics > of how it's downloaded are irrelevant to the ethics of the situation! That doesn't mean you should go and have millions of clients change from not checking it to downloading it without at least warning them about it. It doesn't help anybody if we overload the servers. Kurt From esr at thyrsus.com Mon Aug 15 14:29:25 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Aug 2016 10:29:25 -0400 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815131107.vsj5ozlh3dlb5ip4@roeckx.be> References: <20160815012801.GA16465@thyrsus.com> <20160815060208.3D77A406070@ip-64-139-1-69.sjc.megapath.net> <20160815123714.GA11020@thyrsus.com> <20160815131107.vsj5ozlh3dlb5ip4@roeckx.be> Message-ID: <20160815142925.GA13318@thyrsus.com> Kurt Roeckx : > That doesn't mean you should go and have millions of clients > change from not checking it to downloading it without at least > warning them about it. It doesn't help anybody if we overload the > servers. Fair enough. Can we identify the responsible persons to communicate with? -- Eric S. Raymond From fallenpegasus at gmail.com Mon Aug 15 14:58:07 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Mon, 15 Aug 2016 14:58:07 +0000 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> References: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> Message-ID: The long term, I like the DNS for solutions to this kind of problem. But, under what name? Other solutions are putting it in AWS & Cloudfront, and in their equivalents at AZR and at GCS. To take that route, I would want to arrange that Amazon, Microsoft, and Google donate that capacity. The those 3 cloud CDNs could handle that load. But, that will take negotation time, and programming time we don't have right now. An even faster to implement solution would be to put it in github.com. We could do that today, and it would cost us nothing, and github on their backend smoothly pours very high demand raw pages into the the assorted worldwide cloud providers and into the CDNs. Plus it versions the data, and they have wellknown TLS certs. Let's do that! Hal, others, do you happen to have copies of all the past leap files, so we can synthesize a git history for it? ..m On Sun, Aug 14, 2016 at 3:49 PM Hal Murray wrote: > Matt Selsky is working on Pythonizing the script that grabs a new leap > second > file. The idea is to run a cron job that keeps it up to date. That opens > an > interesting can of worms. > > As a general rule, you shouldn't use a resource on a system that you don't > own without permission from the owner. Informed consent might be a better > term. A system open to occasional downloads by a human might not be > willing > to support automated fetches from many many systems. > > This case is doubly nasty in two ways. > > First, the load will normally be light but then go up sharply 60 days > before > the file expires. (The doc mentions a crontab, but I can't find > specifics.) > That could easily turn into a DDoS. > > Second, the URL from NIST is unreliable[1] and the IEFT clone is out of > date. > It's not obvious that NIST is expecting to support non US clients or that > either NIST or IEFT is prepared to support high volumes of automated > fetches. > > The clean solution is for us to provide the server(s), or at least the DNS > so > we can provide the servers tomorrow. That commits us to long term support, > but since we have control of everything we can fix it if something goes > wrong. > > Does anybody know how many downloads/hour a cloud server can suppor? I'm > interested in this simple case, just downloading a small file, no fancy > database processing. Are there special web server packages designed for > this > case? > > How many clients are we expecting to run this code? > > Another approach might be to get the time zone people to distribute the > leap > second file too. That seems to get updated often enough. > > --------- > > 1] The current URL is ftp://time.nist.gov/pub/leap-seconds.list > DNS for time.nist.gov is setup for time, not ftp. It rotates through all > their public NTP servers and many of them don't support ftp. > > > Matt: The current code has an option to restart ntpd. The current ntpd > will > check for a new leap file on SIGHUP but that will kill ntp-classic. > > Please see if you can find a simple way to spread the load. We can reduce > the load on the servers by a factor of 30 if you can spread that out over a > month. > > > -- > These are my opinions. I hate spam. > > > > _______________________________________________ > devel mailing list > devel at ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From esr at thyrsus.com Mon Aug 15 15:14:10 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Aug 2016 11:14:10 -0400 Subject: Possible abuse from fetching the leap second file In-Reply-To: References: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815151410.GA14218@thyrsus.com> Mark Atwood : > Let's do that! Hal, others, do you happen to have copies of all the past > leap files, so we can synthesize a git history for it? We don't need all the past leap files. The data is only ever modified by appending a line. -- Eric S. Raymond From kurt at roeckx.be Mon Aug 15 15:44:36 2016 From: kurt at roeckx.be (Kurt Roeckx) Date: Mon, 15 Aug 2016 17:44:36 +0200 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815151410.GA14218@thyrsus.com> References: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> <20160815151410.GA14218@thyrsus.com> Message-ID: <20160815154436.jvsgmgdzbljkqict@roeckx.be> On Mon, Aug 15, 2016 at 11:14:10AM -0400, Eric S. Raymond wrote: > Mark Atwood : > > Let's do that! Hal, others, do you happen to have copies of all the past > > leap files, so we can synthesize a git history for it? > > We don't need all the past leap files. The data is only ever modified by > appending a line. There is also at least a timestamp until when it's valid updated. It also contains a hash sum. They have actually changed the text over the years. You can at least get a short history at: https://github.com/eggert/tz/commits/master/leap-seconds.list If you want to contact the NIST people, there is actually a contact address in the file. Kurt From fallenpegasus at gmail.com Mon Aug 15 17:17:06 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Mon, 15 Aug 2016 17:17:06 +0000 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815151410.GA14218@thyrsus.com> References: <20160814224915.BC72A406070@ip-64-139-1-69.sjc.megapath.net> <20160815151410.GA14218@thyrsus.com> Message-ID: On Mon, Aug 15, 2016 at 8:14 AM Eric S. Raymond wrote: > We don't need all the past leap files. The data is only ever modified by > appending a line. > > I'm a crazy competitionist. Git resources need histories. OTOH, it looks like the eggert/tz repo already exists. On the other other hand, it doesn't look like it's got the historical history. ..m -------------- next part -------------- An HTML attachment was scrubbed... URL: From hmurray at megapathdsl.net Mon Aug 15 20:26:31 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Mon, 15 Aug 2016 13:26:31 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: Message from Kurt Roeckx of "Mon, 15 Aug 2016 11:29:28 +0200." <20160815092928.GA1814@roeckx.be> Message-ID: <20160815202631.3F337406077@ip-64-139-1-69.sjc.megapath.net> kurt at roeckx.be said: > However, I already get a /usr/share/zoneinfo/leap-seconds.list from the > tzdata package that updates the timezone information and I should probably > look into using that file to update things. Those files get regular updates. > The tz at iana.org list has been notified, the git repository updated, there > just hans't been a release since. Thanks. I didn't know it was getting distributed. That looks like the way to go. You don't need to "update things". Just point ntp.conf at that file. I don't see that file in Fedora yet. Time to see if I can help get it there. -- These are my opinions. I hate spam. From hmurray at megapathdsl.net Mon Aug 15 20:40:31 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Mon, 15 Aug 2016 13:40:31 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: Message from "Eric S. Raymond" of "Mon, 15 Aug 2016 08:37:14 EDT." <20160815123714.GA11020@thyrsus.com> Message-ID: <20160815204032.03169406077@ip-64-139-1-69.sjc.megapath.net> esr at thyrsus.com said: > I've thought about this some more, and now I am in doubt that the general > principle (don't use other peoples' resources without their permission) > applies here. I think we need to apply what tort law would call a > reasonable-person test. > Some kinds of public-facing offer of a resource clearly constitute an > implied invitation to download it as needed. Consider, for example, a web > page. I think you are just plain wrong. The invitation to download a web page has some expectations of reasonable use. What's reasonable for an individual trying to answer a question using google and following links is no longer reasonable if it gets automated so that every box running ntp tries to download a file on the same day. I'm not a lawyer etc. Consider a public drinking fountain. That's not an invitation to fill up your tanker truck to water your lawn. Besides, as Kurt Roeckx has pointed out, there is a much much better way. Just use the copy that gets distributed by the time zone package. It's already there on Debian and Ubuntu and Raspberian. -- These are my opinions. I hate spam. From Matthew.Selsky at twosigma.com Mon Aug 15 20:50:22 2016 From: Matthew.Selsky at twosigma.com (Matthew Selsky) Date: Mon, 15 Aug 2016 16:50:22 -0400 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815204032.03169406077@ip-64-139-1-69.sjc.megapath.net> References: <20160815123714.GA11020@thyrsus.com> <20160815204032.03169406077@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815205022.GR25969@twosigma.com> On Mon, Aug 15, 2016 at 01:40:31PM -0700, Hal Murray wrote: > Besides, as Kurt Roeckx has pointed out, there is a much much better way. > Just use the copy that gets distributed by the time zone package. It's > already there on Debian and Ubuntu and Raspberian. This file is only on some distributions: Jessie has the file: https://packages.debian.org/jessie/all/tzdata/filelist Wheezy does not have the file: https://packages.debian.org/wheezy/all/tzdata/filelist Cheers, -Matt From kurt at roeckx.be Mon Aug 15 20:59:16 2016 From: kurt at roeckx.be (Kurt Roeckx) Date: Mon, 15 Aug 2016 22:59:16 +0200 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815202631.3F337406077@ip-64-139-1-69.sjc.megapath.net> References: <20160815092928.GA1814@roeckx.be> <20160815202631.3F337406077@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815205915.kgldgucwzpciryvs@roeckx.be> On Mon, Aug 15, 2016 at 01:26:31PM -0700, Hal Murray wrote: > > kurt at roeckx.be said: > > However, I already get a /usr/share/zoneinfo/leap-seconds.list from the > > tzdata package that updates the timezone information and I should probably > > look into using that file to update things. Those files get regular updates. > > The tz at iana.org list has been notified, the git repository updated, there > > just hans't been a release since. > > Thanks. I didn't know it was getting distributed. That looks like the way > to go. > > You don't need to "update things". Just point ntp.conf at that file. With update things I was just thinking that I would need to add a trigger when tzdata is update so I can either reload the config file or restart ntpd, depending on what's needed to get it to check the file. In any case, I will never use a cron job in Debian for this, I will use the one provided by tzdata. Kurt From hmurray at megapathdsl.net Mon Aug 15 21:03:23 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Mon, 15 Aug 2016 14:03:23 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: Message from Mark Atwood of "Mon, 15 Aug 2016 14:58:07 -0000." Message-ID: <20160815210323.785EF406077@ip-64-139-1-69.sjc.megapath.net> fallenpegasus at gmail.com said: > The long term, I like the DNS for solutions to this kind of problem. But, > under what name? I think piggybacking on the time zone database is a better solution. There are 2 parts to the problem. One is when the next leap second is scheduled. The other is the whole chain of leap seconds. The kernel needs the first which what ntpd tells it. The second is needed by time conversion routines. The current DNS implementation is a demo. If you want to use it for real, you will have to setup an infrastructure to run highly reliable servers that are likely to be targets for script kiddies. fallenpegasus at gmail.com said: > Let's do that! Hal, others, do you happen to have copies of all the past > leap files, so we can synthesize a git history for it? I have some, not all. There is a good chance you could get them all from NIST. -- These are my opinions. I hate spam. From hmurray at megapathdsl.net Mon Aug 15 21:09:07 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Mon, 15 Aug 2016 14:09:07 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: Message from Kurt Roeckx of "Mon, 15 Aug 2016 22:59:16 +0200." <20160815205915.kgldgucwzpciryvs@roeckx.be> Message-ID: <20160815210908.0687C406077@ip-64-139-1-69.sjc.megapath.net> kurt at roeckx.be said: > You don't need to "update things". Just point ntp.conf at that file. > With update things I was just thinking that I would need to add a trigger > when tzdata is update so I can either reload the config file or restart > ntpd, depending on what's needed to get it to check the file. If you send SIGHUP to ntpd it will reload a new file. You can see that in syslog or the ntpd log file. If you do nothing, ntpd checks once a day so it will pick up the new version within 24 hours. SIGHUP will kill ntp classic so doing nothing seems like the conservative approach. I think it's good enough. -- These are my opinions. I hate spam. From esr at thyrsus.com Mon Aug 15 21:17:48 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Aug 2016 17:17:48 -0400 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815210323.785EF406077@ip-64-139-1-69.sjc.megapath.net> References: <20160815210323.785EF406077@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815211748.GC19860@thyrsus.com> Hal Murray : > > fallenpegasus at gmail.com said: > > The long term, I like the DNS for solutions to this kind of problem. But, > > under what name? > > I think piggybacking on the time zone database is a better solution. I concur. Anything that touches DNS administration tends to get messy. -- Eric S. Raymond From hmurray at megapathdsl.net Mon Aug 15 21:20:23 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Mon, 15 Aug 2016 14:20:23 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: Message from Matthew Selsky of "Mon, 15 Aug 2016 16:50:22 EDT." <20160815205022.GR25969@twosigma.com> Message-ID: <20160815212023.EAC37406077@ip-64-139-1-69.sjc.megapath.net> Matthew.Selsky at twosigma.com said: > This file is only on some distributions: Jessie has the file: https:// > packages.debian.org/jessie/all/tzdata/filelist Wheezy does not have the > file: https://packages.debian.org/wheezy/all/tzdata/filelist I was assuming that it would be good enough if the file was available on the latest supported release. That's assuming that future releases will continue to support it. The current code works without a leap file. It uses a local refclock or a majority vote of the servers it is using. -- These are my opinions. I hate spam. From gem at rellim.com Mon Aug 15 21:41:26 2016 From: gem at rellim.com (Gary E. Miller) Date: Mon, 15 Aug 2016 14:41:26 -0700 Subject: Possible abuse from fetching the leap second file In-Reply-To: <20160815210908.0687C406077@ip-64-139-1-69.sjc.megapath.net> References: <20160815205915.kgldgucwzpciryvs@roeckx.be> <20160815210908.0687C406077@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160815144126.321370e4@spidey.rellim.com> Yo Hal! On Mon, 15 Aug 2016 14:09:07 -0700 Hal Murray wrote: > kurt at roeckx.be said: > > You don't need to "update things". Just point ntp.conf at that > > file. With update things I was just thinking that I would need to > > add a trigger when tzdata is update so I can either reload the > > config file or restart ntpd, depending on what's needed to get it > > to check the file. > > If you send SIGHUP to ntpd it will reload a new file. You can see > that in syslog or the ntpd log file. I just pulled git. I tried -HUP, this is all that happened: 08-15T14:39:18 ntpd[19189]: Saw SIGHUP 08-15T14:39:18 ntpd[19189]: reopen_logfile: same length, ignored What I need ntpd to do on a -HUP is reread ntp.conf. > SIGHUP will kill ntp classic so doing nothing seems like the > conservative approach. I think it's good enough. Yes. RGDS GARY --------------------------------------------------------------------------- Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703 gem at rellim.com Tel:+1 541 382 8588 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From esr at thyrsus.com Thu Aug 18 14:16:24 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 18 Aug 2016 10:16:24 -0400 Subject: stats directory is gone - now, how is logfile pruning done? Message-ID: <20160818141624.GA6606@thyrsus.com> Because ntpviz is now working and documented, I removed a pretty large volume of crufty ancient scripts for postprocessing statfiles from util/ and util/stats this morning. I will be astonished if anyone misses them. I merged the documentation under util/stats into docs/, where it should have been in the first place. Being able to do this was a pretty major victory for the maintainability of the codebase, cutting down the number of languages we have to worry about by 2 - awk and S. Eventually Perl will also be banished, leaving Python and sh as the only scripting languages in place. At which point I will be looking hard at getting rid of sh. This does, however, leave me with a question: How are we doing pruning of statfiles at this point? I seem to recall someone who is not me working on this. The facility, whatever it is, needs to be documented. -- Eric S. Raymond From hmurray at megapathdsl.net Fri Aug 19 00:27:14 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Thu, 18 Aug 2016 17:27:14 -0700 Subject: stats directory is gone - now, how is logfile pruning done? In-Reply-To: Message from "Eric S. Raymond" of "Thu, 18 Aug 2016 10:16:24 EDT." <20160818141624.GA6606@thyrsus.com> Message-ID: <20160819002714.DA217406077@ip-64-139-1-69.sjc.megapath.net> esr at thyrsus.com said: > This does, however, leave me with a question: How are we doing pruning of > statfiles at this point? I seem to recall someone who is not me working on > this. The facility, whatever it is, needs to be documented. Whatever you do, make sure it's easy to get the no-prune option. I suspect that's going to be a distro option and whatever you do will just be an example. Most people won't care about how well their clock is working as long at it works well enough so they don't have to pay any attention to it. Debian sets things up so that /etc/cron.daily/ntp compresses things and only keeps the last week. I don't know why they did it that way rather than letting logrotate do it. -- These are my opinions. I hate spam. From esr at thyrsus.com Fri Aug 19 03:01:15 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 18 Aug 2016 23:01:15 -0400 Subject: stats directory is gone - now, how is logfile pruning done? In-Reply-To: <20160819002714.DA217406077@ip-64-139-1-69.sjc.megapath.net> References: <20160818141624.GA6606@thyrsus.com> <20160819002714.DA217406077@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160819030115.GA15248@thyrsus.com> Heads up, Mark! Policy sanity check requested. Hal Murray : > > esr at thyrsus.com said: > > This does, however, leave me with a question: How are we doing pruning of > > statfiles at this point? I seem to recall someone who is not me working on > > this. The facility, whatever it is, needs to be documented. > > Whatever you do, make sure it's easy to get the no-prune option. > > I suspect that's going to be a distro option and whatever you do will just be > an example. Most people won't care about how well their clock is working as > long at it works well enough so they don't have to pay any attention to it. > > Debian sets things up so that /etc/cron.daily/ntp > compresses things and only keeps the last week. > I don't know why they did it that way rather than letting logrotate do it. OK, you've told me the important thing: the NTP suite is not historically expected to do statfile pruning itself. That's fine, then. My decision: We won't try taking the job over from the distros - I'm quite happy to let this be somebody else's problem. OTOH. if someone pushes a well-documented and neatly-packaged solution upstream to us (like, say, a logrotate recipe) we'll keep it in etc/ as an option for distro packagers. If Mark has any larger-context reason to overrule this it won't bother me any. But I doubt he will. -- Eric S. Raymond From esr at thyrsus.com Fri Aug 19 14:41:39 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 19 Aug 2016 10:41:39 -0400 (EDT) Subject: Python module installation Message-ID: <20160819144139.5C41813A0C5B@snark.thyrsus.com> This is heads-up to Amar. I need the waf recipe modified so that the contents of pylib is installed as a Python module named 'ntp'. (Right now this is for the packet module. I intend to move the ntpstats.py used by ntpviz there as well. Later there may well be an sntp client module added so we can replace ntpdig.) I've created a wscript file in that directory that is properly called by the top-level wscript. What I don't know is how to write the build production. Presently it's a stub. -- From verm at darkbeer.org Fri Aug 19 15:22:43 2016 From: verm at darkbeer.org (Amar Takhar) Date: Fri, 19 Aug 2016 15:22:43 +0000 Subject: Python module installation In-Reply-To: <20160819144139.5C41813A0C5B@snark.thyrsus.com> References: <20160819144139.5C41813A0C5B@snark.thyrsus.com> Message-ID: <20160819152243.GA6925@darkbeer.org> On 2016-08-19 10:41 -0400, Eric S. Raymond wrote: > This is heads-up to Amar. > > I need the waf recipe modified so that the contents of pylib is > installed as a Python module named 'ntp'. > > (Right now this is for the packet module. I intend to move > the ntpstats.py used by ntpviz there as well. Later there may > well be an sntp client module added so we can replace ntpdig.) > > I've created a wscript file in that directory that is properly > called by the top-level wscript. What I don't know is how to write > the build production. Presently it's a stub. OK, I'm busy with my day job right now but I can look at this tonight / tomorrow. I did start something to get this installing properly it won't take long to finish it. Amar. From fallenpegasus at gmail.com Fri Aug 19 16:52:13 2016 From: fallenpegasus at gmail.com (Mark Atwood) Date: Fri, 19 Aug 2016 16:52:13 +0000 Subject: stats directory is gone - now, how is logfile pruning done? In-Reply-To: <20160819030115.GA15248@thyrsus.com> References: <20160818141624.GA6606@thyrsus.com> <20160819002714.DA217406077@ip-64-139-1-69.sjc.megapath.net> <20160819030115.GA15248@thyrsus.com> Message-ID: I have no overrule on this point. Pray continue. And welcome back, Hal. On Thu, Aug 18, 2016 at 8:01 PM Eric S. Raymond wrote: > Heads up, Mark! Policy sanity check requested. > > Hal Murray : > > > > esr at thyrsus.com said: > > > This does, however, leave me with a question: How are we doing pruning > of > > > statfiles at this point? I seem to recall someone who is not me > working on > > > this. The facility, whatever it is, needs to be documented. > > > > Whatever you do, make sure it's easy to get the no-prune option. > > > > I suspect that's going to be a distro option and whatever you do will > just be > > an example. Most people won't care about how well their clock is > working as > > long at it works well enough so they don't have to pay any attention to > it. > > > > Debian sets things up so that /etc/cron.daily/ntp > > compresses things and only keeps the last week. > > I don't know why they did it that way rather than letting logrotate do > it. > > OK, you've told me the important thing: the NTP suite is not historically > expected to do statfile pruning itself. > > That's fine, then. My decision: We won't try taking the job over from > the distros - I'm quite happy to let this be somebody else's problem. > > OTOH. if someone pushes a well-documented and neatly-packaged solution > upstream to us (like, say, a logrotate recipe) we'll keep it in etc/ > as an option for distro packagers. > > If Mark has any larger-context reason to overrule this it won't bother > me any. But I doubt he will. > -- > Eric S. Raymond > _______________________________________________ > devel mailing list > devel at ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From esr at thyrsus.com Fri Aug 19 19:03:22 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 19 Aug 2016 15:03:22 -0400 Subject: Python module installation In-Reply-To: <20160819152243.GA6925@darkbeer.org> References: <20160819144139.5C41813A0C5B@snark.thyrsus.com> <20160819152243.GA6925@darkbeer.org> Message-ID: <20160819190322.GA27369@thyrsus.com> Amar Takhar : > OK, I'm busy with my day job right now but I can look at this tonight / > tomorrow. > > I did start something to get this installing properly it won't take long to > finish it. I've solved it already. Got some help from the waf author. -- Eric S. Raymond From esr at thyrsus.com Sun Aug 21 20:15:46 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 21 Aug 2016 16:15:46 -0400 Subject: Linux Journal article on NTPsec Message-ID: <20160821201546.GA14610@thyrsus.com> This will be published in Linux Journal, probably in October, possibly as the cover story. They asked me for cover concepts: I suggested either Salvador Dali's "The Persistence of Memory" (the one with the melting clocks) or the famous silent-film image of Harold Lloyd hanging from the hands of a tower clock. Mark has already reviewed it and suggested one very minor change. There's still time for corrections. Daniel, if you have harder numbers for how we've dodged CVEs, that 'graph could be updated. -- Eric S. Raymond -------------- next part -------------- = NTPsec - a secure, hardened NTP implementation = // This file is marked up in asciidoc == Introduction == Network time synchronization - aligning your computer's clock to the same Universal Coordinated Time (UTC) that everyone else is using - is both necessary and a hard problem. Many Internet protocols rely on being able to exchange UTC timestamps accurate to small tolerances, but the clock crystal in your computer drifts (its frequency varies by temperature) so it needs occasional adjustments. That's where life gets complicated. Sure, you can get another computer to tell you what time it thinks it is, but if you don't know how long that packet took to get to you the report isn't very useful. On top of that, its clock might be broken. Or lying. To get anywhere, you need to exchange packets with several computers that allow you to compare your notion of UTC with theirs, estimate network delays, apply statistical cluster analysis to the resulting inputs to get a plausible approximation of real UTC, and then adjust your local clock to it. Generally speaking you can get sustained accuracy to on the close order of 10 milliseconds this way, though asymmetrical routing delays can make it much worse if you're in a bad neighborhood of the Internet. The protocol for doing this is called "NTP", Network Time Protocol, and the original implementation was written near the dawn of Internet time by an eccentric genius named Dave Mills. Legend has it that Dr. Mills was the person who got a kid named Vint Cerf interested in this ARPANET thing. Whether that's true or not, for decades Mills was *the* go-to guy for computers and high-precision time measurement. Eventually, though, Dave Mills semi-retired, then retired completely. His implementation (which we now call "NTP Classic") was left in the hands of the Network Time Foundation and Harlan Stenn, the man Information Week feted as "Father Time" in 2015 <>. Unfortunately, on NTF's watch some serious problems accumulated. By that year the codebase was already more than a quarter-century old, and techniques that had been state-of-the-art when it was first built were showing their age. The code had become rigid and difficult to modify, a problem exacerbated by the fact that very few people actually understood the Byzantine time-synchronization algorithms at its core. Among the real-world symptoms of these problems were serious security issues. That same year of 2015, infosec researchers becan to realize that NTP Classic installations were being routinely used as DDoS amplifiers - ways for crackers to packet-lash target sites by remote control. NTF, which had complained for years of being underbudgeted and understaffed, seemed unable to fix these bugs. This is intended to be a technical article, so I'm going to pass lightly over the political and fundraising complications that ensued. There was, alas, a certain amount of drama. When the dust finally settled, a very reluctant fork of the Mills implementation had been performed in early June 2015 and named 'NTPsec', I had been funded on an effectively full-time basis by the Linux Foundation to be the NTPsec's architect/tech-lead, and we had both the nucleus of a capable development team and some serious challenges. This much about the drama I will say because it is technically relevant: One of NTF's major problems was that though NTP Classic was nominally under an open-source license, NTF retained pre-open-source habits of mind. Development was closed and secretive, technically and socially isolated by NTF's determination to keep using the BitKeeper version-control system. One of our mandates from the Linux Foundation was to fix this, and one of our first serious challenges was simply moving the code history to git. This is never trivial for a codebase as large and old as NTP Classic, and it's especially problematic when the old version-control system is proprietary with code you can't touch. I ended up having to heavily revise Andrew Tridgell's sourecepuller utility - yes, the same code that triggered Linus Torvald's famous public break with BitKeeper back in '05 - to do part of the work. The rest was tedious and difficult hand-patching with reposurgeon <>. A year later in May 2016 - far too late to be helpful - BitKeeper went open-source. == Strategy and challenges == Getting a clean history conversion to git took ten weeks and, gruelling as that was, it was only the beginning. I had a problem: I was expected to harden and secure the NTP code, but came in knowing very little about time service and even less about security engineering. I'd picked up a few clues about the former from my work leading GPSD <>, which is widely used for time service. About the latter, I had some basics about how to harden code - because when you get right down to it, *that* kind of security engineering is a special case of reliability engineering, which I *do* understand. But I had no experience at "adversarial mindset", the kind of active defense that good infosec people do, nor any instinct for it. A way forward came to me when I remembered a famous quote by C. A. R. Hoare: "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." A slightly different angle on this was the perhaps better-known aphorism by St.-Exupéry that I was to adopt as NTPsec's motto: "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." In the language of modern infosec, Hoare was talking about reducing attack surface, global complexity, and the scope for unintended interactions leading to exploitable holes. This was bracing, because it suggested that maybe I didn't actually need to learn to think like an infosec specialist or a time service expert. If I could refactor, cut, and simplify the NTP Classic codebase enough, maybe all those domain-specific problems would come out in the wash. And if not, then at least taking the pure software-engineering approach I was comfortable with might buy me enough time to learn the domain-specific things I needed to know. I went all-in on this strategy. It drove my argument for one of the very first decisions we made, which was to code to a fully modern API - pure POSIX and C99. This was only partly a move for ensuring portability; mainly I wanted a principled reason (one we could give potential users and allies) for ditching all the cruft in the codebase from the big-iron Unix era. And there was a *lot* of that. The code was snarled with portability #ifdefs and shims for a dozen ancient Unix systems: SunOS, AT&T System V, HP-UX, UNICOS, DEC OSF/1, Dynix, AIX, and others more obscure. All relics from the days before API standardization really took hold. The NTP Classic people were too terrified of offending their legacy customers to remove any of this stuff, but I knew something they apparently didn't. Back around 2006 I had done a cruft-removal pass over GPSD, pulling it up to pretty strict POSIX conformance - and nobody from GPSD's highly varied userbase ever said boo about it or told me they missed the ancient portability shims at all. Thus, what I had in my pocket was nine years of subsequent GPSD field experience telling me that the standards people had won their game without most Unix systems programmers actually capturing all the implications of that victory. So I decrufted the NTP code *ruthlessly*. Sometimes I had to fight my own reflexes in order to to do it. I too have long been part of the culture that says "Oh, leave in that old portability shim, you never know, there just *might* still be a VAX running ISC/5 out there, and it's not doing any harm." But when your principal concern is reducing complexity and attack surface, that thinking is wrong. No individual piece of obsolete code costs very much, but in a codebase as aged as NTP Classic the cumulative burden on readability and maintainability becomes massive and paralyzing. You have to be hard about this; it all has to go, or exceptions will pile up on you and you'll never achieve the mission objective. I'm emphasizing this point because I think much of what landed NTP Classic in trouble was not want of skill but a continuing failure of what one might call surgical courage - the kind of confidence and determination it takes to make that first incision, knowing that you're likely to have to make a bloody mess on the way to fixing what's actually wrong. Software systems architects working on legacy infrastructure code need this quality almost as much as surgeons do. The same applies to superannuated features. The NTP Classic codebase was full of dead ends, false starts, failed experiments, drivers for obsolete clock hardware, and other code that might have been a good idea once but had long outlived the assumptions behind it. Mode 7 control messages. Interleave mode. Autokey. An SNMP daemon that was never conformant to the published standard and never finished. Half a dozen other smaller warts. Some of these (Mode 7 handling and Autokey especially) were major attractors for security defects. As with the port shims, these lingered in the NTP Classic codebase not because they couldn't have been removed, but because NTF cherished compatibility back to the year zero and had an allergic reaction to the thought of removing any features at all. Then there were the incidental problems, the largest of which was Classic's build system. It was a huge, crumbling, buggy, poorly-documented pile of autoconf macrology. One of the things that jumped out at me when I studied NTF's part of the code history was that in recent years they seemed to spend as much or more effort fighting defects in their build system as they did modifying code. But there was one amazingly good thing about the NTP Classic code: that despite all these problems it *still worked*. It wheezed and clanked and was rife with incidental security holes, but it did the job it was supposed to do. When all was said and done and all the problems admitted, Dave Mills had been a brilliant systems architect and, even groaning under the weight of decades of unfortunate accretions, NTP Classic still functioned. Thus, the big bet on Hoare's advice at the heart of our technical strategy unpacked to two assumptions: (a) that beneath the cruft and barnacles the NTP Classic codebase was fundamentally sound, and (b) that it would be practically possible to clean it up without breaking that soundness. Neither assumption was trivial. This could have been the a priori *right* bet on the odds and still failed because the Dread God Finagle and his mad prophet Murphy micturated in our soup. Or, the code left after we scraped off the barnacles could actually turn out to be unsound, fundamentally flawed. Nevertheless, the success of the team and the project at its declared objectives was riding on these premises. Through 2015 and early 2016 that was a constant worry in the back of my mind. *What if I was wrong?* What I was like the drunk in that old joke, looking for his keys under the streetlamp when he's dropped then two darkened streets over because "Offisher, this is where I can see". The final verdict is not quite in on that question; as I write, NTPsec is still in beta. But, as we shall see, there are now (in August 2016) solid indications that the project is on the right track. == Stripping down, cleaning up == One of our team's earliest victories after getting the code history moved to git was throwing out the autoconf build recipe and replacing it with one written in a new-school build engine called waf (also used by Samba and RTEMS). Builds became *much* faster and more reliable. Just as importantly, this made the the build recipe an order of magnitude smaller so it could be comprehended as a whole and maintained. Another early focus was cleaning up and updating the NTP documentation. We did this before most of the code modifications because the research required to get it done was an excellent way way to build knowledge about what was actually going on in the codebase. These moves began a virtuous cycle. With the build recipe no longer a buggy and opaque mess, the code could be modified more rapidly and with more confidence. Each bit of cruft removal lowered the total complexity of the codebase, making the next one slightly easier. Testing was pretty ad-hoc at first. Around May 2016, for reasons not originally related to NTPsec, I became interested in Raspberry Pis. Then it occurred to me that they would make an excellent way to run long-term stability tests on NTPsec builds. Thus it came to be that the windowsill above my home-office desk is now home to six headless Raspberry Pis, all equipped with on-board GPSes, all running stability and correctness tests on NTPsec 24/7. Just as good as a conventional rack full of servers, but far less bulky and expensive! We got a lot done over our first eighteen months. The headline number that shows just how much was the change in the codebase's total size. We went from 227KLOC to 88KLOC, cutting the total line count by almost a factor of three. Dramatic as that sounds, it actually understates the attack-surface reduction we achieved, because complexity was not evenly distributed in the codebase. The worst technical debt, and the security holes, tended to lurk in the obsolete and semi-obsolete code that hadn't gotten any developer attention in a long time. NTP Classic was not exceptional in this; I've seen the same pattern in other large, old codebases I've worked on. Another important measure was systematically hunting down and replacing all unsafe C function calls with equivalents that can provably not cause buffer overruns. I'll quote from NTPsec's hacking guide: ------------------------------------------------------------------------ * strcpy, strncpy, strcat: Use strlcpy and strlcat instead. * sprintf, vsprintf: use snprintf and vsnprintf instead. * In scanf and friends, the %s format without length limit is banned. * strtok: use strtok_r() or unroll this into the obvious loop. * gets: Use fgets instead. * gmtime(), localtime(), asctime(), ctime(): use the reentrant *_r variants. * tmpnam() - use mkstemp() or tmpfile() instead. * dirname() - the Linux version is re-entrant but this property is not portable. ------------------------------------------------------------------------ This formalized an approach I?d used successfully on GPSD ? instead of fixing defects and security holes after the fact, constrain your code so that it *cannot have* entire classes of defects. The experienced C programmers out there are are thinking "What about wild-pointer and wild-index problems?" And it?s true that the achtung verboten above will not prevent those kinds of overruns. That's why another prong of the strategy was systematic use of static code analyzers like Coverity, which actually is pretty good at picking up the defects that cause that sort of thing. Not 100% perfect, C will always allow you to shoot yourself in the foot, but I knew from prior success with GPSD that the combination of careful coding with automatic defect scanning can reduce your bug load a very great deal. To help defect scanners do a better job, we enriched the type information in the code. The largest single change of this kind was changing int variables to C99 bools everywhere they were being used as booleans. Little things also mattered, like fixing all compiler warnings. I thought it was shockingly sloppy that the NTP Classic maintainers hadn?t done this. The pattern detectors behind those warnings are there because they often point at real defects. Also, voluminous warnings make it too easy to miss actual errors that break your build. And you never want to break your build, because later on that will make bisection testing more difficult. An early sign that this systematic defect-prevention approach was working was the extremely low rate of bugs we detected by testing as having been introduced during our cleanup. In the first fourteen months we averaged less than one iatrogenic C bug every ninety days. I would have had a lot of trouble believing that if GPSD hadn't posted a defect frequency nearly as low over the previous five years. A major lesson from both projects is that applying best practices in coding and testing really works. I pushed this point back in 2012 in my essay on GPSD for 'The Architecture of Open Source, Volume 2" <>; what NTPsec shows is that GPSD is not a fluke. I think this is one of the most important takeaways from both projects. We really don't have to settle for what have historically been considered "normal" defect rates in C code. Modern tools and practices can go a very long way towards driving those defect rates towards zero. It's no longer even very difficult to do the right thing; what's too often missing is a grasp of the possibility and the determination to pursue it. And here's the real payoff. Early in 2016, CVEs (security alerts) started issuing against NTP Classic that NTPsec dodged because we had already cut out their attack surface before we knew there was a bug! This actually became a regular thing, with the percentage of dodged bullets increasing over time. Somewhere, Hoare and St.-Exupéry might be smiling. The cleanup isn't done yet. We're testing a major refactoring and simplification of the central protocol machine for processing NTP packets. We believe this has already revealed a significant number of potential security defects nobody ever had a clue about before. Every one of these will be another dodged bullet attributable to getting our practice and strategic direction right. == Features? What features? == I have yet to mention new features because NTPsec doesn't have many; that's not where our energy has been going. But here's one that came directly out of the cleanup work... When NTP was originally written, computer clocks only delivered microsecond precision. Now they deliver nanosecond precision (though not all of that precision is accurate). By changing some internal representations we have made NTPsec able to use the full precision of modern clocks when stepping them, which can result in a factor 10 or more of accuracy improvement with real hardware such as GPSDOs and dedicated time radios. Fixing this was about a four-line patch. It might have been noticed sooner if the code hadn't been using an uneasy mixture of microsecond and nanosecond precision for historical reasons. As it is, anything short of the kind of systematic API-usage update we were doing would have been quite unlikely to spot the problem. A longstanding pain point we've begun to address is the nigh-impenetrable syntax of the ntp.conf file. We've already implemented a new syntax for declaring reference clocks that is far easier to understand than the old. We have more work planned towards making composing NTP configurations less of a black art. The diagnostic tools shipped with NTP Classic were messy, undocumented, and archaic. We have a new tool, ntpviz, which gives time-server operators a graphical and much more informative view of what's been going on in the server logfiles. This will assist in understanding and mitigating various sources of inaccuracy. == Where we go from here == We don't think our 1.0 release is far in the future - in fact, given normal publication delays, it might well have shipped by the time you read this. An early-adopter contingent - including at least one high-frequency-trading company for which accurate time is business-critical - is already happily using NTPsec for production. There remains much work to be done after 1.0. We're cooperating closely with IETF to develop a replacement for Autokey public-key authentication that actually works. We want to move as much of the C code as possible outside ntpd itself to Python in order to reduce long-term maintainance load. There's a possibility that the core daemon itself might be split in two to separate the TCP/IP parts from the handling of local reference clocks, drastically reducing global complexity. Beyond that, we're gaining insight into the core time-synchronization algorithms and suspect there are real possibilities for improvement in those. Better statistical filtering that's sensitive to measurements of network weather and topology looks possible. It's an adventure, and we welcome anyone who'd like to join in. NTP is vital infrastructure, and keeping it healthy over a time-frame of decades will need a large, flourishing community. You can learn more about how to take part at our project website <>>. == References == [bibliography] [[[FT]]] http://www.informationweek.com/it-life/ntps-fate-hinges-on-father-time/d/d-id/1319432[NTP's Fate Hinges On "Father Time"] [[[RS]]] http://www.catb.org/esr/reposurgeon/[reposurgeon] [[[GPSD]]] http://catb.org/gpsd/[GPSD] [[[AOS2]]] http://www.aosabook.org/en/gpsd.html[GPSD in AOS2] [[[NTPSEC]]] https://www.ntpsec.org/[Welcome to NTPsec] From hmurray at megapathdsl.net Sun Aug 21 23:17:33 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Sun, 21 Aug 2016 16:17:33 -0700 Subject: Big picture Message-ID: <20160821231733.32369406061@ip-64-139-1-69.sjc.megapath.net> Look ahead a year or two. Assume the NTPWG has agreed upon a good crypto layer and that we are happy with our code. What happens next? Somebody is going to need to deploy enough well run NTP servers. My straw man is that it will be roughly parallel to DNS. We will need a diverse collection of stratum 1 servers parallel to the DNS root servers. Then ISPs will have to run servers for their clients similar to the way many/most of them run caching DNS servers. Are any of the people you talk to thinking about that area? -- These are my opinions. I hate spam. From esr at thyrsus.com Mon Aug 22 01:07:51 2016 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 21 Aug 2016 21:07:51 -0400 Subject: Big picture In-Reply-To: <20160821231733.32369406061@ip-64-139-1-69.sjc.megapath.net> References: <20160821231733.32369406061@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160822010751.GC15767@thyrsus.com> Hal Murray : > Look ahead a year or two. Assume the NTPWG has agreed upon a good crypto > layer and that we are happy with our code. What happens next? Er, most likely I hand off to a successor who doesn't need to be a forensic systems architect (I'm grooming Daniel Franke for this) and go looking for another big codebase that badly needs to be fixed. -- Eric S. Raymond From hmurray at megapathdsl.net Mon Aug 22 05:36:06 2016 From: hmurray at megapathdsl.net (Hal Murray) Date: Sun, 21 Aug 2016 22:36:06 -0700 Subject: Wish list - hack to monitor a server Message-ID: <20160822053606.14F16406061@ip-64-139-1-69.sjc.megapath.net> I'm thinking of something that will monitor a NTP server and send mail/SMS/whatever when it finds troubles. I can see two types of trouble. One is when the server isn't responding. The other is that one or more of the upstream servers or refclocks it is using stops responding. There are lots of possible variations/options/parameters. This should probably wait until Eric gets the python library ready for use under ntpq. I mention it now because I thought of it and/or it might provide input to the requirements for that library. -- These are my opinions. I hate spam. From gem at rellim.com Mon Aug 22 18:31:27 2016 From: gem at rellim.com (Gary E. Miller) Date: Mon, 22 Aug 2016 11:31:27 -0700 Subject: Wish list - hack to monitor a server In-Reply-To: <20160822053606.14F16406061@ip-64-139-1-69.sjc.megapath.net> References: <20160822053606.14F16406061@ip-64-139-1-69.sjc.megapath.net> Message-ID: <20160822113127.0ad32300@spidey.rellim.com> Yo Hal! On Sun, 21 Aug 2016 22:36:06 -0700 Hal Murray wrote: > I'm thinking of something that will monitor a NTP server and send > mail/SMS/whatever when it finds troubles. I use Icinga2 (nagios fork) for that. NTPsec prolly just needs to document that existing usage. > I can see two types of trouble. One is when the server isn't > responding. The other is that one or more of the upstream servers or > refclocks it is using stops responding. I can see us including a nagio/icinga2 plugin for that, but the existing plugins have worked great for me for over a decade. Here are the existing plugins: /usr/lib64/nagios/plugins/check_ntp_peer /usr/lib64/nagios/plugins/check_ntp_time > There are lots of possible variations/options/parameters. Yeah. Here are the existing options. spidey ntpsec # /usr/lib64/nagios/plugins/check_ntp_peer Usage: check_ntp_peer -H [-4|-6] [-w ] [-c ] [-W ] [-C ] [-j ] [-k ] [-v verbose] spidey ntpsec # /usr/lib64/nagios/plugins/check_ntp_time Usage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o