[Git][NTPsec/ntpsec][master] Added section on async DNS to the tour document.

Mon Aug 29 20:03:22 UTC 2016

Eric S. Raymond pushed to branch master at NTPsec / ntpsec


Commits:
743dd381 by Eric S. Raymond at 2016-08-29T16:02:43-04:00
Added section on async DNS to the tour document.

- - - - -


2 changed files:

- devel/hacking.txt
- devel/tour.txt


Changes:

=====================================
devel/hacking.txt
=====================================

--- a/devel/hacking.txt
+++ b/devel/hacking.txt
@@ -12,6 +12,9 @@ documented here.
 
 == General notes ==
 
+If you want to learn more about the code internals, find tour.txt.
+This document is about development practices and project conventions.
+
 === Build system ===
 
 The build uses waf, replacing a huge ancient autoconf hairball that


=====================================
devel/tour.txt
=====================================
--- a/devel/tour.txt
+++ b/devel/tour.txt
@@ -170,4 +170,50 @@ when a specific event occurs on a file descriptor or after a timeout
 has been reached.  Other NTP programs, notably ntpd and ntpq, could
 use it, but would require serious rewrites to do so.
 
+== Asynchronous DNS lookup ==
+
+There are great many complications in the code that arise from wanting
+to avoid stalling the main loop while it waits for a DNS lookup to
+return. And DNS lookups can take a *long* time.  Hal Murray notes that
+he thinks he's seen 40 seconds on a failing case.
+
+One reason for the complications is that the async-DNS support seems
+somewhat overengineered.  Whoever built it was thinking in terms of a
+general async-worker facility and implemented things that this use
+of it probably doesn't need - notably an input-buffer pool.
+
+This code is a candidate to be replaced by an async-DNS library such
+as cAres. One attempt at this has been made, but abandoned because
+the async-worker interface to the rest of the code is pretty gnarly.
+
+The DNS lookups during initialization - of hostnames specified on the
+coomand line of ntp.conf - could be done synchronously.  But there are
+two cases we know of where ntpd has to do a DNS lookup after its
+main loop gets started.
+
+One is the try again when DNS for the normal server case doesn't work during
+initialization.  It will try again occasionally until it gets an answer.
+(which might be negative)
+
+The main one is the pool code trying for a new server.  There are
+several possible extensions in this area.  The main one would be to verify that
+a server you are using is still in the pool.  (There isn't a way to do
+that yet - the pool doesn't have any DNS support for that.)  The other
+would be to try replacing the poorest server rather than only
+replacing dead servers.
+
+As long as we get packet receive timestamps from the OS, synchronous
+DNS delays probably won't introduce any lies on the normal path.  We
+could test that by putting a sleep in the main loop.  (There is a
+filter to reject packets that take too long, but Hal thinks that's
+time-in-flight and excludes time sitting on the server.)
+
+There are two known cases where a pause in ntpd would cause troubles.
+One is that it would mess up refclocks.  The other is that packets
+will get dropped if too many of them arrive during the stall.
+
+This probably means we could go synchronous-only and use the pool
+command on a system without refclocks.  That covers end nodes and
+maybe lightly loaded servers.
+
 // end



View it on GitLab: https://gitlab.com/NTPsec/ntpsec/commit/743dd381b92a6193198ebd54ba483e2bb753f75b
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ntpsec.org/pipermail/vc/attachments/20160829/c44f9bcc/attachment.html>