Use of pool servers reveals unacceptable crash rate in async DNS

Eric S. Raymond esr at
Mon Aug 29 19:03:40 UTC 2016

Processing old mail...

Hal Murray <hmurray at>:
> > I believe you're right that these platforms don't have it.  The question is,
> > how important is that fact?  Is the performance hit from synchronous DNS
> > really a showstopper?  I don't know the answer. 
> There are two cases I know of where ntpd does a DNS lookup after it gets 
> started.
> One is the try again when DNS for the normal server case doesn't work during 
> initialization.  It will try again occasionally until it gets an answer. 
> (which might be negative)
> The main one is the pool code trying for a new server.  I think we should be 
> extending this rather than dropping it.  There are several possibles in this 
> area.  The main one would be to verify that a server you are using is still 
> in the pool.  (There isn't a way to do that yet - the pool doesn't have any 
> DNS support for that.)  The other would be to try replacing the poorest 
> server rather than only replacing dead servers.
> DNS lookups can take a LONG time.  I think I've seen 40 seconds on a failing 
> case.
> If we get the recv time stamp from the OS, I think the DNS delays won't 
> introduce any lies on the normal path.  We could test that by putting a sleep 
> in the main loop.  (There is a filter to reject packets that take too long, 
> but I think that's time-in-flight and excludes time sitting on the server.)
> There are two cases I can think of where a pause in ntpd would cause 
> troubles.  One is that it would mess up refclocks.  The other is that packets 
> will get dropped if too many of them arrive.
> I think that means we could use the pool command on a system without 
> refclocks.  That covers end nodes and maybe lightly loaded servers.
> -------
> It's worth checking out the input buffering side of things.  There may be 
> some code there that we don't need.  I think there is a pool of buffers.  
> Where can a buffer sit other than on the free queue.   Why do we need a pool?

The project has more important priorities than chasing this down.  But: I have
edited this text, adding a few details I have learned since, into a new
section for the internals tour (devel/tour.txt).  That will give somebody
a better-than-nothing place to start if we ever again try something like
the cAres replacement.
		<a href="">Eric S. Raymond</a>

More information about the devel mailing list