Use of pool servers reveals unacceptable crash rate in async DNS
Eric S. Raymond
esr at thyrsus.com
Sun Jun 26 02:21:13 UTC 2016
Mark: Heads up! Policy issue. Important but not urgent.
Hal Murray <hmurray at megapathdsl.net>:
> esr at thyrsus.com said:
> > I think the hack is to force libgcc_s to be loaded early. I don't know how
> > to do that in waf.
> There are two problems in this area. One is the end-of-thread code not
> getting locked into memory. I think that is what you are running into.
> The other is a tangle of error handling on out-of-memory issues by things
> like pthread_create and DNS lookup. I think the latter end up with a retry
> error code. I think I fixed some/many of them to crash rather than retry on
> the assumption that memory wasn't going to get freed and I didn't know of any
> other reason to retry. But that was a long time ago (maybe pre fork) and I
> don't remember the details.
> I think we should copy the warmup code from ntp classic. It's basically an
> upstream bug. Warmup seems like a reasonable work around.
We could do that. But I'm opposed to the idea. Not because I think the
warmup code is of itself bad, but because adding complexity seems like
the wrong direction to go in general.
The project motto is "Perfection is achieved, not when there is
nothing more to add, but when there is nothing left to take away." I
didn't pick it out of a hat. I wasn't just quoting it as a tribal
shibboleth. I *meant* it, and I've acted on it to the project's
Given a choice, I will almost always opt for the fix that removes
complexity and code bulk even if it sacrifices a feature I consider
marginal. My being relentless about this is the direct reason we've
dodged so many CVEs; that is real-world feedback telling me to keep up
the simplifying pressure.
In this case, we have two possible complexity-reducing fixes. One is
to drop the memlock feature entirely. The other is to drop the
buggy homebrew asynchronous-DNS lookup from Classic and use libc's.
Before I will willingly sign off on any solution that adds code, someone
needs to explain to me why neither of those approaches will fly.
It could be, for example, that Daniel thinks we need memlocking for
crypto security. (I'm not going to buy "performance", not when modern
systems swap so seldom that many people have stopped bothering with
swap partitions.) But if so, I want to hear him explain that and
establish that the memory-locking code is worth its weight.
It could be that Mark judges there's a really important platform out
there that has POSIX threads but is non-libc, so getaddrinfo_a() is an
unacceptable port blocker that can be soilved with the homebrew code.
But if so, I want to hear him explain that and establish that the homebrew
lookup code is worth its weight.
Nothing that increases our defect rate gets to stay in purely on
historical inertia. Show me the use case, please.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the devel