Use of pool servers reveals unacceptable crash rate in async DNS

Sat Jun 25 18:04:09 UTC 2016

esr at thyrsus.com said:
> 1. Apply Classic's workaround for the problem, which I don't remember the
> details of but involved some dodgy nonstandard linker hacks done through the
> build system.  *However, I did not trust this method when I understood it.*
> It seemed sure to cause porting difficulties and is inherently fragile. 

kurt at roeckx.be said:
> If it's the one I'm thinking about, I think the solution is to remove the
> locking of memory. 

We may be confusing several bugs.

There was a problem with locking stuff into memory.  Some library needed by 
end of thread processing wasn't loaded yet and things worked out such that 
with the default memory 32 bit systems worked but 64 bit systems didn't have 
enough room.

I think one solution was to create a dummy thread early on to get that module 
loaded.  Or disable memory locking, or tell it to use more memory, or ...

> 2. Fix the actual problem. Well, that'd be nice, but Hal looked into it
> months ago and said he understood it but couldn't generate a fix. IIRC, he
> said it needed a full rewrite.  That tells me the code is probably not
> salvageable. 

I don't remember that part.  I use the pool command on several systems.  I 
haven't seen a crash in ages.

There was another interesting problem in this area.  It was a bug in 
FreeBSD's trap handler.  ntpd managed to trigger it consistently.

.....

> I favor #4.

I favor understanding things more.

Can you get a stack trace?

-- 
These are my opinions.  I hate spam.