My task list

Hal Murray hmurray at
Thu Jun 30 07:29:55 UTC 2016

> 1. Try replacing our buggy async-DNS code with the c-ares library.

You keep calling the existing code "buggy".  Is that correct, or are you just 
being sloppy since you don't like it (perhaps justifiably) and it has 
triggered bugs/quirks in other parts of the system.

As far as I can tell, our code is innocent.  The recent troubles are some 
combination of libc/memlockall and pthreads not working well together.  We 
just happened to trigger it reliably enough to cause troubles but not 
reliable enough to make testing simple.

> 2. If that succeeds, reinstate memlocking long enough to check if the
>    crash bug recurs.  If it doesn't, leave memlocking in.

The old memlock code, or a simplified lock-everything (no parameters) version?

If any new code uses threads, it's going to have the same problem.  I'd vote 
against restoring the old code until you have figured out how to test it.

> 3. Collect the results from my first profiling runs, now about 14 days of
> data
>    Learn how to graph and interpret them. 

You might do that first since you will probably want to tweak something and 
collect more data.

Data for a day will tell you most of what you will ever get.  If you have 
lots of data, then you have to scan it looking for glitches.

Consider bumping the clock and watching it recover.  (util/bumpclock)  There 
are two interesting cases.  One is a big bump so it will "step" the clock to 
recover.  The other is a small bump so it will slew (slowly) to recover.  The 
split is 128 ms.  So I'd try 200 ms and 100 ms.

> 5. Do the cleanup required to get the code compiling under -std=c99. 

What does that involve?

TESTFRAME is missing.  How about we both clear our schedules and desks and 
give it another try?  How about next Wed?

These are my opinions.  I hate spam.

More information about the devel mailing list