My task list
Eric S. Raymond
esr at thyrsus.com
Thu Jun 30 13:02:52 UTC 2016
Hal Murray <hmurray at megapathdsl.net>:
> > 1. Try replacing our buggy async-DNS code with the c-ares library.
> You keep calling the existing code "buggy". Is that correct, or are you just
> being sloppy since you don't like it (perhaps justifiably) and it has
> triggered bugs/quirks in other parts of the system.
You told me it was buggy yourself, some months ago. Something about the
handling of the ring buffers not being right in a way you couldn't see
a fix for. If that got repaired it's news to me.
I'm aware that this is a separate issue from the mlockall-threads mess.
But it's enough reason for me to distrust the code and want to get rid of
Besides, I think farming the async-DNS support out to people who specialize
in maintaining that and have a track record at it is a good idea. I might
want to do it even if I weren't suspicious of ours, just to reduce the
KLOC we have to maintain.
> > 2. If that succeeds, reinstate memlocking long enough to check if the
> > crash bug recurs. If it doesn't, leave memlocking in.
> The old memlock code, or a simplified lock-everything (no parameters) version?
I'll test the old code first, and if that fails certainly try chrony-like
> If any new code uses threads, it's going to have the same problem. I'd vote
> against restoring the old code until you have figured out how to test it.
Well, first, I don't consider that a given. Maybe they have a workaround
that we wouldn't have to maintain. If this is is really a general problem,
their odds of having had to deal with it seem pretty high.
Second, I didn't have to figure out how to test it. The bad
combination crashes *very* frequently on the Great Beast. Like, every
three or four minutes if I run it that often.
> > 3. Collect the results from my first profiling runs, now about 14 days of
> > data
> > Learn how to graph and interpret them.
> You might do that first since you will probably want to tweak something and
> collect more data.
Yeah, on the other hand c-ares is likely to be a fast fix for a real
problem, while the data reduction is going to be several days of
exploration. (I have new tools to learn to even start)
> Data for a day will tell you most of what you will ever get. If you have
> lots of data, then you have to scan it looking for glitches.
> Consider bumping the clock and watching it recover. (util/bumpclock) There
> are two interesting cases. One is a big bump so it will "step" the clock to
> recover. The other is a small bump so it will slew (slowly) to recover. The
> split is 128 ms. So I'd try 200 ms and 100 ms.
These seem like good advice of exactly the kind I expect from you. Will do.
> > 5. Do the cleanup required to get the code compiling under -std=c99.
> What does that involve?
Getting rid of some GNUisms, notably the u_long/u_int/u_short typedefs
that NTP uses a lot. That's a simple change that touches a zillion files
in a zillion placed - huge but dumb.
It's either that or finding a better way to conditionalize the definitions. Right
#if _XOPEN_SOURCE >= 600
* Supply GCCisms that stop being visible if we tell it we need the
* prototype for strptime(3).
typedef unsigned long u_long;
typedef unsigned short u_short;
typedef unsigned int u_int;
Ideally I'd just write something like
#if (_XOPEN_SOURCE >= 600) || defined (STD_C_99)
but the right predefine doesn't seem to exist. I'm still looking.
> TESTFRAME is missing. How about we both clear our schedules and desks and
> give it another try? How about next Wed?
That might be good timing. I have been contemplating another whack at it,
the magic-address elimination patches were partly a way of getting close
to that code again.
Maybe you don't know about those - I'm not sure I discussed them here and
I sometimes forget you don't watch our other channels. I have *entirely*
confined the 127.127.t.u assumption about clock addresses to the config
parser. Doing this required some changes to ntp_io.c and ntp_proto.c that
are near the hairball.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the devel