Technical strategy and performance
Eric S. Raymond
esr at thyrsus.com
Thu Jun 30 11:53:20 UTC 2016
Hal Murray <hmurray at megapathdsl.net>:
> Removing cruft is good. Removing features is not. There is a trade off
> between the cruftiness of the code and the importance of any features it
Of course. Exercising that judgment is part of what LF is paying me to
do; it's part of what "technical lead" means. Part of Mark's job is
to check my judgment from a point of view that is more clued in to the
corporate and political context of our funding. Part of *your* job,
and the reason I described you as "voice of wisdom" on the website, is
to check my judgment from a position of greater domain expertise and
experience than I have.
I think the three of us are a very well-balanced steering committee and a
good bet to get the tradeoffs right when we put our heads together. That
is, one software architect and one domain expert and one business-case
expert seems pretty close to optimal, especially since none of us
are clueless in each others' wheelhouses.
> I seem to be the only one who occasionally pushes back when you hint at
> removing stuff.
You're the only person who makes a regular practice of it. (Joel popped
up once to object when I proposed ditching the Windows code.) That's fine,
it's part of what I expect from you.
> I can't tell if I'm making the right amount of noise or not
> enough or too much.
Well, in principle, I can't tell that either.
On the one hand, if you *never* objected to a removal, I'd figure I
wasn't being bold enough in my code reductions. On the other, if you
were constantly saying "bad idea!" and urging me to slow down, I'd
figure it was time to cool my jets a little. Occasional pushback,
like explaining why I shouldn't drop the hpgps driver or griping about
memlock, seems to me like a healthy situation with both of us doing
exactly what we should.
In general, I experience your feedback as valuable and constructive,
and put a great deal of trust in your judgment. Please don't lose
sight of that when I argue with you - a certain amount of wrangling is
part of the process.
Of course we could both be modeling the problem space wrong. But
there are many reasons I like having Mark running things, and a major
one is that I think he is very competent to make final decisions if
you and I are deadlocked or thrashing. (Something I don't consider to
have really happened yet.)
> Most of the cruft you remove looks like progress to me,
> but I can't tell if/when you are going too far. It's a judgment call.
> Sometimes I don't care much. Sometimes I do.
In principle I can't tell with 100% certainty either. But I can't let
that paralyze me. The pressures are real, the decisions have to be
made, I'm on the spot to make them (or at least do the technical
triage, expecting you and Mark to second-guess me sometimes), and I do
so knowing that the odds are I will sometimes err. That's OK. Errors
can be fixed and I do trust the two of you to help me course-correct.
In the absence of objective certainty about what choices will lead to
what real-world outcomes, all we can do is watch the reality feedback
and pay attention to the quality and health of our decision-making
process. I think the feedback says we're doing well on all counts.
The only case that troubles me, looking back, is ntpdate. I still
think I most likely did the right thing to replace that awful code
with an emulation, but if anyone asked where it is least unlikely that
I have messed up I would put that at the top of the list.
> One of the complications for this case is that we don't have a good way to
> test things. This feels like the sort of problem that might come back as a
> hard to debug example way off in a far away datacenter where it would be even
> harder to debug. I don't like that sort of problem so I'm probably willing
> to put up with a bit of cruft in the code in order to reduce the risk.
That is a completely reasonable concern and a rational way to address
it. The fact that I push hard *against* cruft in the code (and that
this remains the right thing for me to be doing from my center in
software engineering) does not mean I fail to recognize the merit in
I guess a point I should emphasize again is that my deletion of the
memlock code doesn't necessarily mean I have a final position on how
we resolve this issue. It would have back, back when reversion was a
more costly and uncertain operation, but because git is git I don't
have the incentive to dig in my heels about it that I would have had
in former times.
> You haven't convinced me that modern hardware will make this problem go away.
> Yes, it will reduce it, but that also makes it harder to test. Your comment
> about no swap space was timely. I lost a cron job a few days ago because it
> ran out of memory. I don't know enough about modern data center operations.
> On VM systems, they charge for memory. ...
Again, fair points. But one benefit of admins often running without
swap space is that OOM conditions get really obvious. Thus, I have more
confidence than you that we'll get prompt feedback if we make the
> Did you consider simplifying things rather than removing everything? (Sorry
> for not suggesting this sooner.) Most of the cruft was in figuring out how
> much to lock. Would locking everything be simple enough?
Yes, I have considered this. Matt Selsky pointed me at a code model in chrony.
This is a good time to put that in the record. From #ntpsec:
01:33:45 selsky | esr: chrony does the mlockall() stuff for clock stability.
| it was added in 2009 https://git.tuxfamily.org/chrony/chrony.git/commit/?id=35e662d810290b43e98e436f8128eddc72b5123d#l200
| they set the limits to unlimited instead of making the user
| pick a value, as Classic does
05:39:12 esr | selsky: Interesting. Does chrony do async DNS? and if so,
| did they run into the same conflict?
05:41:43 esr | Hm. "You don't need it unless you really have a requirement
| for extreme clock stability."
05:42:09 esr | I wonder what their predicate for "extreme" is?
09:25:59 selsky | esr: yes, chrony supports async DNS via it's own code.
| see https://git.tuxfamily.org/chrony/chrony.git/tree/nameserv_async.c
> I thought there was a command line switch to use the real-time scheduler but
> I can't find it. If it's there, it might be cruft to clean up. If it's not
> there, it might be a good feature. There would be complications with lots of
> traffic locking up the CPU.
There is no such switch. I had already thought of swiping the implementation
My current plan, assuming all goes well at each step, is
1. Drop in c-ares
2. Restore the Classic memlocking code
3. Take another look at the chrony patch to see if we can and should adopt
the chrony behavior and their setpriority option.
You didn't hear about 3 before because my knowledge of the chrony option
is very recent.
> There is another interesting consideration when using old hardware. They
> take a lot of power. At some point, it's cheaper to buy new gear that
> doesn't use as much power and has more memory while you are at it. I
> computed the pay back time once, and it seemed like a good excuse to get some
> new toys. The next time I did the calculations, I got a an answer I didn't
> like as much.
Indeed. You might find this amusing:
Last night I bought another $79 Jetway brick from Phil that he had sitting
awround unused. I'm going to replace my wife Cathy's tower PC with
it, again with the objective of a relatively prompt payback from cutting
the power bill. (Reducing fan noise and the longer expected service life
from a no-moving-parts system also play into the calculation.)
But at any sigficantly higher price point it would have made more sense
to wait. As it is I'm wincing just a little at what the SSD is going to
cost. I certainly can't afford to match the terabyte of spinning rust
in her tower; fortunately she's only using 80GB of it.
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
More information about the devel