Using Go for NTPsec

Eric S. Raymond esr at thyrsus.com
Tue Jul 6 19:12:41 UTC 2021


Hal Murray <halmurray at sonic.net>:
> 
> [timing of GC]
> 
> > which shows their meassured STW pauses are bounded to about 95% by 600us and
> > typically less than 400us. This is consistent with other reports I've seen,
> > and that's why I took 600us as a worst case STW we're likely to see. 
> 
> I didn't see any description about what was actually going on.  How many 
> threads?  How big are the stacks?  How big is the heap?  What are the right 
> parameters?

I don't know all those numbers yet.  But: given that NTPsec only
currently has 2 threads and our allocations are typically occuring one
second apart or less per upstream or downstream, I can't even
plausibly *imagine* a Raft implementation having lower memory churn
than we do.

> > There is one other that is more important: Rust does not have a stable API,
> > certainly not one that we can count on to be solid on decadal scales. Nor
> > does it have the kind of development culture that is conducive to API
> > stability over decades.  It's a very young language, still in "move fast and
> > break thing" mode. 
> 
> You may be right.  I haven't written/maintained enough code to confirm or deny 
> that.
> 
> The language and libraries/crates are both versioned.  It looks like a good 
> story to me.

It it, if we have a dev team that is able and willing to keep up with
Rust's change velocity.  The problem with that, though, is it would
tend to crowd out concentration on *our* issues.  Too much time
fighting instability and breaking changes in the toolchain means less
time actually spent on improving time sync and security and device
support.

The grief thousands of projects went through during the Python 2 to
Python 3 transition is an example of how draining and distracting
such a change can be. I have a major goal of not making choices that
buy that kind of trouble for our project.

Good versioning practice isn't enough to salvage the situation,
because it doesn't prevent breaking changes that could be huge
headaches.  Rust's past record doesn't conduce to optimism
on this score.  Admittedly they're getting better, but not as
far as I can tell better enough yet.

*Eventually* they'll get to where we need them to be, I think.  But I
don't foresee it happening within NTPsec's present planning horizon.

Rust's development culture would have to change a lot first - in
game-theory terms they'd need to shift from explore to exploit, and
start valuing API stability a lot more.

Two years ago, some core Rust people told me they weren't ready to
support decadal-scale stability yet.  This hasn't changed since.
They're not ready for it to change.  It's not even clear that they
should *want* to be ready yet; the Rust way of carving up the world
is so new and so radical that there is, I think, much more exploration
still to be done.

By contrast, the Go people went for a far more conservative design
approach and consciously made the decision to bake in a hard
forward-compatibility promise when they shipped 1.0.

Until I see an analogous change happen in the Rust world, I'm going to
consider Rust disqualified for our use.

> > Maybe. But I know from previous experience that trying to make major changes
> > to a program's architecture *while you're porting it to new language* is an
> > invitation to disaster.
> 
> > The only strategy that works is to do a stupid, literal, unidiomatic port
> > first, verify it, then clean it up and make it idiomatic. 
> 
> That would make sense if the structure of the starting code was reasonably 
> close to the final goal.  Does it hold if you are going to make major changes 
> to the architecture?

Yes, it does. My most recent serious learning experience along this
line was moving reposurgeon from Python to Go.

I knew I was going to need to change reposurgeon's architecture
significantly at some point.  The objective of the port was to
dramatically improve performance on large repositories, but ... the
collapse of Dennard scaling was a seious blocker.  I simply couldn't
count on getting better performace by changing the hardware underneath.
That meant the expensive parts (especially the Subversion stream
reader) needed to go concurrent.

I had 14 KLOC of algorithmnically dense Python to move (much less code
volunme than NTPsec).  Of course it was tempting to try to combine the
rewrite with the port, but I had previous experience telling me that
combination causes fatal complexity explosions. So I didn't try it.

I did the stupidest, most literal port I could imagine.  Didn't use a
single channel or goroutine (oh, exceot for the one read channel that
replaced catching signals - that was irreducible).  Once I had the
entire codebase in Go and it was passing the test suite, *then* I
changed it to exploit concurrency.

The positive takeaway is that this worked. I got the performance
gains I was looking for, and used them to make lifting the history
of GCC practical.

Even so, I had enough trouble with the second phase to convince me
that the port would have foundered if I had *combined* it with trying
to rethink the architecture.  I'm good, but I'm not *that* good - I'd
be surprised if anybody is.

This is why I have the same intention for NTPsec. "First make it work.
Then make it right.  Then make it fast."

> [Split out stuff so we can write the time-critical parts in C or Rust.]
> 
> > I want to stay away from mixing languages if at all possible.  The joints
> > between them are always *serious* defect attractors and major sources of
> > maintainence complexity. 
> 
> I envision the APIs being text strings over stdin/stdout.  I think they will 
> be simple enough so that the joints won't be a serious problem.  Put it on 
> your list of options in case you decide you have to do something about GC.

Noted.  That does reduce my reluctance somewhat.

> > Picking up new languages *is* one of my strong points, yet I found Rust
> > rebarbative in the extreme. This did nothing to make me optimistic about
> > finding developers to work in it. 
> 
> I think that is the root of our "discussion".  Your version of good/clean 
> focuses on the language/environment.  You are willing to (try to) dance around 
> run time quirks.  Mine focuses on the runtime.  I'm willing to struggle with 
> the language/environment.  Or at lease struggle some more until I learn 
> something critical.

No.  You think I didn't struggle with Go when I was doing the reposurgeon port?

Before that I had written exactly one Go program, loccount. 2.1KLOC -
near trivial.  Reposurgeon is extremely algorithmically dense -
porting it was *hard*, took me a year of work.

I'm willing enough to struggle with the language/environment.  Given
that you describe it as "not one of your strong points", I'm probably
more willing than you are.  That would be unsurprising, as in my past
experience I have been more willing to handle that kind of novelty
than almost anybody around me.

No, I'm pushing Rust away - and determined to exit from C - because of
reasons in the larger context. We need to get to a memory-safe language,
we need decadal stability, and we need one with a reasonably low
barrier to entry for new devs.

Rust fails two of those tests.  Go passes all of them.  If that means
we need to do some acrobatics to deal with GC-induced latency spikes,
that's a cost I'm willing to incur.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>




More information about the devel mailing list