Using Go for NTPsec

Tue Jul 6 16:44:05 UTC 2021

Hal Murray <halmurray at sonic.net>:
> You have a new toy.  The only tool needed is a simple lock.

Oh? What about the concurrent DNS thread we already have?

At this point I have two years of heavy experience in Go, so the toy
is no longer new. If a better upgrade from C existed, I would know
about it - Rust comes closest but, alas, doesn't make the cut yet,
though it might with five more years of maturity.

> There are N server threads.  They are each in a loop that does a recvfrom(), 
> fills in the reply, then a sendto().
> 
> Filling in the reply needs to access some data that is in globals in the 
> current code.  That needs a simple lock.  If the packet is authenticated, 
> accessing the key needs another lock.

You're thinking in C. This is a problem, because mutex-and-mailbox
architectures don't scale up well.  Human brains get overwhelmed at
about 4 mutexes; beyond that the proliferation of corner cases get to
be more than the meat can handle.

In obedience to "never write code that is as clever as you can
imagine, because you have to be more clever than that to debug it",
we'd really want to hold NTPsec down to 3 mutexes or fewer.

You may think you can keep a fully concurrentized NTPsec down to 3
mutexes, but I doubt it.  Actually already at 4 in the design you're
talking about, counting DNS and logging mutexes.

This is the universe's way of telling us we're going to need more
tractable concurrency primitives going forward.  Languages in which
"more tractable" has actually been implemented in a production-quality
toolchain that can meet our porting requirements are few and far
between.

In fact, as far as I can tell, that set is a singleton.

> >> You can't put it on an ouput queue.  Handing it to the kernel it is part of 
> >> the time critical region.
> > I meant output *channel*.  With a goroutine spinning on the end of it. 
> 
> Either I wasn't clear or you were really focused on using your new toy.
> 
> The case we are trying to protect from the GC is the transmit side.  The 
> critical chunk of code starts with putting a time stamp into the packet, 
> optionally adds the authentication, then sends it.  At that point, the packet 
> is gone.  There is nothing to put on a queue or into a channel.

Yes, we've apparently talking past each other. Let's try to fix that.

In Go (or any other language that supports Communicating Sequential
Processes and lightweight threads, which includes Rust), it's bad
style to use mutexes.  When you have a critical region, good style is
to make a service thread that owns that region, spinning forever
reading requests from a channel and shipping results to another
channel. Channels are thread-safe queues; they provide serialization,
replacing the role mutexes play in C.

Each Go program that uses concurrency is a flock of channel-service
threads flying in formation, tossing state around via channels.  It is
much easier to describe and enforce invariants for this kind of design
than for a mutex-and-mailbox architecture, because each service thread
is a serial state machine.  The major remaining challenge is avoiding
deadlocks.

What you're telling me (and I recognize it is correct) is that one
of our service threads has to have the sequence:

1. Wait on the inmput channel for a packet to send.
2. Turn off GC
3. Timestamp the packet.
4. Ship the packet
5. Turn on GC
6. Loop back to 1.

Look, ma, no mutex!

> > Do you know what our actual bottlenecks are? Have you [rofioled them?
> 
> Have I used Eric's favorite profiling tool?  Probably not.
> 
> But I think I have a pretty good idea of where the CPU cycles are going.  Who 
> do you think wrote all the timing hacks in the attic?

OK, one of the most useful things you could do, then, is write a whitepaper
describing where the bottlenecks are and why you think so.  That knowledge
needs to get out of your head to where other people can use it.

> I have a multi-threaded echo server and enough other hacks to drive it and 
> collect data.  It has a knob to inject spin between the recvfrom and sendto.  
> The idea is to simulate the NTP computations and/or crypto.  Do you want 
> graphs?

Yes, but it's not just me that neds them.  Anyone working on this code
needs to know where the bottlenecks are.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>