Go vs GC

Wed Jun 30 02:54:19 UTC 2021

Hal Murray <halmurray at sonic.net>:
> > Well, first, the historical target for accuracy of WAN time service is more
> > than an order of magnitude higher than 1ms.
> 
> Time marches on.  We need to do better today, much better.
> 
> NTP is used on LANs.

Then we'll need to go to watching for GC pauses and skipping samples that might have
been distorted by them.

> > turning GC off
> 
> Is that lightweight or heavyweight?
> 
> How does that interact with threads?

It's a fast operation, if that's what you mean.  The way Go GC works
requires that there is only one GC-enable flag, not one per
thread. The flag tells the Go runtime whether or not to GC when the
normal memory-usage threshold is reached.

> What happens if there are lots of threads and they are all turning
> it off/on very frequently and probably overlapping?

That flag has to be protected by a mutex, and you have whatever value
happened to be set last regardless of how many threads are running.

If we think contention for that lock is going to be an issue, there's
a pretty standard and simple way of dealing with it using an auxiliary
semaphore.

> I'm assuming the mainline server path won't require any allocations
> or frees.  Total CPU time to process a simple request is under 10
> microseconds.

The main source of memory churn is going to be allocations for
incoming packets, and deallocations when they're no longer referenced
anf get GCed. Allocations are fast.  GC is slow, but isn't performed
very often.

> Is there a subset of Go that doesn't use GC?  Or someting like that.

Not really.  If you want to not use GC, you turn GC off.  Then everything
works as it normally does but your mnemory usage grows without bound
until you re-enable GC, which could trigger an immediate GC sweep.

I analyzed this years ago and discovered two kinds of code span where
unexpected latency spikes could mess things up.

One is right around where the adjtimex call or equivalent is done.
That's a very narrow code section that's going to run in near
constant time and not do any allocations; we can guard it just by
turning GC off at the start of the span and on at the end so that any
other threasd that *is* doing allocations cannot induce a latency spike
during the critical section.

The other is during sample collection from local refclocks.  That's a
little trickier because the read from device is a blocking operation
that can and will do memory allocation.  I think what we have to do
in that case is take a timestamp before the read, then after it check
to see if there was a GC between that timestamp and now, and if so discard
the sample.

Outside those places the code is not really stall-sensitive because
all the data flying around has enough timestamping.

With these mitigation measures I think performance can be expected to
be C-like, except that one in a great while a GC stop will be detected
to have occured during refclock sampling and cause a that sample to
get tossed out.

I say "once in a great while" because a program with ntpd's memory
usage pattern is not going to trigger GCs very often. Most of the passes through
critical regions won't collide with a GC latency spike. We can log
these exceptions to check, of course.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>