How not to design a wire protocol
Eric S. Raymond
esr at thyrsus.com
Thu Mar 7 06:56:55 UTC 2019
Hal Murray <hmurray at megapathdsl.net>:
> Interesting, but...
>
> Why isn't refclock_gpsd a good example?
The protocol isn't bad. The implementation is kludgy and shoddy enough
to make me want to raze it to the ground and start over. Maybe I
ought to do exactly that...
> Is there a good package for working with JSON?
Not certain how inclusive a sense of "working" you mean, but it tends
to be dead easy to encode/decode JSON in any language with a
first-class map/dict and late binding. Python, in particular; it's
almost absurdly easy there and there's excellent standard-library
support.
Go, alas, is less good than most in that class for
type-safety-enforcement reasons - I keep hearing people complain the Go
support is awful but I think it's merely clumsy compared to Python.
Giving up late binding has costs and no question this is a place they
chafe a bit more more than usual.
JSON is C is normally *very* awkward, vastly worse than Go, because
full JSON parse needs to do fancy dancing with dynamic memory to
handle heterogenous arrays. The good news is that I ran into this
problem back in 2009 on GPSD, solved it then and spun it out as a
little library, microjson.
In order to get away with using only fixed-extent storage and having
near-constant runtime (no system calls, no context switches),
microjson only implements a subset of JSON. The main restriction is
that arrays have to be type-homogenous. Also you can't pass the JSON
null value. But that subset was rich enough for GPSD and is is *plenty*
rich enough to support NTP, which has a less demanding type ontology.
So yes, we have good tools in any of our plausible implementation
languages. If we didn't, I'd write them and they'd Just Plain Work.
Center of my wheelhouse sort of thing.
> I'm not convinced that NTP is a good example. Sure, in hindsight, we can see
> some problems, but it's not obvious to me that JSON is the answer. Are there
> any interesting alternatives?
I've actually been thinking about that fairly hard. And not really,
not among the established metaprotocols. XML would be absurdly heavy.
YAML would be pointless; we don't need any of the parts of YAML that
aren't JSON.
We can only do better than JSON, relative to the existring
altermatuives, by writing our own tag-value-pair serialization format
that somehow improves on it.
To anyone with a sense of the design space around this class of
protocols it would seem that any such attempt is doomed. That is,
that anything functionally JSON-like necessarily going to look so much
like JSON that making it *not* syntactically coincident with JSON
would be pointless wankery. The glyphs in your "improved"
serialization might change but the grammar is pretty strongly
constrained by what you're trying to do.
And I think this is almost true! It's true enough that JSON basically
stomped every competing idea about how to ASCII-serialize datagrams
flat between 2006 when it came out and about 2010.
But note that I said "almost". I have been kicking around the idea
that we might do *less* - throw away the parts of JSON we don't need.
Say we change just one premise: tag values cannot have whitespace or
non-alphamerics in them. Then we throw away the delimiters we don't
need any more. The JSON
{"class":"TPV","time":"2010-04-30T11:48:20.10Z","ept":0.005,
"lat":46.498204497,"lon":7.568061439,"alt":1327.689,
"epx":15.319,"epy":17.054,"epv":124.484,"track":10.3797,
"speed":0.091,"climb":-0.085,"eps":34.11,"label":"Ketchikam",
"accurate":true}
becomes "ISON":
{class:"TPV",time:2010-04-30T11:48:20.10Z,ept:0.005,
lat:46.498204497,lon:7.568061439,alt:1327.689,
epx:15.319,epy:17.054,epv:124.484,track:10.3797,
speed:0.091,climb:-0.085,eps:34.11,label:"Ketchikam",
accurate:t}
Maybe we even throw out the commas:
{class:"TPV" time:2010-04-30T11:48:20.10Z ept:0.005
lat:46.498204497 lon:7.568061439 alt:1327.689
epx:15.319 epy:17.054 epv:124.484 track:10.3797
speed:0.091 climb:-0.085 eps:34.11 label:"Ketchikam"
accurate:t}
I'm still not sure this "ISON" is a good tradeoff. The packet gets a little
shorter and lighter, the parsing a touch faster; that's good. We lose JSON
compatibility; that's bad.
But only a *little* bad, as in tolerably little for what we get. There
are deplyments (web microservices for example) where deviating from
JSON compatibility would be nucking futs. This isn't one of them.
Outside of monitoring tools like wireshark that are in the business of
being adaptable to odd formats, nothing speaks NTP packets other than
NTP itself.
Well, you *did* ask...
> The complexities of changing NTP's packet format are partly due to it's
> success. If it wasn't so widely deployed there would be less pressure for
> backward compatibility.
>
> Being able to change the wire packet format doesn't solve the problem. You
> still have to write the code to do the right thing with the new format.
I've scoped the job. It's not hard. One marshaller, one unmarshaller;
the existing parsed-packet structure at one end, JSON or ISON at the
other. If we use microjson it's almost trival. If we do ISON I take a
copy of microjson.c and carve away everything that's not ISON; it
probably drops back to about 300 LOC.
> One of the areas on my list of hard unsolved computer science problems is how
> to upgrade a popular protocol. It's usually easy to add new features. It's
> hard to get rid of the old clients. There is a chicken/egg in there. If it's
> hard to get rid of them, then support lingers on. If enough support is
> available there is not enough incentive for people running old software to
> upgrade.
Ah. But I've pulled off this exact thing once before, upgrading the old bad
pre-JSON GPSD protocol. I know how to do it!
> Consider SMTP. No binary packing there, but plenty of complexity and quirks.
Yes. SMTP is a excellent demonstration, up there with NME0183, that you can
get one that aspect of good protocol design right and still botch others enough
to produce a mess with poor discoverability. Sad but true.
> How much of this problem is changing skill sets?
I'm not sure of the referent of "this" in your sentence.
What I do know is that for someone with my skills and background, none
of what I've been talking about doing is actually difficult. Yeah,
you have to do it carefully and write your unit tests, but there isn't
any real depth here. These are solvable problems; I've solved some
of them before.
(Note: there might have been depth if I hadn't had the basic insight
about how to constrain JSON so the recognizer can run really tight and
light back ten years ago, and had to invent and debug it now on the
fly. But I did, so we're fine.)
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.
More information about the devel
mailing list