argparse vs getopt

Sat Jun 10 13:45:17 UTC 2017

Ian Bruene via devel <devel at ntpsec.org>:
> First: I am not considering performance here *whatsoever*, even if there
> were a meaningful difference, which I doubt, option parsing happens once
> during program startup, and ntpq doesn't need high speed anyway.

And you should not be.  See Knuth's dictum: Premature (performance)
optimization is the root of all evil.

Less poetically: *until you have measured*, paying complexity to get
anticipated performance gains is almost always a bad deal - increased
bug load due to complexity tends to swamp those gains. Human intuition
about performance tuning in programs is notoriously poor, and does not
improve very much with experience.  Therefore: get it working, and if
it's slow use profiling to identify the bottlenecks *before* you tune.

The above is one of the fundamental right practices of software
engineering.  You may already know this, but it bears reinforcing
because a lot of programmers talk ike they know it and then fail
to follow through.  Don't be that guy.

> Advantages of getopt
>     getopt is simpler, it only needs argv + some definitions fed into a
> single function and you get parsed options out the other end.
>     The definitions are themselves simpler; just a string of the short
> arguments, and a list of strings for the long arguments.
>     Because of that, the "setup" such as it is, is very compact.
>     getopt results are simpler, just a list of (option, value) pairs that
> are easy to loop through.
>     Since getopt does not handle usage notes the usage string will be
> defined in one place, and formatted exactly as the author intends.
>     Generally speaking getopt handles basic option parsing and no more. This
> lack of options means there is no need to tell it which bells, whistles, and
> gongs should or should not be used.
> 
> Advantages of argparse
>     All of the relevant information about an option is defined in the same
> place, this includes both short and long forms, casting, repetition, and
> usage string. This is a clear win for self-documentation despite being more
> verbose.
>     argparse can automatically create the usage note from the information
> given to it in the definitions.
>     The previous features result in a Single Point of Truth.
>     While argparse's setup is far longer it is visually simpler than the big
> bag of bytes that getopt wants.
>     argparse can cast and sanity check inputs where relevant.
>     argparse can provide default values where relevant.
>     argparse returns its results in a form that is more complicated, but
> does not require a loop to find and assign from.
>     argparse supports options with optional values. The whole reason this
> thing started.

There is one thing you have failed to check: whether argparse is
portable to the oldest Python we support.

The only claim in the above I find questionable is "[argparse] is
visually simpler than the big bag of bytes that getopt wants. Maybe;
but that's misleading if the "visual simplicity" impairs readability,
and I find argparse has a problem here.  Getopt code may be a bigger
bag of bytes, but it has the Python virtue of of explicitness - you
can see all the way down, there's little or no in the way of hidden magic
and odd side effects.

The rest of your observations about both getopt and argparse are
certainly on target.

> The general tradeoff between getopt vs argparse would appear to be a matter
> of option complexity: programs with few, simple options should use getopt.
> Programs which have many options, or options with complex argument or
> exclusivity requirements should use argparse as they will have lower overall
> complexity, and will also be easier to read.
> 
> TL;DR: if the program is xkcd 1168 compliant use argparse.

I think you are reasoning about the problem in a basically correct way
qualitatively speaking. As noted above, I question the "easier to
read".  So while the form of your utility function is appropriate, you
may have mis-evaluated one of the inputs.

> As it currently stands I believe that ntpq is small enough to be below the
> complexity crossover point. Of the 14 currently existing options 6 are
> switches, 2 are knobs that take ints, 2 are for the auth system one taking
> an int the other a filename, 2 are commands, and 2 are debugs. No fancy
> parsing or exclusion is required, and the total number is low enough that
> there is little need to pull hair out when tinkering with the system. Under
> those circumstances I do not believe the transition cost is worth it, unless
> consistency across the python toolchain is a goal.

Yes, "complexity crossover point" is a common feature/problem in
situations like this - that was well spotted.

You might want to think about this, though: the problem with argparse
is that it has hidden state and magic action at a distance.  You
shouldn't stop at the correct observation that the compactness of an
argparse spec makes it look simpler; you should ask whether the gains
in compactness are outdone by the probable increase in downstream
error rates due to unexpected behavior that you can't spot by looking
at the surface code.

This may sound like an argument against encapsulation in general, but
it's not.  Magic boxes like argarse work well when they have strong,
easily retained invariants that make their behavior easy for our poor
distractible meat brains to reason about.  The trouble with argparse -
and the reason I posed this as a teaching example for you - is that
it's in a gray zone.  Its invariants are not quite strong and simple
enough to make it obviously the right thing, and it's not so baroque
and overcomplex that it's obviously the wrong thing.

> However.
> 
> The reason this subject is being discussed in the first place is because a
> currently existing snafu with the debug options would be best solved by
> adding separate options to control logging to a file. With getopt this
> requires 2 additional options because it doesn't support optional arguments.
> There is also the possibility of adding an option in the future to display
> the raw packets that are sent and received.
> 
> Given /these/ circumstances I believe the conversion to argparse is
> justifiable, with ntpq just coming over the threshold of benefiting from the
> SPOT and self-documenting aspects of argparse more than it loses from the
> increased setup costs. I am weighting the self-doc relatively high in this
> case because ntpq is part of NTPsec.

I might have called it the other way, but I am not so sure of the right
thing that I'm going to override you.

Do check the portability back to earliest supported Python, though. If
there's a problem there it's a showstopper.  I think Gary may have 
hit this before.

Overall, well done.  I said I was looking for quality of fact gathering
and reasoning, and you supplied that.  Once you've done the portability
check I'll call it successful task completion.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Please consider contributing to my Patreon page at https://www.patreon.com/esr
so I can keep the invisible wheels of the Internet turning. Give generously -
the civilization you save might be your own.