Dump bison

Tue Jun 19 15:15:44 UTC 2018

Hal Murray <hmurray at megapathdsl.net>:
> 
> Eric said:
> >  The problem with the new-parser-generator theory is that as much fun as I'd
> > have doing it, the net effect on stability and maintainability would probably
> > be negative.  There's that how-do-you-know-you-specced- the-same-grammar
> > problem again.
> 
> Actually, I think the stability would improve.  We would probably build a test 
> harness that the current setup lacks.

That claim puzzles me.

What I think one needs for a test harness is the ability to dump the parse tree
structure in a textual form.  Changing parser generators won't give us that.
What really matters is whether our target language has introspection capability
and dynamic dump code.

E.g. in Python one can say print(x) and the print library will dynamically
do the right thing according to the type of x, including going through
user-defined str(x) methods on classes if required.

Go has this too (pretty clear influence from Python in the design) but C
does not.  As long as we still generate C we're going to have to jump
through very elaborate and bug-prone hoops to partially simulate this.
I really do not think changing parser generators will help.

> When we ripped that stuff out, why didn't we add something better?

Because it wasn't actually used in the unit tests anywhere.  I
concluded from this that it was probably broken and nobody had had the
round tuits to fix it or remove it. So I removed it.

> There are several ways we could fix the warnings.
> 
> One would be to patch the generated code.  Run bison.  Edit by hand until the 
> compiler is happy.  Save those edits as a patch file.  Add the patch file to 
> git as source code.  Teach waf to apply the patch...  As long as the patch is 
> something simple like dropping in an endcase, that's probably reasonable to 
> maintain.

Agreed, that would work.  

> The simplest solution is probably John Bell's suggestion.  Hack waf to filter 
> the flags used to compile the parser to not use the option that generates the 
> warnings.. - even if it would be used for the rest of the compiles.

That would work too.

> > So, in sum, I think living with a warning or two is the least bad option. The
> > second least bad is that I slightly customize the Bison parser skeleton to
> > make the problem go away; of course then we'd have to maintain that through
> > Bison upgrades. 
> 
> I agree that living with a few warnings is probably the right approach.  At 
> least for now.  But I think it's good to discuss other options.

The clean long term fix for test-harnessing the parser would be "move the
whole codebase to Go" (so we get that introspective capability). This
would fix the warnings problem as a side effect.

We haven't committed to this, of course, but the prospect that we
might in the future reduces my interest in shorter-term, uglier
patches against either problem.

> Maintaining a fixed version of Bison seems like a bad idea.  For something 
> like this, I'd expect upstream would accept a patch.  That fixes the warnings 
> when a new version of Bison is released and distros adopt it.

Note that I wasn't talking about forking Bison itself, just the parser skeleton
file shipped with it.  What this is is a C code template into which the parser
generator drops its lookup tables.

Bison, like Yacc before it, has an option to use a custom skeleton
file.  I'm an old hand at hacking these; many years ago I write a
custom skeleton for System V Yacc that fixed the #^%@! interface,
abolishing all globals by packing them into a state structure to be
passed to the yacc() driver function.  This is the way it should have
been done in the first place, and would have been if Steve Johnson had
designed Yacc after C passed out of its cuneiform-tablets stage.

I haven't repeated that work on Bison only because someone else did it
first.  If you invoke the option to generate what the Bison docs call
(reasonably) a "re-entrant" parser this is what you get - a parser
generated with a skeleton file that works exactly like my old Yacc
hack.  (There's basically only one way to do it right, modulo
structure naming.)  I know this works because cvs-fast-export uses it
to parallelize parsing of trees of CVS masters on a multicore
processor.  It's a helluva speedup - I can literally have a thousand
Yacc instances running simultaneously in threads and only blocking on
disk I/O.

Yes, you'd expect upstream would accept a fix patch.  Unfortunately I
have tried to fix a minor Bison bug and found whoever is behind the
bug address to be unresponsive. I fear we cannot expect help from that
quarter.

*grumble*  If I could get whoever it is to hand off, I'd cheerfully
take over maintaining Bison myself. And do a better job.

> Currently, we require Bison in the build environment.  We don't really need 
> that.  We only need Bison to run on some system we have access to.  Then we 
> could treat the output as source and put it in git.  It adds a layer of 
> complexity that I'm happy we don't have, but it is a possibility.
> 
> The approach of saving the Bison output also applies to using a modern parser. 
>  If it isn't widely available, we have the option of running it on one system 
> and capturing the output to use on other systems.

While I won't die on this hill, I'm reluctant.  I've seen projects go this
route before and my experience is that this tactic always eventually turns
into a pain in the ass.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.