tech-toolchain archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

cleaning up the make parser

I have been muttering for ages about cleaning up the make parser so it
generates some kind of abstract syntax tree that can be eval'd,
instead of going straight to graph nodes. This will have a number of
benefits: right now there are various places where strings are
rescanned in not always consistent ways, and that should stop; also,
it will allow parsing multiply-included files only once, which should
speed up pkgsrc; it will allow doing optimization passes before eval,
particularly on loops, which may also help pkgsrc; and by cleaning up
the internals it will make it reasonably possible to add new
functionality like variable types, multiline macros, and perhaps

To be clear I'm taking about an abstract syntax tree for the macro
preprocessing language, not so much for the rules and commands. The
idea is that you parse files into the abstract representation,
perhaps typecheck and/or optimize it, then eval it to generate the
lower-level internal graph representation of rules and commands.
(Cleaning up that representation is desirable too but a different

With luck this will also result in a parser that runs faster, because
it should be able to spend a lot less time scanning strings; however,
even if it doesn't, cleaner code will make it at least vaguely
possible to pursue other ways of speeding things up.

The goal therefore is to divide the existing parser code into a
frontend, a backend, and perhaps some intermediate checking passes,
all of which operate on a suitable abstract syntax representation.
Because the internals are a little messy, it will take quite some time
to get there. I think the steps are:

(1) change the file reading logic to read a whole file at a time into
a single object, which can then be fed to the existing parser logic,
and throw away the current file reading code that's intertwined with
the parser.

(2) add a layer that divides the single object into lines, which is a
simple textual kind of intermediate representation, and feed the lines
one at a time into (most of) the existing parser logic, and throw away
or migrate the current line-splitting code that's also fairly
intertwined with other things.

(3) migrate the classification of line types to the new regime.

(4) migrate the parsing of the contents of the macro language lines
(variable assignments and directives, not rules or commands) to the
new regime; this will yield the first draft of the real abstract

(5) pull the parsing stuff and the graph-building code fully apart, so
there's a separation between the frontend and backend, and the
intermediate representation can be worked on.

(6) move the parsing of variable representations into the frontend, so
variable substitution no longer has to do this on the fly.

(7) add support for boolean variables, lists, for-loop hoisting,
constant propagation, or whatever else.

(8) fun and profit!

In the long run I'd also like to set up a mechanism where recursive
make invocations can share pre-parsed representations of include
files; while there are a lot of potential problems with this it would
probably make it practical to do real work by invoking make at the top
of the pkgsrc tree. However, that's a long way off as none of these
steps are entirely trivial.

Attached is a candidate patch for step 1. I have a couple questions
related to compat logic for tools and for make where that's different:

  - is there any real system that anyone still cares at all about that
    doesn't have <sys/mman.h> and mmap? Right now I don't have a
    HAS_SYS_MMAN_H test but it's easily added.

  - is it safe to use __printflike() in tools? __c99inline? grepping
    src/tools finds nothing for either of them; also, make has to be
    treated specially...

The patch has so far only passed simple tests; before I commit it I'm
going to make sure it can build the world, and do some pkgsrc

David A. Holland

Home | Main Index | Thread Index | Old Index