Subject: Re: Making the (asm) world safe for modern cpp
To: None <tech-toolchain@NetBSD.org>
From: M L Riechers <mlr@rse.com>
List: tech-toolchain
Date: 09/21/2003 16:47:41
Fri, 19 Sep 2003 05:32:48 -0400 (EDT) der Mouse
<mouse@Rodents.Montreal.QC.CA>:

> > [Explicit comment delimiters] are probably not truly required ...
> 
> Well, they're not truly required by the _machine_'s architecture on any
> machine; ...

> As for the SPARC, this is true only if you ignore assembler
> pseudo-instructions like inc.  SPARC assemblers generally support both
> "inc %o0" and "inc %o0,3".  There are also save and restore, which are
> normally accepted without operands, implying "%g0,0,%g0".
> 
> But assembly languages that allow an expression as the last
> ... must either use self-delimiting
> expression syntax or explicit comment delimiters:
> 
>         addl3   iblen, 512, r8  The byte count of this piece is iblen
>         movc3   work, (r7), 256  + 512 - sizeof(int), but first we need
>         subl2   4, r8            to save the packet-so-far from work.
>         movc3   (r9), work, r8
> 
> On that second line, it's not clear where the comment starts, ...

Heh, thank you, you've made my day (he says, holding back serious
chuckles).  Once again, I'm hoisted on my own petard. You've obviously
done some serious assembly programming.

But you don't expect facts to get the way of a satisfyingly good rant,
do you? ;<)

Also, I was hoping to skirt the issue of how the IBM 360 assembler got
around this: the comment delimiter was simply the first space
character found in the operand list: therefore, no spaces allowed in
the operand list!

later, on Fri, 19 Sep 2003 06:40:20 -0400 (EDT) (actually extracted from
Perry's email below):
> der Mouse <mouse@Rodents.Montreal.QC.CA> writes:
> > > One of the reasons is that .S files do go through the preprocessor
> > > and assembler comments are not preprocessor comments.
> > 
> > Of course, the fundamental problem here is the use of a C-specific tool
> > (the C preprocessor) to process non-C text - the same basic problem
> > that X's imake suffers from.

I (mlr) agree.

and on 19 Sep 2003 10:33:36 -0400 "Perry E. Metzger" <perry@piermont.com>
responds,

> Unfortunately, the use of cpp makes sense for this, because it allows
> us to share common definitions of system constants between the .S
> files and .c files in the kernel. I agree, though, that it would be
> nice if there was a simple alternative. Sadly, I don't think there is.

and Fri, 19 Sep 2003 10:21:15 -0400 (EDT) Jim Wise <jwise@draga.com>
responds to the same with:

> True in principle, but the benefit of doing so (being able to share .h
> files between assembly and c sources) is pretty major.

Yes, I totally agree with Perry and Jim with respect to the sharing of
common definitions, and with Perry with respect to his wish that "it
would be nice if there was a simple alternative," but that none
currently exists.

The trouble with sharing .h files from C to assembly is:

1.  With some exceptions, the various assemblers do not understand the
    C syntax at all.  This means that if an .h file is to be fed to an
    assembler, it must be carefully crafted to be understood by the
    assembler.

2.  The various assemblers mostly manage to agree with their
    counterpart C compilers with respect to what a "word" (int)
    is. But from there, it gets much worse. They (the assemblers) do
    not know how to interpret, or floor-plan, if you will, data
    lay-outs such as arrays or structs. Nor do they know what to do
    with typedef's.  Additionally, cpp has no notion of how to cope
    with these things, either.  It is the actual C compiler that is
    the final authority on, and effective agent in these matters (at
    least to the extent of interpreting and enforcing the rules).

So, to keep things flexible and useful, people have made procedures
and constructs such as the "genassym.cf" found in (pretty sure) each
NetBSD port.  The purpose of which is to produce an "assym.h" file
which may or may not translate the C constructs to a syntax with with
the assembler can deal, but mainly fairly reflects the structs with
which the (necessary) assembler programs must deal.  But easily
maintainable they're not.

I've often wished for a "not so simple alternative" to just passing
the cpp over the assembler for "includes", "defines", and, possibly,
macro expansion.  My wish would be a train of programs to something
along the lines of the following in the pre-processor pass:

1.  The initial scanner would be a program that is machine specific
    assembler aware.  It would separate the assembler lines from the C
    lines, and pass the C lines to the cpp.

2.  The cpp would grab the include files, expand the macros, and
    resolve the defines, as it always has, (but this time the cpp is
    standard C one, since the the initial scanner has withheld all
    offending assembler lines).  It would naturally be machine
    independent.  The cpp would then pass its results to:

3.  The actual C compiler (or some program written to function pretty
    much like it, at least in the initial stages) would then resolve
    and interpret, and emit its results in a form understandable by
    the specific assembler.  (The C compiler does emit assembler now,
    it is true, but in a limited fashion: i.e. the C compiler sees no
    need to output its notion of the (abstract) data structures, and
    keeps this info pretty much private, emitting data addresses and
    code; on the other hand, we're not interested in the code, we're
    interested in the lay-out of the data structures.)

4.  The initial scanner, or some other program, would insert the
    assembler so expanded over the top of the C lines it passed to the
    cpp, and present the result to the machine's assembler.

Just wishing away again.  Perhaps someday I'd have the time to deal
with this.

Best regards,

-Mike