tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: Time to merge the pgoyette-compat branch



As I expected, my attempts to respond to your comments and explain
things have made matters worse rather than better.  And I'm pretty
sure that any further attempts to explain will simply confuse
things even more.  So I'm not going to pursue this effort further
without some substantial encouragement and involvement from others.

Some final comments:

WRT module names, We're already quite dependent on identifying modules
by their names.  It provides an additional qualifier, and removing
this part of the change set would be nearly as much work as putting it
in the first place.  IMHO, it doesn't hurt, it might help, and it
doesn't cost very much (the checking happens only at module load time).
And whether you like the current error values being used or not, having
different error values for different conditions is (again, IMHO) quite
useful.

(Also, while not really relevant to the current discussion, you asked
"why do we care about names?"  A module's name is the _only_ way to
identify a module when it is being unloaded, and it is the _only_ way
to identify a built-in module that is being re-enabled after having
been "unloaded.  Agreed that neither of these reasons-for-caring are
sufficient to require the introduction of aliases.)

WRT the .o vs. .a situation, I could probably make it work equally well
with either method.  But it works just fine with .a and I see no reason
to go back to the .o method.  Again, lots more work for no/little gain.




On Fri, 7 Sep 2018, matthew green wrote:

* Introduction of module "aliases".  In addition to its own name, a
   module can now provide alias names.  This is useful for the
   monolithic compat module, which now contains the functionality of
   the many version-specific modules.  If you load the monolithic module,
   its aliases will prevent you from also loading individual-version
   compat modules.

i'm not sure i understand the point of this.  how does the normal
duplicate symbol detection not cause a failure?  why do we care
about names, vs what is actually imported or exported?  ie, what
happens differently if this change is excluded?

This issue arises in a situation where you have none of the compat
code built in.  First you load (for example) the compat_70 module,
and then you try to load the monolithic compat module.  You will
of course get duplicate symbols, and the load will fail with EEXEC.
But if you try to otherwise load a duplicate module, the load will
fail with EEXIST.  The alias mechanism allows the monolithic compat
module to have multiple names, including compat_70, so the load can
fail with the (IMHO) more informative EEXIST - module already exists.

hm.  i don't know.  i don't think this is useful, as it is
additional checking that isn't needed, and i don't really
agree with "more informative".  if a module can't load
because a symbol already exists, i'd expect, eg EBUSY or
something more specific, than either ENOEXEC or EEXIST.

i'd really rather this part be not merged.

* Removed linking of the .o kernel compat library into all kernels.
   This caused problems, since the library included lots of compat
   symbols, but did not include module linkage; attempts to subsequently
   load some modules would fail due to multiply-defined symbols.

can you expand upon this more?  what exactly has changed how?

When maxv did (one of) his rototill a while ago, he replaced the .a
compat library with a .o library.  As I recall, the reason was that
there were some shared routines that weren't strictly compat (I don't
have the original discussion handy).  By using a .o library, we end up
with much of the compat code included in the kernel even without the
COMPAT_xx options.  (And the routines that caused maxv to do this have
now been included in a module of their own, and are required by their
callers.)

On the branch, we return to a .a library, only including in the
kernel those things that are actually needed to resolve symbols in
the kernel.

i think you've misunderstood why we have .o vs .a for kernel
libraries.  infact, the whole reason we have a .o version of
them is _only for modular kernels_.  if there is a problem
with a .o version for module kernels, then that library is
being built or used wrongly, perhaps even the code should be
elsewhere.

eg, the libcompat.{a,o} difference entirely has to do with
what ends up in the linked kernel image.  for modular kernels,
that are expected to be able to load modules, any support code
for them should be present, regardless of known use.  this is
why we use the library linked as a .o in this case normally.
for non-modular kernels, that are 100% statically linked, we
can build the kernel libraries as a .a, and let the linker
only include the parts we use.

so, if there is a problem with modular and the .o, then it
sounds like that there is a problem with the usage, not that
the linkage should be (IMO) broken.  your method sounds like
it _should_ work without this change, but i can imagine that
unclean usages will trigger the failures you've seen.


.mrg.

!DSPAM:5b924b7432985219015240!



+------------------+--------------------------+----------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+------------------+--------------------------+----------------------------+


Home | Main Index | Thread Index | Old Index