tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: updating COMPAT_LINUX for linux 2.6.x support (take 2)



On Wed, Jun 16, 2010 at 03:56:44PM -0700, Chuck Silvers wrote:

> On Mon, Jun 14, 2010 at 03:25:56PM +0000, Andrew Doran wrote:
> > On Sun, Jun 13, 2010 at 10:59:45PM -0700, Chuck Silvers wrote:
> > > hi folks,
> > > 
> > > ok, more progress.  linux32 is working now and I fixed a few other bugs
> > > along the way.
> > > 
> > > the updated diff is in
> > > ftp://ftp.netbsd.org/pub/NetBSD/misc/chs/linux/diff.linux-nptl-take2.36
> > 
> > - mips pcb_tls. Can you use curlwp->l_private instead and add a 
> >   cpu_lwp_setprivate() ala i386 to handle this?  As it would be the same
> >   mechanism that we'd use for native TLS.  There is a __HAVE flag for
> >   this in machine/types.h as far as I remember, see sys_lwp.c.  I created
> >   patches for a bunch of other architectures to do this, mjf@ is sitting
> >   on them I think.
> 
> ok, I did that for all the platforms where I added TLS code for linux.
> actually, I created an lwp_setprivate() which sets l->l_private and calls
> cpu_lwp_setprivate() if there is one, and changed all the linux TLS code
> and sys__lwp_setprivate() to use that.  for mips, there isn't any
> hardware register so we don't need a cpu_lwp_setprivate() there.

Ah right.. The rdhwr thing can just read l_private. 
 
> > - In x86 sys_machdep.c, I'd feel better if wrmsr() and set of pcb_gs etc 
> >   were bracketed with kpreempt_disable()/kpreempt_enable().  Likewise
> >   memcpy() to pcb_gsd and friends in Linux compat code.
> > 
> > - For the Linux compat setting of %gs/%fs, I'd rather this was done via
> >   a function call into native x86 code because in the past we've ended
> >   up with stragglers in this code, where someone working on compat does
> >   the wrong thing or where someone working on x86 fails to update the
> >   compat code. Not a strong opinion just a preference.
> 
> with the above change to use lwp_setprivate(), the linux code is now
> free of anything that fiddles with TLS thread state directly.
> 
> 
> > - FYI I think I disabled Linux ptrace() because I was concerned about
> >   potential security issues and bitrot in the code. Dunno if that's
> >   still the case.
> 
> you're right, more would be needed there to be safe.
> I put the checks to disallow ptrace on multi-LWP processes
> back the way they were (and added one on powerpc, where it was missing).
> 
> 
> > - Re: the Linux +ucas_int() hack, preemption implies MULTIPROCESSOR
> >   so the kpreempt toggles aren't needed.. Maybe worthwhile as a sort
> >   of documentation though.
> >   We may context switch during the copyout so I don't see how this
> >   can be atomic.  If copyin() is somehow wacky I guess we could switch
> >   there too.  Any reason not to say "implement user space CAS or your
> >   port loses Linux emulation"?
> 
> interesting, I didn't realize that preemption wasn't supported on
> non-MP kernels.

This is mainly down the fact that we need kernel_lock to bracket "legacy"
sections of code that aren't preemption safe.  I think MULTIPROCESSOR
should be sent off to the glue factory but that's another discussion :-).

> I think the hacky version is actually safe.  the copyin() will have
> fetched the page into memory and created a pmap mapping for it.
> in a UP kernel, nothing can happen between the copyin() and the copyout()
> that could change the page's contents or invalidate the mapping,
> so the copyout() can't fault and the value that copyin() read can't
> have changed before the copyout().

Ok, maybe a brief comment above would do.

> as for just disabling COMPAT_LINUX on these platforms,
> I really don't like disabling other people's features
> just because it would be convenient for me.

Perhaps my choice of words was bad.  What we've found is some ports 
lagging behind in the feature stakes, and what works is to say "please
implement this because some point down the road X will break without it".
The presumption here is that it's unreasonable to ask one person to go
implement feature Z on N different ports.  Given the short timeframe for
this change I don't object to the CAS hack.. It is horrible though!

> > - The dup code for fork1() code makes me uncomfortable.  Maybe it's
> >   worthwhile changing our native code so that LIDs are always allocated
> >   from the PID table or something along those lines?  Tend to think these
> >   should be globally unique with the system and not just within a process.
> >   Could also be of potential help with things like inter-process pthread
> >   objects in shared memory.
> 
> could you be more specific about the duplicated code that concerns you?
> do you mean the "Set the new LWP running" chunk at the end of
> linux_clone_nptl()?  that's copied from sys__lwp_create() rather than
> fork1(), BTW.

That type of code, right.
 
> there are 7 callers (8 with my new one) of lwp_create(),
> and all of them have different code for making the new LWP runnable
> (or not) afterward.  do you suppose it would be worthwhile to
> collect all of that together into an lwp_launch() or somesuch,
> with a bunch of flags to select the behaviour?

They all have different requirements.. I would prefer us to limit the
proliferation of low-level stuff outside of kern/ but, I don't have a
stong opinion, it's fine as it was.
 
> I'd rather not get into changing LID allocation behaviour of native
> processes as part of this linux work.

Sure, it's something we can revisit.

> > - When resetting l_lid, for safety we should hold p_lock (allthough 
> >   during early fork we'd probably get away with it due to the process
> >   being SIDL).
> >   If we take an approach like the above then we wouldn't need to reset
> >   l_lid at all.
> 
> I added a flag to lwp_create() to have it allocate a new PID to use for
> the LID of the new thread.  now the only places that the LID of an existing
> LWP is changed are in execve1() (which already did that) and in the
> linux fork and exec callbacks (at which point the process has only 1 LWP).
> do we need to take p_lock in these places?

In fork is the process still SIDL?  In that case no.. Exec yes I think
we'd need p_lock held there since the process is still "live" as it were.
p_reflock may be held to lock out the debugger (can't remember) but
otherwise the proc is visible.  Re: the existing LID reset in execve1()
I'll have a look and possibly open a PR.  The only reason to reset the
LID there is so it looks pretty for ps/top, there should be no assumptions
about LID 1.
 
> > > there is one odd thing that I haven't been able to figure out:
> > > I don't think we should need to reload %gs in INTRFASTEXIT on amd64
> > > since swapgs will put back the user's gs.base value.
> > > but if I take out the reload of %gs and run the 32-bit pthread tests,
> > > within a few minutes the system will either reset or lock up.
> > > anyone have any idea why this would be?  I'd appreciate it if
> > > someone could look over the asm code changes in general.
> > 
> > No clear idea about the %gs issue, sorry.
> 
> well, it doesn't break anything that's currently working
> and it won't cause a problem for TLS, so I'll just leave it
> that the way I have it for now.
> 
> 
> the updated diff is at:
> ftp://ftp.netbsd.org/pub/NetBSD/misc/chs/linux/diff.linux-nptl-take2.39

Thanks!

> 
> -Chuck


Home | Main Index | Thread Index | Old Index