Subject: Re: Massive lossage with -current as of tonight?
To: None <current-users@netbsd.org>
From: Christos Zoulas <christos@astron.com>
List: current-users
Date: 11/05/2006 02:52:59
In article <454D1B08.1000500@warped.com>,
Scott Ellis  <scotte@warped.com> wrote:
>Eric Haszlakiewicz wrote:
>> On Fri, Nov 03, 2006 at 11:25:57PM -0800, Scott Ellis wrote:
>>> Well, after cvs updating and doing a complete rebuild (so using -current 
>>> as of ~10pm PST Nov 3rd), I get the same behavior as before: Various 
>>> programs appear to hang when booting multi-user.
>>>
>>> Going back to libc.so.12.147 "fixes" things (mostly), but sshd still 
>>> fails, and now I see the new, even more exciting behavior that prevents 
>>> logging in:
>[snip]
>> 	Given that you can fix your problem by reverting libc, and I
>> haven't updated anything beyond the ipf binaries, we might have
>> separate issues here.
>
>Well, I'm starting to suspect some of the kauth changes here.
>
>Booting a -current (Nov 4th, cvs updated moments ago) kernel works fine 
>with the October 26th userland.
>
>Updating to Nov 4th userland breaks just as it did when originally 
>reported (stuff like named hanging on "load: 0.95  cmd: named 795 
>[piperd] 0.00u 0.00s 0% 1808k", but being able to be ^C'ed).  My gut 
>tells me this is really sh that's hanging, since we're really running 
>through rc.local and the rc.d/ scripts at this point.  But I digress...
>
>Reverting to Oct 26th binaries, but Nov 4th /lib and /usr/lib "mostly" 
>works.  Most everything is functional (the system works more-or-less as 
>expected) except for some weird permission problems.  For example, 
>during boot I see:
>raidctl: unable to open device file: raid0
>
>And trying to run atactl (Oct 24th or Nov 4th) yields:
>atactl: wd0: Operation not permitted
>
>A ktrace of this shows:
>499      1 atactl   NAMI  "/dev/rwd0d"
>499      1 atactl   RET   open -1 errno 1 Operation not permitted
>
>Using the Oct 24th libc (and other libs), this works fine.
>
>I'm quickly running out of clues.  Can anyone suggest what additional 
>debugging to collect, or what steps to take to try and root-cause this? 
>  My build machine is only an Athlon64 3400+, so rebuilding userland for 
>every day between Oct 24th and Nov 4th seems time prohibitive.
>

I have no idea. I am running current here on two machines and it seems to
work. But that is i386, not amd64.

christos