netbsd-bugs: Re: kern/29291

Subject: Re: kern/29291
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Frederick Bruckman <fredb@immanent.net>
List: netbsd-bugs
Date: 02/14/2005 17:50:02

The following reply was made to PR kern/29291; it has been noted by GNATS.

From: fredb@immanent.net (Frederick Bruckman)
To: Jens Kessmeier <j.kessmeier@teles.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/29291
Date: Mon, 14 Feb 2005 11:48:57 -0600 (CST)

 In article <20050214155701.A371963B845@narn.netbsd.org>,
 	Jens Kessmeier <j.kessmeier@teles.de> writes:
 >  
 >  You say: In my experience, after a failed "sync" attempt, the scsi driver is
 >  usually too horked for the dump to take.
 >  
 >  Right, that is our problem. If you repeat my test stuff, or something else
 >  with a panic, you will see 80 times no dump and
 >  1 time a dump. Your solution is fine in development. For a crashed server
 >  machine far from home, in a dark room, on weekend, at night with your
 >  girlfriend,  you will drive or what ever to the console? Please, please say
 >  that is not what you want.

 Huh? So why don't you tell us exactly what you want to hear?

 Notice that, in your original example, panic(9) with no ddb
 still managed to reboot, which is generally what you'd want
 in a production environment. The detailed issue is that the
 second panic in scsipi_execute_xs() caused an immediate reboot
 (as panic(9) is documented to do), because to continue under
 those conditions could be damaging to the file store, and is
 not likely to produce a useful core in any case.

 >  Let us talk about another solution. Why failed the "sync" or why is the scsi
 >  driver too horked?

 I guess your LKM corrupted some critical kernel data structure.
 You must know, that it's a monolithic kernel -- so it's always
 going to be possible for an LKM to screw things up very badly,
 so badly that even a core dump is impossible.

 Frederick