Subject: Re: Qube2 crashes every night
To: NetBSD Cobalt list <port-cobalt@netbsd.org>
From: Andy Ruhl <acruhl@gmail.com>
List: tech-kern
Date: 11/08/2005 05:25:55
On 11/8/05, Andy Ruhl <acruhl@gmail.com> wrote:
> On 11/7/05, Andy Ruhl <acruhl@gmail.com> wrote:
> > On 11/5/05, Pete Rushmere <pete@rushmere.org> wrote:
> > > Andy,
> > >
> > > Same time every night? There's a cron job that runs at 3 eve=
ry
> > > morning and does some disk clean up stuff... amongst other things.
> > >
> > > Kind Regards,
> > > Pete.
> > >
> > >
> > > At 12:58 05/11/2005, Andy Ruhl wrote:
> > > >I recently built release-3 for my Qube2. It had a bad disk (I think?=
)
> > > >so I replaced it with an IBM 60 gig drive.
> > > >
> > > >Ever since I did that, the Qube crashes every night. I'm guessing
> > > >while doing updatedb.
> > > >
> > > >Here's the end of the dmesg:
> > > >
> > > >dev =3D 0x600 bno =3D 26929766 bsize =3D 16384, size =3D 16384, fs =
=3D /
> > > >panic: blkfree: bad size
> > > >
> > > >I'll see if I can save the panic and get a backtrace from it.
> > > >
> > > >On searching on this, it seems to be either a bad disk or a bad cont=
roller?
> >
> > Ok, so maybe I spoke too soon earlier when another person on the list
> > bashed NetBSD for not being stable :)
> >
> > I put in another disk, one that I know works pretty good otherwise.
> >
> > I got another crash today, when I was doing a build of php5 and I was
> > ftp'ing some data off the Qube. Here's some output:
> >
> > db> bt
> > r5k_pdcache_wb_range_32+60 (cb72fe20,cb7303e0,5ea,5ea) ra 801da6bc sz 0
> > 801da60c+b0 (cb72fe20,cb7303e0,5ea,5ea) ra 0 sz 0
> > User-level: pid 13774.1
> >
> > And the end of the dmesg:
> >
> > root on wd0a dumps on wd0b
> > root file system type: ffs
> > trap: TLB miss (load or instr. fetch) in kernel mode
> > status=3D0x2403, cause=3D0x8008, epc=3D0x801d50a4, vaddr=3D0xcb730000
> > pid=3D13774 cmd=3Dftpd usp=3D0x7fffcd10 ksp=3D0xcb75fb08
> > db>
> >
> > I'll see if I can get the crash to go to the dump device and go from th=
ere.
> >
> > Thanks for any help. I'll start searchign on this stuff in the morning.
>
> Copying tech-kern this time.
>
> And again:
>
> Here's the bt output:
> db> bt
> 801c9564+214 (89ffe000,0,bc800000,d) ra 801475ec sz 0
> panic+190 (89ffe000,d,bc800000,65) ra 800d3fb8 sz 40
> 800d38b8+700 (89ffe000,d,bc800000,65) ra 0 sz 0
> User-level: pid 6.1
>
> And here's the dmesg:
>
> root on wd0a dumps on wd0b
> root file system type: ffs
> dev =3D 0x600, bno =3D 6179277 bsize =3D 16384, size =3D 16384, fs =3D /
> panic: blkfree: bad size
> db>
>
> This is a generic release-3 kernel. All I've done is I've changed some
> of the vm. sysctls to see how it affects things based on another
> thread I've been following recently. I also have softdeps set.
>
> Also, I notice this because an ssh session no longer responds to
> commands, so I log in via the serial console. I have ddb.fromconsole=3D0
> (default is 1) so it doesn't cause a panic (I had that problem
> before).
>
> This may be related to some other open bugs, but I'm not sure. I could
> try a -current kernel I suppose.
>
> Thanks.
And again this morning when using ftp:
(Sorry if it's getting old, I won't post too much more of this stuff):
login: trap: TLB miss (load or instr. fetch) in kernel mode
status=3D0x2403, cause=3D0x8008, epc=3D0x801d50a4, vaddr=3D0xcb744000
pid=3D1023 cmd=3Dftpd usp=3D0x7fffcd10 ksp=3D0xcb765b08
Stopped in pid 1023.1 (ftpd) at netbsd:r5k_pdcache_wb_range_32+0x60: cac=
he
0
x19,0x1e0(a0)
db> bt
r5k_pdcache_wb_range_32+60 (cb743e20,cb7443e0,5ea,5ea) ra 801da6bc sz 0
801da60c+b0 (cb743e20,cb7443e0,5ea,5ea) ra 0 sz 0
User-level: pid 1023.1
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND =
WAIT
PID PPID PGRP UID S FLAGS LWPS COMMAND =
WAIT
745 1 745 0 2 0x4002 1 getty t=
tyin
695 1 695 0 2 0 1 ntpd p=
ause
381 400 381 0 2 0x4002 1 tcsh t=
tyin
400 399 400 1000 2 0x4102 1 su =
wait
399 350 399 1000 2 0x4002 1 tcsh p=
ause
350 373 350 0 2 0x4103 1 login =
wait
373 365 365 0 2 0x4000 1 telnetd =
poll
338 1 338 0 2 0 1 cron nan=
osle
365 1 365 0 2 0 1 inetd kq=
read
300 1 300 0 2 0 1 sshd se=
lect
182 0 0 0 2 0x20200 1 nfsio nf=
sidl
171 0 0 0 2 0x20200 1 nfsio nf=
sidl
174 0 0 0 2 0x20200 1 nfsio nf=
sidl
180 0 0 0 2 0x20200 1 nfsio nf=
sidl
120 1 120 0 2 0 1 syslogd
7 0 0 0 2 0x20200 1 aiodoned aio=
done
6 0 0 0 2 0x20200 1 ioflush sy=
ncer
5 0 0 0 2 0x20200 1 pagedaemon pgd=
aemo
4 0 0 0 2 0x20200 1 atabus1 a=
tath
3 0 0 0 2 0x20200 1 atabus0 a=
tath
2 0 0 0 2 0x20200 1 cryptoret cry=
pto_
1 0 1 0 2 0x4000 1 init =
wait
0 -1 0 0 2 0x20200 1 swapper sch=
edul
db>
Thanks.
Andy