Subject: Re: Qube2 crashes every night
To: NetBSD Cobalt list <port-cobalt@netbsd.org>
From: Andy Ruhl <acruhl@gmail.com>
List: tech-kern
Date: 11/08/2005 05:25:55
On 11/8/05, Andy Ruhl <acruhl@gmail.com> wrote:
> On 11/7/05, Andy Ruhl <acruhl@gmail.com> wrote:
> > On 11/5/05, Pete Rushmere <pete@rushmere.org> wrote:
> > > Andy,
> > >
> > >          Same time every night? There's a cron job that runs at 3 eve=
ry
> > > morning and does some disk clean up stuff... amongst other things.
> > >
> > > Kind Regards,
> > > Pete.
> > >
> > >
> > > At 12:58 05/11/2005, Andy Ruhl wrote:
> > > >I recently built release-3 for my Qube2. It had a bad disk (I think?=
)
> > > >so I replaced it with an IBM 60 gig drive.
> > > >
> > > >Ever since I did that, the Qube crashes every night. I'm guessing
> > > >while doing updatedb.
> > > >
> > > >Here's the end of the dmesg:
> > > >
> > > >dev =3D 0x600 bno =3D 26929766 bsize =3D 16384, size =3D 16384, fs =
=3D /
> > > >panic: blkfree: bad size
> > > >
> > > >I'll see if I can save the panic and get a backtrace from it.
> > > >
> > > >On searching on this, it seems to be either a bad disk or a bad cont=
roller?
> >
> > Ok, so maybe I spoke too soon earlier when another person on the list
> > bashed NetBSD for not being stable :)
> >
> > I put in another disk, one that I know works pretty good otherwise.
> >
> > I got another crash today, when I was doing a build of php5 and I was
> > ftp'ing some data off the Qube. Here's some output:
> >
> > db> bt
> > r5k_pdcache_wb_range_32+60 (cb72fe20,cb7303e0,5ea,5ea) ra 801da6bc sz 0
> > 801da60c+b0 (cb72fe20,cb7303e0,5ea,5ea) ra 0 sz 0
> > User-level: pid 13774.1
> >
> > And the end of the dmesg:
> >
> > root on wd0a dumps on wd0b
> > root file system type: ffs
> > trap: TLB miss (load or instr. fetch) in kernel mode
> > status=3D0x2403, cause=3D0x8008, epc=3D0x801d50a4, vaddr=3D0xcb730000
> > pid=3D13774 cmd=3Dftpd usp=3D0x7fffcd10 ksp=3D0xcb75fb08
> > db>
> >
> > I'll see if I can get the crash to go to the dump device and go from th=
ere.
> >
> > Thanks for any help. I'll start searchign on this stuff in the morning.
>
> Copying tech-kern this time.
>
> And again:
>
> Here's the bt output:
> db> bt
> 801c9564+214 (89ffe000,0,bc800000,d) ra 801475ec sz 0
> panic+190 (89ffe000,d,bc800000,65) ra 800d3fb8 sz 40
> 800d38b8+700 (89ffe000,d,bc800000,65) ra 0 sz 0
> User-level: pid 6.1
>
> And here's the dmesg:
>
> root on wd0a dumps on wd0b
> root file system type: ffs
> dev =3D 0x600, bno =3D 6179277 bsize =3D 16384, size =3D 16384, fs =3D /
> panic: blkfree: bad size
> db>
>
> This is a generic release-3 kernel. All I've done is I've changed some
> of the vm. sysctls to see how it affects things based on another
> thread I've been following recently. I also have softdeps set.
>
> Also, I notice this because an ssh session no longer responds to
> commands, so I log in via the serial console. I have ddb.fromconsole=3D0
> (default is 1) so it doesn't cause a panic (I had that problem
> before).
>
> This may be related to some other open bugs, but I'm not sure. I could
> try a -current kernel I suppose.
>
> Thanks.

And again this morning when using ftp:

(Sorry if it's getting old, I won't post too much more of this stuff):

login: trap: TLB miss (load or instr. fetch) in kernel mode
status=3D0x2403, cause=3D0x8008, epc=3D0x801d50a4, vaddr=3D0xcb744000
pid=3D1023 cmd=3Dftpd usp=3D0x7fffcd10 ksp=3D0xcb765b08
Stopped in pid 1023.1 (ftpd) at netbsd:r5k_pdcache_wb_range_32+0x60:    cac=
he
0
x19,0x1e0(a0)
db> bt
r5k_pdcache_wb_range_32+60 (cb743e20,cb7443e0,5ea,5ea) ra 801da6bc sz 0
801da60c+b0 (cb743e20,cb7443e0,5ea,5ea) ra 0 sz 0
User-level: pid 1023.1
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    =
WAIT
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    =
WAIT
 745              1      745          0 2  0x4002    1            getty   t=
tyin
 695              1      695          0 2       0    1             ntpd   p=
ause
 381            400      381          0 2  0x4002    1             tcsh   t=
tyin
 400            399      400       1000 2  0x4102    1               su    =
wait
 399            350      399       1000 2  0x4002    1             tcsh   p=
ause
 350            373      350          0 2  0x4103    1            login    =
wait
 373            365      365          0 2  0x4000    1          telnetd    =
poll
 338              1      338          0 2       0    1             cron nan=
osle
 365              1      365          0 2       0    1            inetd  kq=
read
 300              1      300          0 2       0    1             sshd  se=
lect
 182              0        0          0 2 0x20200    1            nfsio  nf=
sidl
 171              0        0          0 2 0x20200    1            nfsio  nf=
sidl
 174              0        0          0 2 0x20200    1            nfsio  nf=
sidl
 180              0        0          0 2 0x20200    1            nfsio  nf=
sidl
 120              1      120          0 2       0    1          syslogd
 7                0        0          0 2 0x20200    1         aiodoned aio=
done
 6                0        0          0 2 0x20200    1          ioflush  sy=
ncer
 5                0        0          0 2 0x20200    1       pagedaemon pgd=
aemo
 4                0        0          0 2 0x20200    1          atabus1   a=
tath
 3                0        0          0 2 0x20200    1          atabus0   a=
tath
 2                0        0          0 2 0x20200    1        cryptoret cry=
pto_
 1                0        1          0 2  0x4000    1             init    =
wait
 0               -1        0          0 2 0x20200    1          swapper sch=
edul
db>

Thanks.

Andy