Subject: ccd(4): kernel memory corruption?
To: None <current-users@netbsd.org>
From: Jukka Salmi <j+nbsd@2007.salmi.ch>
List: current-users
Date: 06/26/2007 12:04:10
Hi,
I've been using a ccd(4) striping (parts of) two disks on a -current
i386 system without any problems for some time. Recently I added two
additional disks to the system and tried to use parts of them for
another ccd(4).
$ cat /etc/ccd.conf
# ileave flags
ccd0 128 none /dev/wd2f /dev/wd3f
ccd1 128 none /dev/wd0e /dev/wd1e
While booting, both ccd(4)s are configured just fine. But as soon as
the system runs multiuser and I unconfigure ccd1, most of the time I
try to configure it again it fails:
$ sudo ccdconfig -vu ccd1
ccd1 unconfigured
$ sudo ccdconfig -vc ccd1 128 none /dev/wd0e /dev/wd1e
ccdconfig: ioctl (CCDIOCSET): /dev/ccd1d: No such file or directory
But sometimes - without changing anything - it suddenly works:
$ sudo ccdconfig -vc ccd1 128 none /dev/wd0e /dev/wd1e
ccd1: 2 components (wd0e, wd1e), 201706752 blocks interleaved at 128 blocks
Furthermore it always works when going to single user. However, I can
always unconfigure and configure my old ccd (ccd0) even in multiuser.
I traced both successful and failing ccdconfig calls for ccd1:
Success:
3137 1 ccdconfig CALL __stat30(0xbfbfeb90,0xbfbfe0b8)
3137 1 ccdconfig NAMI "/dev/wd0e"
3137 1 ccdconfig RET __stat30 0
3137 1 ccdconfig CALL __stat30(0xbfbfeb9a,0xbfbfe0b8)
3137 1 ccdconfig NAMI "/dev/wd1e"
3137 1 ccdconfig RET __stat30 0
3137 1 ccdconfig CALL open(0x804e030,2,0x1a0)
3137 1 ccdconfig NAMI "/dev/ccd1d"
3137 1 ccdconfig RET open 3
3137 1 ccdconfig CALL ioctl(3,CCDIOCSET,0xbfbfe11c)
3137 1 ccdconfig GIO fd 3 wrote 24 bytes
"@\M-`\^D\b\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
3137 1 ccdconfig NAMI "/dev/wd0e"
3137 1 ccdconfig NAMI "/dev/wd1e"
3137 1 ccdconfig GIO fd 3 read 24 bytes
Failure:
3751 1 ccdconfig CALL __stat30(0xbfbfeb90,0xbfbfe0b8)
3751 1 ccdconfig NAMI "/dev/wd0e"
3751 1 ccdconfig RET __stat30 0
3751 1 ccdconfig CALL __stat30(0xbfbfeb9a,0xbfbfe0b8)
3751 1 ccdconfig NAMI "/dev/wd1e"
3751 1 ccdconfig RET __stat30 0
3751 1 ccdconfig CALL open(0x804e030,2,0x1a0)
3751 1 ccdconfig NAMI "/dev/ccd1d"
3751 1 ccdconfig RET open 3
3751 1 ccdconfig CALL ioctl(3,CCDIOCSET,0xbfbfe11c)
3751 1 ccdconfig GIO fd 3 wrote 24 bytes
"@\M-`\^D\b\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
3751 1 ccdconfig NAMI "/dev/wd0e"
3751 1 ccdconfig NAMI "=C5=93=C2=BB=C3=B0=C3=BF=C2=BF=C2=BF"
3751 1 ccdconfig RET ioctl -1 errno 2 No such file or directory
Note the stange characters where I would have expected "/dev/wd1e".
I added some debug printfs to ccdioctl() in sys/dev/ccd.c and noticed
that *(ccio->ccio_disks+1) is NULL even if ccio->ccio_ndisks is 2,
causing cpp[1] to contain garbage, but I'm not familiar with kernel
code to find the problem.
Help is appreciated!
TIA, Jukka
[1] http://mail-index.netbsd.org/port-sparc64/2007/01/03/0003.html
--=20
bashian roulette:
$ ((RANDOM%6)) || rm -rf ~