Subject: ccd(4): kernel memory corruption?
To: None <current-users@netbsd.org>
From: Jukka Salmi <j+nbsd@2007.salmi.ch>
List: current-users
Date: 06/26/2007 12:04:10
Hi,

I've been using a ccd(4) striping (parts of) two disks on a -current
i386 system without any problems for some time. Recently I added two
additional disks to the system and tried to use parts of them for
another ccd(4).

$ cat /etc/ccd.conf
#	ileave	flags
ccd0	128	none	/dev/wd2f /dev/wd3f
ccd1	128	none	/dev/wd0e /dev/wd1e

While booting, both ccd(4)s are configured just fine. But as soon as
the system runs multiuser and I unconfigure ccd1, most of the time I
try to configure it again it fails:

$ sudo ccdconfig -vu ccd1
ccd1 unconfigured
$ sudo ccdconfig -vc ccd1 128 none /dev/wd0e /dev/wd1e
ccdconfig: ioctl (CCDIOCSET): /dev/ccd1d: No such file or directory

But sometimes - without changing anything - it suddenly works:

$ sudo ccdconfig -vc ccd1 128 none /dev/wd0e /dev/wd1e
ccd1: 2 components (wd0e, wd1e), 201706752 blocks interleaved at 128 blocks

Furthermore it always works when going to single user. However, I can
always unconfigure and configure my old ccd (ccd0) even in multiuser.

I traced both successful and failing ccdconfig calls for ccd1:

Success:

  3137      1 ccdconfig CALL  __stat30(0xbfbfeb90,0xbfbfe0b8)
  3137      1 ccdconfig NAMI  "/dev/wd0e"
  3137      1 ccdconfig RET   __stat30 0
  3137      1 ccdconfig CALL  __stat30(0xbfbfeb9a,0xbfbfe0b8)
  3137      1 ccdconfig NAMI  "/dev/wd1e"
  3137      1 ccdconfig RET   __stat30 0
  3137      1 ccdconfig CALL  open(0x804e030,2,0x1a0)
  3137      1 ccdconfig NAMI  "/dev/ccd1d"
  3137      1 ccdconfig RET   open 3
  3137      1 ccdconfig CALL  ioctl(3,CCDIOCSET,0xbfbfe11c)
  3137      1 ccdconfig GIO   fd 3 wrote 24 bytes
       "@\M-`\^D\b\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
  3137      1 ccdconfig NAMI  "/dev/wd0e"
  3137      1 ccdconfig NAMI  "/dev/wd1e"
  3137      1 ccdconfig GIO   fd 3 read 24 bytes

Failure:

  3751      1 ccdconfig CALL  __stat30(0xbfbfeb90,0xbfbfe0b8)
  3751      1 ccdconfig NAMI  "/dev/wd0e"
  3751      1 ccdconfig RET   __stat30 0
  3751      1 ccdconfig CALL  __stat30(0xbfbfeb9a,0xbfbfe0b8)
  3751      1 ccdconfig NAMI  "/dev/wd1e"
  3751      1 ccdconfig RET   __stat30 0
  3751      1 ccdconfig CALL  open(0x804e030,2,0x1a0)
  3751      1 ccdconfig NAMI  "/dev/ccd1d"
  3751      1 ccdconfig RET   open 3
  3751      1 ccdconfig CALL  ioctl(3,CCDIOCSET,0xbfbfe11c)
  3751      1 ccdconfig GIO   fd 3 wrote 24 bytes
       "@\M-`\^D\b\^B\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
  3751      1 ccdconfig NAMI  "/dev/wd0e"
  3751      1 ccdconfig NAMI  "=C5=93=C2=BB=C3=B0=C3=BF=C2=BF=C2=BF"
  3751      1 ccdconfig RET   ioctl -1 errno 2 No such file or directory

Note the stange characters where I would have expected "/dev/wd1e".

I added some debug printfs to ccdioctl() in sys/dev/ccd.c and noticed
that *(ccio->ccio_disks+1) is NULL even if ccio->ccio_ndisks is 2,
causing cpp[1] to contain garbage, but I'm not familiar with kernel
code to find the problem.

Help is appreciated!


TIA, Jukka

[1] http://mail-index.netbsd.org/port-sparc64/2007/01/03/0003.html

--=20
bashian roulette:
$ ((RANDOM%6)) || rm -rf ~