current-users: Re: newfs/vnd problems -- cg 0: bad magic number

Subject: Re: newfs/vnd problems -- cg 0: bad magic number
To: None <port-sgimips@netbsd.org>
From: sgimips NetBSD list <sgimips@mrynet.com>
List: current-users
Date: 02/08/2002 14:10:29
(I sent a similar reply to Wayne, but forgot to CC the lists.  I'm
 attempting to reconstruct my response here :)

> On Wed, 6 Feb 2002, sgimips NetBSD list wrote:
> 
> > Since I last build a miniroot for sgimips, last December,
> > I am now having the following problem:
> >
> > mod80 (355)# newfs /dev/rvnd0a
> > /dev/rvnd0a:    163840 sectors in 5120 cylinders of 1 tracks, 32 sectors
> >         80.0MB in 6 cyl groups (917 c/g, 14.33MB/g, 3328 i/g)
> > super-block backups (for fsck -b #) at:
> >      32,  29376,  58720,  88064, 117408, 146752,
> > cg 0: bad magic number
> >
> > I can't determine if this is a vnd problem or a newfs issue.
> > I can mount a disk image previously (December) build and it fsck's fine.
> >
> > Anyone recognise a problem I'm overlooking?
> 
> Scott,
> 
> It isn't a nice one to track down.  I have seen it several times on other
> ports.
> 
> Possible causes:
>  - Disklabel synchronization issues (a VND device gets a disk label)
>  - Cache coherency issues when using unaligned DMA in the wdsc driver.
>     thorpej-mips-cache merge occured in november so shouldn't be a cause.
>     wdsc.c rewrite also occured in Novemeber, so unlikely cause.
>  - chrtoblktbl[] entries not correct (for the vnd major number)
>  - RAW Disk transfers are not transfering the full byte count.

I've now realised that my source is NFS mounted, and that the problem does
not occur on real drives.  Thus, I believe that the above 4 issues are
not possible--the vnd newfs works fine on real drives for both raw and 
block devices.  When using an NFS file for the filesystem image, it fails
as above for both raw and block devices.

> The chrtoblktbl[] entries look to be OK, so other things to check:
> 
> First, check to see if it behaves on a block device (more than likely it
> will)
> 
> Then check that it is a local disk issue by using nfs or mfs based
> filesystems (using the rvnd device)
> 
> Using a binary approach build a kernel tree to determine the date that the
> problem started.   This is a time consuming task, but often more
> production than hit-and-miss techniques.   Start by going back to December
> kernel when you think it worked.....

I have now tried this on a kernel built last december 2, (NetBSD 1.5Z, as
opposed to the current 1.5ZA) and there are no problems at all.

For now, I've reverted to that older kernel for my sgimips release work.
I'm setting up another R4400 to try to narrow down where the problem lies
(NFS?) and when the change occured.

Thanks Wayne :)

-scott

> -- 
> Wayne Knowles			NetBSD/mipsco port maintainer
> wdk@netbsd.org			http://www.netbsd.org
> 
> 
>