Subject: swap_pager_clean: clean of page %x failed
To: None <port-sparc@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: port-sparc
Date: 07/02/1996 15:23:33
I was investigating the swap_pager_clean bug (the one that produces
"clean of page %x failed" complaints for no visible reason).  I believe
I may have found the problem.  I'm certain I've found _a_ problem.

My disk was partitioned thusly:

 bbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
cccccccccccccccccccccccccccccccccccccccccccccccccc
 ddddddddddddddddddddddddddddddddddddddddeeeeeeeee

This was because I was planning to swap the partitions to see if it was
just that b was before a; after copying a to d I was going to
repartition so that a was where d is and b was where e is, then reboot.
(All this was done while booted off of a second disk I managed to swipe
for the purpose.)

First, I took a dd image of a, in case I really screwed something up.

Then I did "dd if=/dev/rsd0a bs=433152 of=/dev/rsd0d" (a cylinder is
433152 bytes, according to the label, on that disk).  dd complained
about "/dev/rsd0d: Read-only file system.".  What the hell?, thought I;
I checked, twice, that sd0d was not mounted.  I checked that /dev/rsd0d
was in fact the character special device it should be.  I eventually
rebooted to clear any incorrect soft state and redid the dd command,
carefully not touching sd0 after booting before doing the dd.

Same result.

I had a copy of the NetBSD source tree on another machine.  I went
looking for that errant EROFS.  Nothing in /sys/scsi.  I searched
/sys/arch/sparc/*/*.[csh] and found it in arch/sparc/sparc/disksubr.c,
bounds_check_with_label(), and nowhere else.  (I didn't check other
places in the kernel.)

int
bounds_check_with_label(bp, lp, wlabel)
	struct buf *bp;
	struct disklabel *lp;
	int wlabel;
{
#define dkpart(dev) (minor(dev) & 7)

	struct partition *p = lp->d_partitions + dkpart(bp->b_dev);
	int labelsect = lp->d_partitions[0].p_offset;
	int maxsz = p->p_size;
	int sz = (bp->b_bcount + DEV_BSIZE - 1) >> DEV_BSHIFT;

	/* overwriting disk label ? */
	/* XXX should also protect bootstrap in first 8K */
	if (bp->b_blkno + p->p_offset <= LABELSECTOR + labelsect &&
	    (bp->b_flags & B_READ) == 0 && wlabel == 0) {
		bp->b_error = EROFS;
		goto bad;
	}

Looking at this code, it appears that everything before the start of
the a partition is write-protected!  (Except when accessed through
RAW_PART, since in that case sd.c doesn't call
bounds_check_with_label().)

Franchement!  Shouldn't that be

	int labelsect = lp->d_partitions[RAW_PART].p_offset;

or even, since this is SPARC-specific code, just delete the labelsect
variable altogether and assume it's zero?  The label _is_ always in
sector zero, isn't it?

To check this, I repartitioned the disk with a as a little one-cylinder
partition at offset zero and f where a used to be.  I left b alone.
That is,

abbbbbbbbbffffffffffffffffffffffffffffffffffffffff
cccccccccccccccccccccccccccccccccccccccccccccccccc
 ddddddddddddddddddddddddddddddddddddddddeeeeeeeee


I then booted with -a, forced root to sd0f by hand, and exercised it.
I relinked a kernel, which invariably provoked swap_pager_clean
complaints before, when it was misbehaving.  No complaints at all.

Of _course_ swapping misbehaved before; the driver was refusing to
write to the swap area at all!

I'm going to send-pr this in a few minutes, once I have a patch tested.
This note is mostly for those who read port-sparc but not netbsd-bugs.

					der Mouse

			    mouse@collatz.mcrcim.mcgill.edu
		    01 EE 31 F6 BB 0C 34 36  00 F3 7C 5A C1 A0 67 1D