Subject: kern/11983: under certain conditions, mkdir(2) takes too long.
To: None <gnats-bugs@gnats.netbsd.org>
From: Herb Peyerl <hpeyerl@beer.org>
List: netbsd-bugs
Date: 01/17/2001 13:12:18
>Number:         11983
>Category:       kern
>Synopsis:       under certain conditions, mkdir(2)  takes too long.
>Confidential:   yes
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 17 13:12:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Herb Peyerl
>Release:        NetBSD-1.5
>Organization:
>Environment:
	
System: NetBSD nlager 1.5 NetBSD 1.5 (LAGER) #4: Tue Jan 16 05:42:44 MST 2001 hpeyerl@nlager:/usr/src/sys/arch/i386/compile/LAGER i386


>Description:
On my Abit KA7, Athlon 800, 128MB, 2 45GB IBM disks RAID1'd together into
one big filesystem, mkdir(2) takes 17 seconds to complete and completes
3500+ I/O's.  As discussed on current-users the week of 01/14/2001.

Here are some particulars:

wd0 and wd1 disklabels:

	# /dev/rwd0d:
	type: ESDI
	disk: IBM-DTLA-307045 
	label: fictitious
	flags:
	bytes/sector: 512
	sectors/track: 63
	tracks/cylinder: 16
	sectors/cylinder: 1008
	cylinders: 16383
	total sectors: 90069840
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 

	8 partitions:
	#        size   offset     fstype   [fsize bsize   cpg]
	  a:    40320        0     4.2BSD     1024  8192    16   # (Cyl.    0 - 39)
	  d: 90069840        0     unused        0     0         # (Cyl.    0 - 89354)
	  e: 89529552    40320       RAID                        # (Cyl.   40 - 88858)
	  h:   499968 89569872       swap                        # (Cyl. 88859 - 89354)

raid1 disklabel:
	# /dev/rraid1d:
	type: RAID
	disk: raid
	label: default label
	flags:
	bytes/sector: 512
	sectors/track: 32
	tracks/cylinder: 1
	sectors/cylinder: 32
	cylinders: 2797796
	total sectors: 89529472
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 

	4 partitions:
	#        size   offset     fstype   [fsize bsize   cpg]
	  a: 89529472        0     4.2BSD     1024  8192   256   # (Cyl.    0 - 2797795)
	  b:    49968 89029504       swap                        # (Cyl. 2782172 - 2783733*)
	  d: 89529472        0     unused        0     0         # (Cyl.    0 - 2797795)

/etc/raid1.conf:
	START array
	1 1 0

	START disks
	/dev/wd0e
	#/dev/wd1e

	START layout
	32 1 1 1

	START queue
	fifo 100

dmesg:
	NetBSD 1.5 (LAGER) #4: Tue Jan 16 05:42:44 MST 2001
	    hpeyerl@nlager:/usr/src/sys/arch/i386/compile/LAGER
	cpu0: AMD K7 (Athlon) (686-class)
	total memory = 127 MB
	avail memory = 112 MB
	using 1658 buffers containing 6632 KB of memory
	BIOS32 rev. 0 found at 0xfb470
	mainbus0 (root)
	pci0 at mainbus0 bus 0: configuration mode 1
	pci0: i/o space, memory space enabled
	pchb0 at pci0 dev 0 function 0
	pchb0: VIA Technologies VT8371 (Apollo KX133) Host Bridge (rev. 0x02)
	ppb0 at pci0 dev 1 function 0: VIA Technologies VT8371 (Apollo KX133) PCI-PCI Bridge (rev. 0x00)
	pci1 at ppb0 bus 1
	pci1: i/o space, memory space enabled
	pcib0 at pci0 dev 7 function 0
	pcib0: VIA Technologies VT82C686A (Apollo KX133) PCI-ISA Bridge (rev. 0x22)
	pciide0 at pci0 dev 7 function 1: VIA Tech VT82C586A IDE Controller (rev. 0x10)
	pciide0: bus-master DMA support present
	pciide0: primary channel configured to compatibility mode
	wd0 at pciide0 channel 0 drive 0: <IBM-DTLA-307045>
	wd0: drive supports 16-sector pio transfers, lba addressing
	wd0: 43979 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 90069840 sectors
	wd0: 32-bit data port
	wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5
	wd1 at pciide0 channel 0 drive 1: <QUANTUM FIREBALLP LM10.2>
	wd1: drive supports 16-sector pio transfers, lba addressing
	wd1: 9797 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 20066251 sectors
	wd1: 32-bit data port
	wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4
	pciide0: primary channel interrupting at irq 14
	wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
	wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
	pciide0: secondary channel configured to compatibility mode
	wd2 at pciide0 channel 1 drive 0: <IBM-DTLA-307045>
	wd2: drive supports 16-sector pio transfers, lba addressing
	wd2: 43979 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 90069840 sectors
	wd2: 32-bit data port
	wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5
	pciide0: secondary channel interrupting at irq 15
	wd2(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
	uhci0 at pci0 dev 7 function 2: VIA Technologies VT83C572 USB Controller (rev. 0x10)
	uhci0: interrupting at irq 11
	usb0 at uhci0: USB revision 1.0
	uhub0 at usb0
	uhub0: VIA Technologie UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
	uhub0: 2 ports with 2 removable, self powered
	uhci1 at pci0 dev 7 function 3: VIA Technologies VT83C572 USB Controller (rev. 0x10)
	uhci1: interrupting at irq 11
	usb1 at uhci1: USB revision 1.0
	uhub1 at usb1
	uhub1: VIA Technologie UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
	uhub1: 2 ports with 2 removable, self powered
	pchb1 at pci0 dev 7 function 4
	pchb1: VIA Technologies VT82C686A SMBus Controller (rev. 0x30)
	fxp0 at pci0 dev 9 function 0: Intel i82557 Ethernet, rev 8
	fxp0: interrupting at irq 10
	fxp0: Ethernet address 00:d0:b7:26:ab:f8, 10/100 Mb/s
	inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
	inphy0: 10baseT, 10baT-FDX, 100baseTX, 100baseTX-FDX, auto
	ahc1 at pci0 dev 11 function 0
	ahc1: interrupting at irq 11
	ahc1: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
	scsibus0 at ahc1 channel 0: 16 targets, 8 luns per target
	isa0 at pcib0
	pckbc0 at isa0 port 0x60-0x64
	pckbd0 at pckbc0 (kbd slot)
	pckbc0: using irq 1 for kbd slot
	wskbd0 at pckbd0: console keyboard
	lpt0 at isa0 port 0x378-0x37b irq 7
	pcdisplay0 at isa0 port 0x3b0-0x3bf iomem 0xb0000-0xb7fff
	wsdisplay0 at pcdisplay0: console (80x25, vt100 emulation), using wskbd0
	pcppi0 at isa0 port 0x61
	midi0 at pcppi0: PC speaker
	sysbeep0 at pcppi0
	isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
	npx0 at isa0 port 0xf0-0xff: using exception 16
	fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
	isapnp0: no ISA Plug 'n Play devices found
	biomask fb7d netmask ff7d ttymask ffff
	scsibus0: waiting 2 seconds for devices to settle...
	ahc1: target 3 using 8bit transfers
	ahc1: target 3 using asynchronous transfers
	cd0 at scsibus0 target 3 lun 0: <YAMAHA, CDR100, 1.11> SCSI2 4/worm rovable
	ahc1: target 5 using 8bit transfers
	ahc1: target 5 synchronous at 5.0MHz, offset = 0xf
	st0 at scsibus0 target 5 lun 0: <SGI, DLT2000, 8519> SCSI2 1/sequential removable
	st0: density code 25, variable blocks, write-enabled
	Kernelized RAIDframe activated
	RAID autoconfigure
	Configuring raid1:
	RAIDFRAME: protectedSectors is 64
	RAIDFRAME: Configure (RAID Level 1): total number of sectors is 89529472 (43715 MB)
	RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep limit
	boot device: raid1
	root on raid1a dumps on raid1b
	root file system type: ffs



	>How-To-Repeat:
	Unfortunately, while I can repeat it easily, no one else indicated that
	they had been able to.
	>Fix:
	Bill Sommerfeld suggested the following  to see if it improved the 
	situation and it did:

	From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
	Message-Id: <20010116054247.E2A862A4B@orchard.arlington.ma.us>

		So, there's one thing which mkdir() does in ffs which is different
		from other sorts of file creation... it tries to put the directory in
		a different cylinder group from the one it's parent lives in..

		I think what's going on here is that ffs_dirpref() may be screwing up
		and always picking an initial cylinder group with few directories,
		lots of free inodes..  but no free blocks.. so it winds up hunting all
		over the disk for free blocks before it finds one for the directory.

		I'm willing to bet that the extra level of indirection required for
		mirroring is causing the "hunt" for free blocks to no longer fit into
		the buffer cache.

		So, the core of ffs_dirpref() in sys/ufs/ffs/ffs_alloc.c is:

			for (cg = 0; cg < fs->fs_ncg; cg++)
				if (fs->fs_cs(fs, cg).cs_ndir < minndir &&
				    fs->fs_cs(fs, cg).cs_nifree >= avgifree) {
					mincg = cg;
					minndir = fs->fs_cs(fs, cg).cs_ndir;
				}

		maybe it should be something more like:

			for (cg = 0; cg < fs->fs_ncg; cg++)
				if (fs->fs_cs(fs, cg).cs_ndir < minndir &&
				    fs->fs_cs(fs, cg).cs_nbfree > 0 &&          
				    fs->fs_cs(fs, cg).cs_nifree >= avgifree) {
					mincg = cg;
					minndir = fs->fs_cs(fs, cg).cs_ndir;
				}

		.. but I must admit I'm not an expert on ffs guts..

>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted: