Subject: Reading file causes process to hang on getblk
To: None <current-users@NetBSD.ORG>
From: Dave Huang <khym@bga.com>
List: current-users
Date: 07/05/1997 07:10:19
Hi there... I've got a rather annoying problem :) I've got this file
in my directory, and whenever I try to read from it, the process doing
the read will just hang. A "ps l" shows the process with a WCHAN of
getblk, and a ^T shows the same thing. When this happens, trying to
"ls" the directory the file is in will also hang. I think even an "ls"
in the parent directory will hang too. "shutdown -r now" hangs after
the "syncing disks..." message, and if I break into ddb and do a "ps",
I see that the process trying to read the file is still hanging
around.

The problem is 100% repeatable... I've tried running gzip, cat, cp,
dd, and cksum on the file, and they all hang. However, I _can_ run
"file" on it, so it looks like it's only having trouble with a certain
block of the file:

dd if=frame.01705.ppm of=/dev/null count=48 works just fine, but
dd if=frame.01705.ppm of=/dev/null count=49 will hang.

I don't get any kernel messages, and my drives are fine, so it's not a
bad block on the hard drive or anything...

The filesystem is on a ccd, made up of two partitions striped
together. The directory the file's in has 2004 files in it, which
probably makes it the biggest directory I have. 

I compiled a kernel with debugging symbols, then booted into single
user mode, catted the file, then broke into ddb and made it give me a
crash dump, so I've got a dump and debugging kernel, but I don't know
what to do with it :) Assuming I did this correctly, here's a stack
trace of the process:

#0  mi_switch () at ../../../../kern/kern_synch.c:615
#1  0xf8119c51 in bpendtsleep ()
#2  0xf812e32b in getblk (vp=0xf8891180, blkno=3, size=8192, slpflag=0, 
    slptimeo=0) at ../../../../kern/vfs_bio.c:553
#3  0xf812ee7c in cluster_read (vp=0xf8891180, filesize=230415, lblkno=3, 
    size=8192, cred=0xffffffff, bpp=0xfcc93eb0)
    at ../../../../kern/vfs_cluster.c:133
#4  0xf8196f63 in ffs_read (v=0x0) at ../../../../ufs/ufs/ufs_readwrite.c:126
#5  0xf813732f in vn_read (fp=0xf889b600, uio=0xfcc93f20, cred=0xf87b4300)
    at ../../../../sys/vnode_if.h:269
#6  0xf811e6e3 in sys_read (p=0xf889cd00, v=0xfcc93f88, retval=0xfcc93f80)
    at ../../../../kern/sys_generic.c:112
#7  0xf81aeb58 in syscall (frame={tf_es = 31, tf_ds = 31, tf_edi = 1, 
      tf_esi = 3, tf_ebp = -138421800, tf_ebx = -138421692, tf_edx = 53244, 
      tf_ecx = 69632, tf_eax = 3, tf_trapno = 3, tf_err = 2, tf_eip = 40827, 
      tf_cs = 23, tf_eflags = 518, tf_esp = -138421924, tf_ss = 31, 
      tf_vm86_es = 0, tf_vm86_ds = 0, tf_vm86_fs = 0, tf_vm86_gs = 0})
    at ../../../../arch/i386/i386/trap.c:623

Anyone have some ideas and/or things for me to try?

My system's a Pentium, running a July 4 -current. My previous kernel
(the one I was using when I first saw the problem) is from June 30.
Kernel config file and dmesg stuff follow:

include "arch/i386/conf/std.i386"

options 	I586_CPU	# CPU classes; at least one is REQUIRED
options 	VM86		# Virtual 8086 emulation

#options 	BIOSEXTMEM=80896	# size of extended memory

options 	DUMMY_NOPS	# speed hack; recommended
options 	XSERVER,UCONSOLE
options 	INSECURE	# insecure; allow /dev/mem writing for X

maxusers	32		# estimated number of users
options		RTC_OFFSET=300	# hardware clock is this many mins. west of GMT

options 	DDB		# in-kernel debugger
makeoptions	DEBUG="-g"	# compile full symbol table
options 	DIAGNOSTIC	# internal consistency checks
#options 	DEBUG		# internal debug messages
options 	KTRACE		# system call tracing, a la ktrace(1)
#options 	AUDIO_DEBUG

options 	SYSVMSG		# System V-like message queues
options 	SYSVSEM		# System V-like semaphores
options 	SYSVSHM		# System V-like memory sharing
#options	SHMMAXPGS=1024	# 1024 pages is the default

options 	COMPAT_NOMID	# compatibility with 386BSD, BSDI, NetBSD 0.8,
options 	COMPAT_09	# NetBSD 0.9,
options 	COMPAT_10	# NetBSD 1.0,
options 	COMPAT_11	# NetBSD 1.1,
options 	COMPAT_12	# NetBSD 1.2,
options 	COMPAT_43	# and 4.3BSD
options 	TCP_COMPAT_42	# TCP bug compatibility with 4.2BSD

options 	COMPAT_LINUX	# binary compatibility with Linux
options 	COMPAT_FREEBSD	# binary compatibility with FreeBSD

options 	EXEC_ELF32	# 32-bit ELF executables (SVR4, Linux)

options 	USER_LDT	# user-settable LDT; used by WINE
options 	LKM		# loadable kernel modules

file-system 	FFS		# UFS
#file-system 	MFS		# memory file system
file-system 	NFS		# Network File System client
#file-system 	CD9660		# ISO 9660 + Rock Ridge file system
#file-system 	MSDOSFS		# MS-DOS file system
file-system 	FDESC		# /dev/fd
file-system 	KERNFS		# /kern
file-system 	PROCFS		# /proc
file-system 	UNION		# union file system

options		NFSSERVER	# Network File System server
options 	FIFO		# FIFOs; RECOMMENDED

options 	GATEWAY		# packet forwarding
options 	INET		# IP + ICMP + TCP + UDP
options 	NETATALK	# AppleTalk
options 	PPP_DEFLATE
#options	PFIL_HOOKS	# pfil(9) packet filter hooks (Required
				# if you enable the pseudo-device ipfilter)

config		netbsd root on sd0a type ffs dumps on sd1b

#options 	SCSI_DELAY=10
options		PCIVERBOSE

#options 	SCSIDEBUG
#options 	CDROM_ASYNC

mainbus0 at root

pci0	at mainbus0 bus ?
#eisa0	at mainbus0

pchb*	at pci? dev ? function ?	# PCI-Host bridges
pcib*	at pci? dev ? function ?	# PCI-ISA bridges

isa*	at pcib?			# ISA on PCI-ISA bridge
isa*	at mainbus0			# all other ISA

apm0	at mainbus0			# Advanced power management

#ppb*	at pci? dev ? function ?	# PCI-PCI bridges
#pci*	at ppb? bus ?

npx0	at isa? port 0xf0 irq 13	# math coprocessor

#pc0	at isa? port 0x60 irq 1		# generic PC console device
vt0	at isa? port 0x60 irq 1

com0	at isa? port 0x3f8 irq 4	# standard PC serial ports
com1	at isa? port 0x2f8 irq 3
com2	at isa? port 0x3e8 irq 7
com3	at isa? port 0x2e8 irq 9

#lpt0	at isa? port 0x378 irq 7	# standard PC parallel ports

pms0	at pckbd? irq 12		# PS/2 auxiliary port mouse

ncr*	at pci? dev ? function ?	# NCR 538XX SCSI controllers
scsibus* at ncr?

sd0	at scsibus? target 0 lun 0
sd1	at scsibus? target 1 lun 0
sd*	at scsibus? target ? lun ?	# SCSI disk drives
st*	at scsibus? target ? lun ?	# SCSI tape drives
cd*	at scsibus? target ? lun ?	# SCSI CD-ROM drives
uk*	at scsibus? target ? lun ?	# SCSI unknown

fdc0	at isa? port 0x3f0 irq 6 drq 2	# standard PC floppy controllers
#fdc1	at isa? port 0x370 irq ? drq ?
fd*	at fdc? drive ?

wdc0	at isa? port 0x1f0 irq 14	# ST506, ESDI, and IDE controllers
#wdc1	at isa? port 0x170 irq ?
wd0	at wdc0 drive 0

sb0	at isa? port 0x220 irq 5 drq 1	# SoundBlaster

ep*	at pci? dev ? function ?	# 3C590 ethernet cards

spkr0	at pckbd? port 0x61		# speaker

joy0	at isa? port 0x201

pseudo-device	loop	1		# network loopback
pseudo-device	bpfilter 16		# packet filter
pseudo-device	sl	2		# CSLIP
pseudo-device	ppp	2		# PPP
#pseudo-device	tun	2		# network tunnelling over tty
#pseudo-device	ipfilter		# ip filter

pseudo-device	pty	32		# pseudo-terminals
pseudo-device	vnd	4		# paging to files

pseudo-device	ccd	2		# concatenated disk devices


NetBSD 1.2G (SPIFF) #58: Sat Jul  5 05:00:30 CDT 1997
    khym@dahan.metonymy.com:/usr/src.local/sys/arch/i386/compile/SPIFF
cpu0: family 5 model 2 step c
cpu0: Intel Pentium (P54C) (586-class)
real mem  = 66711552
avail mem = 60567552
using 839 buffers containing 3436544 bytes of memory
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
Intel 82439 (Triton II) TXC Host Bridge (host bridge, revision 0x03) at pci0 dev 0 function 0 not configured
pcib0 at pci0 dev 7 function 0
pcib0: Intel 82371SB (Triton II) PCI-ISA Bridge (rev. 0x01)
Intel 82371SB (Triton II) IDE controller (IDE mass storage, interface 0x80) at pci0 dev 7 function 1 not configured
ep0 at pci0 dev 10 function 0: 3Com 3C595 Ethernet
ep0: MAC address 00:a0:24:01:de:fa
ep0: 64KB word-wide FIFO, 3:1 Rx:Tx split, utp/100-TX default utp
ep0: interrupting at irq 15
ncr0 at pci0 dev 11 function 0: NCR 53c810 SCSI
ncr0: interrupting at irq 10
ncr0: restart (scsi reset).
scsibus0 at ncr0: 8 targets
sd0 at scsibus0 targ 0 lun 0: <Quantum, VP32210, L915> SCSI2 0/direct fixed
sd0: sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
2103MB, 4243 cyl, 8 head, 126 sec, 512 bytes/sec x 4308352 sectors
sd1 at scsibus0 targ 1 lun 0: <Quantum, XP32150W, L915> SCSI2 0/direct fixed
sd1: sd1(ncr0:1:0): 10.0 MB/s (100 ns, offset 8)
2151MB, 3907 cyl, 10 head, 112 sec, 512 bytes/sec x 4406960 sectors
cd0 at scsibus0 targ 2 lun 0: <TOSHIBA, CD-ROM XM-5301TA, 1895> SCSI2 5/cdrom removable
probe(ncr0:2:1): 4.0 MB/s (250 ns, offset 8)
sd2 at scsibus0 targ 3 lun 0: <Quantum, XP34300W, L915> SCSI2 0/direct fixed
sd2: sd2(ncr0:3:0): 10.0 MB/s (100 ns, offset 8)
4101MB, 3907 cyl, 20 head, 107 sec, 512 bytes/sec x 8399520 sectors
Matrox MGA Millenium 2064W ("Storm") (VGA display, revision 0x01) at pci0 dev 12 function 0 not configured
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
com2 at isa0 port 0x3e8-0x3ef irq 7: ns8250 or ns16450, no fifo
com3 at isa0 port 0x2e8-0x2ef irq 9: ns8250 or ns16450, no fifo
sb0 at isa0 port 0x220-0x237 irq 5 drq 1: dsp v4.13
npx0 at isa0 port 0xf0-0xff: using exception 16
vt0 at isa0 port 0x60-0x6f irq 1: generic, 80 col, color, 8 scr, mf2-kbd, [R3.32]
pms0 at vt0 irq 12
spkr0 at vt0 port 0x61
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
fd1 at fdc0 drive 1: 1.2MB 80 cyl, 2 head, 15 sec
joy0 at isa0 port 0x201
joy0: joystick not connected
apm0 at mainbus0: Power Management spec V1.1
apm0: A/C state: on
apm0: battery charge state: no battery
biomask 440 netmask 8440 ttymask 9442
boot device: sd0
root on sd0a dumps on sd1b

Name: Dave Huang     |   Mammal, mammal / their names are called /
INet: khym@bga.com   |   they raise a paw / the bat, the cat /
FurryMUCK: Dahan     |   dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 21 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++