Subject: kern/17537: tlp0 powerhook knocks machine off net
To: None <>
From: Chuck Cranor <>
List: netbsd-bugs
Date: 07/09/2002 15:52:33
>Number:         17537
>Category:       kern
>Synopsis:       tlp0 powerhook knocks machine off net
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 09 12:53:00 PDT 2002
>Originator:     Chuck Cranor
>Release:        NetBSD 1.6_BETA4

System: NetBSD 1.6_BETA4 NetBSD 1.6_BETA4 (LLAMA) #3: Tue Jul  9 14:53:57 EDT 2002 i386

Architecture: i386
Machine: i386

	- machine: Pentium PRO 200MHz, Gateway2000 G6-200.   
		   my home desktop system.  
		   also acts as fileserver for diskless systems (shark).

	- upgraded machine from 1.5ZA to 1.6_BETA4, causing switch of
	  network driver from de0 to tlp0.

	- when apm decides idle time is such that it should power down
	  the monitor, the tlp0 interface is also powered down.   at 
	  that point machine becomes unreachable from the network.
	  NFS filesystem server is disrupted and my diskless systems 
	  hang.   ooops!

dmesg for system:

NetBSD 1.6_BETA4 (LLAMA) #3: Tue Jul  9 14:53:57 EDT 2002
cpu0: Intel Pentium Pro (686-class), 199.44 MHz
cpu0: I-cache 8 KB 32b/line 4-way, D-cache 8 KB 32b/line 2-way
cpu0: L2 cache 256 KB 32b/line 4-way
cpu0: features f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
cpu0: features f9ff<PGE,MCA,CMOV>
total memory = 97916 KB
avail memory = 87636 KB
using 1249 buffers containing 4996 KB of memory
BIOS32 rev. 0 found at 0xfd9f0
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82441FX PCI and Memory Controller (PMC) (rev. 0x02)
pcib0 at pci0 dev 7 function 0
pcib0: Intel 82371SB PCI-to-ISA Bridge (PIIX3) (rev. 0x01)
pciide0 at pci0 dev 7 function 1: Intel 82371SB IDE Interface (PIIX3) (rev. 0x00)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <Maxtor 90576D4>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 5495 MB, 11166 cyl, 16 head, 63 sec, 512 bytes/sect x 11255328 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
pciide0: primary channel interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
pciide0: secondary channel wired to compatibility mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 1: <PLEXTOR CD-R   PX-W1210A, , 1.05> type 5 cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2
wd1 at pciide0 channel 1 drive 0: <Maxtor 4W080H6>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 78167 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
pciide0: secondary channel interrupting at irq 15
wd1(pciide0:1:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
cd0(pciide0:1:1): using PIO mode 4, DMA mode 2 (using DMA data transfers)
ahc0 at pci0 dev 11 function 0
ahc0: interrupting at irq 10
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
tlp0 at pci0 dev 15 function 0: DECchip 21140A Ethernet, pass 2.0
tlp0: broken MicroWire interface detected; setting SROM size to 1Kb
tlp0: interrupting at irq 9
tlp0: Ethernet address 00:00:c0:81:0a:e9
nsphy0 at tlp0 phy 3: DP83840 10/100 media interface, rev. 0
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga1 at pci0 dev 17 function 0: S3 ViRGE/DX (rev. 0x01)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
lpt0 at isa0 port 0x378-0x37b irq 7
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
isapnp0: read port 0x203
sb0 at isapnp0 port 0x220/16,0x330/2,0x388/4 irq 5 drq 1,5
sb0: Creative SB AWE64 Gold Audio: dsp v4.16
audio0 at sb0: full duplex, mmap, independent
mpu0 at sb0
midi0 at mpu0: SB MPU-401 MIDI UART
opl0 at sb0: model OPL3
midi1 at opl0: SB Yamaha OPL3
joy0 at isapnp0 port 0x200/8
joy0: Creative SB AWE64 Gold Game
joy0: joystick not connected
isapnp0: <Creative SB AWE64 Gold, CTL0023, , WaveTable> port 0x620/4 not configured
apm0 at mainbus0: Power Management spec V1.2
biomask ed45 netmask ef45 ttymask ffc7
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST32155W, 0362> SCSI2 0/direct fixed
sd0: 2049 MB, 4177 cyl, 8 head, 125 sec, 512 bytes/sect x 4197405 sectors
sd0: sync (50.0ns offset 8), 16-bit (40.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 4 lun 0: <iomega, jaz 1GB, H.72> SCSI2 0/direct removable
sd1: drive offline
sd1: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers, tagged queueing
cd1 at scsibus0 target 5 lun 0: <PLEXTOR, CD-ROM PX-12TS, 1.02> SCSI2 5/cdrom removable
cd1: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers
IPsec: Initialized Security Association Processing.
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)


	Upgrade to 1.6 BETA kernel.


	Quick fix: The bad behavior goes away if you do this to tulip.c

Index: tulip.c
RCS file: /cvsroot/syssrc/sys/dev/ic/tulip.c,v
retrieving revision 1.113
diff -c -r1.113 tulip.c
*** tulip.c	2002/05/03 08:48:12	1.113
--- tulip.c	2002/07/09 19:22:21
*** 1964,1972 ****
--- 1964,1974 ----
  	switch (why) {
  	case PWR_SUSPEND:
  	case PWR_STANDBY:
+ #if 0
  		tlp_stop(ifp, 0);
  		if (sc->sc_power != NULL)
  			(*sc->sc_power)(sc, why);
+ #endif
  	case PWR_RESUME:
  		if (ifp->if_flags & IFF_UP) {

More generally, we need to rethink how we have applied powerhooks
to the kernel.   another example of this type of problem was the
itroduction of the the pms0 powerhook... this powerhook confuses
my APM BIOS into thinking that the mouse has been moved resulting
in the following bad behavior:
	1. system is idle for 20 minutes, APM powers down monitor
	2. pms powerhooks switches off mouse
	   << a small amount of random time passes >>
	3. network packet is received
	4. system breaks out of APM sleep
	5. pms powerhook powers up mouse
	6. packet is processed
	7. APM sees mouse state has changed, powers up monitor again
	8. goto 1

the only way to fix is to recompile the kernel with PMS_DISABLE_POWERHOOK,
an option which is commented out in the GENERIC config, but not documented.
so unless you know what you are looking for, you'll have a hard time
debugging this problem (especially true, since i believe the behavior
depends alot on your hardware and what APM bios you are using).

note that neither of these problems occur in NetBSD 1.5, and there is 
currently no way to fix them except for recompiling the kernel with the 
proper hacks/options.