Subject: Re: lock-up during memory shortage
To: NetBSD-current <current-users@netbsd.org>
From: Milos Urbanek <urbanek@openbsd.cz>
List: current-users
Date: 05/14/2003 17:50:32
On Wed, May 14, 2003 at 10:22:31AM -0400, Andreas Wrede wrote:

I get aswell from time to time "hard lockup" with NetBSD 1.6P.
Last time it happened (two days ago) when updating my local cvs tree that
contained two 45 MB MP3 files. I'm running X all the time so no
chance to get to debugger.

Hard lockup:
	- unable to ping machine from another one
	- unable to do anything in X (including mouse movement etc)
	- unable to get to debugger / console
	- no logs available
	- shortly before lockup there is a period if high disk activity
	(writing a file); during that some pages are swapped out
	(like X server etc.) machine goes less interactively but it runs
	- when the lockup occurs all activity is stopped (disk LEDs
	are silent etc.)
	- I'm still able to switch numlock; so some interrupts are probably
	working

Machine with 128 MB of memory. Dmesg follows;
Note that I use local kernel config that is slightly different from GENERIC
options         DIAGNOSTIC      # expensive kernel consistency checks
options         DEBUG           # expensive debugging checks/support
options         DDB             # in-kernel debugger
options         DDB_HISTORY_SIZE=512    # enable history editing in DDB  
options         NEW_BUFQ_STRATEGY

	- drivers that are not needed are removed from kernel

I was able to crash the machine the same way when running NetBSD 1.5ZC
(that time it was with GENERIC kernel). I was never able to get to debugger
to get valuable information.

Milos

Dmesg>

NetBSD 1.6P (OAKLAND) #1: Mon Mar 31 13:10:26 MEST 2003
        root@oakland:/usr/src/sys/arch/i386/compile/OAKLAND
total memory = 127 MB
avail memory = 114 MB
using 1658 buffers containing 6632 KB of memory
BIOS32 rev. 0 found at 0xfb310
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: AMD Duron (686-class), 800.12 MHz, id 0x631
cpu0: features c1c7f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
cpu0: features c1c7f9ff<PGE,MCA,CMOV,PAT,PSE36,PN,MMXX,MMX>
cpu0: features c1c7f9ff<FXSR,3DNOW2,3DNOW>
cpu0: I-cache 64 KB 64b/line 2-way, D-cache 64 KB 64b/line 2-way
cpu0: L2 cache 64 KB 64b/line 16-way
cpu0: ITLB 16 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: DTLB 24 4 KB entries fully associative, 8 4 MB entries 4-way
cpu0: 8 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: VIA Technologies VT8363 KT133 System Controller (rev. 0x03)
agp0 at pchb0: aperture at 0xd0000000, size 0x10000000
ppb0 at pci0 dev 1 function 0: VIA Technologies VT8363 KT133 PCI to AGP Bridge (rev. 0x00)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga1 at pci1 dev 0 function 0: ATI Technologies Rage XL (AGP) (rev. 0x65)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 7 function 0
pcib0: VIA Technologies VT82C686A (Apollo KX133) PCI-ISA Bridge (rev. 0x40)
pciide0 at pci0 dev 7 function 1: VIA Technologies VT82C686A (Apollo KX133) ATA1
00 controller
pciide0: bus-master DMA support present
pciide0: primary channel configured to compatibility mode
wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-371350>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 12949 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 26520480 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd1 at pciide0 channel 0 drive 1: <WDC WD300AB-00BVA0>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 28629 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 58633344 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
pciide0: primary channel interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data 
transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data
 transfers)
pciide0: secondary channel configured to compatibility mode
pciide0: disabling secondary channel (no drives)
uhci0 at pci0 dev 7 function 2: VIA Technologies VT83C572 USB Controller (rev. 0
x16)
uhci0: interrupting at irq 9
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: VIA Technologies UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 7 function 3: VIA Technologies VT83C572 USB Controller (rev. 0
x16)
uhci1: interrupting at irq 9
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: VIA Technologies UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
viapm0 at pci0 dev 7 function 4
viaenv0 at viapm0
auvia0 at pci0 dev 7 function 5: VIA VT82C686A AC'97 Audio (rev 0x50)
auvia0: ICEnsemble ICE1232/VT1611A codec; headphone, 18 bit DAC, 18 bit ADC, KS 
Waves 3D
auvia0: variable rate audio
audio0 at auvia0: full duplex, mmap, independent
ex0 at pci0 dev 18 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0
x74)
ex0: interrupting at irq 11
ex0: MAC address 00:01:02:db:4f:a5
bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
lpt0 at isa0 port 0x378-0x37b irq 7
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.2 (slowidle)
IPsec: Initialized Security Association Processing.
uhub0: port error, restarting port 1
uhub0: port error, giving up port 1
uhub0: port error, restarting port 2
uhub0: port error, giving up port 2
uhub1: port error, restarting port 1
uhub1: port error, giving up port 1
uhub1: port error, restarting port 2
uhub1: port error, giving up port 2
boot device: wd0
root on wd0a dumps on wd0b
mountroot: trying smbfs...
mountroot: trying coda...
mountroot: trying msdos...
mountroot: trying cd9660...
isofs: session offset (part 0) 0
mountroot: trying ntfs...
mountroot: trying nfs...
mountroot: trying ext2fs...
mountroot: trying ffs...
root file system type: ffs
init: copying out path `/sbin/init' 11

It happens when running e.g. 
> I am running 1.6.1 on a /i386 machine with 128Mb memory. I get regular
> lock-ups during times of high-memory, high-swap activity, typically
> during a nightly rsync run against a large filesystem on a remote
> machine.  I can still break into the debugger, but all higher level
> function are frozen. (I don't even get character echo on the serial
> console). The tracebacks for the last two lock-ups are:
> 
> pmap_extract(c0542400,c0cf1000,cbc98cac,c0241d56) at pmap_extract+0xc
> uvm_km_pgremove_intrsafe(c0cdf000,c0d28000,c0cdf000,c0307bf0,c04fc8e0,cf8000,300
> 00,0,1727,0,cbc98d0c,c04fc908,c04fc908,0,cbc98d30,c0301b5f,c04fc8e0,c0cdf000,c0d
> 28000,cbc98d2c,c0cf8000,cf8000,30000,e000ffe6,1,3,cbc98d70,cbc98d2c,0,0,cbc98d70
> ,c0300c87,c04fc8e0,c0cdf000,c0d28000,1727,c052c5e0,49000,c052bce0,c024217a,49000
> ,c0c16000,cbc98db0,c023bdf1,49000,c0cdf000,cbc98dc0,c023bdf1,c04fc8e0,0,49000,1,
> cbcac3d0,49000,1,ffffffff,49000,0,cbc98dd0,ffffffff,49000,0) at uvm_km_pgremove_
> intrsafe+0x2a
> uvm_unmap_remove(c04fc8e0,c0cdf000,c0d28000,cbc98d2c) at uvm_unmap_remove+0x100
> uvm_unmap(c04fc8e0,c0cdf000,c0d28000,1727,c052c5e0) at uvm_unmap+0x8f
> uvm_km_kmemalloc(c04fc8e0,0,49000,1,cbcac3d0) at uvm_km_kmemalloc+0x7f
> malloc(49000,52,1,0,cbb952f8) at malloc+0x249
> amap_copy(cbb952f8,cbcb0780,1,1,f9eb000,f9eb001,cbc98f48,286) at amap_copy+0x16e
> 
> uvmfault_amapcopy(cbc98f34,6,0,1,0) at uvmfault_amapcopy+0x128
> uvm_fault(cbb952f8,f9eb000,0,2,4811f094) at uvm_fault+0x1b6
> trap() at trap+0x4d4
> --- trap (number 6) ---
> 0x481112e4:
> 
> 
> and
> 
> uvm_pagealloc_strat(0,c7e000,0,0,1,0,0,1727) at uvm_pagealloc_strat+0x141
> uvm_km_kmemalloc(c04fc8e0,0,49000,1,cba5fd48) at uvm_km_kmemalloc+0xbd
> malloc(49000,52,1,0,cb9c0478) at malloc+0x249
> amap_copy(cb9c0478,cba465d0,1,1,1a2bb000,1a2bb001,cbbf2f48,202) at amap_copy+0x1
> 6e
> uvmfault_amapcopy(cbbf2f34,6,0,1,0) at uvmfault_amapcopy+0x128
> uvm_fault(cb9c0478,1a2bb000,0,2,4811f094) at uvm_fault+0x1b6
> trap() at trap+0x4d4
> --- trap (number 6) ---
> 0x481112e4:
> 
> 
> I cought the last one on systat vm and top:
> 
>     2 users    Load  2.68  1.96  1.05                  Wed May 14 05:57:51
> 
>           memory totals (in KB)             PAGING   SWAPPING      Interrupts
>          real   virtual    free             in  out   in  out      1425 total
> Active  67988    374524    1020     ops   1320    3                 100 irq0
> All    123180    429716  469092     pages        20                     irq4
>                                                                         irq6
> Proc:r  d  s  w    Csw   Trp   Sys  Int  Sof   Flt        forks         irq10
>         1  9      1344  2651    40 1426 1432  1330        fkppw       2 irq11
>                                                           fksvm    1323 irq12
>    6.4% Sy  20.8% Us   0.0% Ni   2.0% In  70.9% Id        pwait         irq15
> |    |    |    |    |    |    |    |    |    |    |  1320 relck
> ===>>>>>>>>>>>%                                      1320 rlkok
>                                                           noram
> Namei         Sys-cache     Proc-cache                    ndcpy
>     Calls     hits    %     hits     %                    fltcp
>                                                           zfod
>                                                           cow
> Discs  cd0  wd0  wd1  fd0  md0                         64 fmin
> seeks                                                  85 ftarg
> xfers       659  664                                 8596 itarg
> Kbyte      2671 2704                                  436 wired
> %busy      45.7 28.7                                 1308 pdfre
> 
> 
> load averages:  2.71,  1.97,  1.06                                     05:57:55
> 62 processes:  1 runnable, 60 sleeping, 1 on processor
> CPU states:  3.0% user,  0.0% nice,  6.5% system,  2.5% interrupt, 88.1% idle
> Memory: 67M Act, 34M Inact, 1744K Wired, 2408K Exec, 1868K File, 300K Free
> Swap: 756M Total, 299M Used, 457M Free
> 
>   PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
>  2012 root      -5    0   290M   96M RUN        2:20 25.68% 25.68% rsync
>     6 root     -18    0     0K   14M pgdaemon   0:12  0.44%  0.44% [pagedaemon]
>     8 root      18    0     0K   14M syncer     0:24  0.00%  0.00% [ioflush]
>  1009 root      28    0   244K  804K CPU        0:16  0.00%  0.00% top
>  1008 root       3    0   472K  676K ttyin      0:07  0.00%  0.00% systat
>   315 andreas    2    0   524K 1308K select     0:06  0.00%  0.00% sshd
>   243 root       2    0  4908K 1200K select     0:06  0.00%  0.00% squid
>   238 root       2    0   380K    4K select     0:05  0.00%  0.00% <sshd>
>   293 andreas    2    0   524K 1308K select     0:03  0.00%  0.00% sshd
>   219 root      18  -12   716K 1552K pause      0:02  0.00%  0.00% ntpd
>   121 root      10    0   672K  468K nanoslee   0:01  0.00%  0.00% ipmon
>  2015 root       2    0   476K    4K select     0:01  0.00%  0.00% <rcmd>
>     9 root     -18    0     0K   14M aiodoned   0:01  0.00%  0.00% [aiodoned]
>   377 root      18    0   540K    4K pause      0:00  0.00%  0.00% <ksh>
>   335 root      18    0   540K    4K pause      0:00  0.00%  0.00% <ksh>
>   316 andreas   18    0   504K    4K pause      0:00  0.00%  0.00% <ksh>
>   294 andreas   18    0   504K    4K pause      0:00  0.00%  0.00% <ksh>
> 
> 
> ----
> 
> The problem did occur with a -current kernel from early February.
> I am about to build a -current kernel for the machine to see if the
> problem persists.
> 
> I suspect I can "fix" this problem by installing more memory and I'll
> probably do that after the real problem is fixed.
> 
> Any suggestions as to how I should proceed?
> 
> -- 
>     - aew
> 
> 
> NetBSD 1.6.1 (PLANIX) #2: Sun May 11 12:00:04 EDT 2003
>     root@wonder.wrede.pvt:/usr/src/sys/arch/i386/compile/PLANIX
> cpu0: Intel Pentium III (Coppermine) (686-class), 701.68 MHz
> cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
> cpu0: L2 cache 256 KB 32b/line 8-way
> cpu0: features 387f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
> cpu0: features 387f9ff<PGE,MCA,CMOV,FGPAT,PSE36,PN,MMX>
> cpu0: features 387f9ff<FXSR,SSE>
> cpu0: serial number 0000-0683-0003-017B-4660-0B30
> total memory = 127 MB
> avail memory = 113 MB
> using 1658 buffers containing 6632 KB of memory
> BIOS32 rev. 0 found at 0xf04e0
> mainbus0 (root)
> pci0 at mainbus0 bus 0: configuration mode 1
> pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
> pchb0 at pci0 dev 0 function 0
> pchb0: VIA Technologies VT82C691 (Apollo Pro) Host-PCI (rev. 0x42)
> agp0 at pchb0: aperture at 0xe4000000, size 0x10000000
> ppb0 at pci0 dev 1 function 0: VIA Technologies VT82C598 (Apollo MVP3) CPU-AGP Bridge (rev. 0x00)
> pci1 at ppb0 bus 1
> pci1: i/o space, memory space enabled
> vga1 at pci1 dev 0 function 0: 3Dfx Interactive Voodoo3 (rev. 0x01)
> wsdisplay0 at vga1 kbdmux 1
> wsmux1: connecting to wsdisplay0
> pcib0 at pci0 dev 7 function 0
> pcib0: VIA Technologies VT82C596A (Apollo Pro) PCI-ISA Bridge (rev. 0x12)
> pciide0 at pci0 dev 7 function 1: VIA Technologies VT82C596A (Apollo Pro) ATA66 controller
> pciide0: bus-master DMA support present
> pciide0: primary channel configured to compatibility mode
> pciide0: primary channel ignored (disabled)
> pciide0: secondary channel configured to compatibility mode
> atapibus0 at pciide0 channel 1: 2 targets
> cd0 at atapibus0 drive 0: <MATSHITA CR-571, , 1.0d> type 5 cdrom removable
> cd0: 32-bit data port
> cd0: drive supports PIO mode 3
> pciide0: secondary channel interrupting at irq 15
> cd0(pciide0:1:0): using PIO mode 3
> uhci0 at pci0 dev 7 function 2: VIA Technologies VT83C572 USB Controller (rev. 0x08)
> uhci0: interrupting at irq 14
> usb0 at uhci0: USB revision 1.0
> uhub0 at usb0
> uhub0: VIA Technologie UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> pchb1 at pci0 dev 7 function 3
> pchb1: VIA Technologies product 0x3050 (rev. 0x20)
> eap0 at pci0 dev 9 function 0: Ensoniq AudioPCI 97 ES1373B (rev. 0x06)
> eap0: interrupting at irq 14
> eap0: Crystal CS4297 codec; headphone, 18 bit DAC, 18 bit ADC, no 3D stereo
> audio0 at eap0: full duplex, mmap, independent
> midi0 at eap0: AudioPCI MIDI UART
> pciide1 at pci0 dev 10 function 0: Promise Ultra100TX2/ATA Bus Master IDE Accelerator (rev. 0x01)
> pciide1: bus-master DMA support present
> pciide1: primary channel configured to native-PCI mode
> pciide1: using irq 12 for native-PCI interrupt
> wd0 at pciide1 channel 0 drive 0: <WDC WD200BB-00AUA1>
> wd0: drive supports 16-sector PIO transfers, LBA addressing
> wd0: 19092 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 39102336 sectors
> wd0: 32-bit data port
> wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
> wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
> pciide1: secondary channel configured to native-PCI mode
> wd1 at pciide1 channel 1 drive 0: <IC35L120AVVA07-0>
> wd1: drive supports 16-sector PIO transfers, LBA addressing
> wd1: 115 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 241254720 sectors
> wd1: 32-bit data port
> wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
> wd1(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
> vr0 at pci0 dev 11 function 0: VIA VT3043 (Rhine) 10/100 Ethernet
> vr0: interrupting at irq 10
> vr0: Ethernet address: 00:50:ba:aa:23:6f
> ukphy0 at vr0 phy 8: Generic IEEE 802.3u media interface
> ukphy0: Am79C873 10/100 media interface (OUI 0x000676, model 0x0000), rev. 0
> ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> ex0 at pci0 dev 12 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x74)
> ex0: interrupting at irq 11
> ex0: MAC address 00:50:da:c6:d3:ef
> bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
> bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> isa0 at pcib0
> com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
> com0: console
> com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
> pckbc0 at isa0 port 0x60-0x64
> az0 at isa0 port 0x350: Aztech/PackardBell
> radio0 at az0
> pcppi0 at isa0 port 0x61
> midi1 at pcppi0: PC speaker
> spkr0 at pcppi0
> sysbeep0 at pcppi0
> isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
> npx0 at isa0 port 0xf0-0xff: using exception 16
> fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
> fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
> isapnp0: no ISA Plug 'n Play devices found
> apm0 at mainbus0: Power Management spec V1.2
> APM power mgmt engage (device 1): power management disabled (0x10f)
> biomask f3e7 netmask ffe7 ttymask ffe7
> Kernelized RAIDframe activated
> IPsec: Initialized Security Association Processing.
> boot device: wd0
> root on wd0a dumps on wd0b
> root file system type: ffs
> IP Filter: v3.4.29 initialized.  Default = pass all, Logging = enabled
> wsdisplay0: screen 1 added (80x25, vt100 emulation)
> wsdisplay0: screen 2 added (80x25, vt100 emulation)
> wsdisplay0: screen 3 added (80x25, vt100 emulation)
> wsdisplay0: screen 4 added (80x25, vt100 emulation)

--