Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Large file copy seems to cause dom0 kernel panic



On Oct 15, 2011, at 3:31 PM, Steven Senator wrote:

> I saw and reported this ~1 year ago. ( See:
> http://mail-index.netbsd.org/port-xen/2010/07/27/msg006181.html It is
> the same kernel stack trace. I only saw problems when doing
>    dd if=/dev/zero of=file-backed-domu-boot-disk-image bs=1048576 count=8192
> within a domU. I could not reproduce this in the GENERIC kernel. My
> motherboard is an Opteron SuperMicro H8QM8-2+ (multiprocessor, high
> memory=24Gb). Also, I was using solid state disks (Intel X-25.) My
> suspicion is that because these "disks" are fast, with a high memory
> footprint the VM system was running into edge cases where pages got
> pushed out to disk faster than with traditional spinning platters, and
> there was probably a missing lock that wasn't exposed normally with
> slower i/O. Unfortunately, when I get a panic there is a secondary
> panic which prevents the dump from happening so i could only get a
> screenshot.
> 
> I can provide remote access to the system if it would be helpful.
> 
> -Steve Senator
> 
> 
> On Sat, Oct 15, 2011 at 4:41 AM, Stephen M. Jones <smj%cirr.com@localhost> 
> wrote:
>>> please enable ddb (sysctl -w ddb.onpanic=1) and report the real stack
>>> trace.
>>> 
>>> I've never noticed this mysef, although I occasionally do large files
>>> copy in dom0.
>> 
>> uvm_fault(0xffffffff80bfffc0, 0xffffffff81400000, 1) -> e
>> fatal page fault in supervisor mode
>> trap type 6 code 0 rip ffffffff804fb673 cs e030 rflags 10286 cr2  
>> ffffffff81400028 cpl 0 rsp ffffa0006ce638d0
>> kernel: page fault trap, code=0
>> Stopped in pid 4889.1 (cp) at   netbsd:pmap_kenter_pa+0x173:    movq    
>> 0(%rax),
>> %rsi
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> ubc_alloc() at netbsd:ubc_alloc+0x25d
>> ubc_uiomove() at netbsd:ubc_uiomove+0xba
>> ffs_write() at netbsd:ffs_write+0x5c2
>> VOP_WRITE() at netbsd:VOP_WRITE+0x2d
>> vn_write() at netbsd:vn_write+0xce
>> dofilewrite() at netbsd:dofilewrite+0x7f
>> sys_write() at netbsd:sys_write+0x72
>> syscall() at netbsd:syscall+0xb4
>> ds          0
>> es          0x3920
>> fs          0
>> gs          0xdd38
>> rdi         0xffffa0006752c000
>> rsi         0xcbe05000
>> rbp         0xffffa0006ce63900
>> rbx         0xcbe05
>> rdx         0x7f8000000000
>> rcx         0
>> rax         0xffffffff81400028
>> r8          0xffffffff80bab900  cpu_info_primary
>> r9          0xffffa0000615d9c0
>> r10         0xffffa00007cdf160
>> r11         0xffffa0006ce63920
>> r12         0x3
>> r13         0x7fd00033a960
>> r14         0xffffa00067a9dd38
>> r15         0xffffa0006752c000
>> rip         0xffffffff804fb673  pmap_kenter_pa+0x173
>> cs          0xe030
>> rflags      0x10286
>> rsp         0xffffa0006ce638d0
>> ss          0xe02b
>> netbsd:pmap_kenter_pa+0x173:    movq    0(%rax),%rsi
>> db>
>> db> bt
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> ubc_alloc() at netbsd:ubc_alloc+0x25d
>> ubc_uiomove() at netbsd:ubc_uiomove+0xba
>> ffs_write() at netbsd:ffs_write+0x5c2
>> VOP_WRITE() at netbsd:VOP_WRITE+0x2d
>> vn_write() at netbsd:vn_write+0xce
>> dofilewrite() at netbsd:dofilewrite+0x7f
>> sys_write() at netbsd:sys_write+0x72
>> syscall() at netbsd:syscall+0xb4
>> db> trace
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> ubc_alloc() at netbsd:ubc_alloc+0x25d
>> ubc_uiomove() at netbsd:ubc_uiomove+0xba
>> ffs_write() at netbsd:ffs_write+0x5c2
>> VOP_WRITE() at netbsd:VOP_WRITE+0x2d
>> vn_write() at netbsd:vn_write+0xce
>> dofilewrite() at netbsd:dofilewrite+0x7f
>> sys_write() at netbsd:sys_write+0x72
>> syscall() at netbsd:syscall+0xb4
>> db> reboot
>> syncing disks... 12 11 done
>> unmounting file systems...
>> unmounting /proc (procfs)...uvm_fault(0xffffffff80bfffc0, 
>> 0xffffffff81400000, 1) -> e
>> fatal page fault in supervisor mode
>> trap type 6 code 0 rip ffffffff804fb673 cs e030 rflags 10282 cr2  
>> ffffffff814000a8 cpl 6 rsp ffffa0006ce630e0
>> kernel: page fault trap, code=0
>> Stopped in pid 4889.1 (cp) at   netbsd:pmap_kenter_pa+0x173:    movq    
>> 0(%rax),
>> %rsi
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> uvm_km_alloc() at netbsd:uvm_km_alloc+0x169
>> pool_grow() at netbsd:pool_grow+0x36
>> pool_get() at netbsd:pool_get+0x68
>> pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x1d0
>> pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0xe1
>> vnfree() at netbsd:vnfree+0x5b
>> vrelel() at netbsd:vrelel+0x3f9
>> vflush() at netbsd:vflush+0x2d7
>> procfs_unmount() at netbsd:procfs_unmount+0x2b
>> dounmount() at netbsd:dounmount+0xd5
>> vfs_unmountall() at netbsd:vfs_unmountall+0x7c
>> cpu_reboot() at netbsd:cpu_reboot+0xe1
>> db_reboot_cmd() at netbsd:db_reboot_cmd+0x47
>> db_command() at netbsd:db_command+0xb0
>> db_command_loop() at netbsd:db_command_loop+0xe9
>> db_trap() at netbsd:db_trap+0xdd
>> kdb_trap() at netbsd:kdb_trap+0xc2
>> trap() at netbsd:trap+0x345
>> 

So this may be drive specific?  I recently swapped out a ST3250620AS for a 
HDS725050KLA360
and began to have this problem.

I did not see large file copy panics in single user mode, which is how I 
migrated the data off
of the ST3250620AS, but it definitely is a repeatable case that a non-root user 
can cause a
kernel panic by doing a large file copy on the HDS725050KLA360 disk.

Here is the output of atactl identify, but I'm not sure there is anything 
useful in it.  

Model: ST3250620AS, Rev: 3.AAK, Serial #:             9QE20PS
Device type: ATA, fixed
Cylinders: 16383, heads: 16, sec/track: 63, total sectors: 268435455
Device supports command queue depth of 31
Device capabilities:
        DMA
        LBA
        ATA standby timer values
        IORDY operation
        IORDY disabling
Device supports following standards:
ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 ATA-7 
Command set support:
        READ BUFFER command (enabled)
        WRITE BUFFER command (enabled)
        Host Protected Area feature set (enabled)
        look-ahead (enabled)
        write cache (enabled)
        Power Management feature set (enabled)
        Security Mode feature set (disabled)
        SMART feature set (enabled)
        FLUSH CACHE EXT command (enabled) 
        FLUSH CACHE command (enabled)
        Device Configuration Overlay feature set (enabled)
        48-bit Address feature set (enabled)
        SET MAX security extension (disabled) 
        DOWNLOAD MICROCODE command (enabled)
        General Purpose Logging feature set
        SMART self-test
        SMART error logging
Serial ATA capabilities:
        1.5Gb/s signaling  
        3.0Gb/s signaling  
        Native Command Queuing
        PHY Event Counters
Serial ATA features:
        Device-Initiated Interface Power Managment (disabled)
        Software Settings Preservation (enabled)

---
Model: HDS725050KLA360, Rev: K2AOAB0, Serial #:       KRVN65ZBHBW54
Device type: ATA, fixed
Cylinders: 16383, heads: 16, sec/track: 63, total sectors: 268435455
Device supports command queue depth of 31
Device capabilities:
        DMA
        LBA
        ATA standby timer values
        IORDY operation
        IORDY disabling
Device supports following standards:
ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 ATA-7 
Command set support:
        READ BUFFER command (enabled)
        WRITE BUFFER command (enabled)
        Host Protected Area feature set (enabled)
        look-ahead (enabled)
        write cache (enabled)
        Power Management feature set (enabled)
        Security Mode feature set (disabled)
        SMART feature set (enabled)
        FLUSH CACHE EXT command (enabled)
        FLUSH CACHE command (enabled)
        Device Configuration Overlay feature set (enabled)
        48-bit Address feature set (enabled)
        Automatic Acoustic Management feature set (disabled)
        SET MAX security extension (disabled)
        SET FEATURES required to spin-up after power-up (disabled)
        Power-Up In Standby feature set (disabled)
        Advanced Power Management feature set (disabled)
        DOWNLOAD MICROCODE command (enabled)    
        URG bit for WRITE STREAM DMA/PIO  
        URG bit for READ STREAM DMA/PIO
        World Wide name
        WRITE DMA/MULTIPLE FUA EXT commands
        General Purpose Logging feature set
        Streaming feature set
        SMART self-test
        SMART error logging
Serial ATA capabilities:
        1.5Gb/s signaling
        Native Command Queuing
        Host-Initiated Interface Power Management
Serial ATA features:
        Non-zero Offset DMA (disabled)
        DMA Setup Auto Activate (disabled)
        Device-Initiated Interface Power Managment (disabled)
        In-order Data Delivery (disabled)
        Software Settings Preservation (enabled)






Home | Main Index | Thread Index | Old Index