Re: is this crash while coredumping known? (forget the link to NFS)

To: NetBSD-current Users's Discussion List <current-users%netbsd.org@localhost>
Subject: Re: is this crash while coredumping known? (forget the link to NFS)
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Sat, 11 Jul 2020 23:29:05 -0700

So it doesn't seem like this crash has anything to do with NFS after all.

I've been doing package builds in a sandboxctl chroot that access NFS
sources (read-only) but are otherwise entirely confined to a local
filesystem, albiet through sandboxctl's Null mounts.  After many core
dumps (mostly from GNU Configure scripts), one eventually caused another
similar looking crash.

This one did a core dump, but savecore didn't think there was enough
free space left in /var/crash to recover it (even though there is enough
space for dozens of the compresed cores if they comrpess as well as the
last one).

(Below is the original crash messages for comparison)


[ 200974.6716318] fatal double fault in supervisor mode
[ 200974.6716318] trap type 13 code 0 rip 0xffffffff80e3c127 cs 0x8 rflags 0x10286 cr2 0xffff9a02af3e6f88
e6f90
[ 200974.6816277] curlwp 0xffff90f14a2e2bc0 pid 1591.1591 lowest kstack 0xffff9a02af3e52c0
kernel: double fault trap, code=0

Stopped in pid 1591.1591 (conftest) at  netbsd:radix_tree_gang_lookup_node+0x1a:        movq    %rdx,ffff)
radix_tree_gang_lookup_node() at netbsd:radix_tree_gang_lookup_node+0x1a
uvm_page_array_fill() at netbsd:uvm_page_array_fill+0x14b
uvm_page_array_fill_and_peek() at netbsd:uvm_page_array_fill_and_peek+0x1e
uvn_findpage() at netbsd:uvn_findpage+0x88
uvn_findpages() at netbsd:uvn_findpages+0xcd
genfs_getpages() at netbsd:genfs_getpages+0x959
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x58
uvn_get() at netbsd:uvn_get+0x57
ubc_fault() at netbsd:ubc_fault+0x182
uvm_fault_internal() at netbsd:uvm_fault_internal+0x51e
trap() at netbsd:trap+0x4e5
--- trap (number 6) ---
kcopy() at netbsd:kcopy+0x15
uiomove() at netbsd:uiomove+0xb7
ubc_uiomove() at netbsd:ubc_uiomove+0x156
ffs_write() at netbsd:ffs_write+0x251
layer_bypass() at netbsd:layer_bypass+0x102
VOP_WRITE() at netbsd:VOP_WRITE+0x40
vn_rdwr() at netbsd:vn_rdwr+0xcc
coredump_write() at netbsd:coredump_write+0xa0
coredump_elf64() at netbsd:coredump_elf64+0x43a
coredump() at netbsd:coredump+0x650
sigexit() at netbsd:sigexit+0x27c
sendsig_siginfo() at netbsd:sendsig_siginfo+0x323
trapsignal() at netbsd:trapsignal+0x371
trap() at netbsd:trap+0x8e7
--- trap (number 6) ---
400581:
ds          23
es          23
fs          0
gs          0
rdi         ffff90eb408bdd58
rsi         0
rbp         ffff9a02af3e7080
rbx         ffff9a02af3e7190
rdx         ffff9a02af3e71b0
rcx         1
rax         ffffffff80e3c10d    radix_tree_gang_lookup_node
r8          0
r9          1
r10         0
r11         2
r12         ffff90eb408bdd40
r13         1
r14         0
r15         ffff90eb408bdd58
rip         ffffffff80e3c127    radix_tree_gang_lookup_node+0x1a
cs          8
rflags      10286
rsp         ffff9a02af3e6f90
ss          0
netbsd:radix_tree_gang_lookup_node+0x1a:        movq    %rdx,ffffffffffffff10(%rbp)
db{3}>




savecore: reboot after panic: reboot forced via kernel debugger
savecore: system went down at Sat Jul 11 19:35:25 2020
savecore: no dump, not enough free space in /var/crash


$ df -h /var/crash/
Filesystem        Size      Used     Avail %Cap MountedOn
/dev/dk2          3.9G      1.5G      2.2G  40% /var



At Thu, 09 Jul 2020 18:03:23 -0700, "Greg A. Woods" <woods%planix.ca@localhost> wrote:
Subject: is this crash while coredumping to NFS known?
>
> Here's what was on the console:
>
> [ 71887.4479952] fatal double fault in supervisor mode
> [ 71887.4479952] trap type 13 code 0 rip 0xffffffff809c5051 cs 0x8 rflags 0x10286 cr2 0xffff8b827c3e4f98 i
> 3e4fa0
> [ 71887.4479952] curlwp 0xffff8693578524c0 pid 29079.29079 lowest kstack 0xffff8b827c3e32c0
> kernel: double fault trap, code=0
> Stopped in pid 29079.29079 (tpgsqltime) at      netbsd:ip_output+0x14:  movq    %rsi,fffffffffffffe68(%rbp
>
> ip_output() at netbsd:ip_output+0x14
> tcp_output() at netbsd:tcp_output+0xc68
> tcp_send_wrapper() at netbsd:tcp_send_wrapper+0x9a
> sosend() at netbsd:sosend+0x7e4
> nfs_send() at netbsd:nfs_send+0x86
> nfs_request() at netbsd:nfs_request+0x3d4
> nfs_readrpc() at netbsd:nfs_readrpc+0x204
> nfs_doio() at netbsd:nfs_doio+0x731
> VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x64
> genfs_getpages() at netbsd:genfs_getpages+0x1400
> nfs_getpages() at netbsd:nfs_getpages+0x5d
> VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x80
> uvm_fault_internal() at netbsd:uvm_fault_internal+0x1895
> trap() at netbsd:trap+0x4e5
> --- trap (number 6) ---
> copyin() at netbsd:copyin+0x2f
> uiomove() at netbsd:uiomove+0xb7
> ubc_uiomove() at netbsd:ubc_uiomove+0x156
> nfs_write() at netbsd:nfs_write+0x129
> VOP_WRITE() at netbsd:VOP_WRITE+0x65
> vn_rdwr() at netbsd:vn_rdwr+0xcc
> coredump_write() at netbsd:coredump_write+0x56
> coredump_elf64() at netbsd:coredump_elf64+0x89c
> coredump() at netbsd:coredump+0x650
> sigexit() at netbsd:sigexit+0x27c
> sendsig() at netbsd:sendsig
> lwp_userret() at netbsd:lwp_userret+0x1c5
> trap() at netbsd:trap+0x9b7
> --- trap (number 6) ---
> 7c5294:
> ds          23
> es          23
> fs          0
> gs          0
> rdi         ffff869202438bc0
> rsi         0
> rbp         ffff8b827c3e5160
> rbx         ffff8693660f4988
> rdx         ffff869364f08818
> rcx         400
> rax         0
> r8          0
> r9          ffff869364f087b8
> r10         ffff869202438bc0
> r11         0
> r12         ffff869364a93040
> r13         a0
> r14         ffff869364a930b0
> r15         6c
> rip         ffffffff809c5051    ip_output+0x14
> cs          8
> rflags      10286
> rsp         ffff8b827c3e4fa0
> ss          0
> netbsd:ip_output+0x14:  movq    %rsi,fffffffffffffe68(%rbp)
> db{0}> machine cpu
> addr                    dev     id      flags   ipis    spl curlwp
> 0xffffffff8163a800      cpu0    0       3009    0       8  0xffff8693578524c0
> 0xffff8b825ded0000      cpu1    4       f002    0       0  0xffff868c2a81e1c0
> 0xffff8b825e0ec000      cpu2    2       f002    0       4  0xffff86934acd26c0
> 0xffff8b825e16d000      cpu3    6       f002    0       0  0xffff868c2ad6c200
> 0xffff8b825e19e000      cpu4    1       f002    0       0  0xffff868c2a9ec340
> 0xffff8b825e1cf000      cpu5    5       f002    0       0  0xffff868c2aa9d080
> 0xffff8b825e200000      cpu6    3       f002    0       0  0xffff868c2aa8e100
> 0xffff8b825e231000      cpu7    7       f002    0       0  0xffff868c2ab3f180
> db{0}> ps
> PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
> 29079>29079 7   0   1000000   ffff8693578524c0         tpgsqltime
>


--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpzY2KHwEwUk.pgp
Description: OpenPGP Digital Signature

Follow-Ups:
- Re: is this crash while coredumping known? (forget the link to NFS)
  - From: Greg A. Woods

References:
- is this crash while coredumping to NFS known?
  - From: Greg A. Woods

Prev by Date: daily CVS update output
Next by Date: Automated report: NetBSD-current/i386 build failure
Previous by Thread: Re: is this crash while coredumping to NFS known?
Next by Thread: Re: is this crash while coredumping known? (forget the link to NFS)
Indexes:

Home | Main Index | Thread Index | Old Index