Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Increasing amount of cached (unflushed/un-invalidated) file pages causing kernel panics



Thank you Michael van Elst and Alexander Schreiber for response.


Some additional findings obtained from my former work, basically explained by you as mostly not relevant, but I'll post it anyway :
        * sysctl knobs with promising names (e.g. vm.bufmem_hiwater or vm.filemax) set to lower values seemingly didn't affect the problem
        * files created on or read from tmpfs filesystem don't increase amount of cached file pages, hence don't affect the problem
        * on port-amd64 :
                sysctl -w hw.acpi.sleep.state=3
                in spite of writing "Flushing disks cache: done" (kern_pmf.c) doesn't reduce amount of cached files pages
                Unfortunatelly, I don't know a about any aarch64 SBC supporting S3(like) sleep state on NetBSD

        * unmount -Fv ...  - doesn't flush/invalidate cached files pages for a given FFS filesystem


I really appreciate "kern.maxvnodes" suggestion of yours, as it seemingly hit the nail on the head. Although I'v seen it in sysctl.conf on FreeBSD on default installation(set to 400000) and have been tinkering with vnode related *.c kernel source files for some time, I hardly believe that I totally overlooked simple 'sysctl -a | vnode'  ;)


During last week's spare time I've been testing kern.maxvnodes value customization vs. different(so far problematic) workloads on different hardware/software configuration to try to rule out possibility that it's dealing with overlapping separate issues here(which probably is the case, so at least reduce their number).

I set kern.maxvnodes to 2000 and it resolved(crash issues) with following workloads:
        * building release
        * building bigger software from pkgsrc(just like Alexander I've had problems with building Rust, which I forgot to mention in my 1st post)
        * unpacking big tar.gz archives
        * transfering big(serveral GBs) files to NFS server at once

I'm wondering what is the default value of kern.maxvnodes based on, as I've noticed, it varies from one board to another.




Nonetheless one issue persists. Kernel panic that happens 10-40min after boot up of NetBSD system working as NFS server. Usually ~2000MB-~3000MB of free memory reported by top in panic moment. No ZFS usage. No custom NFS server parameters. Served share is on /share which is root partition directory, which obviously is FFS filesystem. Load isn't heavy, ~5MB/s on average. Workload is mixed, no sequences of one big file writing.


Following trace is from Rock64(4xA53, 4GB LPDDR3, run off 16GB eMMC card - not SD card), locally built (from vanilla 9.99.88) aarch64 image(only MKZFS=YES in mk.conf, no local patches or any other base or GENERIC64 config customizations). I've  put "..." in place of my build path in the first line of trace, otherwise unmodified, hope it clarifies a bit. 


[ 781.6927155] panic: kernel diagnostic assertion "len <= buflen" failed: file "/.../usr/src/sys/kern/uipc_mbuf.c", line 1822 
[ 781.6927155] cpu1: Begin traceback...ATE       TIME   WCPU    CPU COMMAND
[ 781.6927155] trace fp ffffc000b015fa50tas/1    0:41  4.44%  4.44% [system]
[ 781.6927155] fp ffffc000b015fa80 vpanic() at ffffc0000056a4dc netbsd:vpanic+0x14c
[ 781.7027167] fp ffffc000b015fae0 kern_assert() at ffffc000007cade8 netbsd:kern_assert+0x58
[ 781.7027167] fp ffffc000b015fb70 m_align() at ffffc00000598f44 netbsd:m_align+0x114
[ 781.7027167] fp ffffc000b015fba0 m_split_internal() at ffffc00000599e48 netbsd:m_split_internal+0xc8
[ 781.7127197] fp ffffc000b015fbf0 nfsrv_getstream() at ffffc000004713fc netbsd:nfsrv_getstream+0xac
[ 781.7127197] fp ffffc000b015fc40 nfsrv_rcv() at ffffc000004716e8 netbsd:nfsrv_rcv+0x1c8
[ 781.7127197] fp ffffc000b015fcd0 do_nfssvc.part.0() at ffffc000004768fc netbsd:do_nfssvc.part.0+0xf5c
[ 781.7227219] fp ffffc000b015fe20 syscall() at ffffc000000aaa54 netbsd:syscall+0x194
[ 781.7227219] tf ffffc000b015fed0 el0_trap() at ffffc000000adff0 netbsd:el0_trap
[ 781.7227219] ---- trapframe 0xffffc000b015fed0 (304 bytes) ----0% rpcbind
[ 781.7227219]     pc=0000fd365bc8bbe8,   spsr=0000000080000000.00% cron
[ 781.7227219]    esr=000000005600009b,    far=0000fd365bc00000.00% rpc.statd
[ 781.7227219]     x0=0000000000000004,     x1=0000fd365abeff40.00% rpc.lockd
[ 781.7227219]     x2=0000000000000000,     x3=0000000000000000.00% mountd
[ 781.7227219]     x4=0000fd365bc30d88,     x5=0000000000000001.00% inetd
[ 781.7227219]     x6=0000fd365bc00006,     x7=00000000000000e3.00% devpubd
[ 781.7327208]     x8=0000000000000000,     x9=0000000000000002
[ 781.7327208]    x10=0000000000000000,    x11=0000000000000000
[ 781.7327208]    x12=0000000000000000,    x13=0000fd3658e008c0
[ 781.7327208]    x14=0000000000000000,    x15=0000000000000000
[ 781.7327208]    x16=00000002001131d8,    x17=0000fd365bc8bbe4
[ 781.7327208]    x18=0000000000000064,    x19=0000fd365bc30c00
[ 781.7327208]    x20=0000000000800000,    x21=0000fd365a3f0000
[ 781.7327208]    x22=0000000000000000,    x23=0000fd365bc30ca0
[ 781.7327208]    x24=0000fd365c07d440,    x25=0000fd365bc30c00
[ 781.7327208]    x26=0000fd365bc1c000,    x27=0000000000000000
[ 781.7327208]    x28=0000000000000004, fp=x29=0000fd365abeff20
[ 781.7327208] lr=x30=0000000200101690,     sp=0000fd365abeff20
[ 781.7327208] ------------------------------------------------
[ 781.7327208] cpu1: End traceback...
Stopped in pid 1002.875 (nfsd) at       netbsd:cpu_Debugger+0x4:        ret
?
x0          1
x1          0
x2          ffffc00000c45d80    cpu_info_store+0x780
x3          0
x4          3
x5          0
x6          1
x7          ffffffc8
x8          0
x9          ffffc000b015f98f
x10         0
x11         ffffc000b015f830
x12         0
x13         ffffc00000dcb628    db_symtab+0xad748
x14         ffffc000009c9f58    ostype+0xa1748
x15         ffffc000b015f4d8
x16         ffffc000000040a0    pic_default_splx
x17         18
x18         1000
x19         ffffc0000111c630    scratchstr.0
x20         ffffc00000928908    ostype+0xf8
x21         ffffc0000111c608    panicstr
x22         ffffc0000111c000    phpool+0x8f0
x23         104
x24         ffffc0000111b000    cpu_counts+0x48
x25         0
x26         ffff0000fc948100
x27         2
x28         ffffc00000cf6c28    nfsrtton
x29         ffffc000b015fa80
x30         ffffc0000056a4e0    vpanic+0x150
sp          ffffc000b015fa80
pc          ffffc000000a7460    cpu_Debugger+0x4
spsr        20000005
netbsd:cpu_Debugger+0x4:        ret


Best regards,
Marcin


Home | Main Index | Thread Index | Old Index