NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Xen storage for NetBSD guests: performance vs. consistent backups (sanity check)



Hello!

> Previously, in essentially this configuration, I had run into
> reproducible freezes during dump -X. With the current setup, however, I
> can no longer reproduce this behaviour.

That "freeze" is more likely kernel panic. fss(4) is known to corrupt kernel memory pool under high I/O load. These two commands below reliably crash any NetBSD kernel (tested 10.1+) in few seconds:

fssconfig -cx fss0 / /root/backup
dd if=/dev/fss0 of=/dev/null bs=3D1024k

After few seconds:

 uvm_fault(0xffffffff81976c20, 0x0, 1) -> e
[ 883.0589643] fatal page fault in supervisor mode
[ 883.0589643] trap type 6 code 0 rip 0xffffffff80e261f3 cs 0x8 rflags 0x10202 cr2 0x9 ilevel 0 rsp 0xffff9d04c49d4c10 [ 883.0589643] curlwp 0xffff8b60ae5fe400 pid 0.1215 lowest kstack 0xffff9d04c49d02c0
kernel: page fault trap, code=0
Stopped in pid 0.1215 (system) at netbsd:pool_get+0x2b9: cmpq %r15,8(%
rax)
pool_get() at netbsd:pool_get+0x2b9
allocbuf() at netbsd:allocbuf+0x113
getblk() at netbsd:getblk+0x18c
bio_doread() at netbsd:bio_doread+0x1d
breadn() at netbsd:breadn+0x8b
ffs_snapshot_read() at netbsd:ffs_snapshot_read+0x1b2
VOP_READ() at netbsd:VOP_READ+0x42
vn_rdwr() at netbsd:vn_rdwr+0xf1
fss_bs_io() at netbsd:fss_bs_io+0x89
fss_bs_thread() at netbsd:fss_bs_thread+0x50f

It is clear from stacktrace that trigger is fss(4).

I encounter same crashes with "dump -X" - however it usually takes around 20minutes to crash kernel in such conditions. I reported that as https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=59663 on 21 Sep 2025 but there was no reply so far.

LVM snapshots use different (and much easier to understand than fss(4) which operates at filesystem level) path so they should work without problem.

On 1/25/26 17:58, Matthias Petermann wrote:
Hi all,

thanks a lot for the many thoughtful replies and perspectives in this thread - I found them genuinely helpful.

I wanted to add a short follow-up with a concrete data point, as I had the chance today to set up a clean test environment to re-check an issue I had observed some months ago.

Test setup:

- NetBSD 10.1_STABLE (built from 2026-01-08), Xen 4.18
- NetBSD Dom0 and identical NetBSD DomU
- DomU filesystem hosted on an LVM volume provided by the NetBSD Dom0
- FFSv2ea + WAPBL as DomU root filesystem
- Backups using dump -X (FSS)

Previously, in essentially this configuration, I had run into reproducible freezes during dump -X. With the current setup, however, I can no longer reproduce this behaviour.

For reference:

- DomU: 192.168.2.252
- Backup target: USB disk mounted at /mnt
- DomU filesystem ~6 GB used

Tests were run repeatedly and also under mixed load (a bonnie++ load test running in parallel inside the DomU)

I did not observe any stalls, hangs, or other instabilities.

The test loop used was:

```
for i in `seq 100 199`; do
   ssh root@192.168.2.252 "/sbin/dump -X -h 0 -b 64 -0auf - /" \
     > /mnt/backup$i.dump
done
```

Under these conditions, the LVM-based approach appears to work reliably again and is, at least for me, back to being a viable option.

Many thanks again to everyone for sharing their experiences and insights - they were the motivation to revisit and re-test this properly.

Best regards,
Matthias



Home | Main Index | Thread Index | Old Index