NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-xen/53074: Daily 8.99.12 DOMU panic
The following reply was made to PR port-xen/53074; it has been noted by GNATS.
From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-xen-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost
Subject: Re: port-xen/53074: Daily 8.99.12 DOMU panic
Date: Wed, 07 Mar 2018 20:43:56 -0500
coypu%sdf.org@localhost writes:
> The following reply was made to PR port-xen/53074; it has been noted by GNATS.
>
> From: coypu%sdf.org@localhost
> To: gnats-bugs%NetBSD.org@localhost
> Cc:
> Subject: Re: port-xen/53074: Daily 8.99.12 DOMU panic
> Date: Tue, 6 Mar 2018 01:49:26 +0000
>
> You can parse the backtrace like so (normally, it would report the
> functions, but that is a bug).
>
> gdb /my-netbsd-kernel
> info symbol 0xffffffff804f74b8
>
> and so on.
>
[snip]
I noticed something odd about the daily panic I am having. It always
happened around 9:15 in the morning, not every 24 hours from a reboot.
At that time, the DOMU was performing a backup using a line much like
this:
/sbin/dump -0u -a -f - $fs | ssh backup%server.eldar.org@localhost "/usr/pkg/bin/buffer | /usr/bin/bzip2 -9c > some_backup.bz2"
The dump would proceed for a while and the at some point panic 100% of
the time. The latest kernel that I compiled up for debugging symbols
produced a slightly different panic then was reported initially in the
PR. I have attached the more recent panic to the end of this email along
with the decoding of the symbols. The original panic that was reported
does not happen with the new kernel.
I suspected some sort of file system damage, but that was ruled out.
Multiple fsck are clean, and even coping the data using pax to a new
volume produced the same panic on the new volume. The file system is a
FFSv2, fslevel 5. WAPBL was enabled on the file system, but that was
also removed, although the panic was slightly different when WAPBL was
present. The volumes are presented to the DOMU as raw lvm devices.
This DOMU was updated from a 6.99 era build to 8.99.12 recently and
performed this sort of backup every day in the past. The DOMU is has a
DOM0 which is a NetBSD/amd64 Xen 4.5.1 DOM0 running 7.1_STABLE, also
recent, and is given a sched_credit weight of 32 and CPU cap of 10.
I can help if given some guidance as to what may be needed.
panic: biodone2 already
cpu0: Begin traceback...
?() at ffffffff804f74b8
?() at ffffffff804f7575
?() at ffffffff80536841
?() at ffffffff8053689c
?() at ffffffff804cc257
cpu0: End traceback...
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff802057a5 cs 0xe030 rflags 0x202 cr2 0xffffa0002b9fe000 ilevel 0 rsp 0xffffa0002b353dc0
curlwp 0xffffa00000741000 pid 0.4 lowest kstack 0xffffa0002b3502c0
Stopped in pid 0.4 (system) at ffffffff802057a5: leave
ds ffff
es 0
fs 3dd0
gs 3d70
rdi 0
rsi a
rbp ffffa0002b353dc0
rbx 104
rdx 1
rcx 0
rax 0
r8 ffffffff807373c0
r9 0
r10 75
r11 e02b
r12 ffffffff8068dbcc
r13 ffffa0002b353e08
r14 ffffffff8068da62
r15 ffffa00000741000
rip ffffffff802057a5
cs e030
rflags 202
rsp ffffa0002b353dc0
ss e02b
ffffffff802057a5: leave
(gdb) l *(0xffffffff804f74b8)
0xffffffff804f74b8 is in vpanic (../../../../kern/subr_prf.c:342).
337 kdbpanic();
338 #endif
339 #ifdef DDB
340 db_panic();
341 #endif
342 cpu_reboot(bootopt, NULL);
343 }
344
345 /*
346 * kernel logging functions: log, logpri, addlog
(gdb) l *(0xffffffff804f7575)
0xffffffff804f7575 is in snprintf (../../../../kern/subr_prf.c:1075).
1070 /*
1071 * snprintf: print a message to a buffer
1072 */
1073 int
1074 snprintf(char *bf, size_t size, const char *fmt, ...)
1075 {
1076 int retval;
1077 va_list ap;
1078
1079 va_start(ap, fmt);
(gdb) l *(0xffffffff80536841)
0xffffffff80536841 is in biointr (../../../../kern/vfs_bio.c:1654).
1649 }
1650 }
1651
1652 static void
1653 biointr(void *cookie)
1654 {
1655 struct cpu_info *ci;
1656 buf_t *bp;
1657 int s;
1658
(gdb) l *(0xffffffff8053689c)
0xffffffff8053689c is in biointr (./x86/intr.h:187).
182
183 static inline int
184 splraiseipl(ipl_cookie_t icookie)
185 {
186
187 return splraise(icookie._ipl);
188 }
189
190 #include <sys/spl.h>
191
(gdb) l *(0xffffffff804cc257)
0xffffffff804cc257 is in softint_thread (/usr/src/sys/arch/amd64/compile/XEN3_DOMU_ISCSI/xen-ma/machine/cpu.h:55).
50 __inline static struct cpu_info * __unused
51 x86_curcpu(void)
52 {
53 struct cpu_info *ci;
54
55 __asm volatile("movq %%gs:%1, %0" :
56 "=r" (ci) :
57 "m"
58 (*(struct cpu_info * const *)offsetof(struct cpu_info, ci_self)));
59 return ci;
--
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS
http://anduin.eldar.org - & - http://anduin.ipv6.eldar.org [IPv6 only]
Home |
Main Index |
Thread Index |
Old Index