NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/53074: Daily 8.99.12 DOMU panic



coypu%sdf.org@localhost writes:

> The following reply was made to PR port-xen/53074; it has been noted by GNATS.
>
> From: coypu%sdf.org@localhost
> To: gnats-bugs%NetBSD.org@localhost
> Cc: 
> Subject: Re: port-xen/53074: Daily 8.99.12 DOMU panic
> Date: Tue, 6 Mar 2018 01:49:26 +0000
>
>  You can parse the backtrace like so (normally, it would report the
>  functions, but that is a bug).
>  
>  gdb /my-netbsd-kernel
>  info symbol 0xffffffff804f74b8
>  
>  and so on.
>  

[snip]

I noticed something odd about the daily panic I am having.  It always
happened around 9:15 in the morning, not every 24 hours from a reboot.
At that time, the DOMU was performing a backup using a line much like
this:

/sbin/dump -0u -a -f - $fs | ssh backup%server.eldar.org@localhost "/usr/pkg/bin/buffer | /usr/bin/bzip2 -9c > some_backup.bz2"

The dump would proceed for a while and the at some point panic 100% of
the time.  The latest kernel that I compiled up for debugging symbols
produced a slightly different panic then was reported initially in the
PR. I have attached the more recent panic to the end of this email along
with the decoding of the symbols.  The original panic that was reported
does not happen with the new kernel.

I suspected some sort of file system damage, but that was ruled out.
Multiple fsck are clean, and even coping the data using pax to a new
volume produced the same panic on the new volume.  The file system is a
FFSv2, fslevel 5.  WAPBL was enabled on the file system, but that was
also removed, although the panic was slightly different when WAPBL was
present.  The volumes are presented to the DOMU as raw lvm devices.

This DOMU was updated from a 6.99 era build to 8.99.12 recently and
performed this sort of backup every day in the past.  The DOMU is has a
DOM0 which is a NetBSD/amd64 Xen 4.5.1 DOM0 running 7.1_STABLE, also
recent, and is given a sched_credit weight of 32 and CPU cap of 10.

I can help if given some guidance as to what may be needed.



panic: biodone2 already
cpu0: Begin traceback...
?() at ffffffff804f74b8
?() at ffffffff804f7575
?() at ffffffff80536841
?() at ffffffff8053689c
?() at ffffffff804cc257
cpu0: End traceback...
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff802057a5 cs 0xe030 rflags 0x202 cr2 0xffffa0002b9fe000 ilevel 0 rsp 0xffffa0002b353dc0
curlwp 0xffffa00000741000 pid 0.4 lowest kstack 0xffffa0002b3502c0
Stopped in pid 0.4 (system) at  ffffffff802057a5:       leave
ds          ffff
es          0
fs          3dd0
gs          3d70
rdi         0
rsi         a
rbp         ffffa0002b353dc0
rbx         104
rdx         1
rcx         0
rax         0
r8          ffffffff807373c0
r9          0
r10         75
r11         e02b
r12         ffffffff8068dbcc
r13         ffffa0002b353e08
r14         ffffffff8068da62
r15         ffffa00000741000
rip         ffffffff802057a5
cs          e030
rflags      202
rsp         ffffa0002b353dc0
ss          e02b
ffffffff802057a5:       leave

(gdb) l *(0xffffffff804f74b8)
0xffffffff804f74b8 is in vpanic (../../../../kern/subr_prf.c:342).
337                     kdbpanic();
338     #endif
339     #ifdef DDB
340             db_panic();
341     #endif
342             cpu_reboot(bootopt, NULL);
343     }
344
345     /*
346      * kernel logging functions: log, logpri, addlog

(gdb) l *(0xffffffff804f7575)
0xffffffff804f7575 is in snprintf (../../../../kern/subr_prf.c:1075).
1070    /*
1071     * snprintf: print a message to a buffer
1072     */
1073    int
1074    snprintf(char *bf, size_t size, const char *fmt, ...)
1075    {
1076            int retval;
1077            va_list ap;
1078
1079            va_start(ap, fmt);

(gdb) l *(0xffffffff80536841)
0xffffffff80536841 is in biointr (../../../../kern/vfs_bio.c:1654).
1649            }
1650    }
1651
1652    static void
1653    biointr(void *cookie)
1654    {
1655            struct cpu_info *ci;
1656            buf_t *bp;
1657            int s;
1658

(gdb) l *(0xffffffff8053689c)
0xffffffff8053689c is in biointr (./x86/intr.h:187).
182
183     static inline int
184     splraiseipl(ipl_cookie_t icookie)
185     {
186
187             return splraise(icookie._ipl);
188     }
189
190     #include <sys/spl.h>
191

(gdb) l *(0xffffffff804cc257)
0xffffffff804cc257 is in softint_thread (/usr/src/sys/arch/amd64/compile/XEN3_DOMU_ISCSI/xen-ma/machine/cpu.h:55).
50      __inline static struct cpu_info * __unused
51      x86_curcpu(void)
52      {
53              struct cpu_info *ci;
54
55              __asm volatile("movq %%gs:%1, %0" :
56                  "=r" (ci) :
57                  "m"
58                  (*(struct cpu_info * const *)offsetof(struct cpu_info, ci_self)));
59              return ci;




-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS
http://anduin.eldar.org  - & -  http://anduin.ipv6.eldar.org [IPv6 only]


Home | Main Index | Thread Index | Old Index