NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/53074: Daily 8.99.12 DOMU panic



The following reply was made to PR port-xen/53074; it has been noted by GNATS.

From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-xen-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost
Subject: Re: port-xen/53074: Daily 8.99.12 DOMU panic
Date: Wed, 07 Mar 2018 20:43:56 -0500

 coypu%sdf.org@localhost writes:
 
 > The following reply was made to PR port-xen/53074; it has been noted by GNATS.
 >
 > From: coypu%sdf.org@localhost
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc: 
 > Subject: Re: port-xen/53074: Daily 8.99.12 DOMU panic
 > Date: Tue, 6 Mar 2018 01:49:26 +0000
 >
 >  You can parse the backtrace like so (normally, it would report the
 >  functions, but that is a bug).
 >  
 >  gdb /my-netbsd-kernel
 >  info symbol 0xffffffff804f74b8
 >  
 >  and so on.
 >  
 
 [snip]
 
 I noticed something odd about the daily panic I am having.  It always
 happened around 9:15 in the morning, not every 24 hours from a reboot.
 At that time, the DOMU was performing a backup using a line much like
 this:
 
 /sbin/dump -0u -a -f - $fs | ssh backup%server.eldar.org@localhost "/usr/pkg/bin/buffer | /usr/bin/bzip2 -9c > some_backup.bz2"
 
 The dump would proceed for a while and the at some point panic 100% of
 the time.  The latest kernel that I compiled up for debugging symbols
 produced a slightly different panic then was reported initially in the
 PR. I have attached the more recent panic to the end of this email along
 with the decoding of the symbols.  The original panic that was reported
 does not happen with the new kernel.
 
 I suspected some sort of file system damage, but that was ruled out.
 Multiple fsck are clean, and even coping the data using pax to a new
 volume produced the same panic on the new volume.  The file system is a
 FFSv2, fslevel 5.  WAPBL was enabled on the file system, but that was
 also removed, although the panic was slightly different when WAPBL was
 present.  The volumes are presented to the DOMU as raw lvm devices.
 
 This DOMU was updated from a 6.99 era build to 8.99.12 recently and
 performed this sort of backup every day in the past.  The DOMU is has a
 DOM0 which is a NetBSD/amd64 Xen 4.5.1 DOM0 running 7.1_STABLE, also
 recent, and is given a sched_credit weight of 32 and CPU cap of 10.
 
 I can help if given some guidance as to what may be needed.
 
 
 
 panic: biodone2 already
 cpu0: Begin traceback...
 ?() at ffffffff804f74b8
 ?() at ffffffff804f7575
 ?() at ffffffff80536841
 ?() at ffffffff8053689c
 ?() at ffffffff804cc257
 cpu0: End traceback...
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip 0xffffffff802057a5 cs 0xe030 rflags 0x202 cr2 0xffffa0002b9fe000 ilevel 0 rsp 0xffffa0002b353dc0
 curlwp 0xffffa00000741000 pid 0.4 lowest kstack 0xffffa0002b3502c0
 Stopped in pid 0.4 (system) at  ffffffff802057a5:       leave
 ds          ffff
 es          0
 fs          3dd0
 gs          3d70
 rdi         0
 rsi         a
 rbp         ffffa0002b353dc0
 rbx         104
 rdx         1
 rcx         0
 rax         0
 r8          ffffffff807373c0
 r9          0
 r10         75
 r11         e02b
 r12         ffffffff8068dbcc
 r13         ffffa0002b353e08
 r14         ffffffff8068da62
 r15         ffffa00000741000
 rip         ffffffff802057a5
 cs          e030
 rflags      202
 rsp         ffffa0002b353dc0
 ss          e02b
 ffffffff802057a5:       leave
 
 (gdb) l *(0xffffffff804f74b8)
 0xffffffff804f74b8 is in vpanic (../../../../kern/subr_prf.c:342).
 337                     kdbpanic();
 338     #endif
 339     #ifdef DDB
 340             db_panic();
 341     #endif
 342             cpu_reboot(bootopt, NULL);
 343     }
 344
 345     /*
 346      * kernel logging functions: log, logpri, addlog
 
 (gdb) l *(0xffffffff804f7575)
 0xffffffff804f7575 is in snprintf (../../../../kern/subr_prf.c:1075).
 1070    /*
 1071     * snprintf: print a message to a buffer
 1072     */
 1073    int
 1074    snprintf(char *bf, size_t size, const char *fmt, ...)
 1075    {
 1076            int retval;
 1077            va_list ap;
 1078
 1079            va_start(ap, fmt);
 
 (gdb) l *(0xffffffff80536841)
 0xffffffff80536841 is in biointr (../../../../kern/vfs_bio.c:1654).
 1649            }
 1650    }
 1651
 1652    static void
 1653    biointr(void *cookie)
 1654    {
 1655            struct cpu_info *ci;
 1656            buf_t *bp;
 1657            int s;
 1658
 
 (gdb) l *(0xffffffff8053689c)
 0xffffffff8053689c is in biointr (./x86/intr.h:187).
 182
 183     static inline int
 184     splraiseipl(ipl_cookie_t icookie)
 185     {
 186
 187             return splraise(icookie._ipl);
 188     }
 189
 190     #include <sys/spl.h>
 191
 
 (gdb) l *(0xffffffff804cc257)
 0xffffffff804cc257 is in softint_thread (/usr/src/sys/arch/amd64/compile/XEN3_DOMU_ISCSI/xen-ma/machine/cpu.h:55).
 50      __inline static struct cpu_info * __unused
 51      x86_curcpu(void)
 52      {
 53              struct cpu_info *ci;
 54
 55              __asm volatile("movq %%gs:%1, %0" :
 56                  "=r" (ci) :
 57                  "m"
 58                  (*(struct cpu_info * const *)offsetof(struct cpu_info, ci_self)));
 59              return ci;
 
 
 
 
 -- 
 Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS
 http://anduin.eldar.org  - & -  http://anduin.ipv6.eldar.org [IPv6 only]
 


Home | Main Index | Thread Index | Old Index