Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Instability issues with NetBSD-9, xen-4.11 and the xbdb backend driver



	Hello Jaromir.  I actually applied the patch that claims to fix that
bug to my 5.2 sources.  When it didn't fix the issue,  I began
investigating further.  I'll try dropping in an 8.1 kernel and see if that
gives different results.

-thanks
-Brian

On Nov 13,  1:03pm, =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= wrote:
} Subject: Re: Instability issues with NetBSD-9, xen-4.11 and the xbdb backe
} --00000000000014ce1d059739273f
} Content-Type: text/plain; charset="UTF-8"
} Content-Transfer-Encoding: quoted-printable
} 
} Indeed this smells the same as http://gnats.netbsd.org/53506, which was
} fixed October 2018.
} 
} Can you try using newer kernel inside the domu? The fix was pulled up to
} netbsd-8 so should be included in 8.1 release. You can just use newer 8.1
} kernel while leaving the userland intact.
} 
} Jaromir
} 
} Le mer. 13 nov. 2019 =C3=A0 10:43, Brian Buhrow <buhrow%nfbcal.org@localhost> a =C3=
} =A9crit :
} 
} >         hello.  After thinking about Jaromir's message for a while, I beg=
} an
} > looking into this issue more.  It's definitely a software issue.  As part
} > of that process, I put some instrumentation in
} > src/sys/arch/xen/xen/xbd_xenbus.c to see if I could figure out what was
} > going on.  I was inspired by the port-xen/53506 bug report.
} > After a bunch of trial and error, I've narrowed down the issue, or, at
} > least, I think I have.  The problem seems to be that we get duplicate
} > requests off of the ring between the domu and the backend from time to ti=
} me
} > in xbd_handler().  In the debug output below, I've added two parenthetica=
} l
} > numbers in the xbd_handler printf where the bp the handler is working on =
} is
} > shown.  The first  represents i in the for loop of xbd_handler() and the
} > second represents the value of resp_prod.  The bug triggers when the
} > difference between i and resp_prod is greater than 1.
} >         Given that these domu's work flawlessly on Xen-3.3.2 and on
} > FreeBSD running as dom0 on xen-4.12, I'm thinking this behavior is a
} > symptom of the problem, rather than the cause of the problem.
} >
} > Given this additional information, does anyone have an idea what might be
} > going on or what I might try next to resolve the issue?
} > I tried checking to see if the bp was the same on sequential passes throu=
} gh
} > the for loop and not calling biodone on the second pass.  That stops the
} > panic, but freezes the domu in physio.  So, I think I'm close to the
} > problem.
} >
} > -thanks
} > -Brian
} >
} >
} > <good trip through xbd_handler()>
} > xbdstrategy(0xffffa0000fa06d20): b_bcount =3D 16384
} > xbdstart(0xffffa0000fa06d20): b_bcount =3D 16384
} > xbd_handler(xbd0)
} > xbd_handler(0xffffa0000fa06d20): b_bcount =3D 16384 (376, 377)
} > xbd_handler(xbd0)
} >
} > . . .
} >
} >
} > <Bad trip through xbd_handler()>
} > xbdstrategy(0xffffa0000fa06d20): b_bcount =3D 32768
} > xbdstart(0xffffa0000fa06d20): b_bcount =3D 32768
} > xbd_handler(xbd0)
} > xbdstrategy(0xffffa0000fa06e38): b_bcount =3D 32768
} > xbdstart(0xffffa0000fa06e38): b_bcount =3D 32768
} > xbd_handler(xbd0)
} > xbd_handler(0xffffa0000fa06e38): b_bcount =3D 32768 (383, 385)
} > xbd_handler(0xffffa0000fa06e38): b_bcount =3D 32768 (384, 385)
} > panic: biodone2 already
} > fatal breakpoint trapxbd_handler(xbd0)
} >  in supervisor mode
} > trap type 1 code 0 rip ffffffff8031d08d cs e030 rflags 246 cr2
} > 7f7ffd60a087 cpl 0 rsp ffffa00055407b00
} > Stopped in pid 0.4 (system) at  netbsd:breakpoint+0x5:  leave
} > db> bt
} > breakpoint() at netbsd:breakpoint+0x5
} > panic() at netbsd:panic+0x242
} > biodone2() at netbsd:biodone2+0xd8
} > biointr() at netbsd:biointr+0x31
} > softint_thread() at netbsd:softint_thread+0x66
} > db>
} >
} 
} --00000000000014ce1d059739273f
} Content-Type: text/html; charset="UTF-8"
} Content-Transfer-Encoding: quoted-printable
} 
} <div dir=3D"ltr"><div dir=3D"ltr">Indeed this smells the same as <a href=3D=
} "http://gnats.netbsd.org/53506";>http://gnats.netbsd.org/53506</a>, which wa=
} s fixed October 2018.</div><div dir=3D"ltr"><br></div><div>Can you try usin=
} g newer kernel inside the domu? The fix was pulled up to netbsd-8 so should=
}  be included in 8.1 release. You can just use newer 8.1 kernel while leavin=
} g the userland intact.</div><div><br></div><div>Jaromir</div></div><br><div=
}  class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">Le=C2=A0mer. 1=
} 3 nov. 2019 =C3=A0=C2=A010:43, Brian Buhrow &lt;<a href=3D"mailto:buhrow@nf=
} bcal.org">buhrow%nfbcal.org@localhost</a>&gt; a =C3=A9crit=C2=A0:<br></div><blockquot=
} e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width=
} :1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-lef=
} t:1ex">=C2=A0 =C2=A0 =C2=A0 =C2=A0 hello.=C2=A0 After thinking about Jaromi=
} r&#39;s message for a while, I began<br>
} looking into this issue more.=C2=A0 It&#39;s definitely a software issue.=
} =C2=A0 As part<br>
} of that process, I put some instrumentation in<br>
} src/sys/arch/xen/xen/xbd_xenbus.c to see if I could figure out what was<br>
} going on.=C2=A0 I was inspired by the port-xen/53506 bug report.=C2=A0 <br>
} After a bunch of trial and error, I&#39;ve narrowed down the issue, or, at<=
} br>
} least, I think I have.=C2=A0 The problem seems to be that we get duplicate<=
} br>
} requests off of the ring between the domu and the backend from time to time=
} <br>
} in xbd_handler().=C2=A0 In the debug output below, I&#39;ve added two paren=
} thetical<br>
} numbers in the xbd_handler printf where the bp the handler is working on is=
} <br>
} shown.=C2=A0 The first=C2=A0 represents i in the for loop of xbd_handler() =
} and the<br>
} second represents the value of resp_prod.=C2=A0 The bug triggers when the<b=
} r>
} difference between i and resp_prod is greater than 1.<br>
} =C2=A0 =C2=A0 =C2=A0 =C2=A0 Given that these domu&#39;s work flawlessly on =
} Xen-3.3.2 and on<br>
} FreeBSD running as dom0 on xen-4.12, I&#39;m thinking this behavior is a<br=
} >
} symptom of the problem, rather than the cause of the problem.=C2=A0 <br>
} <br>
} Given this additional information, does anyone have an idea what might be<b=
} r>
} going on or what I might try next to resolve the issue?<br>
} I tried checking to see if the bp was the same on sequential passes through=
} <br>
} the for loop and not calling biodone on the second pass.=C2=A0 That stops t=
} he<br>
} panic, but freezes the domu in physio.=C2=A0 So, I think I&#39;m close to t=
} he<br>
} problem.=C2=A0 <br>
} <br>
} -thanks<br>
} -Brian<br>
} <br>
} <br>
} &lt;good trip through xbd_handler()&gt; <br>
} xbdstrategy(0xffffa0000fa06d20): b_bcount =3D 16384<br>
} xbdstart(0xffffa0000fa06d20): b_bcount =3D 16384<br>
} xbd_handler(xbd0)<br>
} xbd_handler(0xffffa0000fa06d20): b_bcount =3D 16384 (376, 377)<br>
} xbd_handler(xbd0)<br>
} <br>
} . . . <br>
} <br>
} <br>
} &lt;Bad trip through xbd_handler()&gt;<br>
} xbdstrategy(0xffffa0000fa06d20): b_bcount =3D 32768<br>
} xbdstart(0xffffa0000fa06d20): b_bcount =3D 32768<br>
} xbd_handler(xbd0)<br>
} xbdstrategy(0xffffa0000fa06e38): b_bcount =3D 32768<br>
} xbdstart(0xffffa0000fa06e38): b_bcount =3D 32768<br>
} xbd_handler(xbd0)<br>
} xbd_handler(0xffffa0000fa06e38): b_bcount =3D 32768 (383, 385)<br>
} xbd_handler(0xffffa0000fa06e38): b_bcount =3D 32768 (384, 385)<br>
} panic: biodone2 already<br>
} fatal breakpoint trapxbd_handler(xbd0)<br>
} =C2=A0in supervisor mode<br>
} trap type 1 code 0 rip ffffffff8031d08d cs e030 rflags 246 cr2=C2=A0 7f7ffd=
} 60a087 cpl 0 rsp ffffa00055407b00<br>
} Stopped in pid 0.4 (system) at=C2=A0 netbsd:breakpoint+0x5:=C2=A0 leave<br>
} db&gt; bt<br>
} breakpoint() at netbsd:breakpoint+0x5<br>
} panic() at netbsd:panic+0x242<br>
} biodone2() at netbsd:biodone2+0xd8<br>
} biointr() at netbsd:biointr+0x31<br>
} softint_thread() at netbsd:softint_thread+0x66<br>
} db&gt;<br>
} </blockquote></div>
} 
} --00000000000014ce1d059739273f--
>-- End of excerpt from =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=




Home | Main Index | Thread Index | Old Index