NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/59497: Panic in ucompoll
The following reply was made to PR kern/59497; it has been noted by GNATS.
From: Paul Ripke <stix%stix.id.au@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost, stix%stix.id.au@localhost
Subject: Re: kern/59497: Panic in ucompoll
Date: Thu, 3 Jul 2025 21:25:18 +1000
On Tue, Jul 01, 2025 at 11:00:02PM +0000, Christoph Badura via gnats wrote:
> On Tue, Jul 01, 2025 at 09:20:00AM +0000, stix%stix.id.au@localhost wrote:
> > Crash appears due to intermittent disconnect/reconnect of a uplcom device while open.
>
> Are you sure this is a genuine Prolific device? I've tried to get some
> Prolific USB serial fobs at the start of the year and found that the market
> is swamped with buggy fake prolific chips. Even supposedly reputable
> manufacturers had fake chips on the fobs that claimed to be PL2303HX /
> PL2303HXD. In the end i managed to get some fobs with genuine Prolific
> chips for some USD 20 per fob. The fake ones all sold for about USD 3-4 and
> were easily identifiable by the missing part number and Prolific logo on the
> SSOP chip.
I'm really not sure - it's old, and it was cheap. I have used it for the
serial console on an old Sun SPARCserver 5, but that system now has dodgy RAM
that needs replacing.
> The real ones also don't periodically disconnect/reconnect. :-)
I should hope not :)
I was considering shopping around for a USB FTDI-based serial adapter -
but I wonder if there are also fakes of those on the market...
> Of course, using the fake chips shouldn't crash the system.
Indeed.
> Obviously you were running a process that had the corresponding ttyUX open
> when the crash happened. Otherwise it wouldn't have been triggered from
> the select(2) code. Can you please describe what command exactly you were
> running and what its command line options and other configuration settings
> were. I'd like to try to reproduce this locally.
That could be challenging. I had it hooked up to a Tandy Color Computer (coco1)
at 38400 baud, via alligator clips, and the software was drivewire.py:
https://github.com/n6il/pyDriveWire
Basically doing remote floppy disk access over the serial port.
> > crash> bt
> > __kernel_end() at 0
> > kern_reboot() at sys_reboot
> > vpanic() at vpanic+0x18d
> > panic() at vprintf
> > trap() at startlwp
> > --- trap (number 6) ---
> > ucompoll() at ucompoll+0x2a
> > cdev_poll() at cdev_poll+0x87
> > spec_poll() at spec_poll+0x6a
> > VOP_POLL() at VOP_POLL+0x5d
> > sel_do_scan() at sel_do_scan+0x3ba
> > selcommon() at selcommon+0x309
> > sys___select50() at sys___select50+0x75
> > syscall() at syscall+0x1fc
> > --- syscall (number 417) ---
> > syscall+0x1fc:
> >
> > Have core and kernel with symbols.
>
> Could you try to disassemble the ucompoll() until the offending
> instruction?
That's easy, it's a tiny function:
(gdb) x/20i ucompoll
0xffffffff804960a5 <ucompoll>: push %rbp
0xffffffff804960a6 <ucompoll+1>: mov %rsp,%rbp
0xffffffff804960a9 <ucompoll+4>: push %r13
0xffffffff804960ab <ucompoll+6>: push %r12
0xffffffff804960ad <ucompoll+8>: mov %esi,%r12d
0xffffffff804960b0 <ucompoll+11>: mov %rdx,%r13
0xffffffff804960b3 <ucompoll+14>: mov %edi,%eax
0xffffffff804960b5 <ucompoll+16>: shr $0xc,%eax
0xffffffff804960b8 <ucompoll+19>: movzbl %dil,%esi
0xffffffff804960bc <ucompoll+23>: and $0x3ff00,%eax
0xffffffff804960c1 <ucompoll+28>: or %eax,%esi
0xffffffff804960c3 <ucompoll+30>: mov $0xffffffff81896660,%rdi
0xffffffff804960ca <ucompoll+37>: call 0xffffffff80e42be0 <device_lookup_private>
0xffffffff804960cf <ucompoll+42>: mov 0xe8(%rax),%rdi <------
0xffffffff804960d6 <ucompoll+49>: mov 0x168(%rdi),%rax
0xffffffff804960dd <ucompoll+56>: mov 0x60(%rax),%rax
0xffffffff804960e1 <ucompoll+60>: mov %r13,%rdx
0xffffffff804960e4 <ucompoll+63>: mov %r12d,%esi
0xffffffff804960e7 <ucompoll+66>: pop %r12
0xffffffff804960e9 <ucompoll+68>: pop %r13
> Could you try to find out if TS_CANCEL is set in tp->t_state?
Yeah, I was actually wondering how to do that. I can't figure out for the
life of me how to switch between cpu stacks in gdb. I realize most of the
kernel debugging I've done has been on single cpu machines...
However, doesn't this imply sc is null?
(gdb) p ucom_cd
$9 = {
cd_list = {
le_next = 0xffffffff818966a0 <umidi_cd>,
le_prev = 0xffffffff81896620 <ugen_cd>
},
cd_attach = {
lh_first = 0xffffffff81815260 <ucom_ca>
},
cd_devs = 0x0,
cd_name = 0xffffffff813e59e8 "ucom",
cd_class = DV_DULL,
cd_ndevs = 0,
cd_attrs = 0x0
}
> This might be relatively easy to work around.
>
> ucycom(4) has (https://nxr.netbsd.org/xref/src/sys/dev/usb/ucycom.c#897):
>
> if (sc->sc_dying)
> return EIO;
>
> of course, it should return POLLHUP.
>
> uhso has (https://nxr.netbsd.org/xref/src/sys/dev/usb/uhso.c#1791):
>
> if (!device_is_active(sc->sc_dev))
> return POLLHUP;
>
> So apparently there is no agreement how this should be handled.
>
> Could you try adding
>
> if (sc->sc_dying)
> return POLLHUP;
>
> before line 853 in ucom.c and see if that makes the symtomps go away?
or perhaps:
if (sc == NULL)
return POLLHUP;
?
> But maybe the right fix would be to make ttycancel() deal with any pending
> select()s too? Or something similar that ties in with the d_cancel
> framework?
Yeah, I haven't studied the code that much as yet.
--
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Home |
Main Index |
Thread Index |
Old Index