NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/59497: Panic in ucompoll



The following reply was made to PR kern/59497; it has been noted by GNATS.

From: Paul Ripke <stix%stix.id.au@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
	netbsd-bugs%netbsd.org@localhost, stix%stix.id.au@localhost
Subject: Re: kern/59497: Panic in ucompoll
Date: Thu, 3 Jul 2025 21:25:18 +1000

 On Tue, Jul 01, 2025 at 11:00:02PM +0000, Christoph Badura via gnats wrote:
 >  On Tue, Jul 01, 2025 at 09:20:00AM +0000, stix%stix.id.au@localhost wrote:
 >  > Crash appears due to intermittent disconnect/reconnect of a uplcom device while open.
 >  
 >  Are you sure this is a genuine Prolific device?  I've tried to get some
 >  Prolific USB serial fobs at the start of the year and found that the market
 >  is swamped with buggy fake prolific chips.  Even supposedly reputable
 >  manufacturers had fake chips on the fobs that claimed to be PL2303HX /
 >  PL2303HXD.  In the end i managed to get some fobs with genuine Prolific
 >  chips for some USD 20 per fob.  The fake ones all sold for about USD 3-4 and
 >  were easily identifiable by the missing part number and Prolific logo on the
 >  SSOP chip.
 
 I'm really not sure - it's old, and it was cheap. I have used it for the
 serial console on an old Sun SPARCserver 5, but that system now has dodgy RAM
 that needs replacing.
 
 >  The real ones also don't periodically disconnect/reconnect. :-)
 
 I should hope not :)
 I was considering shopping around for a USB FTDI-based serial adapter -
 but I wonder if there are also fakes of those on the market...
 
 >  Of course, using the fake chips shouldn't crash the system.
 
 Indeed.
 
 >  Obviously you were running a process that had the corresponding ttyUX open
 >  when the crash happened.  Otherwise it wouldn't have been triggered from
 >  the select(2) code.  Can you please describe what command exactly you were
 >  running and what its command line options and other configuration settings
 >  were.  I'd like to try to reproduce this locally.
 
 That could be challenging. I had it hooked up to a Tandy Color Computer (coco1)
 at 38400 baud, via alligator clips, and the software was drivewire.py:
 
 https://github.com/n6il/pyDriveWire
 
 Basically doing remote floppy disk access over the serial port.
 
 >  > crash> bt
 >  > __kernel_end() at 0
 >  > kern_reboot() at sys_reboot
 >  > vpanic() at vpanic+0x18d
 >  > panic() at vprintf
 >  > trap() at startlwp
 >  > --- trap (number 6) ---
 >  > ucompoll() at ucompoll+0x2a
 >  > cdev_poll() at cdev_poll+0x87
 >  > spec_poll() at spec_poll+0x6a
 >  > VOP_POLL() at VOP_POLL+0x5d
 >  > sel_do_scan() at sel_do_scan+0x3ba
 >  > selcommon() at selcommon+0x309
 >  > sys___select50() at sys___select50+0x75
 >  > syscall() at syscall+0x1fc
 >  > --- syscall (number 417) ---
 >  > syscall+0x1fc:
 >  > 
 >  > Have core and kernel with symbols.
 >  
 >  Could you try to disassemble the ucompoll() until the offending
 >  instruction?
 
 That's easy, it's a tiny function:
 
 (gdb) x/20i ucompoll
    0xffffffff804960a5 <ucompoll>:       push   %rbp
    0xffffffff804960a6 <ucompoll+1>:     mov    %rsp,%rbp
    0xffffffff804960a9 <ucompoll+4>:     push   %r13
    0xffffffff804960ab <ucompoll+6>:     push   %r12
    0xffffffff804960ad <ucompoll+8>:     mov    %esi,%r12d
    0xffffffff804960b0 <ucompoll+11>:    mov    %rdx,%r13
    0xffffffff804960b3 <ucompoll+14>:    mov    %edi,%eax
    0xffffffff804960b5 <ucompoll+16>:    shr    $0xc,%eax
    0xffffffff804960b8 <ucompoll+19>:    movzbl %dil,%esi
    0xffffffff804960bc <ucompoll+23>:    and    $0x3ff00,%eax
    0xffffffff804960c1 <ucompoll+28>:    or     %eax,%esi
    0xffffffff804960c3 <ucompoll+30>:    mov    $0xffffffff81896660,%rdi
    0xffffffff804960ca <ucompoll+37>:    call   0xffffffff80e42be0 <device_lookup_private>
    0xffffffff804960cf <ucompoll+42>:    mov    0xe8(%rax),%rdi		<------
    0xffffffff804960d6 <ucompoll+49>:    mov    0x168(%rdi),%rax
    0xffffffff804960dd <ucompoll+56>:    mov    0x60(%rax),%rax
    0xffffffff804960e1 <ucompoll+60>:    mov    %r13,%rdx
    0xffffffff804960e4 <ucompoll+63>:    mov    %r12d,%esi
    0xffffffff804960e7 <ucompoll+66>:    pop    %r12
    0xffffffff804960e9 <ucompoll+68>:    pop    %r13
 
 >  Could you try to find out if TS_CANCEL is set in tp->t_state?
 
 Yeah, I was actually wondering how to do that. I can't figure out for the
 life of me how to switch between cpu stacks in gdb. I realize most of the
 kernel debugging I've done has been on single cpu machines...
 
 However, doesn't this imply sc is null?
 
 (gdb) p ucom_cd
 $9 = {
   cd_list = {
     le_next = 0xffffffff818966a0 <umidi_cd>,
     le_prev = 0xffffffff81896620 <ugen_cd>
   },
   cd_attach = {
     lh_first = 0xffffffff81815260 <ucom_ca>
   },
   cd_devs = 0x0,
   cd_name = 0xffffffff813e59e8 "ucom",
   cd_class = DV_DULL,
   cd_ndevs = 0,
   cd_attrs = 0x0
 }
 
 >  This might be relatively easy to work around.
 >  
 >  ucycom(4) has (https://nxr.netbsd.org/xref/src/sys/dev/usb/ucycom.c#897):
 >  
 >  	if (sc->sc_dying)
 >  		return EIO;
 >  
 >  of course, it should return POLLHUP.
 >  
 >  uhso has (https://nxr.netbsd.org/xref/src/sys/dev/usb/uhso.c#1791):
 >  
 >  	if (!device_is_active(sc->sc_dev))
 >  		return POLLHUP;
 >  
 >  So apparently there is no agreement how this should be handled.
 >  
 >  Could you try adding
 >  
 >  	if (sc->sc_dying)
 >  		return POLLHUP;
 >  
 >  before line 853 in ucom.c and see if that makes the symtomps go away?
 
 or perhaps:
 
   if (sc == NULL)
     return POLLHUP;
 
 ?
 
 >  But maybe the right fix would be to make ttycancel() deal with any pending
 >  select()s too?  Or something similar that ties in with the d_cancel
 >  framework?
 
 Yeah, I haven't studied the code that much as yet.
 
 -- 
 Paul Ripke
 "Great minds discuss ideas, average minds discuss events, small minds
  discuss people."
 -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
 


Home | Main Index | Thread Index | Old Index