Jason Mitchell <jmitchel%bigjar.com@localhost> writes:
There something that causes frequent crashes when using an FTDI USB to
serial converter with a Raspberry Pi 3B (and maybe other evbarm
devices). The easiest way to reproduce this bug is to:
1) Insert an FTDI cable
2) Run minimum and connect to the ucom port associated with the FTDI adapter
3) Remove the FTDI cable
I could NOT make this happen with an aarch64 machine (Libre LePotato),
but I did not extensively test.
I have seen crashes on NetBSD-8 when disconnecting at least one kind of
USB/serial adaptor while it was opened by a program. So I am not at all
sure that this is an arm problem, but I am of course not sure that it isn't.
The console output is below. I caused the crash twice to make sure it
was reproducible.
Could someone let me know what the next steps are to troubleshoot this?
Speculating from experience, what happens as devices are removed is
various bits of state are deallocated, and this can result in dangling
pointers from other state if it is not done exactly right. This is very
tricky to get right. It is necessary to find out what went wrong and
then it's usually fairly easy to fix.
Sometimes devices set a variable to indicate that they are being torn
down and all other code is supposed to check that and return an error
rather than access anything else. I would suggest reading the ucom and
uftdi driver code if you are up for it.
You are dropping into ddb. Rather than c for continue, which you have
established doesn't work :-), do bt for backtrace. See ddb(4) for more
instructions. What you are trying to do is find out the very first
instruction that faulted, and then find what source line that
corresponds to.
So do this again, run bt, and see what the address is of the last frame
before the trap. Or just post the backtrace. Also explain what kernel
version you are running and where you got it (downloaded from X, built
yourself, data of sources, branch, etc.).
I am somewhat hesitant to believe the traceback after continue, but it
seems that continue after a fault prints the backtrace and reboots. It
looks like ucomopen+0x58. I wonder if your terminal program is going
close/open when it gets an error, and it is catching the device in a
half-closed state.
I would try kermit or cu or something different and see if you can
provoke the crash there. Or disable any auto close/open behavior if you
can figure out that and see if that makes it not crash. (I don't mean
that there isn't a bug; just the more narrowly we can characterize it,
the easier it is to find.)
[ 185922.2110825] 0x812afc24: netbsd:address_exception_entry+0x5c
[ 185922.2210826] 0x812afc94: netbsd:ucomopen+0x58
[ 185922.2210826] 0x812afd0c: netbsd:spec_open+0x20c
[ 185922.2310842] 0x812afd34: netbsd:VOP_OPEN+0x44
[ 185922.2310842] 0x812afe0c: netbsd:vn_open+0x200
[ 185922.2410844] 0x812afe8c: netbsd:do_open+0xac
[ 185922.2410844] 0x812afec4: netbsd:do_sys_openat+0xa4
[ 185922.2510843] 0x812afeec: netbsd:sys_open+0x38
[ 185922.2510843] 0x812affac: netbsd:syscall+0x12c
So this looks like your minicom program did the open syscall which got
passed down to ucomopen, which faulted at the instruction at 0x58.
This needs to be associated with a C line, which needs your specific
kernel information and maybe debug info from it.
Also, if you have a different chipset serial adaptor, it would be useful
to see if that faults, or not. (Or if anyone else on the list has FTDI
and something else.)