USB, NetBSD 7/amd64: crashes

To: tech-kern%netbsd.org@localhost
Subject: USB, NetBSD 7/amd64: crashes
From: tlaronde%polynum.com@localhost
Date: Tue, 7 Jul 2015 08:40:11 +0200

On Thu, Jul 02, 2015 at 10:18:23AM +0100, Nick Hudson wrote:
> On 07/02/15 10:07, tlaronde%polynum.com@localhost wrote:
> >Hello,
> >
> >On an NetBSD 6.1.5/amd64, when I connect a second USB connected disk to
> >the machine, NetBSD freezes. Unable to connect remotely; hard reboot
> >required.
> >
> 
> Can you try netbsd-7 or better still -current?

I have tried a netbsd-7 kernel and it crashes as well and the problem is
still with locking.

Here is the bt:

mutex_oncpu.part.0() at netbsd:mutex_oncpu.part.0+0x8
mutex_vector_enter() at netbsd:mutex_vector_enter+0x93
sdopen() at netbsd:sdopen+0x87
cdev_open() at netbsd:cdev_open+0xb2
spec_open() at netbsd:spec_open+0x250
VOP_OPEN() at netbsd:VOP_OPEN+0x33
vn_open() at netbsd:vn_open+0x1ea
do_open() at netbsd:do_open+0x112
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9c
---syscall (number 5)---

There is no problem if the two disks are connected when booting (How can
concurrency been achieved when the numbering of devices depends on the
number of devices connected? How can two concurrent devices be named
when they have the same "rights" to claim the very same name---sd0 for
example? If the not problematic obviously sequential enumeration when
both connected does not lead to problem, how can a dynamic concurrent
attachment be managed if one needs to remember how many are already
connected, since the number depends on that, while the already connected
may be concurrently detached---not the case here? Would it not be 
simpler to affect a USB port fixed name? No pun intended: I'm just
trying to understand how it works).

Desaster occurs when one disk is added concurrently to another one.
FWIW, when rebooting after the crash, the two disks being then
connected, the "second" one (the added one) is detected as sd0 while the
first one is then sd1 (for the case where the variable enumeration had
something to do with the resulting havoc).

For reference, on 6.1.5 this was the same:

---8<---
umass0: at uhub3 port 1 (addr 3) disconnected
umass0 at uhub3 port 1 configuration 1 interface 0
umass0: Western Digital Elements 10A2, rev 2.10/10.42, addr 3
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <WD, Elements 10A2, 1042> disk fixed
sd0: fabricating a geometry
sd0: 931 GB, 953837 cyl, 64 head, 32 sec, 512 bytes/sect x 1953458176 sectors
sd0: fabricating a geometry
sd0: GPT GUID: 960d762c-1cf3-11e5-b5f3-448a5b9b9f0f
dk0 at sd0: Basic data partition
dk0: 1953454080 blocks at 2048, type: 
umass1 at uhub2 port 6 configuration 1 interface 0
umass1: Western Digital Elements 10A8, rev 2.10/10.42, addr 3
umass1: using SCSI over Bulk-Only
scsibus1 at umass1: 2 targets, 1 lun per target
sd1 at scsibus1 target 0 lun 0: <WD, Elements 10A8, 1042> disk fixed
sd1(umass1:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
    SENSE KEY:  Not Ready
     ASC/ASCQ:  Logical Unit Is in Process Of Becoming Ready

sd1: drive offline
sd1: fabricating a geometry
sd1: GPT GUID: f3d6ceb3-2183-11e5-8a35-448a5b9b9f0f
sd1: detached
uvm_fault(0xffffffff80771320, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff80238c1f cs 8 rflags 10287 cr2  8 cpl 0 rsp fffffe81111976b0
panic: trap
cpu1: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
dkwedge_add() at netbsd:dkwedge_add+0x1d1
dkwedge_discover_gpt() at netbsd:dkwedge_discover_gpt+0x492
dkwedge_discover() at netbsd:dkwedge_discover+0x128
sdattach() at netbsd:sdattach+0x1cb
config_attach_loc() at netbsd:config_attach_loc+0x1bb
scsi_probe_bus() at netbsd:scsi_probe_bus+0x537
scsibus_config() at netbsd:scsibus_config+0x74
scsipi_completion_thread() at netbsd:scsipi_completion_thread+0x23
cpu1: End traceback...
--->8---

Dropping in ddb on panic, more precisely there is:

Stopped in pid 1.57 (system) at netbsd:mutex_vector_enter+0x80: movq 18(%r15),%rax

This has nothing to do with MBR or GPT since I have tested with both. It
is systematic whenever one disk is first connected and then a second is
added.

Once rebooted, the two disks being connected, they are both correctly
accessible.

Note: FWIW, the first (and sole) disk is sd0. When rebooting, the
device nodes are reversed, the second one being sd0 and the first
one being sd1.

Question: is there some way to named partitions independantly from
hardware random enumeration (via wedges names? But this would imply
keeping persistently the name, so I guess in the GPT? Is there such 
a thing?)

-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                     http://www.arts-po.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                     http://www.arts-po.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

Follow-Ups:
- Re: USB, NetBSD 7/amd64: crashes
  - From: Manuel Bouyer

References:
- USB, NetBSD 6.1.5/amd64: freezes when 2 umass connected
  - From: tlaronde
- Re: USB, NetBSD 6.1.5/amd64: freezes when 2 umass connected
  - From: Nick Hudson

Prev by Date: Re: mount_checkdirs
Next by Date: Re: USB, NetBSD 7/amd64: crashes
Previous by Thread: Re: USB, NetBSD 6.1.5/amd64: freezes when 2 umass connected
Next by Thread: Re: USB, NetBSD 7/amd64: crashes
Indexes:

Home | Main Index | Thread Index | Old Index