NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
NetBSD iSCSI target on ZVOL used as block device for Qemu - iSCSI: NOP timeout
Hello all,
here again a small (or big?) problem in connection with virtualisation ;-)
The following scenario is given: There is a NetBSD 9.3 server with ZFS,
on it a ZVOL. The server makes the ZVOL available via iSCSI. There is
also a NetBSD 9.3 client with Qemu/nvmm. The client boots from the ZVOL
provided via iSCSI.
I use the following test configuration for this:
## Server
```
saturn$ cat /etc/iscsi/targets
extent0 /dev/zvol/rdsk/tank/backup/vhost/vol/iot 0 16GB
target0 rw extent0 0.0.0.0/0
```
## Client
```
HOSTNAME=netbsd
CORES=1
RAM=1G
qemu-system-x86_64 -nodefaults -machine pc-i440fx-7.0 -smp $CORES -m
$RAM -monitor stdio \
-k de -vga std -usbdevice tablet -boot c \
-object iothread,id=t0 \
-drive
file=iscsi://192.168.2.20:3260/iqn.1994-04.org.netbsd.iscsi-target:target0/0,format=raw
\
-netdev user,id=vioif0 -device
virtio-net-pci,netdev=vioif0 \
-iscsi
initiator-name=iqn.1994-04.org.netbsd.iscsi-target:target0,timeout=0 \
-accel nvmm
```
## Observation
To my delight, booting on the client works quite well at first. However,
there are long pauses when loading the kernel (when the spinner is
displayed on the console). The spinner stops for a few seconds and then
continues to spin. At the moments when the spinner continues to spin, a
message appears on the Qemu console:
```
qemu-system-x86_64: iSCSI: NOP timeout. Reconnecting...
qemu-system-x86_64: iSCSI: NOP timeout. Reconnecting...
...
```
On the server, there is no indication of the cause of the timeouts -
only an output in the syslog that a reconnect has taken place with some
regularity:
```
...
Sep 18 08:24:03 saturn iscsi-target: > iSCSI Normal login successful
from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk
0, ISID 140969396928512, TSIH 182
Sep 18 08:24:28 saturn iscsi-target: > iSCSI Normal login successful
from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk
0, ISID 141669559369728, TSIH 183
Sep 18 08:24:53 saturn iscsi-target: > iSCSI Normal login successful
from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk
0, ISID 141210014318592, TSIH 184
Sep 18 08:25:18 saturn iscsi-target: > iSCSI Normal login successful
from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk
0, ISID 140802084110336, TSIH 185
```
While at the time of booting these irregularities do not seem to matter
much (I assume that the BIOS routines in Qemu are tolerant enough here),
later on when initialising the emulated ATA controller this leads first
to a downgrade from DMA to PIO4, and finally to a series of "lost
interrupt", which leads to "device timeout" on wd0 and finally to a
system that is caught in a never-ending retry loop.
Which I can rule out:
- Problems with the network quality - the devices involved are wired
with 1GB/s LAN and have no problems in other network-heavy scenarios.
- There is no firewall
- Even during the "hangs" there is no high CPU load on the systems involved.
What else I noticed:
- During the "hangs", the iscsi-target process on the server is stuck in
the "netio/0" state. When the system has recovered and data is flowing,
it switches between "netio/0" and "netio/1" every second or so.
This is certainly a very special scenario and I suspect that I will have
to test the whole thing without ZFS involvement (i.e. with a VND).
However, if anyone has a tip or even experience with this, I would be
very grateful.
Kind regards
Matthias
Home |
Main Index |
Thread Index |
Old Index