NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

NetBSD iSCSI target on ZVOL used as block device for Qemu - iSCSI: NOP timeout



Hello all,

here again a small (or big?) problem in connection with virtualisation ;-)

The following scenario is given: There is a NetBSD 9.3 server with ZFS, on it a ZVOL. The server makes the ZVOL available via iSCSI. There is also a NetBSD 9.3 client with Qemu/nvmm. The client boots from the ZVOL provided via iSCSI.

I use the following test configuration for this:

## Server

```
saturn$ cat /etc/iscsi/targets

extent0         /dev/zvol/rdsk/tank/backup/vhost/vol/iot        0       16GB
target0         rw      extent0         0.0.0.0/0
```

## Client

```
HOSTNAME=netbsd
CORES=1
RAM=1G

qemu-system-x86_64 -nodefaults -machine pc-i440fx-7.0 -smp $CORES -m $RAM -monitor stdio \
                               -k de -vga std -usbdevice tablet -boot c \
                               -object iothread,id=t0 \
-drive file=iscsi://192.168.2.20:3260/iqn.1994-04.org.netbsd.iscsi-target:target0/0,format=raw \ -netdev user,id=vioif0 -device virtio-net-pci,netdev=vioif0 \ -iscsi initiator-name=iqn.1994-04.org.netbsd.iscsi-target:target0,timeout=0 \
                               -accel nvmm
```

## Observation

To my delight, booting on the client works quite well at first. However, there are long pauses when loading the kernel (when the spinner is displayed on the console). The spinner stops for a few seconds and then continues to spin. At the moments when the spinner continues to spin, a message appears on the Qemu console:

```
qemu-system-x86_64: iSCSI: NOP timeout. Reconnecting...
qemu-system-x86_64: iSCSI: NOP timeout. Reconnecting...
...
```

On the server, there is no indication of the cause of the timeouts - only an output in the syslog that a reconnect has taken place with some regularity:

```
...
Sep 18 08:24:03 saturn iscsi-target: > iSCSI Normal login successful from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk 0, ISID 140969396928512, TSIH 182 Sep 18 08:24:28 saturn iscsi-target: > iSCSI Normal login successful from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk 0, ISID 141669559369728, TSIH 183 Sep 18 08:24:53 saturn iscsi-target: > iSCSI Normal login successful from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk 0, ISID 141210014318592, TSIH 184 Sep 18 08:25:18 saturn iscsi-target: > iSCSI Normal login successful from iqn.1994-04.org.netbsd.iscsi-target:target0 on 192.168.2.140 disk 0, ISID 140802084110336, TSIH 185
```

While at the time of booting these irregularities do not seem to matter much (I assume that the BIOS routines in Qemu are tolerant enough here), later on when initialising the emulated ATA controller this leads first to a downgrade from DMA to PIO4, and finally to a series of "lost interrupt", which leads to "device timeout" on wd0 and finally to a system that is caught in a never-ending retry loop.

Which I can rule out:

- Problems with the network quality - the devices involved are wired with 1GB/s LAN and have no problems in other network-heavy scenarios.

- There is no firewall

- Even during the "hangs" there is no high CPU load on the systems involved.

What else I noticed:

- During the "hangs", the iscsi-target process on the server is stuck in the "netio/0" state. When the system has recovered and data is flowing, it switches between "netio/0" and "netio/1" every second or so.

This is certainly a very special scenario and I suspect that I will have to test the whole thing without ZFS involvement (i.e. with a VND). However, if anyone has a tip or even experience with this, I would be very grateful.

Kind regards
Matthias


Home | Main Index | Thread Index | Old Index