Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Instability issues with NetBSD-9, xen-4.11 and the xbdb backend driver



	hello.  After even more instrumentation and examination of the output,
I now think I know what's going wrong.

	As part of the process  of receiving requests from the domu, dom0
calls xbdback_map_shm() which, in turn calls, xen_shm_map(), which is in
sys/arch/xen/x86/xen_shm_machdep.c.  In this file, the allocation of memory
succeeds, but after the shared grant is requested, the status field of the
ops field in one of the slots of the array is  not 0.  In fact, it is -1.  This
-1 is returned as the error to the xen_shm_map call, which is then returned
to the domu as an error.  Unfortunately, nobody deals with this error
gracefully, not NetBSD-5.2, NetBSD-8.1, and, if inspection is to be
believed, not -current either.  

	this may be a zfs issue, since it only seems to happen when new files
are being created and written to on the domu.
Below is the snippet of code that generates the error, as well as a snippet
of log that shows the error in play.  the debugging output from the
xen_shm_map() call is information I added to help me understand what's
going on.  The two lines there are somewhat redundant, but the second line
shows when the error fires and the first line always prints, so I could see
what happens when things are working.

	Further reading shows the -1 error is returned from the Hypervisor,
defined as GNTST_general_error.  If I run the xen debug version, I see: 

(XEN) grant_table.c:591:d0v0 Bad flags (0) or dom (0). (expected dom 0)
when the error hits.  

	I think the bug strikes when there is a segment break.  The xen error
message suggests to me that perhaps the grant reference array isn't
properly completed when there is a segment break.  It doesn't break when
ever there is a segment break, but I think I'll look at that tomorrow.

Again, any thoughts would be greatly appreciated.
I think this is close to working, but it's not quite there yet.
-thanks
-Brian


<sys/arch/xen/x86/xen_shm_machdep.c excerpt>

	for (i = 0; i < nentries; i++) {
		op[i].host_addr = new_va + i * PAGE_SIZE;
		op[i].dom = domid;
		op[i].ref = grefp[i];
		op[i].flags = GNTMAP_host_map |
		    ((flags & XSHM_RO) ? GNTMAP_readonly : 0);
	}

	ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nentries);
	if (__predict_false(ret)) {
		panic("xen_shm_map: HYPERVISOR_grant_table_op failed");
	}

	for (i = 0; i < nentries; i++) {
		if (__predict_false(op[i].status))
			return op[i].status; //This is the line that gets the error
		handlep[i] = op[i].handle;
	}



<log segment>

Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: iodone ptr 0xffffaf8010ded038
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_unmap_shm handle 230 231 232 233 234 235 237 238 239 240 241 242 243 244 245 246 
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: end request 5 error=0
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_send_reply notify 5
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: end request 21 error=0
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_send_reply notify 5
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: end request 27 error=0
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_send_reply notify 5
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: end request 22 error=0
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_send_reply notify 5
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: first,last_sect[0]=00,07
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: appending grant 121
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: start sect 67117280 size 8
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: first,last_sect[1]=00,07
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: appending grant 124
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: start sect 67117288 size 8
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: first,last_sect[2]=00,07
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: appending grant 125
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: start sect 67117296 size 8
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: first,last_sect[3]=00,07
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: appending grant 448
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: start sect 67117304 size 8
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback op 1 req_cons 0xef37 req_prod 0xef38 resp_prod 0xef36 id 20
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: hoping for sector 67117312; got 67117376
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: segment break
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: flush sect 67117280 size 16384 ptr 0xffffaf8010ded038
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_map_shm map grant 121 124 125 448 xen_shm_map: nentries: 4
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xen_shm_map: op[0].status = -1 (4)
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_map_shm: xen_shm error -1 xbdback_io domain 5: iodone ptr 0xffffaf8010ded038
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbd IO domain 5: error -1
Nov 13 18:02:06 xen-hardconnect /netbsd: [ 2971.8000777] xbdback_io domain 5: end request 21 error=-1


Home | Main Index | Thread Index | Old Index