Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dom0 PAE panic when starting xend



Jean-Yves Migeon wrote:
> Jean-Yves Migeon wrote:
>> Christoph Egger wrote:
>>> jym: Does your save/restore/migration work have some fixes related to
>>> machine-to-phys / phys-to-machine tables ?
>>> If yes, can you commit them, please ?
>>
>> I'm actively investigating this matter, as I have similar problems
>> with the page fault handler during a migration with an updated
>> current, on a call to xc_map_foreign_batch().
>>
>> During the live save, vaddr 0xbba9c000 <> 0xbba9d000 access trigger a
>> page fault in dom0, and it keeps calling privpgop_fault in a loop,
>> which leads to a hang.
>>
>> These faults happen when dom0 maps the p2m translation tables. I am
>> looking at it.
> 
> Alright, little update, as this thing is a real pain to track down.
> 
> For what I gathered so far, the p2m/m2p tables are handled correctly by
> NetBSD (manually auditing the content of the tables does not reveal any
> bogus entry).
> 
> However, I discovered today that the privcmd routines seem to hande
> their associated ioctls (the IOCTL_PRIVCMD_MMAP{BATCH} commands)
> incorrectly. I frequently get "off by one" errors (that is, the correct
> expected data is found in index 1 of an array instead of 0, for example).
> 
> The first element of the array contains a poison (see christoph's mail),
> and the fault routine manipulates incorrect values, which results either
> in an endless loop, or a crash during a mmu_update.
> 
> In my case, it happens with the mfn array during xc_map_foreign_batch.
> The array[0] value contains 4101 (== 0x1005, the "poison entry"), and
> array[1] contains the correct mfn to map.
> 
> I am now looking at the inside stuff between privcmd and uvm. From my
> PoV, the bug lies somewhere in there (alignement issue, like an improper
> cast, I don't know specifically yet), but IMHO, it is not Xen's direct
> fault.
> 
> Somewhere between uvm_map and privpgop_fault, the mfns are not passed
> down correctly.
> 
> Stay tuned.

Does it matter if you use PAE or non-PAE ? on amd64, I can't reproduce
it the way I described in my earlier mail.

Christoph


Home | Main Index | Thread Index | Old Index