NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: double fault when using fssconfig + repetitive savecore problem



Juergen Hannken-Illjes wrote:
On Sun, Feb 20, 2011 at 09:48:57PM +0100, theo borm wrote:
[snip]
The error message scrolling by during the kernel panic was:

# fssconfig -c -x fss0 / /tmp/backing
uvm_fault(0xffffffff80c911c0, 0xffff800001a13999, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803bc40e cs 8 rflags 10206 cr2

What do you get from

        gdb netbsd.gdb
        list *0xffffffff803bc40e


0xffffffff803bc40e is in ffs_copyonwrite 
(../../../../ufs/ffs/ffs_snapshot.c:1787).
1782            snapblklist = si->si_snapblklist;
1783            upper = (snapblklist != NULL ? snapblklist[0] -1 : 0);
1784            lower = 1;
1785            while (lower <= upper) {
1786                    mid = (lower + upper)/2;
1787                    if (snapblklist[mid] == lbn)
1788                            break;
1789                    if (snapblklist[mid] < lbn)
1790                            lower = mid + 1;
1791                    else
(gdb)

ok.... it's starting to make some sense again.

The way I read the section of code, snapblklist[0] is supposed to contain the 
number of items (lbn's) in the snapblklist while snapblklist[1..upper] contain 
the sorted lbn's. While doing a binary search on this sorted list a page fault 
occurs....

I added a print statement just after line 1784, and when the fault occurs, 
upper equals 5523455. IF (I haven't delved deeper, so I can't tell if this 
assumption is correct) the lbn's are related to the filesystem fsize (2048) or 
bsize (16384), then this number seems nonsensical: The filesystem is only 
8257014K (4128507 blocks) large, so it doesn't really make sense to use a 
snapblklist longer than that. Besides, at 64 bits per daddr_t, snapblklist 
would take ~ 45MB, and I don't see that happening.

But why should all of this be KVM specific, and why should the handling of this 
error cause further page faults?

My feeling is that this is not the root cause, and that use of fss only 
triggers another problem. Could this be related to differences in memory layout 
between a KVM and real hardware?

regards, Theo



Home | Main Index | Thread Index | Old Index