Re: Some alpha problems in current

To: port-alpha%NetBSD.org@localhost
Subject: Re: Some alpha problems in current
From: "Michael L. Hitch" <mhitch%NetBSD.org@localhost>
Date: Sun, 10 Feb 2008 21:36:04 -0700 (MST)

On Fri, 8 Feb 2008, Michael L. Hitch wrote:

But that is exactly what is happening. I have a dump file that I think Iwas able to locate the jobs variable on the stack, and it points to apmap_tlb_shootdown_job that links to itself:
*** stack from for pmap_do_tlbshootdown:

(gdb) x/10gx 0xfffffe000e501e58
0xfffffe000e501e58:     0xfffffc0000840468      0xfffffe0000084c70
                       ^ RA from call to pmap_do_tlb_shootdown
0xfffffe000e501e68:     0xfffffc0000b31ea8      0x0000000000000003
0xfffffe000e501e78:     0xfffffc0000b3b2f8      0xfffffc0000b6e3c8
0xfffffe000e501e88:     0xfffffc006f92e2c0      0xfffffc006f92e2c0
                       ^  jobs TAILQ_HEAD
0xfffffe000e501e98:     0xfffffc000083fd84      0xfffffc0000b31ea8

*** pmap_tlb_shootdown_job pointed to by jobs:

(gdb) x/x 0xfffffc006f92e2c0
0xfffffc006f92e2c0:     0xfffffc006f92e2c0
(gdb) print (struct pmap_tlb_shootdown_job)* 0xfffffc006f92e2c0
$4 = {pj_list = {tqe_next = 0xfffffc006f92e2c0, tqe_prev =0xfffffe000e501e88},
                           ^^^^^^^^^^^^^^^^^^
                           EEEK!!!!!
 pj_va = 18446741874823061504, pj_pmap = 0xfffffc0000ba68a8, pj_pte = 16}
Another oddity - the pmap_tlb_shootdown_q entry for CPU 0 shows a differentcount:
(gdb) print pmap_tlb_shootdown_q[0]
$5 = {pq_head = {tqh_first = 0x0, tqh_last = 0xfffffc0000b77480}, pq_lock ={
   mtx_pad1 = 1025, mtx_pad2 = 1}, pq_pte = 16, pq_count = 2, pq_tbia =  0,
 pq_pad = '\0' <repeats 23 times>}

 The pq_count indicates there should be 2 entries in the job queue.
Somewhere something is corrupting the job queue, but I haven't been able tospot it. All the accesses look like they should be properly protectedvia the pq_lock mutex. I guess the next step will be to put checks in toverify the proper queue entries and see if I can find where it's gettingcorrupted.


  What I have found so far:

The pool_cache_get() in pmap_tlb_shootdown() is returning a pool entrywhich is already on the job queue, which results in that entry gettinglinked to itself. Adding checks in pool_cache_get_paddr() andpool_cache_put_paddr() caught pool_cache_get_paddr() with two consecutivepool entries in the cache with the same address. The corresponding checkin pool_cache_put_paddr() didn't see the duplicate entry being put back inthe cache, so I don't know where it came from.

I have now started looking at a different approach to this. Since thelength of the job queue is now limited to 6 entries (after which theshootdown just invalidates all the tlb entries), I thought I'd try justallocating the job queue as a static array in each pmap_tlb_shootdown_qentry and not even try using the pool_cache. Initially, I ran intoproblems with the kernel_lock (big lock) spinning out and crashing whiletrying to rebuild the parity on my raidframe disk. After letting theparity rewrite complete while in single-user mode, I when multi-user andmy system has been running for 7 hours now. I've done the operations thatwould usually induce the duplicate job queue entry fairly quickly severaltimes, and have not experienced any problems so far (although that's nosaying much).


--
Michael L. Hitch                        mhitch%NetBSD.org@localhost

References:
- Re: Some alpha problems in current
  - From: Anders Hjalmarsson
- Re: Some alpha problems in current
  - From: Anders Hjalmarsson
- Re: Some alpha problems in current
  - From: Andrew Doran
- Re: Some alpha problems in current
  - From: Michael L. Hitch
- Re: Some alpha problems in current
  - From: Michael L. Hitch
- Re: Some alpha problems in current
  - From: Michael L. Hitch
- Re: Some alpha problems in current
  - From: Michael L. Hitch

Prev by Date: Re: Self baked kernel panics
Next by Date: alpha LOCKDEBUG hang
Previous by Thread: Re: Some alpha problems in current
Next by Thread: netbsd-4.0 boots on alpha ES45 ?
Indexes:

Home | Main Index | Thread Index | Old Index