Re: CVS commit: [simonb-wapbl] src/sys/kern

To: Jason Thorpe <thorpej%shagadelic.org@localhost>
Subject: Re: CVS commit: [simonb-wapbl] src/sys/kern
From: Andrew Doran <ad%NetBSD.org@localhost>
Date: Tue, 29 Jul 2008 13:01:59 +0000

On Mon, Jul 28, 2008 at 08:12:34PM -0700, Jason Thorpe wrote:

>          * If we are releasing a write lock, then prefer to wake all
>          * outstanding readers.  Otherwise, wake one writer if there
>          * are outstanding readers, or all writers if there are no
>          * pending readers.  If waking one specific writer, the writer
>          * is handed the lock here.  If waking multiple writers, we
>          * set WRITE_WANTED to block out new readers, and let them
>          * do the work of acquring the lock in rw_vector_enter().
> 
> So, my guess is that there is writer starvation, which is exactly why  
> pthread_rwlock *ALWAYS* prefers writers (per SUSv3). However, it seems  
> as though kern_rwlock will sometimes prefer readers.

It flip-flops between readers and writers.

> When you always acquire as a writer, then you fall into the "wake all
> writers" case because there are never outstanding readers.  You get
> thundering-herd, but luck in how the scheduler is implemented allows the
> journal flush through.

Along with the adaptive spinning it has a dramatic effect on build.sh. A
delay in getting the lock to a thread that needs it can cause back pressure
and sometimes the system idles needlessly, with lots of threads sitting in
tstile.

There is a lot of contention on vnode locks due to namei() and the exec path
does evil things with the locks. We could fix those paths to suck less but
that's an awful lot of work.

> Andy, why was the current sometimes-prefer-readers behavior implemented?

Only to try and avoid starving readers.
 
> Oh, BTW, I think I see a bug in the current rw_vector_exit():
> 
>                         /* Give the lock to the longest waiting  
> writer. */
>                         l = TS_FIRST(ts, TS_WRITER_Q);
>                         new = (uintptr_t)l | RW_WRITE_LOCKED |  
> RW_HAS_WAITERS;
>                         if (wcnt != 0)
>                                 new |= RW_WRITE_WANTED;
> 
>                         ^^^^^
>                         should be "if (wcnt > 1)"
> 
> I think the current test will always set RW_WRITE_WANTED, which is not  
> what you want if you're waking the last writer.  Andy, do you agree?

Yeah, that's a bug. It should be (wcnt > 1) as you've written.

> >11) It really feels like there is a lack of backpressure somewhere,
> >and that WAPBL is just allowed to create files with wild abandon.
> >
> >Thoughts/comments/solutions are welcome.

I've said it before, I think a rwlock is the wrong tool for the job here. My
two thoughts on it are:

- rwlocks can be tweaked but are always going to perform poorly if the
  writer:reader ratio sucks.

- Adopting a producer-consumer model where journal writeback is handled by a
  kthread would remove the need for the rwlock. It could be done by
  maintaining transaction sequence numbers. For example, each in-core inode
  could contain a record of the last transaction made and the kthread a
  global sequence number indicating the last transaction that is known to be
  in stable storage.

Thanks,
Andrew

Follow-Ups:
- Re: CVS commit: [simonb-wapbl] src/sys/kern
  - From: Jason Thorpe
- Re: CVS commit: [simonb-wapbl] src/sys/kern
  - From: Jason Thorpe

References:
- Re: CVS commit: [simonb-wapbl] src/sys/kern
  - From: Jason Thorpe

Prev by Date: Re: CVS commit: [simonb-wapbl] src/sys/kern
Next by Date: Re: CVS commit: [simonb-wapbl] src/sys/kern
Previous by Thread: Re: CVS commit: [simonb-wapbl] src/sys/kern
Next by Thread: Re: CVS commit: [simonb-wapbl] src/sys/kern
Indexes:

Home | Main Index | Thread Index | Old Index