tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: [simonb-wapbl] src/sys/kern

On Mon, Jul 28, 2008 at 08:12:34PM -0700, Jason Thorpe wrote:

>          * If we are releasing a write lock, then prefer to wake all
>          * outstanding readers.  Otherwise, wake one writer if there
>          * are outstanding readers, or all writers if there are no
>          * pending readers.  If waking one specific writer, the writer
>          * is handed the lock here.  If waking multiple writers, we
>          * set WRITE_WANTED to block out new readers, and let them
>          * do the work of acquring the lock in rw_vector_enter().
> So, my guess is that there is writer starvation, which is exactly why  
> pthread_rwlock *ALWAYS* prefers writers (per SUSv3). However, it seems  
> as though kern_rwlock will sometimes prefer readers.

It flip-flops between readers and writers.

> When you always acquire as a writer, then you fall into the "wake all
> writers" case because there are never outstanding readers.  You get
> thundering-herd, but luck in how the scheduler is implemented allows the
> journal flush through.

Along with the adaptive spinning it has a dramatic effect on A
delay in getting the lock to a thread that needs it can cause back pressure
and sometimes the system idles needlessly, with lots of threads sitting in

There is a lot of contention on vnode locks due to namei() and the exec path
does evil things with the locks. We could fix those paths to suck less but
that's an awful lot of work.

> Andy, why was the current sometimes-prefer-readers behavior implemented?

Only to try and avoid starving readers.
> Oh, BTW, I think I see a bug in the current rw_vector_exit():
>                         /* Give the lock to the longest waiting  
> writer. */
>                         l = TS_FIRST(ts, TS_WRITER_Q);
>                         new = (uintptr_t)l | RW_WRITE_LOCKED |  
>                         if (wcnt != 0)
>                                 new |= RW_WRITE_WANTED;
>                         ^^^^^
>                         should be "if (wcnt > 1)"
> I think the current test will always set RW_WRITE_WANTED, which is not  
> what you want if you're waking the last writer.  Andy, do you agree?

Yeah, that's a bug. It should be (wcnt > 1) as you've written.

> >11) It really feels like there is a lack of backpressure somewhere,
> >and that WAPBL is just allowed to create files with wild abandon.
> >
> >Thoughts/comments/solutions are welcome.

I've said it before, I think a rwlock is the wrong tool for the job here. My
two thoughts on it are:

- rwlocks can be tweaked but are always going to perform poorly if the
  writer:reader ratio sucks.

- Adopting a producer-consumer model where journal writeback is handled by a
  kthread would remove the need for the rwlock. It could be done by
  maintaining transaction sequence numbers. For example, each in-core inode
  could contain a record of the last transaction made and the kthread a
  global sequence number indicating the last transaction that is known to be
  in stable storage.


Home | Main Index | Thread Index | Old Index