tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ioflush kernel thread chewing CPU time

Hi Andy,

Simon Burge wrote:

> Andrew Doran wrote:
> > I suggest putting in some counters to see what the syncer
> > is doing. For example:
> > 
> > - number VDIR vnodes flushed
> > - number VREG vnodes flushed
> > - number VT_VFS vnodes flushed (sync vnodes)
> > 
> > If you put an integer switch in the kernel you can turn the counters on at
> > runtime using gdb, when the problem starts to occur.
> I'll try this before trying a gprof kernel.  Actually, maybe both - I'm 
> not worried about the performance hit of profiling on this box.

A netbsd-5 gprof kernel just reset the system as soon as it loaded/started.
I'll dig around with that a bit more when I get a chance.

I sprinkled some event counters in sched_sync().  Over a 300 second period
where I was seeing ioflush chewing usual 20ish% CPU time:

 - just before the while loop inside the for loop:      254
 - at the top of the while loop:                        137
 - after vget success:                                  137
      type VDIR:                                        0
      type VREG:                                        12
      type VBLK:                                        0
      type VCHR:                                        0
      tag VFS:                                          125

> > > Before I start digging, anyone else seen anything like this before? 
> > 
> > Nope. But, processing a sync vnode involves a trawl through all vnodes
> > associated with every file system. It sounds like that could be happening
> > too often, or for some reason perhaps vnodes on the worklist aren't getting
> > flushed.
> That seems like a pretty reasonable assumption - maxvnodes is set to
> 128k here, and dropping it to 8k sees ioflush go pretty much idle!
> ps shows that thread now using 1.05 cpu seconds of CPU time over 60
> seconds.  Bumping maxvnodes back to 128k still shows ioflush idle, but
> based on past experience I guess it's not going to show a problem for 48
> or more hours.

I've also just rebooted a kernel with your recent ffs_sync() change.  I'll
let you know results in a day or two :-)


Home | Main Index | Thread Index | Old Index