tech-kern: Re: FFSv1 performance on large filesystems

Subject: Re: FFSv1 performance on large filesystems
To: None <tech-perform@netbsd.org, tech-kern@netbsd.org>
From: Matthias Scheler <tron@zhadum.de>
List: tech-kern
Date: 03/03/2005 16:32:38

On Thu, Mar 03, 2005 at 10:32:42AM -0500, Thor Lancelot Simon wrote:
> > Filesystem Size		Device	Write Performance(*)
> > 1.9G			raid0	34MB/Sec
> > 101G			raid0	22MB/Sec <--
> 
> How large is the underlying disk?

wd0 at atabus0 drive 0: <WDC WD1600JB-32EVA0>
wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
wd1 at atabus0 drive 1: <WDC WD1600JB-32EVA0>
wd1: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
raid0: Components: /dev/wd0a /dev/wd1a

> You should see a significant difference in performance from center to
> edge of the disk.

I had that idea too which is why I tested another partition which is
I tested this partition ...

26G	raid1		33MB/Sec

... which is at the end of pair of 80GB disks.

I've run another test on my NetBSD-current system on a single disk ...

wd1 at atabus3 drive 0: <WDC WD1600JD-00GBB0>
wd1: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors

... and performance is much better here ...

> dd if=/dev/zero of=test.img bs=1024k count=256
256+0 records in
256+0 records out
268435456 bytes transferred in 5.739 secs (46773907 bytes/sec)

... on an ever bigger partition:

Filesystem  1K-blocks      Used     Avail Capacity  Mounted on
/dev/wd1f   111479275   2122056 103783256     2%    /export/scratch

So either somebody "fixed" FFSv1 after NetBSD 2.0.1 or the problem isn't
related to FFSv1. Possible other reasons:

1.) The disks
    The only difference between the disk is the interface (PATA vs. SATA).
    And in my tests ("dd" from a raw device) that didn't make much
    difference.

2.) RAIDframe
    I'm also not convinced that RAIDframe causes the problem because all
    initial test cases used RAIDframe RAID 1 devices and the problem
    only affected a single partition.

Any other ideas?

> How large is the file you're writing?

256MB

> I actually think that, in any case where we have more than 100 or so
> cylinder groups, we should default -e to be a full group (it would be
> nice if it could meaningfully be _more_ than a group).

After "tunefs -e 40960 ..." the performance dropped below 20MB/Sec.

	Kind regards

-- 
Matthias Scheler                                  http://scheler.de/~matthias/