NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Re: About GCC optimizations



On Tue, Mar 11 2008 - 17:47, Valeriy E. Ushakov wrote:
> Joel CARNAT <joel%carnat.net@localhost> wrote:
> 
> > I'm trying to get the best performance from my VI C3 machine.
> > Thus I begin looking for GCC optimizations. According to the Gentoo
> > Linux WiKi, a good set of flags would be "-march=c3-2 -Os
> > -fomit-frame-pointer". I've compiled a few things to check for binary
> > size and crash tests and it looks OK. But I still have a few questions:
> > 
> > 1. As far as I understood, "-Os" builds small code that fetch well in
> > the C3 small cache. Does this mean that the binary will only load faster
> > or does it mean that it will also run better because each function call
> > will fit better in the small cache ?
> 
> It's hard to give any specific answer.  It's hard to talk about
> performance "in general" (if at all meaningful).
> 
> IIRC, main alleged benefit of -Os (vs. -O2) is more compact code and
> hence less paging (less pages to load on start, less pages to page out
> under memory shortage, less pages to keep in page cache and so likely
> less need for paging them out).  Cache effects are likely to be in the
> noice compared to paging for something like ls.
> 
> ISTR, I heard a story (mid 90s) about Sun folks that were looking for
> "generic" perfromance tuning for Solaris and they settled on -Os for
> these reasons.  (apologies to Sun folks if i remember wrong).
> 
> OTOH, on a certain type of long running program (e.g. media
> transcoding) cache effects will dominate the picture.
> 
> I'd guess that on a system with plenty of memory and not under any
> heavy load the difference between -Os and -O2 is going to be very
> small, and if most of you code is hot in the page cache, -O2 might
> actually be faster b/c "less paging" benefit of -Os no longer plays
> any role.
> 
> I use -Os -freorder-blocks for sh3 instead of -O2 b/c -falign-*
> options included into -O2 tend to increase sh3 code size quite a bit
> and sh3 machines often have little memory (64MB in usl-p5, 16MB in
> Jornada 680) and CF size used to be a consideration until recently
> (now that 1GB CFs seems to be "entry level" models compared to 64MB
> few years ago).  Not that I bothered to actually measure any
> performance impact, though :)
> 

well... I'm talking here about a 1GHz i386-like CPU with 64K of cache
and 512MB of RAM. Does this fit in the horsepower section ? ;-)

the only mesurement I have is that top `says` MPlayer use 70% of the CPU,
Xorg uses 20% and the DVD-display sometimes freezes or drop frames.

it seems though that freeze and/or frame drop occurs less when reading a
local video rather than an NFS-shared one. but I never reach the "less
than 10% CPU usage" that is claimed on the web(c).

> Having worked with people who do hardcoreperformance tuning as their
> day job, one thing I've learned from them is that you probably
> shouldn't even bother thinking about performance unless you measured
> it and you understand specific conditions under which your system runs
> and what kind of workloads you are optimizaing for.
> 
> So don't lose much time on this, you'll waste more time than any
> improvements from highly fine-tuned -O* -f* are going to save you in
> the few following years while the system you are optimizaing is still
> useful. :) if -Os is what makes you feel good, just stick with it :)

heh, not that I particularily matter at all.
after all, Slackware claims to not set any specific optimization.
indeed, the SlackBuild files proove that and the distrib runs well on
low-end computer.

maybe I'm taking it from the wrong side.
but how to make sure that things that can be done on the hardware level
is not done on the software level. I'm thinking of hardware to MPEG2
decoding, or MMX/SSE CPU feature.

it's really weird how a multimedia 2"5 disk can run DVD fluently
(running an embedded linux) when a real computer does not achieve it.
don't get me wrong, I don't point NetBSD ! DVD rendering is doing well
on my Core2Duo 2GHz with 4MB cache and 4GB of RAM running 4.99.55 :-)
I obviously must be doing something wrong with my C3 installation :-)
but I just don't get what :-p

> 
> > 2. My C3 is a multimedia station (running freevo and MPlayer). Should I
> > had flags like "-mmmx" or "-mmsse" or are they included in the
> > "-march=c3-2" ?
> > 
> > 3. In a more general way, how do I know which "-m" or "-f" option are
> > included in the "-march=FOO" parameter ?
> 
> It's all in gcc.info.
> 

ok - I just read gcc manpage and a few Google URL.

Regards,
        Jo

Attachment: pgpWGlbC0kdVW.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index