Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
gcc optimize for p3, p4 & xp
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7197

PostPosted: Sun Nov 21, 2004 7:12 am    Post subject: gcc optimize for p3, p4 & xp Reply with quote

Just find that, looks cool !

from gcc man pages...

Code:

`-mfpmath=UNIT'
     Generate floating point arithmetics for selected unit UNIT.  The
     choices for UNIT are:

    `387'
          Use the standard 387 floating point coprocessor present
          majority of chips and emulated otherwise.  Code compiled with
          this option will run almost everywhere.  The temporary
          results are computed in 80bit precision instead of precision
          specified by the type resulting in slightly different results
          compared to most of other chips. See `-ffloat-store' for more
          detailed description.

          This is the default choice for i386 compiler.

    `sse'
          Use scalar floating point instructions present in the SSE
          instruction set.  This instruction set is supported by
          Pentium3 and newer chips, in the AMD line by Athlon-4,
          Athlon-xp and Athlon-mp chips.  The earlier version of SSE
          instruction set supports only single precision arithmetics,
          thus the double and extended precision arithmetics is still
          instruction set supports only single precision arithmetics,
          thus the double and extended precision arithmetics is still
          done using 387.  Later version, present only in Pentium4 and
          the future AMD x86-64 chips supports double precision
          arithmetics too.

          For i387 you need to use `-march=CPU-TYPE', `-msse' or
          `-msse2' switches to enable SSE extensions and make this
          option effective.  For x86-64 compiler, these extensions are
          enabled by default.

          [b]The resulting code should be considerably faster in the
          majority of cases[/b] and avoid the numerical instability
          problems of 387 code, but may break some existing code that
          expects temporaries to be 80bit.

          This is the default choice for the x86-64 compiler.


Got it ?
pentium3, pentium4, athlon users could use it :p
Code:

CFLAGS="-march=pentium4 -mtune=pentium4 -O3 -pipe -msse2 -msse -mfpmath=sse -mmmx


It should be safe as it's the default choice for x86-64...
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7197

PostPosted: Sun Nov 21, 2004 7:18 am    Post subject: Reply with quote

Well i wasn't really sure i should post that one (looks more dangerous than the other) but, if you feel crazy enought

Code:

    sse,387
          Attempt to utilize both instruction sets at once.  This
          effectively double the amount of available registers and on
          chips with separate execution units for 387 and SSE the
          execution resources too.  Use this option with care, as it is
          still experimental, because the GCC register allocator does
          not model separate functional units well resulting in
          instable performance.


Code:

CFLAGS="-march=pentium4 -mtune=pentium4 -O3 -pipe -msse2 -msse -mfpmath=sse,387 -mmmx


And don't miss it--> Use this option with care, as it is still experimental
Back to top
View user's profile Send private message
frenkel
Veteran
Veteran


Joined: 13 May 2003
Posts: 1034
Location: .nl

PostPosted: Sun Nov 21, 2004 10:57 am    Post subject: Reply with quote

I'm using this -mfpmath=sse,387 flag since I installed this system about a year ago (Athlon XP 2800+) and never had any problems with it. I use this system every day.

Frank
_________________
http://techfield.org
Back to top
View user's profile Send private message
Dolio
l33t
l33t


Joined: 17 Jun 2002
Posts: 650

PostPosted: Mon Nov 22, 2004 3:48 am    Post subject: Reply with quote

The only flag here that probably does anything is '-mfpmath=sse,387' and that only because it's experimental.

When you set '-march=whatever' it should automatically signal gcc t use '-msse -mmmx' etc. as appropriate to the architecture you specify. The only reason to use those flags is if you want to use -march=i386 and enable everything else manually or if something weird is going on with your cpu (like you have an Athlon Thunderbird that magically developed sse2 instructions :)).

Otherwise, it's either redundant (since it's already being specified by march) or potentially dangerous (since you could generate code that doesn't execute on your processor).
_________________
They don't have a good bathroom to do coke in.
Back to top
View user's profile Send private message
augury
l33t
l33t


Joined: 22 May 2004
Posts: 722
Location: philadelphia

PostPosted: Mon Nov 22, 2004 6:59 am    Post subject: Reply with quote

-mfpmath=sse,387 doesnt do anything worth the effort

-msse3 on -march=prescott will have an effect if you use gcc-3.4.3,
devs took it out, i dont know why exactly, i think it gets to much when by default maybe or just broken.
Back to top
View user's profile Send private message
frenkel
Veteran
Veteran


Joined: 13 May 2003
Posts: 1034
Location: .nl

PostPosted: Mon Nov 22, 2004 3:54 pm    Post subject: Reply with quote

augury wrote:
-mfpmath=sse,387 doesnt do anything worth the effort

Where is this based on??

Frank
_________________
http://techfield.org
Back to top
View user's profile Send private message
rhill
Retired Dev
Retired Dev


Joined: 22 Oct 2004
Posts: 1629
Location: sk.ca

PostPosted: Wed Dec 01, 2004 1:27 am    Post subject: Reply with quote

http://www.coyotegulch.com/products/acovea/acovea_original.html
http://www.coyotegulch.com/products/acovea/acovea_4.html

i was also just browsing the gcc mailing list for reference to sse,387 sucking, and instead found an example to the contrary. in fact, for the P4, 'sse,387' > '387' > 'sse'. not right now (they were discussing a recent patch for gcc 4.0), but it's good to see that it's being looked at. :)

but i've heard a lot about how sse,387 doesn't work, is broken, or runs slower than the defaults. who knows, if it works for you, go for it. as with everything, it depends what you're running and what you're running it on.
_________________
by design, by neglect
for a fact or just for effect
Back to top
View user's profile Send private message
MighMoS
Guru
Guru


Joined: 24 Apr 2003
Posts: 416
Location: @ ~

PostPosted: Wed Dec 01, 2004 2:14 am    Post subject: Reply with quote

This cut GNOME's startup time in half, as well as maploads for UT2k4 (I relinked the libs)
_________________
jabber: MighMoS@jabber.org

localhost # export HOME=`which heart`
Back to top
View user's profile Send private message
opm8
n00b
n00b


Joined: 10 Sep 2003
Posts: 56

PostPosted: Wed Dec 01, 2004 7:13 am    Post subject: Reply with quote

MighMoS,

What's the command to relink libs?

MighMoS wrote:
This cut GNOME's startup time in half, as well as maploads for UT2k4 (I relinked the libs)
Back to top
View user's profile Send private message
ARC2300
Apprentice
Apprentice


Joined: 30 Mar 2003
Posts: 260
Location: Odenton, MD

PostPosted: Sun Dec 05, 2004 10:53 am    Post subject: Reply with quote

opm8 wrote:
MighMoS,

What's the command to relink libs?

MighMoS wrote:
This cut GNOME's startup time in half, as well as maploads for UT2k4 (I relinked the libs)


I believe you're looking for "ldconfig".
_________________
It's fun to take a trip
Put acid in your veins
Back to top
View user's profile Send private message
yngwin
Retired Dev
Retired Dev


Joined: 19 Dec 2002
Posts: 4572
Location: Suzhou, China

PostPosted: Mon Dec 06, 2004 10:01 am    Post subject: Reply with quote

Actually on athlon-xp -mfpmath=387 is faster than the other options...
_________________
"Those who deny freedom to others deserve it not for themselves." - Abraham Lincoln
Free Culture | Defective by Design | EFF
Back to top
View user's profile Send private message
thechris
Veteran
Veteran


Joined: 12 Oct 2003
Posts: 1203

PostPosted: Mon Dec 06, 2004 6:13 pm    Post subject: Reply with quote

in every test i've done and every one i've seen, -mfpmath=anything will be worse then omitting the option. I can only assume the compiler can determine these things better. in the future 387,sse should be faster.
Back to top
View user's profile Send private message
Genkaku
n00b
n00b


Joined: 26 Aug 2004
Posts: 72
Location: Poland

PostPosted: Mon Dec 06, 2004 6:59 pm    Post subject: Reply with quote

MighMoS, what cpu do you have ? And You have chosen -mfpmath=387, -mfpmath=sse or -mfpmath=sse,387 ?
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7197

PostPosted: Wed Dec 08, 2004 11:27 pm    Post subject: Reply with quote

ok, after few days testing mfpmath=sse,387 i could say

- Speed: well, can't really see the difference as i haven't tune it yet, except maybe gnome, looks to respond faster, but could be psychologic result... and loading seems really better...
- Stability: actually no problem with binary, no crash... code is stable for me...

augury: i'm aware of flags for prescott, nocona, sse3... but i open that thread for the mfpmath that i wasn't knowing, everyone talk about others, but actually never saw a thread with that one.
Maybe a lot of ppl knows it, but as nobody write it down, i didn't get that one, until now...

dirtyepic: both links are dead, could you drop some others ?

dolio: yep, but 1/ redundant isn't dangerous (my gcc like it), and 2/ mtune will automatically set them, not march.
ie: -march=pentium4 -msse3 == -march=pentium4 -mtune=prescott
So if you only set march=pentium4 and got a prescott, you will not have sse3 code until mtune or msse3 specified... As you see, march gives general architecture optimization, but you need to tune to your processor implementation.

Anyone got a real testcase with "time" ?
ps: should be a program that will help gcc produce code for sse,387... some equations maybe. and result should fail a "diff nonoptimizedversion optimizedversion"
Back to top
View user's profile Send private message
bi3l
Apprentice
Apprentice


Joined: 06 Feb 2003
Posts: 268
Location: France

PostPosted: Wed Dec 08, 2004 11:49 pm    Post subject: Reply with quote

krinn wrote:
dolio: yep, but 1/ redundant isn't dangerous (my gcc like it), and 2/ mtune will automatically set them, not march.
ie: -march=pentium4 -msse3 == -march=pentium4 -mtune=prescott
So if you only set march=pentium4 and got a prescott, you will not have sse3 code until mtune or msse3 specified... As you see, march gives general architecture optimization, but you need to tune to your processor implementation.

That's not exactly true as you can just set -march=prescott and according to the man page of gcc:
Quote:
specifying -march=cpu-type implies -mtune=cpu-type.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7197

PostPosted: Thu Dec 09, 2004 2:07 am    Post subject: Reply with quote

good catch :D
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum