Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
(not so) safe cflags and per package cflags
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
darkbasic
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2006
Posts: 133

PostPosted: Mon Mar 01, 2010 2:05 pm    Post subject: (not so) safe cflags and per package cflags Reply with quote

I was experimenting a bit on my laptop and wondering if there is any way to gain some extra performance, avoiding use of ugly cflags in make.conf of course.
What I aim to do is tweaking a bit make.conf's cflags, but put there only optimizations which do not usually hurt performance (so nothing like -O3) and which build flawlessy the vast majority of the packages.
Second step is adding per package cflags for the few packages left which does not build with the tweaked global cflags.
Last but not the least build some packages which get benefits and which are known to be safe with the icc compiler (for example python gets an extra +15% boost and the kernel is _much more_ quicker in context switching, although maybe it's not so safe to compile it with icc) and tweak the gcc flags for some key packages.

I was wondering if someone already made some benchmarks and what flags does he use. I played mainly with icc and usually it is as fast as gcc-4.4 on the amd64 architecture (on a 45nm core2 duo), with few noticeable exceptions.
It would be interesting to make a database with the best flags for each package and keep make.conf as clean and safe as possible.
_________________
Computers are like air conditioners:
they stop working properly when you open Windows...

Coltiva Linux, Windows si pianta da solo.


http://www.linuxsystems.it/
Back to top
View user's profile Send private message
nikaya
Veteran
Veteran


Joined: 13 May 2006
Posts: 1471
Location: Germany

PostPosted: Tue Mar 02, 2010 9:23 am    Post subject: Reply with quote

http://usrportage.de/archives/898-Specific-env-vars-for-Gentoo-packages.html
_________________
Notes on Dhamma
How to waste your time: look for an explanation of consciousness, ask to know what feeling is. (Nanavira Thera)
Back to top
View user's profile Send private message
darkbasic
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2006
Posts: 133

PostPosted: Tue Mar 02, 2010 11:57 am    Post subject: Reply with quote

nikaya wrote:
http://usrportage.de/archives/898-Specific-env-vars-for-Gentoo-packages.html

Thank you for the link, but I already use /etc/portage/env and I already modified my bashrc to add custom flags for icc.

The hard part of the work is find which flags and which compiler are better for a package.

Something like:
Code:
ICC:
dev-lang/python -O3 -ipo -xSSE4.1 -gcc
media-sound/lame -O2 -ip -xSSE4.1 -gcc


Code:
pybench

Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   diff
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:   112ms   133ms  -15.9%   119ms   135ms  -11.7%
           BuiltinMethodLookup:    92ms   108ms  -15.3%    95ms   110ms  -13.5%
                 CompareFloats:   119ms   138ms  -13.5%   122ms   139ms  -12.1%
         CompareFloatsIntegers:   110ms   125ms  -12.0%   113ms   126ms  -10.5%
               CompareIntegers:    90ms   120ms  -25.3%    91ms   121ms  -24.8%
        CompareInternedStrings:   122ms   110ms  +10.9%   124ms   112ms  +10.0%
                  CompareLongs:    86ms    99ms  -13.2%    87ms   100ms  -13.3%
                CompareStrings:    85ms   127ms  -33.3%    89ms   129ms  -31.3%
                CompareUnicode:   115ms   102ms  +13.3%   117ms   103ms  +14.3%
    ComplexPythonFunctionCalls:   121ms   138ms  -11.9%   125ms   140ms  -10.8%
                 ConcatStrings:   131ms   148ms  -11.5%   149ms   170ms  -12.4%
                 ConcatUnicode:   124ms   192ms  -35.5%   128ms   211ms  -39.2%
               CreateInstances:   125ms   130ms   -3.9%   129ms   132ms   -2.3%
            CreateNewInstances:   101ms    96ms   +4.7%   104ms    98ms   +6.1%
       CreateStringsWithConcat:   107ms   150ms  -28.7%   109ms   153ms  -28.6%
       CreateUnicodeWithConcat:   112ms   116ms   -3.1%   120ms   121ms   -0.8%
                  DictCreation:    91ms   101ms   -9.4%    94ms   102ms   -7.7%
             DictWithFloatKeys:   102ms   123ms  -17.0%   105ms   123ms  -15.1%
           DictWithIntegerKeys:   101ms   118ms  -14.1%   105ms   120ms  -12.5%
            DictWithStringKeys:    94ms   107ms  -12.8%    96ms   108ms  -11.3%
                      ForLoops:    68ms    78ms  -12.3%    71ms    79ms   -9.7%
                    IfThenElse:   106ms   105ms   +1.1%   108ms   106ms   +2.3%
                   ListSlicing:   107ms   112ms   -4.9%   125ms   113ms  +10.8%
                NestedForLoops:    92ms   113ms  -18.0%    96ms   113ms  -15.8%
      NestedListComprehensions:   116ms   145ms  -20.1%   120ms   148ms  -19.1%
          NormalClassAttribute:   105ms   112ms   -6.2%   107ms   113ms   -4.9%
       NormalInstanceAttribute:   104ms    99ms   +5.1%   107ms   101ms   +6.7%
           PythonFunctionCalls:   116ms   126ms   -7.6%   118ms   127ms   -6.6%
             PythonMethodCalls:   137ms   148ms   -7.2%   141ms   150ms   -5.6%
                     Recursion:   166ms   163ms   +1.9%   171ms   163ms   +4.6%
                  SecondImport:    94ms   103ms   -8.6%    99ms   104ms   -5.4%
           SecondPackageImport:    97ms   110ms  -11.5%   101ms   111ms   -8.8%
         SecondSubmoduleImport:   126ms   142ms  -10.8%   130ms   143ms   -9.0%
       SimpleComplexArithmetic:   121ms   127ms   -4.9%   123ms   128ms   -3.4%
        SimpleDictManipulation:   106ms   117ms   -9.6%   111ms   121ms   -8.5%
         SimpleFloatArithmetic:   116ms   133ms  -12.2%   121ms   135ms  -10.6%
      SimpleIntFloatArithmetic:    79ms    92ms  -14.0%    81ms    93ms  -12.7%
       SimpleIntegerArithmetic:    80ms    94ms  -14.7%    81ms    94ms  -13.7%
      SimpleListComprehensions:    97ms   121ms  -19.6%   102ms   123ms  -17.4%
        SimpleListManipulation:    84ms    98ms  -13.8%    87ms    98ms  -10.9%
          SimpleLongArithmetic:   106ms   108ms   -1.8%   107ms   109ms   -1.3%
                    SmallLists:   112ms   118ms   -5.4%   114ms   120ms   -5.2%
                   SmallTuples:   100ms   114ms  -12.4%   101ms   116ms  -12.8%
         SpecialClassAttribute:   105ms   110ms   -4.7%   107ms   112ms   -5.2%
      SpecialInstanceAttribute:   125ms   208ms  -39.9%   127ms   211ms  -39.9%
                StringMappings:   118ms   107ms  +10.2%   119ms   107ms  +10.5%
              StringPredicates:   110ms   135ms  -18.6%   112ms   136ms  -17.7%
                 StringSlicing:   111ms   117ms   -5.2%   120ms   125ms   -3.5%
                     TryExcept:    68ms    90ms  -24.4%    69ms    90ms  -24.0%
                    TryFinally:    97ms   106ms   -8.3%    99ms   107ms   -7.2%
                TryRaiseExcept:   104ms   111ms   -6.6%   105ms   113ms   -6.8%
                  TupleSlicing:   114ms   133ms  -14.7%   119ms   142ms  -16.2%
               UnicodeMappings:   109ms   147ms  -26.2%   111ms   148ms  -25.2%
             UnicodePredicates:   117ms   128ms   -8.7%   119ms   132ms  -10.5%
             UnicodeProperties:   116ms   117ms   -1.3%   120ms   123ms   -3.0%
                UnicodeSlicing:   125ms   128ms   -2.9%   128ms   134ms   -4.6%
                   WithFinally:   135ms   150ms   -9.6%   141ms   152ms   -6.7%
               WithRaiseExcept:   117ms   124ms   -5.4%   122ms   125ms   -2.5%
-------------------------------------------------------------------------------
Totals:                          6247ms  7070ms  -11.6%  6459ms  7217ms  -10.5%

(this=iccO3.pybench, other=gccO2.pybench)


If someone uses a package a lot and he found the better compiler/flags for it, sharing them we can easily make a little database with all the packages which benefits greatly from some optimizations.
Adding them to /etc/portage/packages.gcc-cflags or packages.icc-cflags portage will use automatically the better flags for any known package.

I'm pretty sure I'm not the only one who have already experimented custom cflags for some packages.
_________________
Computers are like air conditioners:
they stop working properly when you open Windows...

Coltiva Linux, Windows si pianta da solo.


http://www.linuxsystems.it/
Back to top
View user's profile Send private message
Spaulding
Apprentice
Apprentice


Joined: 16 Apr 2006
Posts: 159
Location: /dev/vagina

PostPosted: Tue Mar 02, 2010 6:10 pm    Post subject: Reply with quote

We can create a mailing list or www interface. User will be able to add his options and results. But I have only one question, whether it is worth?
Back to top
View user's profile Send private message
darkbasic
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2006
Posts: 133

PostPosted: Tue Mar 02, 2010 7:20 pm    Post subject: Reply with quote

It depends... usually gains are in the range of 1-15% which is quite enough in my opinion, sometimes even greater (for example Sun Studio's auto-parallelization technology doubles (2x!) the performance in SPEC CPU2006).
We haven't to find the best flags for every package, so if someone find a better flag/compiler for a package and share it, it is worth.
_________________
Computers are like air conditioners:
they stop working properly when you open Windows...

Coltiva Linux, Windows si pianta da solo.


http://www.linuxsystems.it/
Back to top
View user's profile Send private message
robnotts
Guru
Guru


Joined: 15 Mar 2004
Posts: 405
Location: Nottingham, UK

PostPosted: Wed Mar 03, 2010 5:58 am    Post subject: Reply with quote

For info, have successfully been running my laptop, which seems stable and fast, with these...

Code:
CFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -pipe"
CXXFLAGS="${CFLAGS}"
CPPFLAGS="${CFLAGS}"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"


...with overrides for...

Code:

/etc/portage/env/app-office:
openoffice

CFLAGS="-O2 -march=native -fomit-frame-pointer -pipe"
CXXFLAGS="-pipe"
CPPFLAGS="-pipe"

/etc/portage/env/dev-db:
mysql

CFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -fno-strict-aliasing -pipe"
CXXFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -fno-strict-aliasing -pipe"
CPPFLAGS="-O2 -march=native -ftree-vectorize -fomit-frame-pointer -fno-strict-aliasing -pipe"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"

/etc/portage/env/sys-libs:
libstdc++-v3

CFLAGS="-O2 -fomit-frame-pointer -pipe"
CXXFLAGS="-O2 -fomit-frame-pointer -pipe"
CPPFLAGS="-O2 -fomit-frame-pointer -pipe"


... the openoffice and libstdc++-v3 flags are to allow them to compile at all, and the mysql flags workaround a bug as listed somewhere which was causing database corruption.

I specifically went for the vectorisation flags as I use this laptop as my main work machine for browsing, photo editing, etc, so wanted it to be as fast as possible, and use as few cpu resources as possible. it does run in 64bit.

I guess the next step would be to try some of the graphite loop manipulation flags on some of the media-libs to see if they make any difference.

Rob.
_________________
---

Gentoo Phenom][ X4 955 on AMD790 + Geforce 220GT 8GB/1.75TB (Desktop)
+ MythTV (3xFreeview,1xFreesat HD) on 1080p
Gentoo Turion64 X2 Geforce 6150 2GB/120GB (Laptop)
Back to top
View user's profile Send private message
Yamakuzure
Advocate
Advocate


Joined: 21 Jun 2006
Posts: 2273
Location: Bardowick, Germany

PostPosted: Wed Mar 03, 2010 11:09 am    Post subject: Reply with quote

My make.conf uses this, and my laptop works flawlessly with it:
Code:
## CFLAGS:
#-------------#
CFLAGS="-march=native -O2 -pipe -mssse3" ## Default and safe flags
CFLAGS="${CFLAGS} -ftree-vectorize"      ## For non tool chain
CFLAGS="${CFLAGS} -mno-push-args"        ## Should not be added unless safety is known

## LDFLAGS:
#-------------#
LDFLAGS="${LDFLAGS} -Wl,--sort-common -Wl,--hash-style=gnu" ## Default and safe flags
LDFLAGS="${LDFLAGS} -Wl,--as-needed"                        ## Optimization - if merges break due to unknown symbols, disable this!
LDFLAGS="${LDFLAGS} -Wl,-O1 -s"                             ## Flags for stripping and optimizing binaries                         
Note: The comments are for me, and not added for this post. So I do not know whether my "commented in" thoughts are entirely correct. ;)

Note 2: I comment lines for individual packages as I see fit.
_________________
Important German:
  1. "Aha" - German reaction to pretend that you are really interested while giving no f*ck.
  2. "Tja" - German reaction to the apocalypse, nuclear war, an alien invasion or no bread in the house.
Back to top
View user's profile Send private message
darkbasic
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2006
Posts: 133

PostPosted: Wed Mar 03, 2010 11:39 am    Post subject: Reply with quote

robnotts wrote:
I guess the next step would be to try some of the graphite loop manipulation flags on some of the media-libs to see if they make any difference.


I'm experimenting with "-floop-interchange -floop-strip-mine -floop-block" but I still have to bench it. It seems to be quite safe.

Maybe something more aggressive like "-floop-parallelize-all -ftree-parallelize-loops=4" is worth trying too...
_________________
Computers are like air conditioners:
they stop working properly when you open Windows...

Coltiva Linux, Windows si pianta da solo.


http://www.linuxsystems.it/
Back to top
View user's profile Send private message
SithMaddox
Tux's lil' helper
Tux's lil' helper


Joined: 02 Jul 2004
Posts: 149

PostPosted: Sun Mar 14, 2010 8:25 pm    Post subject: Reply with quote

Yamakuzure wrote:
My make.conf uses this, and my laptop works flawlessly with it:
Code:
## CFLAGS:
#-------------#
CFLAGS="-march=native -O2 -pipe -mssse3" ## Default and safe flags
CFLAGS="${CFLAGS} -ftree-vectorize"      ## For non tool chain
CFLAGS="${CFLAGS} -mno-push-args"        ## Should not be added unless safety is known

## LDFLAGS:
#-------------#
LDFLAGS="${LDFLAGS} -Wl,--sort-common -Wl,--hash-style=gnu" ## Default and safe flags
LDFLAGS="${LDFLAGS} -Wl,--as-needed"                        ## Optimization - if merges break due to unknown symbols, disable this!
LDFLAGS="${LDFLAGS} -Wl,-O1 -s"                             ## Flags for stripping and optimizing binaries                         
Note: The comments are for me, and not added for this post. So I do not know whether my "commented in" thoughts are entirely correct. ;)

Note 2: I comment lines for individual packages as I see fit.


Doesn't -march=native imply -mssse3?
Back to top
View user's profile Send private message
darkbasic
Tux's lil' helper
Tux's lil' helper


Joined: 06 Sep 2006
Posts: 133

PostPosted: Mon Mar 15, 2010 11:47 am    Post subject: Reply with quote

SithMaddox wrote:
Doesn't -march=native imply -mssse3?


Uhm... I think so but I'm not sure...
_________________
Computers are like air conditioners:
they stop working properly when you open Windows...

Coltiva Linux, Windows si pianta da solo.


http://www.linuxsystems.it/
Back to top
View user's profile Send private message
amade
n00b
n00b


Joined: 30 Mar 2009
Posts: 8

PostPosted: Mon Mar 15, 2010 12:22 pm    Post subject: Reply with quote

Code:

# gcc -march=native -Q --help=target
...
  -msseregparm                      [disabled]
  -mssse3                           [disabled]
  -mstack-arg-probe                 [disabled]
...
# gcc -march=native -Q --help=target -msse3
...
  -msseregparm                      [disabled]
  -mssse3                           [enabled]
  -mstack-arg-probe                 [disabled]
...
Back to top
View user's profile Send private message
loftwyr
l33t
l33t


Joined: 29 Dec 2004
Posts: 970
Location: 43°38'23.62"N 79°27'8.60"W

PostPosted: Mon Mar 15, 2010 1:05 pm    Post subject: Reply with quote

That output only shows you which is explicitly set. It doesn't show what is implied by the march/mtune settings generated by native.

If you have a CPU that supports the sse sets, then they are enabled but msseX will still show as disabled.
_________________
My emerge --info
Have you run revdep-rebuild lately? It's in gentoolkit and it's worth a shot if things don't work well.
Celebrating 5 years of Gentoo-ing.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6279

PostPosted: Mon Mar 15, 2010 7:08 pm    Post subject: Reply with quote

-mssse3 is implied by -march=native (if supported by the CPU, of course). Here is how to find out (if your CPU supports it):
Code:
gcc -v -c -Q -march=native -O2 -o /dev/null -x c - 2>&1 <<PROG
int main(){return 0;}
PROG
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum