Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Tip: Maths performance tweak
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
taviso
Retired Dev
Retired Dev


Joined: 15 Apr 2003
Posts: 261
Location: United Kingdom

PostPosted: Wed Mar 23, 2005 4:00 pm    Post subject: Tip: Maths performance tweak Reply with quote

If you have an intel cpu, try this:
Code:
# emerge dev-lang/icc
$ wget http://dev.gentoo.org/~taviso/cpml.c
$ gcc -O2 -ldl -o cpml cpml.c
$ ./cpml

If you get interesting results, you can switch to using libimf globally, take some benchmarks first:
Code:

# emerge nbench
$ nbench
# echo /opt/intel/compiler80/lib/libimf.so >> /etc/ld.so.preload
$ nbench   

Probably not a big difference for most people, but it's still interesting :) I was investigating this to see how the (aging) cpml library compared to glibc and found it interesting, so added x86 support. How do the benchmarks look for other people if preloading libimf? what about games performace? might be interesting to find out :)
_________________
--------------------------------------
Gentoo on Alpha, is your penguin 64bit?
--------------------------------------------------------
Back to top
View user's profile Send private message
ballero
n00b
n00b


Joined: 10 Jul 2004
Posts: 62

PostPosted: Thu Mar 24, 2005 11:59 am    Post subject: Reply with quote

some benchies:

Code:
acos:                           icc 8.1                         icc 9.0
           libm.so->acos()      (373 cycles)
         libimf.so->acos()      (206 cycles)                  (204 cycles)

asin:
           libm.so->asin()      (372 cycles)
         libimf.so->asin()      (187 cycles)                  (186 cycles)

atan:
           libm.so->atan()      (296 cycles)
         libimf.so->atan()      (118 cycles)                  (85 cycles)

atan2:
           libm.so->atan2()     (303 cycles)
         libimf.so->atan2()     (62 cycles)                   (62 cycles)

cos:
           libm.so->cos()       (226 cycles)
         libimf.so->cos()       (102 cycles)                  (101 cycles)

exp:
           libm.so->exp()       (383 cycles)
         libimf.so->exp()       (89 cycles)                   (89 cycles)

hypot:
           libm.so->hypot()     (128 cycles)
         libimf.so->hypot()     (45 cycles)                   (47 cycles)

log:
           libm.so->log()       (200 cycles)
         libimf.so->log()       (122 cycles)                  (92 cycles)

log10:
           libm.so->log10()     (208 cycles)
         libimf.so->log10()     (128 cycles)                  (100 cycles)

pow:
           libm.so->pow()       (859 cycles)
         libimf.so->pow()       (211 cycles)                  (111 cycles)

sin:
           libm.so->sin()       (253 cycles)
         libimf.so->sin()       (102 cycles)                  (101 cycles)

sqrt:
           libm.so->sqrt()      (64 cycles)
         libimf.so->sqrt()      (53 cycles)                   (40 cycles)

tan:
           libm.so->tan()       (354 cycles)
         libimf.so->tan()       (186 cycles)                  (184 cycles)       


w/o preload
Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          1352.8  :      34.69  :      11.39
STRING SORT         :              73  :      32.62  :       5.05
BITFIELD            :      7.1218e+08  :     122.16  :      25.52
FP EMULATION        :          232.24  :     111.44  :      25.71
FOURIER             :           20760  :      23.61  :      13.26
ASSIGNMENT          :          39.553  :     150.50  :      39.04
IDEA                :          2430.2  :      37.17  :      11.04
HUFFMAN             :          2170.4  :      60.19  :      19.22
NEURAL NET          :            30.8  :      49.48  :      20.81
LU DECOMPOSITION    :          1248.2  :      64.67  :      46.69
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 65.526
FLOATING-POINT INDEX: 42.271
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz 3681MHz
L2 Cache            : 1024 KB
OS                  : Linux 2.6.11-gentoo-r4
C compiler          : 3.4.3-20050110
libc                :
MEMORY INDEX        : 17.133
INTEGER INDEX       : 15.789
FLOATING-POINT INDEX: 23.445
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


with preload
Code:
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          1345.8  :      34.51  :      11.33
STRING SORT         :          73.291  :      32.75  :       5.07
BITFIELD            :      7.1013e+08  :     121.81  :      25.44
FP EMULATION        :          232.16  :     111.40  :      25.71
FOURIER             :           35437  :      40.30  :      22.64
ASSIGNMENT          :          39.568  :     150.56  :      39.05
IDEA                :          2435.1  :      37.24  :      11.06
HUFFMAN             :          2179.3  :      60.43  :      19.30
NEURAL NET          :          30.631  :      49.21  :      20.70
LU DECOMPOSITION    :          1232.7  :      63.86  :      46.11
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 65.546
FLOATING-POINT INDEX: 50.217
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : Dual GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz 3681MHz
L2 Cache            : 1024 KB
OS                  : Linux 2.6.11-gentoo-r4
C compiler          : 3.4.3-20050110
libc                :
MEMORY INDEX        : 17.141
INTEGER INDEX       : 15.791
FLOATING-POINT INDEX: 27.852
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


There's a huge difference in FOURIER, useful to people who use SETI@home.

ftp://pi.super-computing.org/Linux/super_pi.tar.gz
Code:
./super_pi 20


Code:
====================== super_pi ==================================

                             with preload          w/o preload

Total calculation(I/O) time=    26.598               28.784
==================================================================




Code:
====================== Quake3-timedemo ==================================

                             with preload          w/o preload

demo001 (fps) =                   374                   370
demo002 (fps) =                   366                   364
=========================================================================




Great tips, taviso. thanks. :)


Last edited by ballero on Sat Aug 06, 2005 10:46 am; edited 1 time in total
Back to top
View user's profile Send private message
Mac-or
n00b
n00b


Joined: 20 Mar 2005
Posts: 15

PostPosted: Fri Apr 15, 2005 1:06 am    Post subject: Reply with quote

You opened a very interesting thread.

./cpml brought me interesting results, too.
acos:
libm.so->acos() (274 cycles)
libimf.so->acos() (163 cycles)

asin:
libm.so->asin() (282 cycles)
libimf.so->asin() (166 cycles)

atan:
libm.so->atan() (173 cycles)
libimf.so->atan() (86 cycles)

atan2:
libm.so->atan2() (410 cycles)
libimf.so->atan2() (274 cycles)

cos:
libm.so->cos() (109 cycles)
libimf.so->cos() (174 cycles)

exp:
libm.so->exp() (209 cycles)
libimf.so->exp() (137 cycles)

hypot:
libm.so->hypot() (677 cycles)
libimf.so->hypot() (199 cycles)

log:
libm.so->log() (153 cycles)
libimf.so->log() (105 cycles)

log10:
libm.so->log10() (162 cycles)
libimf.so->log10() (126 cycles)

pow:
libm.so->pow() (1164 cycles)
libimf.so->pow() (604 cycles)

sin:
libm.so->sin() (128 cycles)
libimf.so->sin() (166 cycles)

sqrt:
libm.so->sqrt() (91 cycles)
libimf.so->sqrt() (73 cycles)

tan:
libm.so->tan() (182 cycles)
libimf.so->tan() (197 cycles)

Note cos, sin and tan. (Actually, I don't understand, why those are slower and the rest of the functions is so much faster).

I had an emerge running in the background but i think that this should be responsible for those results. I ran cpml a few times with no change.
My System is a PIII and my CFLAGS include O3, etc.

Nevertheless i will investigate this further and use it.

BTW: Is prelink aware of /etc/ld.so.preload?
Back to top
View user's profile Send private message
roothorick
Tux's lil' helper
Tux's lil' helper


Joined: 30 May 2004
Posts: 83
Location: Menasha, WI

PostPosted: Sat Apr 16, 2005 3:03 am    Post subject: Reply with quote

Out of curiosity, tried it on an AMD Athlon XP 2200+:

Quote:
acos:
libm.so->acos() (296 cycles)
libimf.so->acos() (137 cycles)

asin:
libm.so->asin() (269 cycles)
libimf.so->asin() (138 cycles)

atan:
libm.so->atan() (203 cycles)
libimf.so->atan() (72 cycles)

atan2:
libm.so->atan2() (197 cycles)
libimf.so->atan2() (195 cycles)

cos:
libm.so->cos() (128 cycles)
libimf.so->cos() (167 cycles)

exp:
libm.so->exp() (144 cycles)
libimf.so->exp() (128 cycles)

hypot:
libm.so->hypot() (79 cycles)
libimf.so->hypot() (61 cycles)

log:
libm.so->log() (211 cycles)
libimf.so->log() (116 cycles)

log10:
libm.so->log10() (234 cycles)
libimf.so->log10() (142 cycles)

pow:
libm.so->pow() (419 cycles)
libimf.so->pow() (221 cycles)

sin:
libm.so->sin() (74 cycles)
libimf.so->sin() (173 cycles)

sqrt:
libm.so->sqrt() (64 cycles)
libimf.so->sqrt() (32 cycles)

tan:
libm.so->tan() (148 cycles)
libimf.so->tan() (206 cycles)



Interesting, to say the least. libm edges out libimf on the basic trig functions (sine, cosine, tangent) while libimf beats libm, usually quite badly, in everything else. I'll take a crack at nbench.

-UPDATE- I tried Doom 3 both ways; no performance difference whatsoever in caching timedemo (timedemo demo1 usecache); it came back 27.9fps both times. UT2k4 may or may not be different, not sure. (D3 might use statically linked or built-in math functions.)
_________________
Note: This user has been arrested under the DMCA for copyright infringement based on a complaint from The Inernational Cliche Company. He is also facing charges for violating US patents describing the encoding of text in digital form.
Back to top
View user's profile Send private message
caslca
Tux's lil' helper
Tux's lil' helper


Joined: 24 Aug 2003
Posts: 85

PostPosted: Sat Apr 16, 2005 1:56 pm    Post subject: Reply with quote

I show improvement on all calls:
Code:


acos:
      libm.so->acos()   (305 cycles)
    libimf.so->acos()   (186 cycles)

asin:
      libm.so->asin()   (304 cycles)
    libimf.so->asin()   (165 cycles)

atan:
      libm.so->atan()   (244 cycles)
    libimf.so->atan()   (111 cycles)

atan2:
      libm.so->atan2()   (312 cycles)
    libimf.so->atan2()   (50 cycles)

cos:
      libm.so->cos()   (196 cycles)
    libimf.so->cos()   (89 cycles)

exp:
      libm.so->exp()   (309 cycles)
    libimf.so->exp()   (80 cycles)

hypot:
      libm.so->hypot()   (124 cycles)
    libimf.so->hypot()   (52 cycles)

log:
      libm.so->log()   (178 cycles)
    libimf.so->log()   (106 cycles)

log10:
      libm.so->log10()   (185 cycles)
    libimf.so->log10()   (109 cycles)

pow:
      libm.so->pow()   (855 cycles)
    libimf.so->pow()   (175 cycles)

sin:
      libm.so->sin()   (220 cycles)
    libimf.so->sin()   (89 cycles)

sqrt:
      libm.so->sqrt()   (55 cycles)
    libimf.so->sqrt()   (44 cycles)

tan:
      libm.so->tan()   (300 cycles)
    libimf.so->tan()   (131 cycles)


P4 3.2 HT laptop/768MB RAM
Back to top
View user's profile Send private message
sn4ip3r
Guru
Guru


Joined: 14 Dec 2002
Posts: 325
Location: Tallinn, Estonia

PostPosted: Sat Apr 16, 2005 3:54 pm    Post subject: Reply with quote

Improvement on all calls, pentium-m dothan 1500.
Code:
acos:
           libm.so->acos()      (274 cycles)
         libimf.so->acos()      (150 cycles)

asin:
           libm.so->asin()      (275 cycles)
         libimf.so->asin()      (139 cycles)

atan:
           libm.so->atan()      (162 cycles)
         libimf.so->atan()      (80 cycles)

atan2:
           libm.so->atan2()     (149 cycles)
         libimf.so->atan2()     (40 cycles)

cos:
           libm.so->cos()       (114 cycles)
         libimf.so->cos()       (84 cycles)

exp:
           libm.so->exp()       (199 cycles)
         libimf.so->exp()       (59 cycles)

hypot:
           libm.so->hypot()     (121 cycles)
         libimf.so->hypot()     (99 cycles)

log:
           libm.so->log()       (144 cycles)
         libimf.so->log()       (75 cycles)

log10:
           libm.so->log10()     (149 cycles)
         libimf.so->log10()     (80 cycles)

pow:
           libm.so->pow()       (344 cycles)
         libimf.so->pow()       (112 cycles)

sin:
           libm.so->sin()       (122 cycles)
         libimf.so->sin()       (85 cycles)

sqrt:
           libm.so->sqrt()      (86 cycles)
         libimf.so->sqrt()      (71 cycles)

tan:
           libm.so->tan()       (169 cycles)
         libimf.so->tan()       (127 cycles)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum