Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Latest LinuxDNA kernel benchmarks - icc faster than gcc
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
Thaidog
Veteran
Veteran


Joined: 19 May 2004
Posts: 1053

PostPosted: Sat Feb 06, 2010 10:01 pm    Post subject: Latest LinuxDNA kernel benchmarks - icc faster than gcc Reply with quote

Here is a post of the latest benchmarks we have done for our ICC compiled LinuxDNA kernel patch. The results are impressive. Context switching is particularly impressive - benchmarks were done with LMbench 3.0 (Linus' favorite bench):

Code:

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem   scal
                                                     pages line   par   load
                                                           bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
atom-gcc  Linux 2.6.33-        x86_64-linux-gnu 1600          64 1.0000    1
atom-icc  Linux 2.6.33-        x86_64-linux-gnu 1600          64 1.0000    1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
atom-gcc  Linux 2.6.33- 1600 0.20 0.36 1.97 5.81 7.44 0.49 2.69 461. 1417 4756
atom-icc  Linux 2.6.33- 1600 0.21 0.36 2.07 5.85 7.18 0.49 2.90 332. 1320 4565

Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host                 OS  intgr intgr  intgr  intgr  intgr
                          bit   add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
atom-gcc  Linux 2.6.33- 0.6400 0.4100 0.2500   40.4   40.5
atom-icc  Linux 2.6.33- 0.6300 0.4100 0.2800   40.1   40.2

Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS int64  int64  int64  int64  int64
                         bit    add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
atom-gcc  Linux 2.6.33-  0.630        0.7600   96.2   96.5
atom-icc  Linux 2.6.33-  0.630        0.7800   95.3   96.0

Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host                 OS  float  float  float  float
                         add    mul    div    bogo
--------- ------------- ------ ------ ------ ------
atom-gcc  Linux 2.6.33- 3.1000 2.4900   20.8   28.0
atom-icc  Linux 2.6.33- 3.1100 2.4900   20.6   27.8

Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS  double double double double
                         add    mul    div    bogo
--------- ------------- ------  ------ ------ ------
atom-gcc  Linux 2.6.33- 3.1000 3.1300   39.1   47.0
atom-icc  Linux 2.6.33- 3.1100 3.1200   38.8   46.7

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
atom-gcc  Linux 2.6.33-   15.0   16.6   13.5   17.9   24.2    22.9    27.0
atom-icc  Linux 2.6.33- 4.0300 7.2000 4.3400 9.0700   14.2    12.3    17.8

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
atom-gcc  Linux 2.6.33-  15.0  34.6 34.2  71.1        90.9       158.
atom-icc  Linux 2.6.33- 4.030  11.5 24.1  49.5        73.2       216.

*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS   UDP  RPC/  TCP   RPC/ TCP
                               UDP         TCP  conn
--------- ------------- ----- ----- ----- ----- ----
atom-gcc  Linux 2.6.33-
atom-icc  Linux 2.6.33-

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
atom-gcc  Linux 2.6.33-   40.2   30.3   96.7   44.0   46.8K 0.896         3.713
atom-icc  Linux 2.6.33-   46.4   34.6  101.1   45.3   45.2K 1.017         3.575

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
atom-gcc  Linux 2.6.33- 522. 1106 384. 1189.1 2970.8  963.9  968.3 2401 1130.
atom-icc  Linux 2.6.33- 767. 620. 350. 1196.4 2993.0  986.9  978.8 2430 1145.

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
--------- -------------   ---   ----   ----    --------    --------    -------
atom-gcc  Linux 2.6.33-  1600 1.9090 9.6120        39.7       286.8
atom-icc  Linux 2.6.33-  1600 1.9010 9.5580        39.2       284.3
make[1]: Leaving directory `/root/lmbench-3.0-a9/lmbench-3.0-a9/results'


icc kernel is compiled with -O3 -xSSE3_ATOM -ip -fp-model fast=2 -unroll-aggressive -vec-guard-write
gcc kernel is compiled with -O3 -march=atom -mtune=atom

Tests were done on an Atom 330 dual core 64bit cpu.
_________________
Registered Linux User: 437619
"I'm a big believer in technology over politics" - Linus Torvalds
Back to top
View user's profile Send private message
r3tep
Tux's lil' helper
Tux's lil' helper


Joined: 10 Sep 2005
Posts: 108

PostPosted: Sun Feb 07, 2010 10:35 am    Post subject: Reply with quote

Did you any benchmarking with compatible cpu's not manufactured by Intel?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43599
Location: 56N 3W

PostPosted: Sun Feb 07, 2010 12:18 pm    Post subject: Reply with quote

Moved from Gentoo Chat to Unsupported Software.

There is nothing Gentoo related there
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Thaidog
Veteran
Veteran


Joined: 19 May 2004
Posts: 1053

PostPosted: Sun Feb 07, 2010 5:59 pm    Post subject: Reply with quote

r3tep wrote:
Did you any benchmarking with compatible cpu's not manufactured by Intel?


Not yet... we don't have the hardware right now. If anyone has an AMD chip they can try it though :)
_________________
Registered Linux User: 437619
"I'm a big believer in technology over politics" - Linus Torvalds
Back to top
View user's profile Send private message
Sadako
Advocate
Advocate


Joined: 05 Aug 2004
Posts: 3789
Location: sleeping in the bathtub

PostPosted: Sun Feb 07, 2010 6:16 pm    Post subject: Reply with quote

Just curious, why -O3?
The L1 cache on the atom is fairly small, a little less than the pentium 2 AFAIK, so it'd be interesting to see it what the comparision is like with -Os, too.

Also, looking at the results, while intel is a clear winner wrt context switching as you pointed out, in most other areas gcc is ahead as often as behind.

-march=atom, so you used a build of gcc 4.5?

Thaidog wrote:
r3tep wrote:
Did you any benchmarking with compatible cpu's not manufactured by Intel?


Not yet... we don't have the hardware right now. If anyone has an AMD chip they can try it though :)
I could try it (just got a shiny new phenom II 965, \o/), but I read before that binaries compiled via ICC are more or less intentionally gimped when running on non-intel processors, any idea if this is the case or not?

edit: does ICC require multilib on x86_64?
If not, then there's no reason I couldn't try it.

edit #2: forget what I said about the atom l1 cache, the pentium 2 only had 32 KB in total, I though it had 32 + 32 for instruction + data, my bad...
The core 2's up to the i7's only have 32 + 32, so the 32 + 24 on the atom is much better than I had thought.
_________________
"You have to invite me in"
Back to top
View user's profile Send private message
Thaidog
Veteran
Veteran


Joined: 19 May 2004
Posts: 1053

PostPosted: Sun Feb 07, 2010 11:09 pm    Post subject: Reply with quote

Hopeless wrote:
Just curious, why -O3?
The L1 cache on the atom is fairly small, a little less than the pentium 2 AFAIK, so it'd be interesting to see it what the comparision is like with -Os, too.

Also, looking at the results, while intel is a clear winner wrt context switching as you pointed out, in most other areas gcc is ahead as often as behind.

-march=atom, so you used a build of gcc 4.5?

Thaidog wrote:
r3tep wrote:
Did you any benchmarking with compatible cpu's not manufactured by Intel?


Not yet... we don't have the hardware right now. If anyone has an AMD chip they can try it though :)
I could try it (just got a shiny new phenom II 965, \o/), but I read before that binaries compiled via ICC are more or less intentionally gimped when running on non-intel processors, any idea if this is the case or not?

edit: does ICC require multilib on x86_64?
If not, then there's no reason I couldn't try it.

edit #2: forget what I said about the atom l1 cache, the pentium 2 only had 32 KB in total, I though it had 32 + 32 for instruction + data, my bad...
The core 2's up to the i7's only have 32 + 32, so the 32 + 24 on the atom is much better than I had thought.


O3 is actually the default for compiling the kernel for both GCC and ICC unless you choose optimize for size where it's then -Os. The other benchmarks are close but the average for all other benchmarks but one ICC wins the category - and since context switching is close to almost %50 percent faster that makes for a seriously noticeable performance increase. Especially for things like multitasking.

There are a few files in the kernel that are still 32bit that need to be compiled with icc:

the files under: arch/x86/boot/*
and: arch/x86/kernel/acpi/realmode/*

There are ways around non-native code execution on non-Intel cpus _ I think this thread has the info in it:

http://groups.google.com/group/linuxdna/browse_thread/thread/c43035b9512c6ace/2b72ff9a47cff6fc?lnk=gst&q=64+bit+linuxdna#2b72ff9a47cff6fc

More than likely you need Zack's modified iccvars_intel64.sh file that can be downloaded off the google group. Look for it under "Files".
_________________
Registered Linux User: 437619
"I'm a big believer in technology over politics" - Linus Torvalds
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum