Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] 4.9.16: System does not use all cores to compile
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
seVes
n00b
n00b


Joined: 06 Jan 2011
Posts: 54
Location: Germany

PostPosted: Wed Apr 05, 2017 2:20 am    Post subject: [SOLVED] 4.9.16: System does not use all cores to compile Reply with quote

Hey guys, i'm not pretty sure if this is related to this thread too, but maybe it is.

First let me say, it's amazing to see, how you investigate that. 8O 8)

I'm running 4.9.16 on a remote-machine with i7-4770 Haswell and my laptop with i7-3520M Ivybridge.

laptop uname -a:

Linux x230 4.9.16-gentoo #2 SMP PREEMPT Sat Apr 1 14:19:51 CEST 2017 x86_64 Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz GenuineIntel GNU/Linux


laptop lscpu:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Stepping:              9
CPU MHz:               1279.370
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5786.71
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts


laptop dmesg:

[    0.000000] Linux version 4.9.16-gentoo (root@x230) (gcc version 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4) ) #2 SMP PREEMPT Sat Apr 1 14:19:51 CEST 2017
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: Allowing 8 CPUs, 4 hotplug CPUs
[    0.031691] Freeing SMP alternatives memory: 28K (ffffffff81d7a000 - ffffffff81d81000)
[    0.033922] smpboot: Max logical packages: 4
[    0.044368] smpboot: CPU0: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz (family: 0x6, model: 0x3a, stepping: 0x9)
[    0.061462] x86: Booting SMP configuration:
[    0.061469] .... node  #0, CPUs:      #1 #2 #3
[    0.261698] x86: Booted up 1 node, 4 CPUs
[    0.261703] smpboot: Total of 4 processors activated (23159.82 BogoMIPS)


laptop make.conf:

CFLAGS="-march=ivybridge -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CPU_FLAGS_X86="aes avx mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3"
CXXFLAGS="${CFLAGS}"
EMERGE_DEFAULT_OPTS="--jobs 4 --load-average 4.0"
MAKEOPTS="--jobs 4 --load-average 4.0"


remote i7 uname -a:

Linux ex40 4.9.16-gentoo #4 SMP PREEMPT Wed Apr 5 03:38:22 CEST 2017 x86_64 Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz GenuineIntel GNU/Linux


remote i7 lscpu:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Stepping:              3
CPU MHz:               3903.234
CPU max MHz:           3900.0000
CPU min MHz:           800.0000
BogoMIPS:              6799.87
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts


remote i7 dmesg:

[    0.000000] Linux version 4.9.16-gentoo (root@ex40) (gcc version 4.9.4 (Gentoo 4.9.4 p1.0, pie-0.6.4) ) #4 SMP PREEMPT Wed Apr 5 03:38:22 CEST 2017
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.006827] Freeing SMP alternatives memory: 24K (ffffffff81d5f000 - ffffffff81d65000)
[    0.015727] smpboot: Max logical packages: 2
[    0.026250] smpboot: CPU0: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (family: 0x6, model: 0x3c, stepping: 0x3)
[    0.043277] x86: Booting SMP configuration:
[    0.043345] .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
[    0.522637] x86: Booted up 1 node, 8 CPUs
[    0.522737] smpboot: Total of 8 processors activated (54424.93 BogoMIPS)


remote i7 make.conf:

CFLAGS="-march=haswell -mabm -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3"
CXXFLAGS="${CFLAGS}"
EMERGE_DEFAULT_OPTS="--jobs 8 --load-average 7.2"
MAKEOPTS="--jobs 8 --load-average 7.2"


Screenshot between emerging ncftp on Laptop and Remote machine:
https://picload.org/image/rcrodciw/emerge_compare.png

As you can see, on my Laptop it compile on all cores this little package. On the remote machine it always (similar each package) use only core 7 ( 8 ).

Kernel configs from Laptop and Remote machine:
https://files.fm/u/un9js5z3

--------------

I would really apricate it, when you can take a closer look into this.

I didn't found any issue and started investigating this, since with kernel >4.4 nftables started to getting slow down (very very slow).
Don't know if this is related to this as well, anyway...

Thanks for helping!!

[Moderator edit: changed [quote] tags to [code] tags to preserve output layout. Cleaned up labeling accordingly.
Split from Only one core using kernel 4.9
-Hu]

_________________
Alex / seVes


Last edited by seVes on Thu Apr 13, 2017 4:53 pm; edited 2 times in total
Back to top
View user's profile Send private message
donmartio
Apprentice
Apprentice


Joined: 11 Dec 2004
Posts: 233

PostPosted: Thu Apr 06, 2017 5:55 am    Post subject: Reply with quote

Hi,

your problem is different, since you get your cpu's recognized in both cases.
This looks like a compiler switch issue.

Could you post the output of gcc-config -l ?

You may try MAKEOPTS="-j9" corresponding to this : https://wiki.gentoo.org/wiki//etc/portage/make.conf#MAKEOPTS.
_________________
Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live.
Back to top
View user's profile Send private message
seVes
n00b
n00b


Joined: 06 Jan 2011
Posts: 54
Location: Germany

PostPosted: Thu Apr 06, 2017 8:02 am    Post subject: Reply with quote

Oh, okay... :oops:

Setting MAKEOPTS to different jobs doesn't take effect to use more than core 7.

On both machines, i'm using stable gcc 4.9.4

Quote:
gcc-config -l
[1] x86_64-pc-linux-gnu-4.9.4 *

_________________
Alex / seVes
Back to top
View user's profile Send private message
donmartio
Apprentice
Apprentice


Joined: 11 Dec 2004
Posts: 233

PostPosted: Thu Apr 06, 2017 6:10 pm    Post subject: Reply with quote

Hmm this is weird, can't imagine what causes this behaviour.

It's a shot in the dark, but you may try

CFLAGS="-O2 -pipe -march=native"
_________________
Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13836

PostPosted: Fri Apr 07, 2017 1:15 am    Post subject: Reply with quote

You set --load-average 7.2 on the affected machine. On a compile-heavy workload, this could cause make to decide not to spawn more jobs. Raise --load-average to match --jobs, or remove it entirely.
Back to top
View user's profile Send private message
seVes
n00b
n00b


Joined: 06 Jan 2011
Posts: 54
Location: Germany

PostPosted: Fri Apr 07, 2017 12:48 pm    Post subject: Reply with quote

@Hu:
Thanks for splitting, i'd like to do that today, but you were faster. ;-)

--load-average 7.2 was added because i choose from the wiki cores n * 0.9 (8*0.9).

No difference if i select 8.0 or just nothing here.
Doesn't matter which package i choose like a small one e.g. ca-certificates or larger ones like glibc.

Always just the last core is used for compiling.

--------------------------------------------------------------------------------

@donmartio:
Doesn't matter if i choose native, haswell or core2 here.

Always just the last core is used for compiling.

Same problem on 4.9.6-r1.

--------------------------------------------------------------------------------

Maybe something is misconfigured in my .config, but i don't see it?
_________________
Alex / seVes
Back to top
View user's profile Send private message
donmartio
Apprentice
Apprentice


Joined: 11 Dec 2004
Posts: 233

PostPosted: Fri Apr 07, 2017 7:12 pm    Post subject: Reply with quote

Ok, would have been strange but at least worth a try.
Just out of curiosity, did you try an older kernel on the remote machine?

I'll dig in your config a little bit, but at the moment i don't really have an idea what i'm looking for.
_________________
Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Fri Apr 07, 2017 8:32 pm    Post subject: Reply with quote

seVes,

If you can reproduce this behavior, please post your dmesg from a restart.
I would like to see everything from dmesg up to the login prompt.

I'm tempted to build a kernel from your config to see if I can reproduce the effect but I'll look at dmesg first.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13836

PostPosted: Sat Apr 08, 2017 12:57 am    Post subject: Reply with quote

Interesting. From the posts made prior to my most recent one above, I thought the problem was that the build would use only 7 cores out of 8 installed/detected cores on the system, which is why I looked at the --load-average, since that could prevent make from launching job #8. From the more recent post, I now understand that it is using exactly one core and that the core it uses is core #7. That is strange. What is the output of taskset -p $$, as run from the same prompt that launched the emerge that uses only 1 core?
Back to top
View user's profile Send private message
seVes
n00b
n00b


Joined: 06 Jan 2011
Posts: 54
Location: Germany

PostPosted: Tue Apr 11, 2017 4:54 pm    Post subject: Reply with quote

Sorry for the delay, i was on a business trip.

@donmartio:
Sure, i was using 4.4.6 before without any problems.

Trying to compile now to go back.

@NeddySeagoon:
Do you need from my laptop as well?

https://drive.google.com/file/d/0B-3_L8gKAknkVlE4TjQ5YUl1Y1E/view?usp=sharing

@Hu:
I give the output from remote and laptop too. It differs, but i don't know what this is.
laptop taskset -p $$:
x230 ~ # taskset -p $$
pid 6468's current affinity mask: f
remote taskset -p $$:
ex40 ~ # taskset -p $$
pid 5060's current affinity mask: 80

_________________
Alex / seVes


Last edited by seVes on Tue Apr 11, 2017 5:38 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Tue Apr 11, 2017 5:34 pm    Post subject: Reply with quote

seVes,

Your dmesg looks much the same as mine. The notable exceptions are due to my system running -hardened.
My next step is to try to replicate your observed behaviour in a KVM running on my system.

It may take several kernel builds to get one that boots :)

Then attempt to replicate the problem.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
seVes
n00b
n00b


Joined: 06 Jan 2011
Posts: 54
Location: Germany

PostPosted: Tue Apr 11, 2017 5:37 pm    Post subject: Reply with quote

I shared all files now over Google Drive, you can access here:

https://drive.google.com/drive/folders/0B-3_L8gKAknkMEZHYmE2aFBDOTQ?usp=sharing

Compiled 4.4.52 with the config from 4.9.16 and the same behaviour occurs.

I shared config-4.4.52-remote and dmesg-4.4.52-remote, see above.

Seems that isn't kernel related, maybe more gcc or so?
_________________
Alex / seVes
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Tue Apr 11, 2017 8:41 pm    Post subject: Reply with quote

seVes,

End of day update.
I've brought all the bits together, built your kernel with your config. Added in htop and lz4, so it is exactly your kernel and tried to boot.

The console goes blank for 15 sec then I get the grub menu back.
Fixing the console driver so I can see what happens is the next step

htop confirmed that your kernel build used all available cores. Its only two just now but I can increase that.
It will do for testing.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jburns
Veteran
Veteran


Joined: 18 Jan 2007
Posts: 1048
Location: Massachusetts USA

PostPosted: Tue Apr 11, 2017 9:04 pm    Post subject: Reply with quote

    ex40 ~ # taskset -p $$
    pid 5060's current affinity mask: 80
shows that something has told the task to only use the 8th core. The affinity mask would be ff if all 8 cores were to be used. Try adding
Code:
taskset 0xff
in front of the emerge command.

Edit added 0x to mask
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Wed Apr 12, 2017 4:42 pm    Post subject: Reply with quote

seVes,

I've moved your kernel to a KVM, with only the changes required to get it to boot there.

Booting it, going to the kernel tree and runnning
Code:
make clean
make -j4
shows it use both available cores.

Code:
emerge sys-devel/gcc
builds gcc on both available cores.

I can add a few more cores to the KVM if you think it matters but I'm fairly sure from the demonstration that the problem is not the kernel.
You can review the config I ended up with and diff it with your own.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
seVes
n00b
n00b


Joined: 06 Jan 2011
Posts: 54
Location: Germany

PostPosted: Thu Apr 13, 2017 4:45 pm    Post subject: Reply with quote

Holy crap...

I started investigating this again, again and again.

After we didn't found any issue with the kernel itself and in 4.4.52 it occurs too, i looked into the whole server configs.

According to @jburns post, i looked to any init.d's and saw for some reason, sshd was launched with taskset -c 7. :roll:
This must be done with a misconfigured sed command i run weeks ago, because i'm launching some stuff to specific cores (frame calculation).

Ok no worries, i added a sshd restart to crontab, and logged off from all running shell sessions.
After coming back again, suprise suprise, any make is not running on core7 anymore. It's core0 now.

Investigating this again and saw, dcron is fixed via taskset -c 0 to core0.
So with that, restarting sshd over dcron wasn't successful.

Tried restarting dcron with no fixed cores, but for some reason, it worked first after a remote restart.

Now, after all, it is fixed and the thread could be marked as solved.
Finally.

If i could, i would spend some coffee for your effort. :oops: :oops:

Anyway, i've a memory leak with nftables beside the core "problem", maybe we'll see us in the other thread. :lol:

Thanks for investigating and giving hints!
_________________
Alex / seVes
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13836

PostPosted: Fri Apr 14, 2017 1:51 am    Post subject: Reply with quote

You could instead have used taskset in its other mode to change the affinity of the running processes, rather than trying to restart them with clean ancestry.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum