Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] gentoo-sources 4.11.0 su does not work anymore
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
miket
Guru
Guru


Joined: 28 Apr 2007
Posts: 411
Location: Gainesville, FL, USA

PostPosted: Fri May 05, 2017 4:30 pm    Post subject: Reply with quote

I have no system affected one way or another but one possible villian comes to mind: seats. Recall that awful chain from display manager to consolekit to polkit coupled with the ugly concept of multiseat computers. The seat you're in is supposed to matter; it may be causing havoc now. For example, some change in the kernel could have made it so that the display manager thinks that whatever it uses to identify the console hardware to later steps in the chain is somewhat different. There may be some rogue rule in that nasty Javascript that polkit uses that is now interpreted a bit differently since the kernel change. There could be a change in the way that DBus communicates the value.

In any event, the effect is that you were ejected from your set. You'd no longer be authorized to use su.

As a check, you could see if your DE lets you do other things that you would normally do, such as mount USB sticks or shut down the machine.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7128
Location: almost Mile High in the USA

PostPosted: Fri May 05, 2017 6:08 pm    Post subject: Reply with quote

Based on the evidence so far, it does look like a seat issue with polkit (but not consolekit, as systemd does not use consolekit - it's built in). Also is there a full DE for Enlightenment?

Make sure etc-update is up to date with all the polkit files too.

---

Another test: I logged into a console virtual terminal, and startx -- :1 with .xinitrc starting just "vte" ... su still works.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Sat May 06, 2017 2:03 pm    Post subject: Reply with quote

Polkit is up to date.
Code:
emerge -pvO polkit

These are the packages that would be merged, in order:

[ebuild   R    ] sys-auth/polkit-0.113-r2::gentoo  USE="gtk introspection nls pam systemd -elogind -examples -jit -kde (-selinux) {-test}" 0 KiB

and I never touch it's config files.

Tried without pam flag, same error. I wanted to know if it's a permission error or something and I did a chmod 644 /etc/shadow.
su throw "setgid: Operation not permitted" and journalctl:
Code:
mai 06 16:52:27 gentoo su[1137]: Successful su for root by costel
mai 06 16:52:27 gentoo su[1137]: + /dev/pts/0 costel:root
mai 06 16:52:27 gentoo su[1137]: bad group ID `0' for user `root': Operation not permitted

But root has uid and guid 0 !?

I installed gcc-7.1.0 and I did a full system and world rebuild. Now, even xfce is affected...
It wouldn't mind me to fully reinstall gentoo from scratch, but I am afraid that the result would be the same.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Sat May 06, 2017 2:13 pm    Post subject: Reply with quote

costel78 wrote:
It wouldn't mind me to fully reinstall gentoo from scratch, but I am afraid that the result would be the same.
... and even if that fixes it, it would be very hard afterwards to pinpoint what caused it in the first place. Anyways. My intuition says the same - the problem would not vanish in complete reinstall.

Just out of curiosity: log in (via virtual console maybe) as root and run
as root:
id root
.. and paste the results.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Sat May 06, 2017 2:21 pm    Post subject: Reply with quote

Code:
uid=0(root) gid=0(root) grupuri=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),26(tape),27(video)

grupuri translate to groups, they are similar, anyway.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Sat May 06, 2017 2:33 pm    Post subject: Reply with quote

Okay... I'm not an expert on internals, or configuration of pam and su but
this error message:
su[1137]: bad group ID `0' for user `root': Operation not permitted
... is something to watch...
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Mon May 15, 2017 8:48 am    Post subject: Reply with quote

Problem persist with 4.11.1 and e developers keep blaming kernel, but only e17 is affected from all DE...
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Mon May 15, 2017 4:49 pm    Post subject: Reply with quote

I'm currently on 4.11. Works fine. I'm running lightdm+i3.

costel78 wrote:
e developers keep blaming kernel
That sounds like ridiculous attitude. Do you know if they have even tried searching the root cause of the problem?
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
konspiracy
n00b
n00b


Joined: 22 Sep 2015
Posts: 6

PostPosted: Mon May 15, 2017 11:36 pm    Post subject: Reply with quote

Working for me with git sources.....
Systemd
Code:

~
➔ su
Password:
muh shane # screenfetch
         -/oyddmdhs+:.                root@muh.rig
     -odNMMMMMMMMNNmhy+-`             OS: Gentoo
   -yNMMMMMMMMMMMNNNmmdhy+-           Kernel: x86_64 Linux 4.11.0-rc8
 `omMMMMMMMMMMMMNmdmmmmddhhy/`        Uptime: 23h 16m
 omMMMMMMMMMMMNhhyyyohmdddhhhdo`      Packages: 1032
.ydMMMMMMMMMMdhs++so/smdddhhhhdm+`    Shell: bash 4.4.12
 oyhdmNMMMMMMMNdyooydmddddhhhhyhNd.   Resolution: 1920x1080
  :oyhhdNNMMMMMMMNNNmmdddhhhhhyymMh   DE: GNOME
    .:+sydNMMMMMNNNmmmdddhhhhhhmMmy   WM: GNOME Shell
       /mMMMMMMNNNmmmdddhhhhhmMNhs:   WM Theme:
    `oNMMMMMMMNNNmmmddddhhdmMNhs+`    GTK Theme: Adwaita [GTK2/3]
  `sNMMMMMMMMNNNmmmdddddmNMmhs/.      Icon Theme: Adwaita
 /NMMMMMMMMNNNNmmmdddmNMNdso:`        Font: Cantarell 11
+MMMMMMMNNNNNmmmmdmNMNdso/-           CPU: Intel Core i5-6600K @ 4x 4.6GHz [31.0°C]
yMMNNNNNNNmmmmmNNMmhs+/-`             GPU: GeForce GTX 960
/hMMNNNNNNNNMNdhs++/-`                RAM: 1280MiB / 16015MiB
`/ohdmmddhys+++/:.`                 
  `-//////:--.                       
muh shane #
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Wed May 17, 2017 3:02 pm    Post subject: Reply with quote

Thanks, but only e17 (ver. 19) is affected. Even old e16 works :)
Did a full reinstall. Now xfce4 and xorg work, only enlightenment 19 fail.

I would report the error on kernel, but, what exactly to blame, which subsystem ? And, more probably, it's e19 problem, otherwise others DE would be affected.

Later: Reported on kernel Bugzilla. It has became very tiresome to switch back and forth to console.
https://bugzilla.kernel.org/show_bug.cgi?id=195799
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 168

PostPosted: Wed May 17, 2017 5:49 pm    Post subject: Reply with quote

costel78 wrote:
I would report the error on kernel, but, what exactly to blame, which subsystem ? And, more probably, it's e19 problem, otherwise others DE would be affected.

If you have the patience you could try doing a kernel git bisect.
https://wiki.gentoo.org/wiki/Kernel_git-bisect

It's tedious and usually takes a few hours.
If you end up with an unbootable kernel or a kernel that doesn't build you can use "git bisect skip" to skip the bad commit.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Wed May 17, 2017 7:28 pm    Post subject: Reply with quote

tholin wrote:
It's tedious and usually takes a few hours.
If you end up with an unbootable kernel or a kernel that doesn't build you can use "git bisect skip" to skip the bad commit.
That process just screams for something automated...
At least something that can compile several git versions in a row. That also needs a bigger than normal /boot. Maybe best solution is to temporarily put /boot on a USB stick.

I'll raise my hat for OP if he pulls this off by hand. Either way, it's a great service for the Linux community if OP does it and can pinpoint the exact git commit that caused that problem.

What's making things bit worse is that there have been heaps of commits between those kernel versions. Since if the bug was introduced in 4.11.0... Although there's always the efficient way to test a kernel that has a git commit in half the way from 4.10.x to 4.11.0. Then depending if the bug still exist on the kernel being tested, move again half the way... and so on. This way even million commits isn't that much.


This is now railing a bit out of topic but I started to think the process bit more...

This (very crude) bash script demonstrates the process:
bash script to count the steps to find correct number:
#!/bin/bash

commit="$1"
guilty="$2"
n=0
let "jump=${commit}/2"

while [ "$commit" -ne "$guilty" ]
do
    let "n=${n}+1"

    if [ $guilty -gt $commit ]
    then
        let "commit=${commit}+${jump}"
    elif [ $guilty -lt $commit ]
    then
        let "commit=${commit}-${jump}"
    fi

    let "jump=${jump}/2+1"

    case "$jump" in
        2)
            jump=1
        ;;
    esac

    echo "Trying ${commit}..."
done

echo "Guilty number is ${commit}. Took ${n} rounds to find."

By running it using one of the most "distant" number:
findguilty.sh 1000000 500001:
Trying 500000...
Trying 750001...
Trying 625000...
Trying 562499...
Trying 531248...
Trying 515622...
Trying 507808...
Trying 503900...
Trying 501945...
Trying 500967...
Trying 500477...
Trying 500231...
Trying 500107...
Trying 500044...
Trying 500012...
Trying 499995...
Trying 500004...
Trying 499999...
Trying 500002...
Trying 500001...
Guilty number is 500001. Took 20 rounds to find.
It takes 20 kernels to test to find the one that has the bug from the million (imaginary) commits between 4.10.x and 4.11.0.

So... Only by choosing commits carefully it's not that bad task. Unless the compilation process takes a lot of time.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...


Last edited by Zucca on Wed May 17, 2017 8:22 pm; edited 1 time in total
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Wed May 17, 2017 8:10 pm    Post subject: Reply with quote

I will also try git bisect, but not right now. Let's wait what kernel developers have to say.
Really understand enlightenment developers position as this bug is very strange, but why only e19, not even their own e16 ?
It could be really a change in the kernel and git bisect wouldn't be a complete vaste of time, but also could be a good fix in the kernel which trigler a hidden error in e.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13831

PostPosted: Thu May 18, 2017 2:14 am    Post subject: Reply with quote

Zucca wrote:
tholin wrote:
It's tedious and usually takes a few hours.
That process just screams for something automated...
At least something that can compile several git versions in a row. That also needs a bigger than normal /boot. Maybe best solution is to temporarily put /boot on a USB stick.
Git bisect can be automated, if you can find a way for it to programmatically tell whether the chosen test revision has the problem or not. For environments where the bug report is a failure in a previously automated test suite, this is easy. For this case, where the test is for the user to start e19, open a terminal, and try to su, automation is a few steps harder.
Zucca wrote:
What's making things bit worse is that there have been heaps of commits between those kernel versions. Since if the bug was introduced in 4.11.0... Although there's always the efficient way to test a kernel that has a git commit in half the way from 4.10.x to 4.11.0. Then depending if the bug still exist on the kernel being tested, move again half the way... and so on. This way even million commits isn't that much.
This is exactly why git bisect exists and is so well loved. It will pick good candidate commits on your behalf, run checkout, then wait for you to tell it whether the commit exhibits the bug. This should generally get the number of steps close to the theoretical minimum.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Thu May 18, 2017 9:09 am    Post subject: Reply with quote

Ran a git bisect and found this patch as responsible.
Reverted it and now everything is working fine with 4.11.1. Hurray!!! :lol:

About how smart was to revert it, I do not know. Those of you with more deeply understand of kernel internals have to pronounce about it.
Also updated info on kernel bugzilla.

To be on the safe side, I will stay on 4.10.16 until everything will be clear.
Thank you for your support and infos! I really appreciate!

Oh, do not try a git bisect since 4.10 with gcc-7: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=474c90156c8dcc2fa815e6716cc9394d7930cb9c
I had to reinstall gcc-6.3.1 and start from scratch after few git bisect skip.
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Thu May 18, 2017 10:02 am    Post subject: Reply with quote

costel78 wrote:
Ran a git bisect and found this patch as responsible.
Reverted it and now everything is working fine with 4.11.1. Hurray!!!

Great!
Please report your findigs to e devs. ;)
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 168

PostPosted: Thu May 18, 2017 10:16 am    Post subject: Reply with quote

The kernel got a strict don't break userspace rule. It doesn't matter how broken userspace programs are, if they stop working because of a kernel change it's the kernel's fault (with some exceptions)

That patch looks related to POSIX capabilities which is a part of the security subsystem. Set the bugzilla regression field to yes. That might get some more attention. It doesn't look like there is a specific capabilities or security category in the kernel bugzilla. If you don't get a response in a day or two send a mail to linux-security-module@vger.kernel.org and point to that bugzilla and this thread. Some subsystem maintainers ignore the kernel bugzilla.
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Thu May 18, 2017 11:42 am    Post subject: Reply with quote

Posted on enlightenment bug discussion. https://phab.enlightenment.org/T5470
If nothing change until 24.05.2017 I will send email to kernel system maintainer.

Thank you!
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Thu May 18, 2017 2:02 pm    Post subject: Reply with quote

tholin wrote:
The kernel got a strict don't break userspace rule.
And Linus enforces that rule. We've seen what happens when someone actually breaks userspace... It's not pretty what happens after Linus finds out. :D

But it really seems like a kernel bug... One that's breaking the userspace. Uh oh.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
costel78
Guru
Guru


Joined: 20 Apr 2007
Posts: 310

PostPosted: Thu May 18, 2017 3:20 pm    Post subject: Reply with quote

Joke aside, it was a very small part of userspace, despite annoying as it was to deal with it. It would be different if was also affecting gnome, kde or xfce as their userbase is much bigger than enlightenment.
Seriously, is there a way to test/prevent such isolated cases ?
_________________
Sorry for my English. I'm still learning this language.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 13831

PostPosted: Fri May 19, 2017 1:18 am    Post subject: Reply with quote

It is a small part that we know of. There may be other more popular programs that also break as a result of the same change, but which have not yet been reported because their users have not yet moved to 4.11.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Fri Jun 02, 2017 12:19 pm    Post subject: Reply with quote

I wonder if this issue has been solved in 4.11.3 already?
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 168

PostPosted: Fri Jun 02, 2017 12:35 pm    Post subject: Reply with quote

Zucca wrote:
I wonder if this issue has been solved in 4.11.3 already?

Nope and I don't see the fix in the 4.11 stable queue, but the fix has been confirmed so I guess the fix will eventually land in 4.11.5.

https://www.spinics.net/lists/stable/msg173893.html
Back to top
View user's profile Send private message
Jack Krauser
Apprentice
Apprentice


Joined: 19 Jan 2011
Posts: 208

PostPosted: Tue Jun 13, 2017 5:26 am    Post subject: Reply with quote

It is happening to me rigth now...
I have 4.9.16-gentoo kernel and I don't know what can I do :/
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1519
Location: KUUSANKOSKI, Finland

PostPosted: Tue Jun 13, 2017 9:36 am    Post subject: Reply with quote

Jack Krauser wrote:
I have 4.9.16-gentoo kernel and I don't know what can I do :/
... wait? You have this problem with 4.9.x? The bug was supposedly introduced at 4.11...
Please paste the error message and also the lines that appear at dmesg while you try to su.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum