Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Inconsistency when booting new kernel
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Tue May 07, 2019 3:37 am    Post subject: Inconsistency when booting new kernel Reply with quote

Hello all,

Tonight I updated from gentoo-sources-4.17.14 to 5.0.13. I used 'make olddefconfig' and then went in to make some modifications (nothing related to device drivers though). When I rebooted to use the new kernel, it panicked. I rebooted again, but went into 5.0.13 (recovery mode). I didn't change anything, but just looked at the output of 'blkid' and such to make sure that nothing had unexpectedly changed. When I rebooted again, the kernel booted without problem. Since I didn't change anything between the time that it failed to boot and the time that it booted properly, I'm a bit concerned regarding that inconsistent behaviour. Any ideas? I'm happy to post any troubleshooting information that may be helpful.

Thank you in advance.

Cheers,
Nathan Zachary
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
lfs0a
n00b
n00b


Joined: 19 Oct 2016
Posts: 62

PostPosted: Tue May 07, 2019 2:32 pm    Post subject: Reply with quote

panic at PCI bridge to [bus 15-18] around?
my last not-panic kernel is 4.14.83
the later like 4.19.27 4.19.37 will ramdomly panic
panic at

[ 0.453204] pci 0000:00:1e.0 PCI bridge to [bus 15-18]

and i used 'make olddefconfig' too.
4.14.83 works just fine.
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Tue May 07, 2019 2:43 pm    Post subject: Reply with quote

post your config file for 5.0.13

OR

Check and see if you have
SCHED_MUQSS, SCHED_MC and RQ_MC set

I had inconsistent booting with the shared runqueue enabled.
Boot one time, hang the next. I turned it off "no sharing"
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Tue May 07, 2019 3:19 pm    Post subject: Reply with quote

Strange. I see SCHED_MC, and it is set. However, I don't see any of the options related to the runqueue (like MuQSS) in 5.0.13. :?
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Tue May 07, 2019 3:28 pm    Post subject: Reply with quote

Code:
~ grep -E "CONFIG_SCHED_|CONFIG_RQ_" .config
CONFIG_SCHED_MUQSS=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_SCHED_SMT=y
# CONFIG_SCHED_MC is not set
CONFIG_RQ_NONE=y
# CONFIG_RQ_SMT is not set
# CONFIG_RQ_SMP is not set
# CONFIG_RQ_ALL is not set


Processor Type and Features -> CPU scheduler runqueue sharing (it's just under multicore support)

Edit to add: or as a kernel parm
Code:
    rqshare=    [X86] Select the MuQSS scheduler runqueue sharing type.
            Format: <string>
            smt -- Share SMT (hyperthread) sibling runqueues
            mc -- Share MC (multicore) sibling runqueues
            smp -- Share SMP runqueues
            none -- So not share any runqueues
            Default value is mc


ETA2: I'm assuming you're running MUQSS
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Tue May 07, 2019 3:40 pm    Post subject: Reply with quote

Yeah, that's the part that is strange to me:

Code:

/usr/src/linux # ls -lh /usr/src/
total 12K
lrwxrwxrwx  1 root root   19 May  6 21:40 linux -> linux-5.0.13-gentoo
drwxr-xr-x 27 root root 4.0K May  6 21:38 linux-4.17.14-gentoo
drwxr-xr-x 27 root root 4.0K May  7 10:37 linux-5.0.13-gentoo
drwxr-xr-x 25 root root 4.0K May  6 21:36 linux-5.1.0-gentoo

/usr/src/linux # grep -E "CONFIG_SCHED_|CONFIG_RQ_" .config
# CONFIG_SCHED_AUTOGROUP is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_SCHED_HRTICK=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_SCHED_INFO=y
# CONFIG_SCHED_STACK_END_CHECK is not set
# CONFIG_SCHED_TRACER is not set

_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Tue May 07, 2019 3:51 pm    Post subject: Reply with quote

:oops: I forgot I run zen kernel w/muqss, you probably don't have the muqss option.

In that case I have no idea what's causing your problem.

In the time where it crashed did it get far enough to leave a /var/log/dmesg file? (you would need to have a rescue cd to get it because it gets overwritten with each boot).

The time it crashed, what did you do before that power off reset, or ctrl-alt-delete/reboot. And was it the same the next time it ran successful?
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Tue May 07, 2019 4:09 pm    Post subject: Reply with quote

The last thing that I saw was the VFS message about attempting to mount root, and I believe the exact message was:

Code:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,20)


I will probably need to do some further tests when I have time to see if I can get it to fail again. However, since it is up and running right now, I may just let it go until the next time I do a kernel upgrade.
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Thu Jun 13, 2019 3:54 am    Post subject: Reply with quote

I'm still open to suggestions here. Without changing anything at all, it will boot about once out of every 6-10 attempts.
That type of inconsistency leads me to believe that there is some type of regression with the 5.0 kernels.
Thoughts?
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
richard77
Apprentice
Apprentice


Joined: 21 Apr 2004
Posts: 277

PostPosted: Mon Jun 17, 2019 8:19 am    Post subject: Reply with quote

I'm not sure is relevant, but since 4.20 iommu is on by default. You could try to add intel_iommu=off to the kernel command line
_________________
Fletto i muscoli e sono nel vuoto
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7197

PostPosted: Mon Jun 17, 2019 11:01 am    Post subject: Reply with quote

you could make a diff of working dmesg and non booting one
* for working one, just save dmesg somewhere
* for non working one, you have to reboot into an usb/livecd/dvd in order to keep the "bad" dmesg alive, mount it, and save it

however, while a kernel race is still possible, most of the time, the assumption that "I didn't change anything" is wrong (plug-in an usb device might change boot order, using wrong blkid on "normal" boot command line fails while in your recovery menu the good blkid is use...). But let's assume you're right there, so for me the only answer to this must be in dmesg.

what i would first look is when i get the "unknown-block" message, kernel list the partitions and disks it could see, empty list mean no working driver, and a list just probably say that the command line wasn't good.
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Mon Jun 17, 2019 11:15 am    Post subject: Reply with quote

richard77 wrote:
I'm not sure is relevant, but since 4.20 iommu is on by default. You could try to add intel_iommu=off to the kernel command line


Or just disable it completely with iommu=off, I'm not sure if intel_iommu disables it completely or just the intel specific stuff.

I've long been using iommu=pt though as I run vm's and you need iommu for that.
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Wed Jun 19, 2019 11:04 pm    Post subject: Reply with quote

I experimented with various iommu kernel parameters, but they didn't help.
I would like to go to the 5.1 kernels as soon as the nvidia-drivers work with them.
My intention is to build a 5.1 kernel from scratch, but I have to wait for nvidia.
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Thu Jun 20, 2019 12:13 am    Post subject: Reply with quote

Paste your 5.0 .config
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Thu Jun 20, 2019 3:01 am    Post subject: Reply with quote

Thanks, Anon-E-moose, for offering to look over the .config:
https://pastebin.com/xk5NmgRY

Cheers,
Nathan Zachary
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Thu Jun 20, 2019 12:57 pm    Post subject: Reply with quote

A few differences (apart from drivers, intel vs amd, network, etc)

I run CONFIG_PREEMPT_RCU=y vs you CONFIG_TREE_RCU=y but not sure that makes any difference

I run CONFIG_UNINLINE_SPIN_UNLOCK=y vs you
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
and these might make a difference simply because the compiler might do something funny with inlining (just a guess)

And I run CONFIG_PREEMPT=y vs you CONFIG_PREEMPT_VOLUNTARY=y but not sure this makes a difference other than responsiveness for things like desktop.

Other than the above, you are using the gcc 9.1 compiler and not sure that makes a difference, but if you have 8.* installed you might use that and see if there are still problems.
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Tue Jun 25, 2019 10:13 pm    Post subject: Reply with quote

krinn wrote:
you could make a diff of working dmesg and non booting one
* for working one, just save dmesg somewhere
* for non working one, you have to reboot into an usb/livecd/dvd in order to keep the "bad" dmesg alive, mount it, and save it

however, while a kernel race is still possible, most of the time, the assumption that "I didn't change anything" is wrong (plug-in an usb device might change boot order, using wrong blkid on "normal" boot command line fails while in your recovery menu the good blkid is use...). But let's assume you're right there, so for me the only answer to this must be in dmesg.

what i would first look is when i get the "unknown-block" message, kernel list the partitions and disks it could see, empty list mean no working driver, and a list just probably say that the command line wasn't good.


Since it doesn't mount the root volume (on the times that it panics), there won't be a dmesg to look at (even from a live environment).
When I see the "unknown-block" message, I see partitions listed (it shows sdf* partitions, but I don't see one for any of the sdg* partitions [sdg4 is my root]).
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Tue Jun 25, 2019 10:29 pm    Post subject: Reply with quote

NathanZachary wrote:
When I see the "unknown-block" message, I see partitions listed (it shows sdf* partitions, but I don't see one for any of the sdg* partitions [sdg4 is my root]).


What devices are sdf, sdg, usb, nfs, ???
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44178
Location: 56N 3W

PostPosted: Tue Jun 25, 2019 11:19 pm    Post subject: Reply with quote

NathanZachary,

Don't count on the kernel naming being reliable.
Use root=PARTUUID= on the kernel line and use PARTUUID or UUID in /etc/fstab.

It may not be /dev/sdg that's missing, it may be a normally lower one in the pecking order, so sdg becomes sdf.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Wed Jun 26, 2019 2:24 am    Post subject: Reply with quote

NeddySeagoon wrote:
NathanZachary,

Don't count on the kernel naming being reliable.
Use root=PARTUUID= on the kernel line and use PARTUUID or UUID in /etc/fstab.

It may not be /dev/sdg that's missing, it may be a normally lower one in the pecking order, so sdg becomes sdf.


Thank you for the replies. I actually already have the following in /etc/fstab:

Code:

PARTLABEL=boot      /boot         ext4      noauto,noatime                  0 2
PARTLABEL=swap      none         swap      sw                     0 0
PARTLABEL=rootfs   /         ext4      noatime,discard                  0 1
<snip>
/dev/cdrom      /mnt/cdrom      auto      noauto,ro                  0 0
tmpfs         /var/tmp/portage   tmpfs      size=12G,uid=portage,gid=portage,mode=775,noatime   0 0


Something I didn't think about, though, is how it is referenced in /boot/grub/grub.cfg:

Code:

menuentry 'Gentoo GNU/Linux' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-2c2f74cd-8c48-46ea-8f04-f75631315a3c' {
        load_video
        if [ "x$grub_platform" = xefi ]; then
                set gfxpayload=keep
        fi
        insmod gzio
        insmod part_gpt
        insmod ext2
        set root='hd1,gpt2'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd1,gpt2 --hint-efi=hd1,gpt2 --hint-baremetal=ahci1,gpt2  fddb287d-f875-4a28-aae1-faacd32a2093
        else
          search --no-floppy --fs-uuid --set=root fddb287d-f875-4a28-aae1-faacd32a2093
        fi
        echo    'Loading Linux 5.0.13-gentoo ...'
        linux   /vmlinuz-5.0.13-gentoo root=/dev/sdb4 ro  intel_idle.max_cstate=0
}


However, that's the same as the entry for the 4.17.14 kernel, which is working as intended. I could try putting in the PARTUUID for the root device in the kernel line of grub.cfg if that could potentially help.
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Wed Jun 26, 2019 9:37 am    Post subject: Reply with quote

is it always able to find grub?

If so, before booting, you might see what disks it shows before booting, do this several times until the problem shows, if it's disk showing up out of order then it should show in grub.

https://www.linux.com/learn/how-rescue-non-booting-grub-2-Linux

Edit to add: what are sd* devices? Separate disks or raid volumes. If separate disks, then the above might help find out what's going on, if they're raid volumes, then it's possible that whatever you're using mdadm, etc might be changing the order. Hard to say without knowing what your disk subsystem is.

It would be helpful for you to paste a good boot, (/var/log/dmesg), from both 4.17 and 5.0, so we can see if the order of things has changed.
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Wed Jun 26, 2019 10:57 pm    Post subject: Reply with quote

Yes, it's always able to find GRUB and start to boot, but then will panic with the infamous "Unable to mount root fs on unknown block(8,20)".
Looking at dmesg from a successful boot, I don't see much of a difference:

Code:

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.17.14-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.14-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0
[    0.550760] sd 4:0:0:0: Attached scsi generic sg1 type 0
[    0.550778] sd 4:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[    0.551157] sd 4:0:0:0: [sda] Write Protect is off
[    0.551307] sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    0.551324] sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.592501]  sda: sda1
[    0.593226] sd 4:0:0:0: [sda] Attached SCSI disk
[    0.634253] sd 6:0:0:0: Attached scsi generic sg2 type 0
[    0.634390] sd 6:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/233 GiB)
[    0.634720] sd 6:0:0:0: [sdb] Write Protect is off
[    0.634870] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    0.634892] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.636500]  sdb: sdb1 sdb2 sdb3 sdb4
[    0.637000] sd 6:0:0:0: [sdb] Attached SCSI disk
[    0.670879] sd 7:0:0:0: Attached scsi generic sg3 type 0
[    0.670905] sd 7:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    0.671273] sd 7:0:0:0: [sdc] 4096-byte physical blocks
[    0.671489] sd 7:0:0:0: [sdc] Write Protect is off
[    0.671834] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    0.672013] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.703018] sdhci: Secure Digital Host Controller Interface driver
[    0.703173] sdhci: Copyright(c) Pierre Ossman
[    1.086802]  sdc: sdc1
[    1.087586] sd 7:0:0:0: [sdc] Attached SCSI disk
[    1.123789] EXT4-fs (sdb4): INFO: recovery required on readonly filesystem
[    1.123948] EXT4-fs (sdb4): write access will be enabled during recovery
[    1.424619] EXT4-fs (sdb4): orphan cleanup on readonly fs
[    1.425690] EXT4-fs (sdb4): 3 orphan inodes deleted
[    1.425851] EXT4-fs (sdb4): recovery complete
[    1.431119] EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
[    2.161752] sd 17:0:0:0: Attached scsi generic sg5 type 0
[    2.180498] sd 17:0:0:1: Attached scsi generic sg6 type 0
[    2.201095] sd 17:0:0:2: Attached scsi generic sg7 type 0
[    2.224531] sd 17:0:0:3: Attached scsi generic sg8 type 0
[    2.300726] sd 17:0:0:0: [sdd] Attached SCSI removable disk
[    2.309823] sd 17:0:0:1: [sde] Attached SCSI removable disk
[    2.317124] sd 17:0:0:2: [sdf] Attached SCSI removable disk
[    2.321412] sd 17:0:0:3: [sdg] Attached SCSI removable disk
[    3.802984] EXT4-fs (sdb4): re-mounted. Opts: discard
[    3.867587] Adding 524284k swap on /dev/sdb3.  Priority:-2 extents:1 across:524284k SS
[    3.943773] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
[    3.977674] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)


Code:

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.0.13-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-5.0.13-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0
[    0.544732] sd 4:0:0:0: Attached scsi generic sg1 type 0
[    0.544763] sd 4:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[    0.545143] sd 4:0:0:0: [sda] Write Protect is off
[    0.545531] sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    0.545610] sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.545618] sd 6:0:0:0: Attached scsi generic sg2 type 0
[    0.545676] sd 6:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/233 GiB)
[    0.545688] sd 6:0:0:0: [sdb] Write Protect is off
[    0.545689] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    0.545708] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.547116] sd 7:0:0:0: Attached scsi generic sg3 type 0
[    0.547125]  sdb: sdb1 sdb2 sdb3 sdb4
[    0.547133] sd 7:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    0.547134] sd 7:0:0:0: [sdc] 4096-byte physical blocks
[    0.547146] sd 7:0:0:0: [sdc] Write Protect is off
[    0.547147] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    0.547162] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.548029] sd 6:0:0:0: [sdb] Attached SCSI disk
[    0.548559]  sdc: sdc1
[    0.549144] sd 7:0:0:0: [sdc] Attached SCSI disk
[    0.585152]  sda: sda1
[    0.585723] sd 4:0:0:0: [sda] Attached SCSI disk
[    0.706034] sdhci: Secure Digital Host Controller Interface driver
[    0.706188] sdhci: Copyright(c) Pierre Ossman
[    0.867676] EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)
[    2.411896] sd 17:0:0:0: Attached scsi generic sg5 type 0
[    2.430637] sd 17:0:0:1: Attached scsi generic sg6 type 0
[    2.435230] sd 17:0:0:0: [sdd] Attached SCSI removable disk
[    2.453109] sd 17:0:0:2: Attached scsi generic sg7 type 0
[    2.471624] sd 17:0:0:3: Attached scsi generic sg8 type 0
[    2.481323] sd 17:0:0:1: [sde] Attached SCSI removable disk
[    2.493083] sd 17:0:0:2: [sdf] Attached SCSI removable disk
[    2.504565] sd 17:0:0:3: [sdg] Attached SCSI removable disk
[    2.652581] EXT4-fs (sdb4): re-mounted. Opts: discard
[    2.717874] Adding 524284k swap on /dev/sdb3.  Priority:-2 extents:1 across:524284k SS
[    2.825647] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
[    2.878508] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)


Something that makes me think the order is getting messed up is what I collected from a failed boot of the 5.0.13 kernel. During the failed boot, one of the last messages I see before the panic related to the following partitions and corresponding PARTUUIDs:

Code:

sdf2 --> 7878d6c0-c5d7-4508-81c7-cbc7b7f971a8
sdf3 --> 5e95ba2c-c107-4c58-a419-ef021f3a4d80
sdf4 --> ebc14d26-4071-489e-b14d-9ef2caabe89f


However, after a successful boot of the 5.0.13 kernel, I see that those are actually referred to as sdb{2,3,4}, respectively:

Code:

# blkid
/dev/sdb1: PARTLABEL="grub" PARTUUID="0fc5f2ae-4ad9-479c-af6a-056a0c389d0f"
/dev/sdb2: UUID="fddb287d-f875-4a28-aae1-faacd32a2093" TYPE="ext2" PARTLABEL="boot" PARTUUID="7878d6c0-c5d7-4508-81c7-cbc7b7f971a8"
/dev/sdb3: UUID="5e2ed6f0-8d35-4dda-afb4-5e5f8d641ade" TYPE="swap" PARTLABEL="swap" PARTUUID="5e95ba2c-c107-4c58-a419-ef021f3a4d80"
/dev/sdb4: UUID="2c2f74cd-8c48-46ea-8f04-f75631315a3c" TYPE="ext4" PARTLABEL="rootfs" PARTUUID="ebc14d26-4071-489e-b14d-9ef2caabe89f"
/dev/sda1: UUID="3ba15db9-7c87-45d5-a3a6-4609d47aab3f" TYPE="ext4" PARTLABEL="vmdrive" PARTUUID="c5088cb5-3cf0-48d8-b944-6833e28a0abc"
/dev/sdc1: UUID="69ffede7-aa6b-424c-8dc7-bbbdb309180d" TYPE="ext4" PARTLABEL="data" PARTUUID="4ca590d7-0a52-4b80-aaa4-969c6dfa2d72"


The other two disks listed there are ones that house my VMs and one for all my data (separate from drive containing /boot, swap, and the root filesystem).

All of the sd* devices are separate disks.

Thanks again for your continued help in troubleshooting this strange problem.

Cheers,
Nathan Zachary
_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Wed Jun 26, 2019 11:11 pm    Post subject: Reply with quote

I see sd[defg] are all labelled removable scsi, what are they, usb, esata, scsi, ???

What it sounds like is the ordering of disks is getting messed up, but only occasionally.
4.17 seems to be grabbing things a little quicker, and in order. 5.0 seems a little choppy.
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
NathanZachary
Moderator
Moderator


Joined: 30 Jan 2007
Posts: 2471
Location: /home/zach

PostPosted: Wed Jun 26, 2019 11:19 pm    Post subject: Reply with quote

Anon-E-moose wrote:
I see sd[defg] are all labelled removable scsi, what are they, usb, esata, scsi, ???

What it sounds like is the ordering of disks is getting messed up, but only occasionally.
4.17 seems to be grabbing things a little quicker, and in order. 5.0 seems a little choppy.

Are you using openrc, and do you have rc_parallel turned off?


They are all SATA disks (OS drive is an SSD, and the other two are spinning disks).
I am using OpenRC and the default is set for rc_parallel:
Code:

$ grep 'rc_parallel=' /etc/rc.conf
#rc_parallel="NO"

_________________
“Truth, like infinity, is to be forever approached but never reached.” --Jean Ayres (1972)
---avatar cropped from =AimanStudio---
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4334
Location: Dallas area

PostPosted: Wed Jun 26, 2019 11:26 pm    Post subject: Reply with quote

What I meant was what type subsystem is sdd-sdg, I have sata drives on the mb sata controller, and as usb drives, the onboard sata controller does it's thing first as a group, then the usb sata's fire up, and they do indeed change order (from time to time).

If these are all sata drives on a sata controller, then are there 2 separate controllers?

Again my mb has a main controller sata 1-6 and a second controller for 2 more sata drives.

ETA: I assume this is a desktop, what is the mb make/model?

ETA2: If you haven't tried it, the root=partuuid= option might get rid of some of the problem, at least it should boot and find root.

https://wiki.gentoo.org/wiki/GRUB -- down toward the bottom

Code:
If the root= parameter doesn't match the actual configuration, all is not lost. It is possible to edit the lines before booting. How this can be done, is explained in Knowledge Base:Adjusting GRUB settings for a single boot session

To get the USB disk boot without initramfs regardless of the number of installed disks, use a GPT partition table and the root=PARTUUID= kernel parameter as explained in this external link: Mounting root partition by UUID (no initrd needed)

Since kernel 3.8 and newer it is possible to use MBR 32-bit UUID, so it's possible to use a MBR partition table as well.

In this case PARTUUID refer to an MBR partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-filled hex representation of the 32-bit "NT disk signature", and PP is a zero-filled hex representation of the 1-based partition number.

To get "NT disk signature" one possibility is using fdisk:
root #fdisk -l /dev/sdd

The output will be something like Disk identifier: 0x2d6b036c, so assuming root partition is /dev/sdd2, the resulting line will be root=PARTUUID=2d6b036c-02
More info is available here: Description of PARTUUID feature
Using LABEL or UUID

Kernel boot parameters are real_root=LABEL= or real_root=UUID=.

_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum