Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Kernel 4.9 works, 4.12 panics b/c of disk [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Thu Oct 05, 2017 9:05 pm    Post subject: Kernel 4.9 works, 4.12 panics b/c of disk [SOLVED] Reply with quote

I can't get my kernel upgrade to work. Currently I'm running 4.9.16 just fine, but getting the new kernel to run is proving frustrating.
It panics with "unable to mount root on unknown-block(x,y)" where x and y are not zero.

https://www.facebook.com/photo.php?fbid=10213160569503112&set=a.1085637895441.15367.1061190698&type=3

It's not my first kernel panic. Normally I would conclude that it's not recognizing the fs and look to be sure the driver is compiled in. I'm using the exact same everything for the old kernel as for the new, same entries in lilo.conf (sorry not changing it), same .config even (using make oldconfig), but 4.9.16 boots and 4.12.12 does not.

Here are the relevant .config entries:

Code:

CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
# CONFIG_EXT3_FS_POSIX_ACL is not set
# CONFIG_EXT3_FS_SECURITY is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set


and I'm pretty sure it's actually an ext4:

Code:

Merckx src # uname -a
Linux Merckx 4.9.16-gentoo #3 SMP PREEMPT Thu Jun 1 17:53:56 CDT 2017 i686 Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz GenuineIntel GNU/Linux
Merckx src # file -sL /dev/sdb2
/dev/sdb2: Linux rev 1.0 ext4 filesystem data, UUID=419902b4-6b85-4a2c-8e81-82a45d498c9c (needs journal recovery) (extents) (large files) (huge files)


Any help would be greatly appreciated.

Jon


Last edited by jesnow on Mon Dec 18, 2017 9:32 pm; edited 1 time in total
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7196
Location: Saint Amant, Acadiana

PostPosted: Thu Oct 05, 2017 10:55 pm    Post subject: Reply with quote

What about partition table support.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Thu Oct 05, 2017 11:16 pm    Post subject: Reply with quote

From what you said on the other thread it looks like it's time to switch to amd64, but I don't want to start that process until I understand what's wrong here.

Code:

Merckx linux # grep PARTITION .config
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Thu Oct 05, 2017 11:33 pm    Post subject: Reply with quote

jesnow,

Your unknown-block(8,18) is sdb2. Is that where you expect root to be?

The kernel block device names are allocated on a first come, first served basis, so they are not deterministic.
It all depends on the way the devices are arranged on the various buses.
e.g. leaving a USB block device connected can change the drive ordering.

Post your lilo.conf and post the output of blkid

This is going towards changing lilo.conf to use root=PARTUUID= to define the root device.
I think that's transparent to lilo.

If your drives are swapped by your new kernel, fstab won't work any more either. You should use LABELs or UUIDS there.
The kernel understands PARTUUID without any help but needs an initrd to find root by LABEL or UUID
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7196
Location: Saint Amant, Acadiana

PostPosted: Fri Oct 06, 2017 12:17 am    Post subject: Reply with quote

If I recall correctly Lilo won't accept PARTUUID. :( Maybe there is a way to override Lilo config check.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Fri Oct 06, 2017 12:34 am    Post subject: Reply with quote

I'm familiar with both of those issues. /dev/sdb2 is indeed where / is and is indeed where 4.9.16 finds it just fine.
My lilo.conf has a lot of cruft (commented out old kernels going back a decade, starting with 2.6.16-r11!) but here is the relevant section:

Code:

#
# Start LILO global section
#

# Faster, but won't work on all systems:
compact
# Should work for most systems, and do not have the sector limit:
lba32
# If lba32 do not work, use linear:
#linear

# MBR to install LILO to:
boot = /dev/sda
map = /boot/.map
install = /boot/boot-menu.b   # Note that for lilo-22.5.5 or later you
                              # do not need boot-{text,menu,bmp}.b in
                              # /boot, as they are linked into the lilo
                              # binary.
large-memory                         
menu-scheme=Wb
prompt
# If you always want to see the prompt with a 15 second timeout:
timeout=150
delay = 50

default=4916


image = /boot/vmlinuz-4.9.16-gentoo
        root = /dev/sdb2
        label = 4916

image = /boot/vmlinuz-4.12.5-gentoo
        root = /dev/sdb2
        label=4125

image = /boot/vmlinuz-4.12.12-gentoo
        root = /dev/sdb2
        label = 41212


All of the 4.9's (there are 2 others) work fine. Here too is fstab:

Code:

# <fs>                  <mountpoint>    <type>          <opts>          <dump/pass>
none                    /proc           proc            defaults        0 0
none                    /dev/shm        tmpfs           nodev,nosuid,noexec     0 0

/dev/sdb2               /               ext4            noatime,defaults 0 1
/dev/sdb1               none            swap            sw              0 0
PARTLABEL=Data          /home           auto            noatime,defaults 0 0
/dev/cdrom              /mnt/cdrom      iso9660         user,unhide,noauto,ro   0 0



It's a real headscratcher!
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7196
Location: Saint Amant, Acadiana

PostPosted: Fri Oct 06, 2017 12:41 am    Post subject: Reply with quote

Then for some reason the new kernel is not enumerating your second drive as sdb. My 2¢.
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7195
Location: almost Mile High in the USA

PostPosted: Fri Oct 06, 2017 1:37 am    Post subject: Reply with quote

Oh gosh, lilo. haven't used it in so long...

If setting root= doesn't work, you could also try adding it in append=
Code:
append="root=PARTUUID=xxxxxxxx"

to see if this works?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Fri Oct 06, 2017 10:55 am    Post subject: Reply with quote

jesnow,

The kernel can't read the filesystem at 8,18, whatever that is.
What filesystems do you have at
Code:
/dev/sd?2

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Fri Oct 06, 2017 7:52 pm    Post subject: Reply with quote

on 4.9.16 it is for sure my ext4 root fs on sdb2. But maybe 4.12.12 is pointing at a different partition? Is there some difference in how filesystems are handled in the time between the two?


NeddySeagoon wrote:
jesnow,

The kernel can't read the filesystem at 8,18, whatever that is.
What filesystems do you have at
Code:
/dev/sd?2
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Fri Oct 06, 2017 8:09 pm    Post subject: Reply with quote

jesnow,

No, filesystems are the same, except maybe bugfixes.
By chance, your drives may enumerate in a different order.

What sort of filesystem is on partition 2 of all your other drives?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Sun Oct 08, 2017 11:30 pm    Post subject: Reply with quote

This is interesting and maybe relevant :

Code:

Merckx mnt # lsblk -o NAME,kname,MAJ:MIN,fstype
NAME   KNAME MAJ:MIN FSTYPE
sdd    sdd     8:48 
`-sdd1 sdd1    8:49  ext4
sdb    sdb     8:16 
|-sdb2 sdb2    8:18  ext4
|-sdb3 sdb3    8:19  ext4
`-sdb1 sdb1    8:17  swap
sr0    sr0    11:0   
loop0  loop0   7:0   iso9660
sda    sda     8:0   
|-sda2 sda2    8:2   ntfs-3g
|-sda3 sda3    8:3   ntfs-3g
`-sda1 sda1    8:1   ntfs-3g


It appears the device number the kernel is trying to boot from doesn't match the one I'm telling lilo to find the boot image on: 8,3 is sda3, not sdb2. This was indeed an issue in the past when lilo would give the warning that the mbr and boot image were on different disks, but would then deal with it gracefully. It seems willing to boot from /dev/sdb2 (as I'm doing right now) as kernel 4.9 but not as 4.12.

Any idea how to troubleshoot this?

Jon.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7195
Location: almost Mile High in the USA

PostPosted: Mon Oct 09, 2017 12:22 am    Post subject: Reply with quote

Well, seems your kernel is missing drivers, is the only thing explaining why 4.9 can see but 4.12 can't see.
Are both sda and sdb on the same controller? SATA vs USB disks vs accessory PATA controllers?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Mon Oct 09, 2017 9:49 am    Post subject: Reply with quote

jesnow,

Code:
sdb2 sdb2    8:18  ext4
sda2 sda2    8:2   ntfs-3g


Humour me ... build in kernel support for ntfs but NOT the write support. Write support is useless, 'incomplete' but what is there is mostly harmless.

If your drives are being swapped, the ntfs partition will mount read only, which is safe and you will get a different error.
I don't know what the error will be, read only root and lots of things don't work on ntfs.
The main thing is if you don't get the unknown-block panic, it confirms the drive enumeration order.

The fix is to pass root=PARTUUID=... in the append statement in lilo.conf
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Mon Oct 09, 2017 11:41 pm    Post subject: Reply with quote

Ok I did it. The drive ordering does indeed seem to be the issue:

Code:
 VFS: Mounted root (ntfs filesystem) readonly on device 8:18.
.
.
.
Kernel Panic - not syncing: No working init found.


Which should indeed be /dev/sdb2 from the table above that shows the output of lsblk. But if it found an ntfs file system there, it can't be /dev/sdb2, and is most likely /dev/sda2.

I will try your fix.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Tue Oct 10, 2017 12:04 am    Post subject: Reply with quote

Success.

I now think that if there is device numbering inconsistency between 4.12 and 4.9, that means that
the next time I run lilo, I will no longer be able to boot 4.9 kernels without a root=PARTUUID= directive.

Many thanks for the help. If you're ever in Texas I will buy you a beer, or if I am ever out swimming in the Firth of Forth I will bring you one.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Tue Oct 10, 2017 11:27 am    Post subject: Reply with quote

jesnow,

You can use append PARTUUID with both kernels.

You will need no write /etc/fstab in a device independent way too.
UUID, LABEL or PARTUUID works there.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Mon Dec 18, 2017 9:31 pm    Post subject: Reply with quote

I tracked the issue down to a printer with a card slot that was being registered as /dev/sda and throwing all the other disks off by 1.

There are many solutions to this problem, but probably the best one in retrospect is to use one of the many trees in /dev/ that list the disks by device name, device id etc:

Code:

jesnow@Merckx ~ $ dir /dev/disk/   
total 0
drwxr-xr-x  8 root root   160 Dec 18 15:29 .
drwxr-xr-x 15 root root 13760 Dec 18 15:32 ..
drwxr-xr-x  2 root root   400 Dec 18 15:29 by-id
drwxr-xr-x  2 root root   100 Dec 18 15:29 by-label
drwxr-xr-x  2 root root    60 Dec 18 15:29 by-partlabel
drwxr-xr-x  2 root root   180 Dec 18 15:29 by-partuuid
drwxr-xr-x  2 root root   280 Dec 18 15:29 by-path
drwxr-xr-x  2 root root   180 Dec 18 15:29 by-uuid
jesnow@Merckx ~ $ dir /dev/disk/by-label
total 0
drwxr-xr-x 2 root root 100 Dec 18 15:29 .
drwxr-xr-x 8 root root 160 Dec 18 15:29 ..
lrwxrwxrwx 1 root root  10 Dec 18 15:29 HP_RECOVERY -> ../../sdb3
lrwxrwxrwx 1 root root  10 Dec 18 15:29 OS -> ../../sdb2
lrwxrwxrwx 1 root root  10 Dec 18 15:29 SYSTEM -> ../../sdb1
jesnow@Merckx ~ $ dir /dev/disk/by-uuid
total 0
drwxr-xr-x 2 root root 180 Dec 18 15:29 .
drwxr-xr-x 8 root root 160 Dec 18 15:29 ..
lrwxrwxrwx 1 root root  10 Dec 18 15:29 419902b4-6b85-4a2c-8e81-82a45d498c9c -> ../../sdc2
lrwxrwxrwx 1 root root  10 Dec 18 15:29 43d3cc9d-2fae-46c4-af2d-a5e6ffe15be4 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Dec 18 15:29 75c774a6-aa35-4d3a-b891-595f17e28908 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Dec 18 15:29 B0766A4A766A1180 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Dec 18 15:29 EE40262E4025FDC9 -> ../../sdb2
lrwxrwxrwx 1 root root  10 Dec 18 15:29 F6A26B6DA26B3173 -> ../../sdb3
lrwxrwxrwx 1 root root  10 Dec 18 15:29 d7e5829d-9396-4762-acf3-936292735960 -> ../../sdc3
jesnow@Merckx ~ $


Any of these will work in /etc/fstab [EDIT: DO NOT DO THIS see below] and in lilo.conf (I assume grub will use them too). That way the "append=" lines can be avoided, and the insane non-repeatable legacy disk naming system can be jettisoned all together.


Last edited by jesnow on Tue Jan 02, 2018 7:17 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Tue Dec 19, 2017 3:15 pm    Post subject: Reply with quote

jesnow,

Do not write symlink names in /etc/fstab.
There will be a race condition between udev, creating the symlinks and localmount trying to use them.
Some things may not mount at boot.

LABEL, UUID, PARTUUID are all safe and unambiguous.
Well, LABELs are up to you.

Read the news item
Code:
$ eselect news read 32
2016-11-04-important_fstab_and_localmount_update
  Title                     Important fstab and localmount update

_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Tue Jan 02, 2018 7:21 pm    Post subject: Reply with quote

NeddySeagoon, thanks once again for the save.

What is the correct solution, now that the problem is found? How do I absolutely prevent a usb device from claiming /dev/sda?
[Edit: unplug the offending thing is one solution].

Cheers,

Jon.


Last edited by jesnow on Tue Jan 02, 2018 8:58 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43559
Location: 56N 3W

PostPosted: Tue Jan 02, 2018 7:35 pm    Post subject: Reply with quote

jesnow,

You can't. Its luck/race conditions.

The correct solution is to use root=PARTUUID= on the kernel command line and the device independent identifiers in /ets/fstab
Then your configuration is totally independent of device names allocated by the kernel.

root=UUID= works if you have an initrd that includes the userspace mount command.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5879

PostPosted: Tue Jan 02, 2018 7:37 pm    Post subject: Reply with quote

Build usb-storage as a module. Otherwise, you don't - that's why we have PARTUUID in the first place.
Back to top
View user's profile Send private message
jesnow
Guru
Guru


Joined: 26 Apr 2006
Posts: 588

PostPosted: Tue Jan 02, 2018 9:36 pm    Post subject: Reply with quote

OK, I have now removed all /dev/sd* references from /etc/fstab and /etc/lilo.conf. I guess the will become deprecated, but is still now very much standard in the gentoo documentation.

Many thanks. Blkid is indeed now my friend.

Jon.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 14063

PostPosted: Wed Jan 03, 2018 12:18 am    Post subject: Reply with quote

It is standard because it is easy to explain and, for people who have only one block storage device (or who enjoy perfectly consistent enumeration of block storage devices), using sd names works fine. For anyone who cannot count on predictable enumeration, the techniques in this thread are a good solution.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum