Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Failed to initilize the NVIDIA kernel module
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Mon Apr 20, 2020 9:04 am    Post subject: Failed to initilize the NVIDIA kernel module Reply with quote

I am getting a 'failed to initialize the NVIDIA kernal module' error when running startx. log file output: http://dpaste.com/2EYHFK1

It was all working fine until yesterday when I updated with

Code:
sudo emerge --update --deep --newuse @world


When I ran that I got an error, something about my kernel not being configured and xorg missing a config file. I've never had this error before but I am pretty new to gentoo.

I had possibly changed some kernel settings earlier and not rebuilt kernel (I am not sure) so I figured I'd just rebuild the kernel. I did that and that seemed to work fine, not problems with running anything after. So then I continued the update which seemed to work.

I did forget that when updating the kernel that the modules need to be rebuilt with
Code:
emerge @module-rebuild
so I didn't do this step until after everything went wrong. Maybe this is what has caused my problems?


It was either finished of almost finished the update when I opened a file with vlc media player (not sure if this was related but it happened right when I opened the file) and then the whole system froze. Nothing was responding. Eventually I turned everything off with the power button. The system rebooted fine but now startx fails with the above error.

At this point I rebuilt the modules and also tried re-emerging the nvidia drivers package but no luck.

I tried
Code:
 lsmod | grep nvidia
and got the following output

Code:
nvidia_drm        49152    0
nvidia_modeset       1081344    1 nvidia_drm
nvidia                    19976192    1 nvidia_modeset
drm_kms_helper        208896    1 nvidia_drm
drm                           540672    3 drm_kms_helper,nvidia_drm
i2c_core                      86016    5 videodev,drm_kms_helper,nvidia,i2c_piix4,drm


I can remove the nvidia_drm, nvidia_modeset, and nvidia modules but it still gives the same error. If I remove and then re-load nvidia module then lsmod again I get:

Code:

nvidia     19976192 0
i2c_core      86016 5 videodev,drm_kms_helper,nvidia,i2c_piix4,drm


but still the same error with startx.

My xorg.conf file is here: http://dpaste.com/290H7GG
But it hasn't changed from what it was when it was working. Same with my .xinitrc file.

I have triple checked my kernel config and it matches up with the NVIDIA drivers page: https://wiki.gentoo.org/wiki/NVIDIA/nvidia-drivers

If I run
Code:
glxinfo | grep direct
then I get
Code:
Error: unable to open display


What else should I try? Any ideas on what might be wrong? Should I try rebuild my kernel again?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45435
Location: 56N 3W

PostPosted: Mon Apr 20, 2020 9:20 am    Post subject: Reply with quote

Indi008,

Put your /var/log/Xorg.0.log onto a pastebin site please.

In your xorg.conf the Section "InputDevice" (both of them) will be ignored.
They have been for a very long time now.
Xorg ony uses Driver "mouse" or Driver "kbd" if you force it to. You don't.
That's just an interesting aside, its not your problem, so leave it just now.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Banana
Guru
Guru


Joined: 21 May 2004
Posts: 446
Location: Germany

PostPosted: Mon Apr 20, 2020 9:23 am    Post subject: Reply with quote

Hello Indi008.

What is your kernel and nvdia dirver version?
Which GPU card do you have?

Basically you need the correct kernel config. Build and load it. After that you install the correct drivers and it dependencies, Reboot and everything should work

EDIT: was to slow. NeddySeagoon was faster
_________________
My personal space
Back to top
View user's profile Send private message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Mon Apr 20, 2020 9:50 am    Post subject: Reply with quote

Here is /var/log http://dpaste.com/106MZBC

nvidia driver package I have is x11-drivers/nvidia-drivers-440.64

my kernel is linux-5.4.28-gentoo

my GPU is NVIDIA GeForce GTX 1070

Quote:
Basically you need the correct kernel config. Build and load it. After that you install the correct drivers and it dependencies, Reboot and everything should work


Since my kernel rebuild seemed to initially work (until update) does that mean that the kernel config is likely fine or could it still be something I have set wrong in the kernel? The kernel has a lot of stuff in it though and I am not sure exactly what I did change between first build and rebuild. Apart from the stuff in the NVIDIA guide I am not sure what to check. I think I saved an old kernel config at some point but also I think I have changed things I want to keep since then so might leave that option for now. Maybe I will try rebuilding the current kernel config though and just stepping though it all one more time.
Back to top
View user's profile Send private message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Mon Apr 20, 2020 10:13 am    Post subject: Reply with quote

I messed up.
I ran
Code:
grub-mkconfig -o /boot/grub/grub.cfg

but I don't have grub I don't think. I think I was just using UEFI. Now I can't boot. It says unable to mount root fs on unknown.
How do I fix this?
Back to top
View user's profile Send private message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Mon Apr 20, 2020 10:33 am    Post subject: Reply with quote

Update, managed to boot an old version of the kernel. Still seem to have the nvidia error so it probably wasn't the kernel config causing the issue? (assuming this is using different config, this one says 4.19.97-gentoo). I think I will go to bed and then rebuild a new kernel tomorrow.
Back to top
View user's profile Send private message
fedeliallalinea
Bodhisattva
Bodhisattva


Joined: 08 Mar 2003
Posts: 24067
Location: here

PostPosted: Mon Apr 20, 2020 11:19 am    Post subject: Reply with quote

Code:
[   707.208] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[   707.208] (EE) NVIDIA:     system's kernel log for additional error messages and
[   707.208] (EE) NVIDIA:     consult the NVIDIA README for details.

What you see in dmesg? I see a similar error and disabling CONFIG_DRM was solution see this howto section
_________________
Questions are guaranteed in life; Answers aren't.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45435
Location: 56N 3W

PostPosted: Mon Apr 20, 2020 12:57 pm    Post subject: Reply with quote

Indi008,

That Xorg.0.log came from the 4.19.97 kernel.
Code:
[  2228.103] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module.

Isn't very useful on its own.

As the rest of the message says, look in dmesg for more detail.
This usually happens whet there is something it the kernel that the nvdia kernel module needs to be off, or a kernel module to be unloaded.

Quote:
When I ran that I got an error, something about my kernel not being configured ...

What happens in that you get a new kernel then run --depclean,
--depclean removes all the old kernel source files in the old kernel, including the Makefile.
The .config is still there.

The nvidia kernel module follows the /usr/src/linux symlink to find a kernel to build against.
I suspect that you did not run
Code:
eselect kernel
, so
Code:
emerge @module-rebuild
rebuilt your out of tree kernel modules against your old kernel.
Which kernel does
Code:
eselect kernel list
show is active?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
icaruslnx
n00b
n00b


Joined: 14 Apr 2009
Posts: 13

PostPosted: Mon Apr 20, 2020 6:52 pm    Post subject: Reply with quote

This is very similar to the issue I'm dealing with. I ran a sync and update, nvidia-drivers was updated from 440.64 to 440.82 and when trying to load the new module the display would freeze and I needed to ssh in to reboot.

The weirdest part (and only clear hint) is what dmesg says (I'm posting from my phone so please excuse typos)

NVRM: API mismatch: the client has the version 440.82, but this kernel module has the version for 440.64. Please make sure that this kernel module and all Nvidia driver components have the same version

What's the difference between the client and kernel version? Why is the kernel still holding onto the old version?
I can verify my kernel is configured properly, uname output matches eselect kernel list.
5.4.14-gentoo

Seeing that Indi008 is on a similar kernel I thought this could be relevant

Edit: also 440.64 is no longer available in the tree
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45435
Location: 56N 3W

PostPosted: Mon Apr 20, 2020 7:08 pm    Post subject: Reply with quote

icaruslnx,

The nvidia-drivers is in two parts.
The kernel part, which is loaded at boot and the Xorg part, which is loaded when Xorg starts.

If you restart Xorg after an nvidia-drivers update you still nave the old kernel part but the new Xorg part.
nvidia-drivers checks versions and won't start unless the versions are identical.

You can achieve the same effect by not building nvidia-drivers against the kernel you thought you were, or not running the kernel you though you were.

Check
Code:
uname -a
Is that the right kernel version and build date?
Its your running kernel.

Check
Code:
eselect kernel list

The active kernel is the kernel all out of tree kernel modules will build against.

If it all looks correct, reboot to load the new nvidia kernel module.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Jaglover
Watchman
Watchman


Joined: 29 May 2005
Posts: 7677
Location: Saint Amant, Acadiana

PostPosted: Tue Apr 21, 2020 12:22 am    Post subject: Reply with quote

Quote:
... reboot to load the new nvidia kernel module


Sic transit gloria mundi. First coronavirus, then worldwide economic downturn and now we suggest on Gentoo forums to reboot to reload some trivial kernel modules. 8O
_________________
Please learn how to denote units correctly!
Back to top
View user's profile Send private message
icaruslnx
n00b
n00b


Joined: 14 Apr 2009
Posts: 13

PostPosted: Tue Apr 21, 2020 2:13 am    Post subject: Reply with quote

LOL @Jaglover :lol:

Back on track, if I rmmod nvidia and 'systemctl restart gdm' it loads the proper module and works. Rebooting it comes up with the API mismatch, rmmod and restart gdm it loads with a sanity check and works as expected.

Code:
[   10.544809] [drm:nv_drm_init [nvidia_drm]] *ERROR* [nvidia-drm] Version mismatch: nvidia-modeset.ko(440.64) nvidia-drm.ko(440.82)
[   10.615121] NVRM: API mismatch: the client has the version 440.82, but
               NVRM: this kernel module has the version 440.64.  Please
               NVRM: make sure that this kernel module and all NVIDIA driver
               NVRM: components have the same version.
...
[   33.554278] NVRM: API mismatch: the client has the version 440.82, but
               NVRM: this kernel module has the version 440.64.  Please
               NVRM: make sure that this kernel module and all NVIDIA driver
               NVRM: components have the same version.
[   94.966299] nvidia-modeset: Unloading
[   96.860204] nvidia-nvlink: Unregistered the Nvlink Core, major device number 252
[  121.489887] nvidia-nvlink: Nvlink Core is being initialized, major device number 252
[  121.490372] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[  121.536599] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.82  Wed Apr  1 20:04:33 UTC 2020
[  121.743376] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
[  121.743597] caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
[  122.865344] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  440.82  Wed Apr  1 19:41:29 UTC 2020


Before I tested this I did update Gnome (had to since they seem to update Gnome once every 5 years :P )
So the driver is fine, it's failing in the boot process from pulling in the wrong module for some reason. Something I'll figure out eventually when I have time, but for now I have a work around as weird as it is. Working from home I seem to have even less free time than before the world melted down

Does any of this help Indi008 at all?

Code:
lsmod |grep nvidia
nvidia_modeset       1073152  11
nvidia              19947520  545 nvidia_modeset
i2c_core               53248  4 drm_kms_helper,nvidia,i2c_piix4,drm
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45435
Location: 56N 3W

PostPosted: Tue Apr 21, 2020 9:33 am    Post subject: Reply with quote

Jaglover,

:) :)

I want to know what kernel is in use too. I still suspect its not the kernel that the OP thinks it is.


icaruslnx,

When its works, what does
Code:
uname -a
show?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
icaruslnx
n00b
n00b


Joined: 14 Apr 2009
Posts: 13

PostPosted: Tue Apr 21, 2020 1:04 pm    Post subject: Reply with quote

Kernel is the same as I've been using all month without issues
Code:
icaruslnx@Daedalus ~ $ uname -a
Linux Daedalus 5.4.14-gentoo #6 SMP PREEMPT Wed Apr 1 09:58:30 CDT 2020 x86_64 AMD FX(tm)-6300 Six-Core Processor AuthenticAMD GNU/Linux
icaruslnx@Daedalus ~ $ sudo eselect kernel list
Available kernel symlink targets:
  [1]   linux-4.19.97-gentoo
  [2]   linux-5.2.17-gentoo
  [3]   linux-5.3.18-gentoo
  [4]   linux-5.4.14-gentoo *
  [5]   linux-5.4.28-gentoo
  [6]   linux-5.5.2-gentoo

I do need to clean up my old kernels but I don't have a problem with a random kernel loading at boot
Back to top
View user's profile Send private message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Wed Apr 22, 2020 10:28 am    Post subject: Reply with quote

Firstly thanks all for the help so far, I know you don't have to spend time helping random strangers so all help is very much appreciated.

It turns out I did originally set grub up, it was just late and I was panicking because it's the first time I've had a system that wouldn't boot.
Unfortunately it still won't boot. Although I can get on to an old kernel so that's something.

NeddySeagoon you are right, I did not run eselect so that is probably the initial cause of the problem. Given what version the log file was it is probably a good assumption that I was not on the kernel I thought I was.

What happened was I found this page: https://wiki.gentoo.org/wiki/Kernel/Rebuild before I found this page: https://wiki.gentoo.org/wiki/Kernel/Upgrade. I think the former is meant to be more a quick reference check but I followed that one when I should have followed the more detailed one.

I have now booted onto the old kernel (which is probably what I was on initially without realizing) and I have followed the instructions on the kernel upgrade page including updating the bootloader. Then I reboot but I am getting a kernel panic error when trying to boot the new kernel.

I am not sure how to copy that error over to share it? But the last line says:

Code:
---[ end kernel panic  - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---


I can still access the old kernel though (just not run startx). I will look into how to get share the error.

Here is dmesg: http://dpaste.com/27K9HRA

It does seem that the NVIDIA module is the new version which explains why I can't run startx. Not sure why new kernel won't boot though. Where do I look to find more detailed boot error? Is there a way to get a paste of the boot error?
Back to top
View user's profile Send private message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Wed Apr 22, 2020 10:52 am    Post subject: Reply with quote

Here is some more info

fstab: http://dpaste.com/0RCD42P

lspci - k: http://dpaste.com/2QQXD88

grub.cfg: http://dpaste.com/0RVMPK5

blkid: http://dpaste.com/08KG9N2


So I see in the grub.cfg that when it is setting location for root for the old kernel version it is using the UUID but when it sets it for the new version it is using /dev/nvme0n1p4 which should be fine but I see it says in the fstab that UUID is more reliable so I might try change the fstab to use the UUID.

I also see that the old kernels have a initrd line so maybe I need to generate that for the new one somehow. I am not sure how to do that but I think I should be able to find out with a little be of searching.

I will give these a go and then report back.
Back to top
View user's profile Send private message
Indi008
n00b
n00b


Joined: 02 Feb 2020
Posts: 17

PostPosted: Wed Apr 22, 2020 11:27 am    Post subject: Reply with quote

Yay, boots fine and startx works again :). Turns out I just needed to regenerate the initramfs to fix boot problem. And make sure to do eselect to fix startx problem. Thanks all very much for the help :).

Next upgrade should go much better and I am slowly learning where to find info about errors which is very useful.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum