Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
CUDA doesn't recognize gpu? [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
scorch_dev
n00b
n00b


Joined: 27 Mar 2017
Posts: 5

PostPosted: Sat Apr 01, 2017 12:58 pm    Post subject: CUDA doesn't recognize gpu? [SOLVED] Reply with quote

I'm running into issues after emerging the the cuda sdk and the cuda toolkit. While both seem to emerge fine, cuda doesn't seem to recognize my gpu. If i run the demo program query_devices from /opt/cuda/demo_suite, I find that it returns "Error 30 ('Unkown Error')" and then exits out. I'm running a gtx 1080 with cuda toolkit version 8.0.61 and nvidia sdk version 8.0.61, with nvidia-drivers version 375.26. lspci and dmesg both seem to indicate that nothing is out of the ordinary though.

Last edited by scorch_dev on Wed Apr 05, 2017 2:11 pm; edited 2 times in total
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3806
Location: Austro Bavaria

PostPosted: Sat Apr 01, 2017 10:00 pm    Post subject: Reply with quote

i'm not sure if this is related:

https://wiki.gentoo.org/wiki/NVidia/nvidia-drivers

Quote:
Driver fails to initialize when MSI interrupts are enabled

The Linux NVIDIA driver uses Message Signaled Interrupts (MSI) by default. This provides compatibility and scalability benefits, mainly due to the avoidance of IRQ sharing. Some systems have been seen to have problems supporting MSI, while working fine with virtual wire interrupts. These problems manifest as an inability to start X with the NVIDIA driver, or CUDA initialization failures.

MSI interrupts can be disabled via the NVIDIA kernel module parameter NVreg_EnableMSI=0. This can be set on the command line when loading the module, or more appropriately via the distribution's kernel module configuration files (such as those under /etc/modprobe.d/).
Back to top
View user's profile Send private message
scorch_dev
n00b
n00b


Joined: 27 Mar 2017
Posts: 5

PostPosted: Sun Apr 02, 2017 2:05 am    Post subject: Reply with quote

Roman_Gruber wrote:
i'm not sure if this is related:

https://wiki.gentoo.org/wiki/NVidia/nvidia-drivers

Quote:
Driver fails to initialize when MSI interrupts are enabled

The Linux NVIDIA driver uses Message Signaled Interrupts (MSI) by default. This provides compatibility and scalability benefits, mainly due to the avoidance of IRQ sharing. Some systems have been seen to have problems supporting MSI, while working fine with virtual wire interrupts. These problems manifest as an inability to start X with the NVIDIA driver, or CUDA initialization failures.

MSI interrupts can be disabled via the NVIDIA kernel module parameter NVreg_EnableMSI=0. This can be set on the command line when loading the module, or more appropriately via the distribution's kernel module configuration files (such as those under /etc/modprobe.d/).



Thanks for the advice, I can try this. I had run across this and tried to disable the MSI interrupts in-kernel, but it hadn't fixed the problem, but I hadn't checked the /etc/modprobe.d/ directory for the nvidia configuration file. I'll try that and get back.

As well, since posting last, I've tried a few things to no avail. I've tried re-emerging each of the cuda packages (toolkit as well as the sdk), the nvidia-drivers, and llvm. I tried upgrading my nvidia-drivers package to 378.13 in the off-chance this would fix it. Though I don't know the effect of it yet, because it caused an "NVRM: API mismatch" error on starting the x-server. I'll see if I can resolve the error by trawling the forums, and, see, once I remove this new error, if the original error was fixed.
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3806
Location: Austro Bavaria

PostPosted: Sun Apr 02, 2017 4:47 am    Post subject: Reply with quote

scorch_dev wrote:

Though I don't know the effect of it yet, because it caused an "NVRM: API mismatch" error on starting the x-server.


emerge new kernel source
change /usr/src/linux symlink to new kernel source
build new kernel + initramfs when wanted + with always a new name for the kernel. I use a datecode which I append to my kernels

Code:
make --jobs 8 && make --jobs 8 modules_install


Code:
uname -a
Linux ASUS-G75VW 4.9.18-gentoo-28-03-2017 .....
Indicates i build that kernel on the 28th of march. Name the file accordingly, and also set the kernel feature to name the kernel !

boot new kernel (adapt bootlaoder by swapping kernel name and copy the files over)
verify new kernel is in use: uname -a
emerge nvidia-drivers
(usually reboot, but not needed anymroe these days) lazy approach
use the x server

cleanup old kernels / kernel sources / old kernel modules in /lib/modules/kernel-name-x-y-...

--

It is not that needed anymore these days. Less fuss is the approach above! Clean state

I*m quite sure guys will post after myself telling htis is not needed, bla bla ...

When you want ot be sure, to always have a clean state, without fuss. Do the approach above!

Do not use grub scripts of desctructions. adapt the bootloader by hand. it is just a plain text file, very easy to edit with e.g. nano.
Back to top
View user's profile Send private message
scorch_dev
n00b
n00b


Joined: 27 Mar 2017
Posts: 5

PostPosted: Wed Apr 05, 2017 2:10 pm    Post subject: Reply with quote

Quote:

emerge new kernel source
change /usr/src/linux symlink to new kernel source
build new kernel + initramfs when wanted + with always a new name for the kernel. I use a datecode which I append to my kernels


This piece of advice led me to an important realization, that ultimately lead me to the solution after several days now. So, when trying to rebuild my kernel to repair my NVRM error, I realized that my kernel version didn't sync up with the kernel set in my /usr/src/linux symlink, because a quick uname -a showed that my kernel version being loaded was version 4.9.6-r1, despite the fact that I had upgraded my kernel about maybe two weeks ago to 4.9.16. So, I was installing all of the drivers, kernel modules, etc. with a symlink that pointed to 4.9.16, but grub was loading in 4.9.6-r1.

Apparently I had forgotten to update grub after my kernel change update. A quick grub-mkconfig call fixed the boot issue, I re-emerged the nvidia-drivers, the cuda-toolkit, and the cuda-sdk. After all of this, I was able to succesfully query the device and get everything back. Thanks for the help.
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3806
Location: Austro Bavaria

PostPosted: Wed Apr 05, 2017 9:47 pm    Post subject: Reply with quote

I recommend updating grub by hand, it's just a plain text file. I usually just change the kernel name and the title of the boot section.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum