Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Issue with amdgpu card and powerplay since kernel update
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
ZenoOfElea
n00b
n00b


Joined: 20 Jan 2017
Posts: 4

PostPosted: Wed Apr 03, 2019 11:35 am    Post subject: Issue with amdgpu card and powerplay since kernel update Reply with quote

I have noticed a recent issue with my amdgpu based R9 380 (Volcanic islands series) graphics card. I am not sure if this is caused by a mistaken kernel configuration or what exactly triggers the problem but during the boot process when the DRM KMS is taking over from the legacy 80x24 framebuffer the systems freezes for 15 seconds or so and following is printed to the kernel message buffer.

Code:

[   27.574874] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[   27.574875] amdgpu: [powerplay] Error in phm_get_clock_info
[   27.575091] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[   27.575103] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[   27.575114] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[   27.575442] [drm] Display Core initialized with v3.1.59!
[   27.628472] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   27.628473] [drm] Driver supports precise vblank timestamp query.
[   27.677800] [drm] UVD initialized successfully.
[   27.888890] [drm] VCE initialized successfully.
[   27.890434] [drm] fb mappable at 0xE0E25000
[   27.890435] [drm] vram apper at 0xE0000000
[   27.890436] [drm] size 8294400
[   27.890436] [drm] fb depth is 24
[   27.890437] [drm]    pitch is 7680
[   27.890574] fbcon: amdgpudrmfb (fb0) is primary device
[   28.031451] Console: switching to colour frame buffer device 240x67
[   28.053547] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device
[   28.410220] amdgpu: [powerplay]
                failed to send message 5d ret is 0
[   28.760175] amdgpu: [powerplay]
                last message was failed ret is 0
[   29.110123] amdgpu: [powerplay]
                failed to send message 148 ret is 0
[   29.809993] amdgpu: [powerplay]
                last message was failed ret is 0
[   30.159941] amdgpu: [powerplay]
                failed to send message 145 ret is 0
[   30.859815] amdgpu: [powerplay]
                last message was failed ret is 0
[   31.209777] amdgpu: [powerplay]
                failed to send message 146 ret is 0
[   31.568264] amdgpu: [powerplay]
                last message was failed ret is 0
[   31.914555] amdgpu: [powerplay]
                last message was failed ret is 0
[   31.920737] amdgpu: [powerplay]
                failed to send message 155 ret is 0
[   32.267039] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   32.273256] amdgpu: [powerplay]
                last message was failed ret is 0
[   32.625681] amdgpu: [powerplay]
                failed to send message 15b ret is 0
[   32.969465] amdgpu: [powerplay]
                last message was failed ret is 0
[   33.319269] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   34.018608] amdgpu: [powerplay]
                last message was failed ret is 0
[   34.368287] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   35.067659] amdgpu: [powerplay]
                last message was failed ret is 0
[   35.417341] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   36.116695] amdgpu: [powerplay]
                last message was failed ret is 0
[   36.466369] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   37.165719] amdgpu: [powerplay]
                last message was failed ret is 0
[   37.515390] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   38.214738] amdgpu: [powerplay]
                last message was failed ret is 0
[   38.564429] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   39.263779] amdgpu: [powerplay]
                last message was failed ret is 0
[   39.613451] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   40.312770] amdgpu: [powerplay]
                last message was failed ret is 0
[   40.662444] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   41.361799] amdgpu: [powerplay]
                last message was failed ret is 0
[   41.711474] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   42.410827] amdgpu: [powerplay]
                last message was failed ret is 0
[   42.760505] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   43.459853] amdgpu: [powerplay]
                last message was failed ret is 0
[   43.809519] amdgpu: [powerplay]
                failed to send message 260 ret is 0
[   43.809611] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:02:00.0 on minor 0
[   43.809974] [drm] Initialized i915 1.6.0 20180719 for 0000:00:02.0 on minor 1
[   43.825122] [drm] Cannot find any crtc or sizes
[   43.830205] [drm] Cannot find any crtc or sizes
[   43.835234] [drm] Cannot find any crtc or sizes
[   46.272965] amdgpu: [powerplay]
                last message was failed ret is 0
[   46.653008] amdgpu: [powerplay]
                failed to send message 154 ret is 0
[   47.658171] [drm:amdgpu_uvd_ring_test_ib [amdgpu]] *ERROR* amdgpu: (0)IB test timed out.
[   47.658205] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 12 (-110).
[   48.209963] amdgpu: [powerplay]
                last message was failed ret is 0
[   48.561668] amdgpu: [powerplay]
                failed to send message 15a ret is 0
[   48.561885] [drm:process_one_work] *ERROR* ib ring test failed (-110).
[   49.942961] amdgpu: [powerplay]
                last message was failed ret is 0
[   50.292627] amdgpu: [powerplay]
                failed to send message 15b ret is 0
[   50.957807] amdgpu: [powerplay]
                last message was failed ret is 0
[   51.307466] amdgpu: [powerplay]
                failed to send message 155 ret is 0


This only becomes an issue after the system boots when I run the sensors program found in the lm_sensors package. The sensors program works but causes a temporary freeze spams the message buffer with:

Code:


[13835.186423] amdgpu: [powerplay]
                last message was failed ret is 0
[13835.537600] amdgpu: [powerplay]
                failed to send message 282 ret is 0
[13835.888783] amdgpu: [powerplay]
                last message was failed ret is 0
[13836.239905] amdgpu: [powerplay]
                failed to send message 170 ret is 0
[13836.591997] amdgpu: [powerplay]
                last message was failed ret is 0
[13836.943289] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13837.295225] amdgpu: [powerplay]
                last message was failed ret is 0
[13837.646192] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13837.998352] amdgpu: [powerplay]
                last message was failed ret is 0
[13838.349518] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13838.701678] amdgpu: [powerplay]
                last message was failed ret is 0
[13839.052848] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13839.404880] amdgpu: [powerplay]
                last message was failed ret is 0
[13839.755661] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13840.107747] amdgpu: [powerplay]
                last message was failed ret is 0
[13840.459014] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13840.811228] amdgpu: [powerplay]
                last message was failed ret is 0
[13841.162384] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13841.514520] amdgpu: [powerplay]
                last message was failed ret is 0
[13841.865626] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13842.217726] amdgpu: [powerplay]
                last message was failed ret is 0
[13842.569209] amdgpu: [powerplay]
                failed to send message 171 ret is 0
[13842.921494] amdgpu: [powerplay]
                last message was failed ret is 0
[13843.272727] amdgpu: [powerplay]
                failed to send message 171 ret is 0


I am at a loss of what I should do any to tackle this problem and suggestion or information would be greatly appreciated[/code]
Back to top
View user's profile Send private message
davee
n00b
n00b


Joined: 25 May 2019
Posts: 1

PostPosted: Sat May 25, 2019 9:38 pm    Post subject: Reply with quote

Hey,

I have had a similar issue with AMDGPU with my R9 290 (Sea Islands). While I don't have the same 15 second freeze on boot, I do the get the intermittent freezes during normal usage of my system. I have also narrowed this down to the lm_sensors package, and specifically the issue occurs after a failure to read the fan1 state.

Code:
# sensors -u
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:
  in0_input: 1.000
fan1:
ERROR: Can't get value of subfeature fan1_input: Can't read
temp1:
  temp1_input: 65.000
  temp1_crit: 104000.000
  temp1_crit_hyst: -273.150
power1:
  power1_average: 66.165
  power1_cap: 225.000


When this happens, I also get a similar powerplay error message in the amdgpu driver:
Code:
amdgpu: [powerplay]
 failed to send message 282 ret is 254


While this error has always been displayed for me, the freezing issue has only appeared for me after updating kernel from 4.19.27 to 4.19.44. I am looking for a more precise cause for this, but so far I have not found anything. Did you manage to get any further with your issue?
Back to top
View user's profile Send private message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 668

PostPosted: Sun May 26, 2019 9:17 pm    Post subject: Reply with quote

FWIW I too get a very annoying 15 sec freeze on booting. Mine is a Radeon RX570. Thanks for the hints about lm-sensors - I'll dig a little. AFAIR I got rid of that package because my old AMD Phenom motherboard tells lies, rendering lm-sensors useless.
_________________
Greybeard
Back to top
View user's profile Send private message
miiichael
n00b
n00b


Joined: 12 Jun 2019
Posts: 1

PostPosted: Wed Jun 12, 2019 8:41 am    Post subject: Reply with quote

Hi,

For the benefit of posters here, and googlers in general, here are my discoveries. R9 290 on Debian (shhh, don't tell anyone!). 4.19.0 AMD64 kernel.

Anyway, I've just noticed that when something touches /sys/class/hwmon/hwmon3/power1_average is the cause of the kernel error messages I get:

Code:
root@joyola:/home/michael# time cat "/sys/class/hwmon/hwmon3/power1_average";tail /var/log/kern.log|grep $(date +%T)
32140000

real    0m0.498s
user    0m0.000s
sys     0m0.497s
Jun 12 16:26:00 joyola kernel: [399556.316094] amdgpu: [powerplay]
Jun 12 16:26:00 joyola kernel: [399556.316094]  failed to send message 282 ret is 254


I found this out by strace'ing /usr/bin/sensors, which on my system is invoked half a dozen times every five minutes via munin-node.

This does confirm suspicions that this is a kernel issue (as opposed to the xorg driver, or other ancillary libraries, etc).

I can't comment on boot delays, as I don't really reboot often enough to be sure (plus I think my boot delay problems relate mostly to both eth0 and my ethernet over power waiting for the other to wake up before waking themselves up...).

Edited to add: BTW I have "radeon.cik_support=0 amdgpu.cik_support=1 radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc_log=1 amdgpu.dc=0" set, if that matters.
Back to top
View user's profile Send private message
TigerJr
Guru
Guru


Joined: 19 Jun 2007
Posts: 504
Location: /dev/x0

PostPosted: Mon Jun 24, 2019 2:25 pm    Post subject: Reply with quote

miiichael wrote:
Hi,

For the benefit of posters here, and googlers in general, here are my discoveries. R9 290 on Debian (shhh, don't tell anyone!). 4.19.0 AMD64 kernel.

Anyway, I've just noticed that when something touches /sys/class/hwmon/hwmon3/power1_average is the cause of the kernel error messages I get:

Code:
root@joyola:/home/michael# time cat "/sys/class/hwmon/hwmon3/power1_average";tail /var/log/kern.log|grep $(date +%T)
32140000

real    0m0.498s
user    0m0.000s
sys     0m0.497s
Jun 12 16:26:00 joyola kernel: [399556.316094] amdgpu: [powerplay]
Jun 12 16:26:00 joyola kernel: [399556.316094]  failed to send message 282 ret is 254


I found this out by strace'ing /usr/bin/sensors, which on my system is invoked half a dozen times every five minutes via munin-node.

This does confirm suspicions that this is a kernel issue (as opposed to the xorg driver, or other ancillary libraries, etc).

I can't comment on boot delays, as I don't really reboot often enough to be sure (plus I think my boot delay problems relate mostly to both eth0 and my ethernet over power waiting for the other to wake up before waking themselves up...).

Edited to add: BTW I have "radeon.cik_support=0 amdgpu.cik_support=1 radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc_log=1 amdgpu.dc=0" set, if that matters.



grep $(date +%T) is not right for finding reasons of kernel messages, i think, but im shure error is in amdgpu kernel driver,

just try modern kernel 5.0.x revision and if error repeats again post message here


cat /sys/class/hwmon/hwmon3/in0_input

Have you got same message if you get current voltage ?
_________________

Do not update portage without hotdog!

Xenogentooway?
Back to top
View user's profile Send private message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 668

PostPosted: Wed Jun 26, 2019 3:05 pm    Post subject: Reply with quote

You may find my partial solution of interest.
_________________
Greybeard
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum