Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AMD TR 2950x qemu/kvm hard locks the kernel
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
myG3nt00
n00b
n00b


Joined: 23 Dec 2012
Posts: 2

PostPosted: Mon Jul 29, 2019 8:51 am    Post subject: AMD TR 2950x qemu/kvm hard locks the kernel Reply with quote

Hi!

I bought an AMD TR2 system to use it with kvm for virtualizing a couple of machines, one of which is a Windows Server 2016. Once a day the kernel hard locks and not even the Magic SysRq keys are working. The system is as following:

CPU: AMD ThreadRipper 2950x
MB: Gigabyte X399 AORUS PRO (BIOS version: F2g, Update AGESA 1.1.0.2)
NAND: 2x Samsung 970 EVO Plus - 500G
OS: Arch Linux
Kernels: 5.2.3-arch1-1-ARCH, 4.19.61-1 (LTS)

I had before, 2 other no-name NAND disks that were crashing the kernel in the same way every time I was copying from one another. Now, I cannot reproduce the issue with the new Samsung NAND but the system still hard locks once a day. The only way to start it is with a hard-reset.

Reading through various forums I tried the following:
1. Disable/enable IOMMU in BIOS (various kernel params: amd_iommu=on, amd_iommu=pt, amd_iommu=soft)
2. kernel params for nvme ASP issues: nvme_core.default_ps_max_latency_us=0
3. Tried latest kernel and linux-lts
4. Compiled kernel with IOMMU debugging options, pci debugging, etc. Enabled panic for all OOPS to be able to catch the defect
5. Enabled kernel dump for OOPS
6. My current kernel parameters are:
kernel.nmi_watchdog = 0 hugepagesz=1G hugepages=48 processor.max_cstate=1 rcu_nocbs=0-31 idle=nomwait nvme_core.default_ps_max_latency_us=0 clocksource=hpet amd_iommu=on iommu=pt pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 rd.driver.pre=vfio-pci netconsole=6665@192.168.2.20/br0,6666@192.168.2.1/c0:25:e9:0f:2a:e3 audit=0 loglevel=8 quiet

No matter the configuration, the hang is always the same: no magic sysrq, no logs, no dump.
I am a developer but not a kernel developer so I am asking nicely if there is any way that I can catch this hard lock in order to understand what the BUG is or what the hardware issue is.
Is there any Gentoo or generic documentation on how to log these hard locks?

Thank you!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum