[SOLVED] Dell E6510, intel_iommu=on causes hang during boot
Tux's lil' helper
Tux's lil' helper

Joined: 04 Jun 2012
Posts: 93
Location: Virginia

PostPosted: Wed May 14, 2014 5:19 pm

I am trying to get PCI passthrough to work on a Dell Latitude E6510 without success. I understand I need to have IOMMU enabled in order to do so. After enabling it in the kernel (and disabling it by default), I can boot the computer fine and OpenRC finishes starting all services fine. When booting, if I modify the kernel parameters to include intel_iommu=on , the OpenRC service starting step hangs at random-ish sections, but no later than "Wiping /tmp directory". I disabled, in rc.conf, parallel service starting to make sure that wasn't a problem. When the system hangs, Ctrl-alt-del does nothing, and the system does not respond to sysreq commands. I see no output of the kernel panicking or crashing with an Oops, so I have no clue as to what is going on. Does anyone have any idea as to what could be going on? What kind of logs should I be looking for? I haven't done in-depth kernel debugging before, so I'm not sure how to use things like netconsole to capture debugging information. As soon as I can get a netconsole log, though, I'll post it up.

With regards to system information, the computer in question is a Dell Latitude E6510, i7 740Q with 8 GB of RAM. It is running Gentoo ~amd64 with OpenRC as the init system, and KDE as the display manager. The kernel is 3.14.4. The kernel config is here on Pastebin.

EDIT: emerge --info:
Portage 2.2.10 (default/linux/amd64/13.0/desktop/kde, gcc-4.8.2, glibc-2.19, 3.14.4-gentoo x86_64)
System uname: Linux-3.14.4-gentoo-x86_64-Intel-R-_Core-TM-_i7_CPU_Q_740_@_1.73GHz-with-gentoo-2.2
KiB Mem:     8099972 total,   7094380 free
KiB Swap:   16777212 total,  16777212 free
Timestamp of tree: Wed, 14 May 2014 12:00:01 +0000
ld GNU ld (GNU Binutils) 2.24                                                                                                                                                                                                 
app-shells/bash:          4.2_p47                                                                                                                                                                                             
dev-java/java-config:     2.2.0                                                                                                                                                                                               
dev-lang/python:          2.7.6-r1, 3.3.5, 3.4.0                                                                                                                                                                               
dev-util/pkgconfig:       0.28-r1                                                                                                                                                                                             
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.12.4
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.13, 2.69
sys-devel/automake:       1.14.1
sys-devel/binutils:       2.24-r2
sys-devel/gcc:            4.8.2
sys-devel/gcc-config:     1.8
sys-devel/libtool:        2.4.2-r1
sys-devel/make:           4.0-r1
sys-kernel/linux-headers: 3.14 (virtual/os-headers)
sys-libs/glibc:           2.19
Repositories: gentoo Local-Overlay
ACCEPT_KEYWORDS="amd64 ~amd64"
CFLAGS="-O2 -pipe -march=corei7 -fomit-frame-pointer"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-O2 -pipe -march=corei7 -fomit-frame-pointer"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync"
FFLAGS="-O2 -pipe"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
USE="X a52 aac acl acpi alsa amd64 berkdb bindist bluetooth branding bzip2 cairo cdda cdr cli consolekit cracklib crypt cups cxx dbus declarative dri dts dvd dvdr emboss encode exif fam firefox flac fortran gdbm gif gpm gtk iconv ipv6 jpeg kde kipi lcms ldap libnotify mad mmx mng modules mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp pam pango pcre pdf phonon plasma png policykit ppds qt3support qt4 readline sdl semantic-desktop session spell sse sse2 sse3 sse4 sse4_1 sse4_2 ssl ssse3 startup-notification svg tcpd tiff truetype udev udisks unicode upower usb vorbis wxwidgets x264 xcb xcomposite xinerama xml xscreensaver xv xvid zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_3 python3_4" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="nouveau" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
USE_PYTHON="2.7 3.3"

EDIT2: I forgot to mention that if I boot the kernel with the added parameters "intel_iommu=on single init=/bin/bash" it does boot to bash fine, except I'm left in a read-only environment as would be expected due to my other kernel parameters (among them "ro"). I think this means the kernel boots fine, but then something goes south later... I wonder if it has to do with disk writes? I don't know why writing to disk would cause a problem though. Just a thought.

Joined: 05 Aug 2006
Posts: 2152
Location: Berlin, Germany

PostPosted: Thu May 15, 2014 1:55 pm

intel_iommu is known to cause problems with nvidia graphics cards. If your laptop has switchable graphics/Optimus, can you try to turn off the nvidia graphics?

Apart from that, you can check if there is a BIOS upgrade available.
Joined: 16 Sep 2005
Posts: 1368
Location: Montréal

PostPosted: Thu May 15, 2014 7:06 pm

I was not able to boot with my personnal kernel configuration of the gentoo-sources on an Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz, where iommu is enable by default. Passing the kernel parameter

make it boot without any problem. Being more specific, the only other kernel parameter that boot, is

It disable iommu only for my Intel video card where was, in all likelihood, the Dma remapping problem. The rest of the iommu features stay on. So, if you enable iommu by default in the kernel configuration and pass intel_iommu=Igfx_off as kernel parameter, it may solve the iommu problem and keep iommu for PCI passthrough.
Tux's lil' helper
Tux's lil' helper

Joined: 04 Jun 2012
Posts: 93
Location: Virginia

PostPosted: Thu May 15, 2014 8:17 pm

I upgraded to newest BIOS/UEFI for this laptop system but it did not help. The laptop does have an NVidia card (a rather old one), so I will be trying disabling it in the near future (UEFI/BIOS setup does not have an option to turn it off). Currently lspci is not returning anything about an Intel VGA or video controller, so it makes me wonder if I have Intel integrated graphics available. I could have just forgotten to enable it in the kernel and as such lspci fails to pick it up. Maybe.

I tried your suggestion of intel_iommu=igfx_off, but it did not work... I wonder if it has something to do with the fact I'm using UEFI and not BIOS? Thanks for the suggestion, though.

I'll keep poking at this as I have time. I guess next on the list of things to get is an actual netconsole log output and maybe enable more debugging output from the kernel. Thanks for the help!
Tux's lil' helper
Tux's lil' helper

Joined: 04 Jun 2012
Posts: 93
Location: Virginia

PostPosted: Fri May 16, 2014 12:54 pm

I still don't have a netconsole output, but I do have something new. I booted the laptop this morning with intel_iommu enabled, and it didn't quite crash, but it wasn't advancing in the boot process. I started to give it Sysrq commands to reboot, and one of them, I think Alt-Sysrq-e, triggered a massive onslaught of messages on my screen. I was able to capture a photo on my phone (scroll lock wasn't working, and any further Sysrq commands were being ignored) and I was able to read the messages:
dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr fffff000
DMAR:[fault reason 02] Present bit in context entry is clear

Some Googling led me to this bug report, which details some problems with bad hardware on a Ricoh card reader-- which I do have in my system. Specifically, device 04:00.0 is:
04:00.0 CardBus bridge: Ricoh Co Ltd CardBus bridge (rev 02)

Yay for bad hardware! So, I disabled the stupid device in BIOS/UEFI (thankfully it had an option there), and the kernel booted just fine. I guess this is a valid workaround, but not a good solution. The Red Hat bug report makes the comment that there isn't a good way to deal with the kind of problems this hardware has, so there may not be a better solution than to work around it. Since the kernel is booting I will mark this thread as solved. If IOMMU doesn't work as intended, I'll open another thread :-) . Thanks for the help!
