Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
grub gets 'stucked' in discrete graphic device
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Installing Gentoo
View previous topic :: View next topic  
Author Message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Sat Dec 14, 2019 9:19 am    Post subject: grub gets 'stucked' in discrete graphic device Reply with quote

Hi,
It's my first topic here.

After installing as the manual, grub2 stucked after 'Loading initial ramdisk ...' prompt. Keyboard cannot input. cursor is not blinking.
I've googled but not exact problem found.

I guessed maybe the grub is stucked or I don't have a working terminal (kernel graphics problem)

2. grub is installed in efi mode. GRUB_FLATFORMS='efi-64"
3. lspci http://dpaste.com/0PAYDMG
4. kernel config. http://dpaste.com/2M8WGA8
5. partitions http://dpaste.com/2MVAJ6B
6. grub.cfg http://dpaste.com/23FWDKR

In BIOS, there are two 'graphic device' setting : "discrete graphics " and "switchable graphics" .
If "discrete graphics " is chosen, grub stucks; maybe graphics driver failed; i don't know.
If "switchable graphics" is chosen, grub can boot the kernel.

How can i got a working system in this 'discrete graphics' (NVIDA RTX2060)?

Thanks very much!

----------- Trashing information what i have done ----
1. I added some grub_dprintf debug messages into the grub code,
Code:

grub_err_t
grub_efi_finish_boot_services (grub_efi_uintn_t *outbuf_size, void *outbuf,
                grub_efi_uintn_t *map_key,
                grub_efi_uintn_t *efi_desc_size,
                grub_efi_uint32_t *efi_desc_version)
{
  grub_efi_boot_services_t *b;
  grub_efi_status_t status;

#if defined (__i386__) || defined (__x86_64__)
  const grub_uint16_t apple[] = { 'A', 'p', 'p', 'l', 'e' };
  int is_apple;

  is_apple = (grub_memcmp (grub_efi_system_table->firmware_vendor,
            apple, sizeof (apple)) == 0);
#endif

  while (1)
    {
      grub_dprintf("efi", "mmap_size=0x%x, buf=0x%x,desc_sz=0x%x\n", finish_mmap_size, finish_mmap_buf, finish_desc_size);
      if (grub_efi_get_memory_map (&finish_mmap_size, finish_mmap_buf, &finish_key,
               &finish_desc_size, &finish_desc_version) < 0)
   return grub_error (GRUB_ERR_IO, "couldn't retrieve memory map");

      if (outbuf && *outbuf_size < finish_mmap_size)
   return grub_error (GRUB_ERR_IO, "memory map buffer is too small");

      finish_mmap_buf = grub_malloc (finish_mmap_size);
      if (!finish_mmap_buf)
   return grub_errno;

      if (grub_efi_get_memory_map (&finish_mmap_size, finish_mmap_buf, &finish_key,
               &finish_desc_size, &finish_desc_version) <= 0)
   {
     grub_free (finish_mmap_buf);
     return grub_error (GRUB_ERR_IO, "couldn't retrieve memory map");
   }

      b = grub_efi_system_table->boot_services;
      grub_dprintf("efi", "before efi_call_2 0x%x 0x%x\n", b, b->exit_boot_services);
      status = efi_call_2 (b->exit_boot_services, grub_efi_image_handle,
            finish_key);
      grub_dprintf("efi", "exit_boot_services return 0x%x\n", status);
      if (status == GRUB_EFI_SUCCESS)
   break;

      if (status != GRUB_EFI_INVALID_PARAMETER)
   {
     grub_free (finish_mmap_buf);
     return grub_error (GRUB_ERR_IO, "couldn't terminate EFI services");
   }

      grub_free (finish_mmap_buf);
      grub_printf ("Trying to terminate EFI services again\n");
    }
  grub_efi_is_finished = 1;
  if (outbuf_size)
    *outbuf_size = finish_mmap_size;
  if (outbuf)
    grub_memcpy (outbuf, finish_mmap_buf, finish_mmap_size);
  if (map_key)
    *map_key = finish_key;
  if (efi_desc_size)
    *efi_desc_size = finish_desc_size;
  if (efi_desc_version)
    *efi_desc_version = finish_desc_version;

#if defined (__i386__) || defined (__x86_64__)
  if (is_apple)
    stop_broadcom ();
#endif

  return GRUB_ERR_NONE;
}

and enable the message print.
Code:
set debug="linux efi err"

it prints as grub is stucked in a loop; (I don't know how to attach a picture so I type them:

kern/efi/mm.c 170 mmap_size=0x8d0,buf=0x1eb8d9e0,desc_sz=0x30
kern/efi/mm.c 192 before efi_call2 0x324eb.. 0x...
kern/efi/mm.c 196: exit_boot_services return 0x2
kern/efi/mm.c 207: Trying to terminate EFI services again
kern/efi/mm.c 170 mmap_size=0x8d0,buf=0x1eb8d9e0,desc_sz=0x30
kern/efi/mm.c 192 before efi_call2 0x324eb.. 0x...
kern/efi/mm.c 196: exit_boot_services return 0x2
kern/efi/mm.c 207: Trying to terminate EFI services again
... loop forever...

Firstly i was mislead by it.. i thought the grub is stuck in a infinite loop; but when I removed the 'set debug="efi"', system boot ok! (grub boots and system login prompts.)
finally i found that it only depends on the 'graphic device' selection whether or not the system can boot with a working terminal. these dprintf seem have bad side effect on the boot process.
Back to top
View user's profile Send private message
Ionen
l33t
l33t


Joined: 06 Dec 2018
Posts: 691

PostPosted: Sat Dec 14, 2019 10:16 am    Post subject: Reply with quote

Maybe someone else has a better grub-specific answer, but there's always the option to not use Grub. I liked replacing lilo with grub back on my MBR setups but with a UEFI system I've been preferring to avoid it. Can either boot a EFI stub kernel directly making your bios is the boot manager with no real in-between (nice if it has a easy-to-use boot menu like after pressing a key -- note that if always use same kernel location(s) there's no need to re-edit efi variables), or if you want/need something more powerful there's rEFInd, I never tried it but I typically hear good things about it.

Also, welcome to Gentoo and the forums :)

Edit: I guess I should've caught it was really a kernel issue :oops:


Last edited by Ionen on Sat Dec 14, 2019 12:59 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45383
Location: 56N 3W

PostPosted: Sat Dec 14, 2019 12:11 pm    Post subject: Reply with quote

thomasZhang,

Welcome to Gentoo.

If Grub can boot the kernel at all, grub is doing its thing. Its a kernel problem.
'Loading initial ramdisk ...' is the last message from grub.
After that it jumps to the kernel and its just the kernel and initrd in memory to bring up your system.

When "switchable graphics" is chosen both your Intel and nVidia GPUs can be used.

But lspci only shows
Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106BM [GeForce RTX 2060 Mobile] [10de:1f51] (rev a1)


lspci says
Code:
Intel Corporation Cannon Lake
but your kernel is missing CONFIG_PINCTRL_CANNONLAKE.
That's hidden in the menu under
Code:
# CONFIG_PINCTRL is not set

Some things in your system will not work without that.

When you fix your kernel, also turn on
Code:
# CONFIG_FB_EFI is not set
# CONFIG_FB_SIMPLE is not set


I think your system is booting properly with either BIOS setting but with discrete graphics, you can't see it.
If you have another system, boot so you have a console. Set up ssh and connect via ssh, so you know that ssh works.
Flip the BIOS setting and boot. Does ssh work now?

When you post back' the output of
Code:
uname -a
and
Code:
lspci -nnk

Will be useful. Also put your
Code:
dmesg
output onto a pastebin please.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Sat Dec 14, 2019 12:35 pm    Post subject: Reply with quote

Ionen wrote:
Maybe someone else has a better grub-specific answer, but there's always the option to not use Grub. I liked replacing lilo with grub back on my MBR setups but with a UEFI system I've been preferring to avoid it. Can either boot a EFI stub kernel directly making your bios is the boot manager with no real in-between (nice if it has a easy-to-use boot menu like after pressing a key -- note that if always use same kernel location(s) there's no need to re-edit efi variables), or if you want/need something more powerful there's rEFInd, I never tried it but I typically hear good things about it.

Also, welcome to Gentoo and the forums :)


Thanks for your tips on the 'efi stub'. i will try it after i figure out this issue.
Back to top
View user's profile Send private message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Sat Dec 14, 2019 1:09 pm    Post subject: Reply with quote

Hi Neddy,
Thanks.
choose 'discrete graphics' in bios, boot with the 'gentoo minimal install' media, lspci only shows the NVIDIA . I tried and found 'gentoo minimal install' media cannot work with 'switchable graphics'.

dmesg from 'gentoo minimal install' system : http://dpaste.com/2X2J6PX
lspci -nnk from 'gentoo minimal install' sytem http://dpaste.com/2NSKV6P

Thanks for you info on the kernel config. I've added it after your post.

after i setup the network succesfully, i will check with ssh.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45383
Location: 56N 3W

PostPosted: Sat Dec 14, 2019 1:55 pm    Post subject: Reply with quote

thomasZhang,

I think I'm misunderstanding.

A read your post as saying that you had successfully installed Gentoo but your Gentoo install will only work with the BIOS settnigs one way.
Is that correct or are you still booting the liveCD to install?

I was wanting to see your dmesg and your lspci -nnk and your uname -a from your own Gentoo install.
The k in lspci -nnk tells the kernel modules being used to drive the hardware.
Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106BM [GeForce RTX 2060 Mobile] [10de:1f51] (rev a1)
   Subsystem: Lenovo TU106BM [GeForce RTX 2060 Mobile] [17aa:3fee]
01:00.1 Audio device ...
shows that there is no driver in use for your nVidia card when you boot the liveCD.

If you are still installing, boot the liveCD with whatever works and make your install.
We can make it work with either BIOS setting later.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Sat Dec 14, 2019 2:38 pm    Post subject: Reply with quote

Hi Neddy,

Boot my system in the harddisk with 'switchable graphics', lscpi shows two vga controller; seems in 'discrete graphics' mode, BIOS disabled one controller.

lspci , uname : http://dpaste.com/16W7Z9J
dmesg http://dpaste.com/09YZFVB

you are right! System boots normally in both settings.
lspci http://dpaste.com/1QN3T0Q
dmesg http://dpaste.com/1HY18X4

Code:
[    8.255987] nouveau 0000:01:00.0: unknown chipset (166000a1)
[    8.255993] nouveau: probe of 0000:01:00.0 failed with error -12


graphic driver issue.

Thanks very much for your help! Neddy and Ionen.

[Moderator edit: added [code] tags to preserve output layout. -Hu]
Back to top
View user's profile Send private message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Sat Dec 14, 2019 2:50 pm    Post subject: Reply with quote

NeddySeagoon wrote:
thomasZhang,

I think I'm misunderstanding.

A read your post as saying that you had successfully installed Gentoo but your Gentoo install will only work with the BIOS settnigs one way.
Is that correct or are you still booting the liveCD to install?

I was wanting to see your dmesg and your lspci -nnk and your uname -a from your own Gentoo install.
The k in lspci -nnk tells the kernel modules being used to drive the hardware.
Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106BM [GeForce RTX 2060 Mobile] [10de:1f51] (rev a1)
   Subsystem: Lenovo TU106BM [GeForce RTX 2060 Mobile] [17aa:3fee]
01:00.1 Audio device ...
shows that there is no driver in use for your nVidia card when you boot the liveCD.

If you are still installing, boot the liveCD with whatever works and make your install.
We can make it work with either BIOS setting later.


Neddy,
my input is a bit of messy. and i am not good at English. installation went to the last step (grub installed and reboot) when I posted; 'discrete' is the default setting in BIOS. so I stopped before this 'blind' boot; then I tried and tried ,found that 'switchable' can give me a working console.

anyway, now, i've finished the installation and get the wired network ready. result shows that it's a graphic driver issue; i will try the propietary driver from NVIDIA...
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45383
Location: 56N 3W

PostPosted: Sat Dec 14, 2019 3:18 pm    Post subject: Reply with quote

thomasZhang,

Code:
#6 SMP Sat Dec 14 21:14:22 -00 2019

Thats good, its today. Many posters here fix their problems by rebuilding the kernel, then don't know it because they boot the old kernel.
It can save many hours by checking this.

Code:
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 630 (Mobile) [8086:3e9b]
   Subsystem: Lenovo UHD Graphics 630 (Mobile) [17aa:3fee]
   Kernel driver in use: i915

That's what I suspected. Its good to see it confirmed.
The i915 kernel module gets you a framebuffer console.
Code:
[    8.053890] fbcon: inteldrmfb (fb0) is primary device

nouveau does something similar when it works.

While we are here
Code:
00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 07)
   Subsystem: Lenovo Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [17aa:3844]
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model [8086:1911]
   Subsystem: Lenovo Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model [17aa:386c]
00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379] (rev 10)
   Subsystem: Lenovo Cannon Lake PCH Thermal Controller [17aa:383d]

You are missing the drivers for your thermal subsysem
Putting 8086:1903 kernel into Google says that you need
Code:
CONFIG_INT340X_THERMAL
CONFIG_INTEL_PCH_THERMAL

I don't get any hits for 8086:1911.

Code:
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 [8086:a368] (rev 10)
   Subsystem: Lenovo Cannon Lake PCH Serial IO I2C Controller [17aa:382f]
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 [8086:a369] (rev 10)
   Subsystem: Lenovo Cannon Lake PCH Serial IO I2C Controller [17aa:3830]
and I2C subsystem.
Requires
Code:
CONFIG_MFD_INTEL_LPSS_PCI


I'm not sure if
Code:
00:1e.0 Communication controller [0780]: Intel Corporation Device [8086:a328] (rev 10)
   Subsystem: Lenovo Device [17aa:380b]
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:a30d] (rev 10)
   Subsystem: Lenovo Device [17aa:3805]
need drivers.
I was trying to cheat by comparing with the
Code:
lspci -nnk
from the liveCD but its no help.

That's all the easy ones.

Code:
[ 8.255987] nouveau 0000:01:00.0: unknown chipset (166000a1)
[ 8.255993] nouveau: probe of 0000:01:00.0 failed with error -12

That means that the kernel does not know how to drive your nVidia GPU.
Try a newer kernel. The Linux version 4.19.86-gentoo you have is a Long Term Support (LTS) kernel.
Unmask the testing gentoo-sources. Its version 5.4.x today.

The nvidia-drivers binary driver is for Xorg only. It does not provide a console. If you want to use nvidia-drivers, you must use either the plain old VGA console or the EFI or Simple framebuffers.
First step is a more up to date kernel with those extra options enabled.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Sun Dec 15, 2019 5:16 pm    Post subject: Reply with quote

Neddy,
thanks for your kind help.

graphic is corrupted in 5.4.1. It's ok if 'nomodeset' kernel parameter is added.

Code:
uname -a
Linux tgt 5.4.1-gentoo #1 SMP Mon Dec 16 00:09:13 CST 2019 x86_64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz GenuineIntel GNU/Linux


.config: http://dpaste.com/2P0JV5Y
lspci : [url] http://dpaste.com/1Q2R0YR[/url]


i back to check the kernel 4.9.x , (with the new grub cfg during 5.4.1 installation), console is working in both 'discrete'/'switchable' mode;
i compared the grub cfg. ' gfxpayload=text' existed in yesterday's cfg, it affected the console showing.

so, the truth is that i missed the framebuffer CONFIG in kernel? ;)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45383
Location: 56N 3W

PostPosted: Sun Dec 15, 2019 6:23 pm    Post subject: Reply with quote

thomasZhang,

Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106BM [GeForce RTX 2060 Mobile] [10de:1f51] (rev a1)
   Subsystem: Lenovo TU106BM [GeForce RTX 2060 Mobile] [17aa:3fee]
   Kernel driver in use: nouveau


That says that the nouveau kernel module is in use for your nVidia GPU. That's good.

You have a USB-C port
Code:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller [10de:1adb] (rev a1)
   Subsystem: Lenovo TU106 USB Type-C Port Policy Controller [17aa:3ffe]

That needs the kernel option
Code:
# CONFIG_TYPEC is not set
set to on, if you want to use it.

The rest looks good.

nomodeset defeats the purpose. nouveau uses kernel modesettnig and the nomodeset turns that off.
Xorg won't work very well with nomodeset

What does dmesg show with your 5.4.1 kernel without the nomodeset option.

The kernel nouveau driver in your 4.19 kernel is too old for your GPU, hence no console at all.
It looks like nouveau is 5.4.1 is better but may not yet be good enough. Your dmesg will tell more.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
thomasZhang
n00b
n00b


Joined: 14 Dec 2019
Posts: 7

PostPosted: Mon Dec 16, 2019 3:23 pm    Post subject: Reply with quote

NeddySeagoon wrote:
thomasZhang,

Code:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106BM [GeForce RTX 2060 Mobile] [10de:1f51] (rev a1)
   Subsystem: Lenovo TU106BM [GeForce RTX 2060 Mobile] [17aa:3fee]
   Kernel driver in use: nouveau


That says that the nouveau kernel module is in use for your nVidia GPU. That's good.

You have a USB-C port
Code:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller [10de:1adb] (rev a1)
   Subsystem: Lenovo TU106 USB Type-C Port Policy Controller [17aa:3ffe]

That needs the kernel option
Code:
# CONFIG_TYPEC is not set
set to on, if you want to use it.

The rest looks good.

nomodeset defeats the purpose. nouveau uses kernel modesettnig and the nomodeset turns that off.
Xorg won't work very well with nomodeset

What does dmesg show with your 5.4.1 kernel without the nomodeset option.

The kernel nouveau driver in your 4.19 kernel is too old for your GPU, hence no console at all.
It looks like nouveau is 5.4.1 is better but may not yet be good enough. Your dmesg will tell more.


Neddy,
Sorry for response late.

here is the dmesg : http://dpaste.com/3GCRV57
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45383
Location: 56N 3W

PostPosted: Mon Dec 16, 2019 5:17 pm    Post subject: Reply with quote

thomasZhang,

Code:
[    8.260667] fb0: switching to nouveaufb from EFI VGA

That looks OK.

Code:
[    8.916301] nouveau 0000:01:00.0: disp: chid 1 stat 00005080 reason 5 [INVALID_STATE] mthd 0200 data 00000001 code 0000002e
...
[   10.915590] nouveau 0000:01:00.0: DRM: core notifier timeout
[   12.915585] nouveau 0000:01:00.0: DRM: wndw-0: timeout

I don't know if that matters or not.

Google hints that it does matter and its kernel related.
Two things to try. Downgrade your kernel to 5.3.x
Try the lastest release candidate from kernel.org that's 5.5-rc2 today.
rc kernels are expected to have bugs. Some of them can be nasty too, so the downgrade is the preferred option.

Downgrade further if you to too but stay in the 5 series kernels. Any kernel that reports youl video card as UNKNOWN, is too old.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Installing Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum