Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
LSI3108 driver failure -- disks not available after boot
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
dbishop
Tux's lil' helper
Tux's lil' helper


Joined: 08 Dec 2007
Posts: 99

PostPosted: Wed Jan 13, 2016 2:26 pm    Post subject: LSI3108 driver failure -- disks not available after boot Reply with quote

I have a new system that uses a Supermicro AOM-S3108M-H8 SAS controller based on an LSI 3108 "invader" part.

The F/W seems correct when loading as BIOS option ROM, correctly recognizes the disks attached. When my kernel loads (gentoo-sources 4.1.12) the megaraid_sas driver loads but reports this in dmesg:

Code:
machine ~ # dmesg | grep -C 2 megasas
[    7.709375] hid-generic 0003:0557:2419.0004: input: USB HID v1.00 Mouse [HID 0557:2419] on usb-0000:00:14.0-13.1/input1
[    8.957241] random: systemd-udevd urandom read with 70 bits of entropy available
[    9.616812] megasas: 06.806.08.00-rc1
[    9.616888] megasas: 0x1000:0x005d:0x15d9:0x0809: bus 3:slot 0:func 0
[    9.617442] megasas: Waiting for FW to come to ready state
[    9.629500] megasas: FW in FAULT state!!
[    9.629503] megaraid_sas 0000:03:00.0: megasas: FW restarted successfully from megasas_init_fw!
[    9.807533] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k
[    9.807534] ixgbe: Copyright (c) 1999-2014 Intel Corporation.
--
[   11.170768] ixgbe 0000:01:00.0 dmz: renamed from eth0
[   23.430286] random: nonblocking pool is initialized
[   39.662901] megasas: Waiting for FW to come to ready state
[   39.662903] megasas: FW in FAULT state!!


The output from lspci is this:

Code:
machine ~ # lspci -s 03:00.0 -vvxxx
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
        Subsystem: Super Micro Computer Inc MegaRAID SAS-3 3108 [Invader]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 26
        Region 0: I/O ports at 5000 [size=256]
        Region 1: Memory at c7300000 (64-bit, non-prefetchable) [size=64K]
        Region 3: Memory at c7200000 (64-bit, non-prefetchable) [size=1M]
        Expansion ROM at c7100000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [d0] Vital Product Data
                Unknown small resource type 00, will not decode more.
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [c0] MSI-X: Enable- Count=97 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [1e0 v1] #19
        Capabilities: [1c0 v1] Power Budgeting <?>
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel modules: megaraid_sas
00: 00 10 5d 00 03 01 10 00 02 00 04 01 08 00 00 00
10: 01 50 00 00 04 00 30 c7 00 00 00 00 04 00 20 c7
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 09 08
30: 00 00 10 c7 50 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 68 03 06 08 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 03 00 00 10 d0 02 00 25 80 00 10
70: 20 28 09 00 83 54 41 00 40 00 83 10 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00
90: 00 00 00 00 0e 00 00 00 03 00 1e 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 05 c0 80 01 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 11 00 60 00 01 e0 00 00 01 f0 00 00 00 00 00 00
d0: 03 a8 00 80 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Errata:

Code:
machine ~ # uname -a
Linux bolan 4.1.12-gentoo #1 SMP Wed Jan 13 08:48:15 XST 2016 x86_64 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux


Code:
machine ~ # lsmod
Module                  Size  Used by
ixgbe                 158809  0
x86_pkg_temp_thermal     2896  0
megaraid_sas           90578  0
mdio                    2711  1 ixgbe
tpm_tis                 7733  0
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2288
Location: near Augsburg, Germany

PostPosted: Thu Jan 14, 2016 7:21 am    Post subject: Reply with quote

What is FW?

The possible cause is:

1. Mainboard defective (PCIe bus)
2. Controller defective
Back to top
View user's profile Send private message
dbishop
Tux's lil' helper
Tux's lil' helper


Joined: 08 Dec 2007
Posts: 99

PostPosted: Thu Jan 14, 2016 12:33 pm    Post subject: A Workaround Reply with quote

I believe that "FW" in this case means "firmware".

I did some additional research on this and got some help from one of the excellent Gentoo dev's who had seen this specific problem some time ago.

The workaround is to simply suffer the time-wasting boot sequence by turning on the BIOS -- but for me it only works if the firmware is configured in "legacy" mode, not in "EFI" mode (which appears to be an issue with Supermicro's BIOS, not the LSI/Agere code).

The history is this: This particular firmware is known to have had problems from the beginning. I believe that it was the FW's authors' expectation that these cards would be used by systems that needed to know ahead of system boot BIOS (and now EFI) what disks were available pre-boot since they may have been hardware-raided (with a proprietary method) in case they were the boot drives. It seems they never expected that their boot option ROM code would not get executed first.

Since pointlessly tolerating this time-sucking OpROM ballet is not desirable for me -- boot time is critical and the disks are non-raided data-only -- I turned the MegaRaid OpROM off. This is neither an unusual nor a discouraged practice -- like turning off PXE in machines that will never pixie-boot. Scanning disks ahead of boot serves no useful purpose for me. Frustratingly, in the case of the Linux kernel and megaraid_sas driver, not doing this causes the firmware to "fail" -- which I believe in this case means "confused state" with respect to the kernel and driver. There have been several driver work-arounds tried but clearly they aren't effective. The driver maintainer stated a while back that the real fix needs to be done in the 3108's firmware. It is my opinion that this will never happen, because Avago couldn't care less about open markets like the Linux market.

The most disturbing and disappointing problem is that Avago seems to be sequestering all Linux support. They do have a CLI control interface but it is fetch-restricted in Portage, and the link in the ebuild is broken. It took a while to come up with the right search key on Avago's site to get the file to betray its location. This mindset seems to be driven by the same megalomania that grips the likes of Broadcom and Qualcomm. They cannot stand the notion that someone would pay good money to use their wares in a way they can't dictate and forever subjugate. I avoid all three companies' goods where ever possible, especially the latter two. I'll change when they change.

I guess for me I'll have to find an alternative, but for now it works provided I let the thing's OpROM run ahead of hardware booting.


Last edited by dbishop on Fri Jan 15, 2016 12:20 pm; edited 2 times in total
Back to top
View user's profile Send private message
Keruskerfuerst
Advocate
Advocate


Joined: 01 Feb 2006
Posts: 2288
Location: near Augsburg, Germany

PostPosted: Fri Jan 15, 2016 6:15 am    Post subject: Reply with quote

Is there any update for the controller firmware availaible?
Back to top
View user's profile Send private message
dbishop
Tux's lil' helper
Tux's lil' helper


Joined: 08 Dec 2007
Posts: 99

PostPosted: Fri Jan 15, 2016 12:13 pm    Post subject: Reply with quote

It is already running the latest firmware. I am sending a request to Avago to see if they want to do something about this but given that they've known about this for years (going back to the LSI days) I am not especially hopeful.

What I can say is this: As of this post, if you're using Linux -- not just Gentoo since this is an SoC's firmware limitation/bug -- I would strongly advise against any LSI 3008, LSI 3108 based SoC designs -- which is to say, avoid anything that runs the megaraid_sas driver.

Areca's products, while not perfect, are much better solutions. For me, I need a built-in/on-board solution. My application is power and size restricted, and boot time is critical. So I may need to avoid SAS, and maybe even Supermicro altogether, since SAS is almost always an add-in PCIe solution these days.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum