Why Interrupt is catched in kernel, but not handled ?
PostPosted: Thu Nov 28, 2019 8:02 am


I am struggling with the following pci issue, I hope someone will have an idea.

I try to use uio_pci_generic driver with custom FPGA (Xilinx), ATOM cpu, kernel 4.18.16 ( using:
echo "10ee 0007" > /sys/bus/pci/drivers/uio_pci_generic/new_id

I use userspace application which wait for interrupt, just as described in code example here:

I than trigger an interrupt from FPGA, but no print from the userspace application is given and there is an exception:
irq 23: nobody cared (try booting with the "irqpoll" option)
[   91.030760] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.18.16 #6
[   91.037037] Hardware name:  /conga-MA5, BIOS MA50R000 10/30/2019
[   91.043302] Call Trace:
[   91.045881]  <IRQ>
[   91.048002]  dump_stack+0x5c/0x80
[   91.051464]  __report_bad_irq+0x35/0xaf
[   91.055465]  note_interrupt.cold.9+0xa/0x63
[   91.059823]  handle_irq_event_percpu+0x68/0x70
[   91.064470]  handle_irq_event+0x37/0x57
[   91.068481]  handle_fasteoi_irq+0x97/0x150
[   91.072758]  handle_irq+0x1a/0x30
[   91.076230]  do_IRQ+0x44/0xd0
[   91.079333]  common_interrupt+0xf/0xf
[   91.083154]  </IRQ>
[   91.085360] RIP: 0010:cpuidle_enter_state+0x7d/0x220
[   91.090563] Code: e8 18 1a 45 00 41 89 c4 e8 d0 50 b1 ff 65 8b 3d d9 db e5 5f e8 44 4f b1 ff 31 ff 48 89 c3 e8 ea 61 b1 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 eb ba ff ff ff 7f 48 89 d9 48
[   91.110283] RSP: 0018:ffffb20e806b7ea8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffda
[   91.118203] RAX: ffff90133fd214c0 RBX: 00000015316faa09 RCX: 000000000000001f
[   91.125662] RDX: 00000015316faa09 RSI: 0000000000000000 RDI: 0000000000000000
[   91.133133] RBP: ffff90133fd2b300 R08: 0000011741eb842c R09: 0000000000000006
[   91.140605] R10: 00000000ffffffff R11: ffff90133fd205a8 R12: 0000000000000001
[   91.148068] R13: 00000015316f98b9 R14: 0000000000000001 R15: 0000000000000000
[   91.155523]  ? cpuidle_enter_state+0x76/0x220
[   91.160088]  do_idle+0x221/0x260
[   91.163470]  cpu_startup_entry+0x6a/0x70
[   91.167588]  start_secondary+0x1a4/0x1f0
[   91.171676]  secondary_startup_64+0xb7/0xc0
[   91.176043] handlers:
[   91.178419] [<00000000ec05b056>] uio_interrupt
[   91.183054] Disabling IRQ #23

So, I started debugging this issue in kernel uio_pci_generic.c code, and according to the prints below, it seems that irq is catched but not delivered to userspace:

static irqreturn_t irqhandler(int irq, struct uio_info *info)
        struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
printk("i'm here 1\n"); <<--- this is printed
        if (!pci_check_and_mask_intx(gdev->pdev))
                return IRQ_NONE;
printk("i'm here 2\n"); <<--- but this is NOT printed
        /* UIO core will signal the user process. */
        return IRQ_HANDLED;

Reading documentation, I see that pci_check_and_mask_intx is actually checking if interrupt bit is set in configuration space. Since it seems to return 0 ,it should mean that it finds that this bit is not enabled ! But how can it be that irq is triggered and status bit is not enabled in configuration space ?

The device appear as following with lspci -vv:
02:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
        Subsystem: Xilinx Corporation Default PCIe endpoint ID
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 23
        Region 0: Memory at 91200000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [58] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 1, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: uio_pci_generic

I also verified that there are no additional irq numbered 23 except for our device.

How can it be that irq is catched, but not delivered to userspace ?

Is it an issue of FPGA device ? Or a bug in kernel ? Or maybe the interrupts are disabled by default in pci generic uio driver ?

Thank you

[Moderator edit: added [code] tags to preserve output layout. -Hu]
