kern/55506: gpioctl/mcp23s17gpio0/spi0 stalls on cv spixfr

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/55506: gpioctl/mcp23s17gpio0/spi0 stalls on cv spixfr
From: kardel%netbsd.org@localhost
Date: Tue, 21 Jul 2020 06:30:00 +0000 (UTC)

>Number:         55506
>Category:       kern
>Synopsis:       gpioctl/mcp23s17gpio0/spi0 stalls on cv spixfr
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 21 06:30:00 +0000 2020
>Originator:     Frank Kardel
>Release:        NetBSD 9.99.69
>Organization:
	
>Environment:
System: NetBSD assel 9.99.69 NetBSD 9.99.69 (ASSEL) #1: Mon Jul 20 14:14:32 CEST 2020 kardel@Andromeda:/src/NetBSD/cur/src/obj.evbarm/sys/arch/evbarm/compile/ASSEL evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
	This issue has been observed since NetBSD 8.99.xx
	The board is a Raspberry Pi 2 Model B Rev 1.1
	A program is polling/setting gpio pins on a piface2 board (mcp23s17) gets stuck
	on the cv spixfr after some hours (on 8.99.xx it was after some days).
	In parallel to the gpio polling program the munin monitoring system executes
	"gpioctl <device> <PIN-name>" commands for all pins every 5 minutes.
	These gpioctl programs also get stuck (interruptable) at the gpio layer.

	The driver is waiting in spi_wait and it cannot be interrupted or does not 
	time out. It seems that the state machine in spi.c/bcm2835_spi.c does never
	reach the (st->st_flags & SPI_F_DONE) != 0 state spi_wait waits for.
	Are we missing an interrupt and do we have another issue as a 
	state handling botch or a bug in the usage of the spi API in mcp23s17.c?
	Reducing the clock to 1Mhz from 10Mhz didn't help. Looking at the broadcom
	SPI programming notes did not show any obvious errors in the driver.
	The stack trace is:

  UID   PID  PPID  CPU PRI NI   VSZ   RSS WCHAN   STAT TTY        TIME COMMAND
   80   659     1    0  95  0  8096  1464 spixfr  DXs  ?       0:38.37 /usr/pkg/sbin/gpiomon -S /tmp/gpiomon -l 

crash> bt/t 0t659
trace: pid 659 lid 1 at 0xba8f1b4c
0xba8f1b4c: mi_switch+0xc
0xba8f1b74: sleepq_block+0xb0
0xba8f1b9c: cv_wait+0xa0
0xba8f1bbc: spi_wait+0x3c
0xba8f1c5c: spi_send_recv+0xd4
0xba8f1c84: mcp23s17gpio_read+0x4c
0xba8f1c9c: mcp23s17gpio_gpio_pin_read+0x2c
0xba8f1cf4: gpioioctl+0x1e8
0xba8f1d1c: spec_ioctl+0xa8
0xba8f1d4c: VOP_IOCTL+0x4c
0xba8f1e24: vn_ioctl+0xc0
0xba8f1eec: sys_ioctl+0x420
0xba8f1fac: syscall+0x12c
crash> 

>How-To-Repeat:
	Mount a piface2 board on a Raspberry Pi 2 Model B Rev 1.1. run NetBSD >= 9.0 and
	a gpio polling program getting and setting gpio values and a parallel program fetching
	gpio values via gpioctl, 
>Fix:
	find the state handling issue?
	workaround: add an emergency timeout after waiting a while?

Prev by Date: Re: port-evbarm/55504: evbarm-earmv7hf testbed hangs during sbin/ifconfig/t_repeated_updown test
Next by Date: Re: kern/40462 (bnx0: Double mbuf allocation failure!)
Previous by Thread: port-arm/55505: RaspberryPi3A+ can't find Wi-Fi module,
Next by Thread: Re: kern/44570 (brazilian keyboard layout)
Indexes:

Home | Main Index | Thread Index | Old Index