NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57259: ucom serial ports cannot be re-opened "too quickly" with O_NONBLOCK



The following reply was made to PR kern/57259; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: thorpej%me.com@localhost
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/57259: ucom serial ports cannot be re-opened "too quickly" with O_NONBLOCK
Date: Sat, 4 Mar 2023 22:15:31 +0000

 The problem you're encountering is unrelated to the d_cancel logic.
 By the time close(2) returns, sc_closing is set to false.  The logic
 you quoted is only applicable when open(2) and close(2) happen
 concurrently.  You could rip that logic back out if you want a crashy
 kernel that makes you twitch at the thought of removing a USB device
 or running two programs that touch a tty at the same time, but it
 wouldn't help with the problem at hand.
 
 The problem at hand is that a delay which used to happen in close(2)
 (with no opportunity for a nonblocking variant),
 
 	/*
 	 * Hang up if necessary.  Wait a bit, so the other side has time to
 	 * notice even if we immediately open the port again.
 	 */
 	if (ISSET(tp->t_cflag, HUPCL)) {
 		ucom_dtr(sc, 0);
 		/* XXX will only timeout */
 		(void) kpause(ttclos, false, hz, NULL);
 	}
 
 is now implemented by sleeping only for the _remainder_ of the 1sec
 delay in open(2):
 
 	/*
 	 * If the previous use had set DTR on close, wait up to one
 	 * second for the other side to notice we hung up.  After
 	 * sleeping, the tty may have been revoked, so restart the
 	 * whole operation.
 	 *
 	 * XXX The wchan is not ttclose but maybe should be.
 	 */
 	if (timerisset(&sc->sc_hup_time)) {
 		struct timeval now, delta;
 		int ms, ticks;
 
 		microuptime(&now);
 		if (timercmp(&now, &sc->sc_hup_time, <)) {
 			timersub(&sc->sc_hup_time, &now, &delta);
 			ms =3D MIN(INT_MAX - 1000, delta.tv_sec*1000);
 			ms +=3D howmany(delta.tv_usec, 1000);
 			ticks =3D MAX(1, MIN(INT_MAX, mstohz(ms)));
 			error =3D cv_timedwait_sig(&sc->sc_statecv, &sc->sc_lock,
 			    ticks);
 			mutex_exit(&sc->sc_lock);
 			return error ? error : ERESTART;
 		}
 	}
 
 When cv_timedwait_sig times out, it returns EWOULDBLOCK.  This is a
 little confusing, both for cv_timedwait_sig(9) (which needs to be
 replaced by a better API) and for open(2) on a tty (with or without
 O_NONBLOCK).  Perhaps it should map EWOULDBLOCK to ERESTART, or be
 moved back to the close path -- I don't have strong feelings about it.
 But this logic is not unique to ucom(4) -- it was first introduced in
 com(4) and plcom(4) a year and a half ago:
 
    Module Name:    src
    Committed By:   jmcneill
    Date:           Mon Oct 11 18:39:06 UTC 2021
 
    Modified Files:
            src/sys/dev/ic: com.c
 
    Log Message:
    com: speed up close with HUPCL set
 
    Instead of incurring a 1s penalty on close of a com device with HUPCL se=
 t,
    defer the sleep until the next open, and only sleep if necessary.
 
    This has a side effect of making `ttyflags -a` with a default install not
    pause for 1s for every non-console com device, which happens every boot
    via /etc/rc.d/ttys.
 
 https://mail-index.netbsd.org/source-changes/2021/10/11/msg132966.html
 
    Module Name:    src
    Committed By:   jmcneill
    Date:           Sun Oct 17 22:34:17 UTC 2021
 
    Modified Files:
            src/sys/arch/evbarm/dev: plcom.c plcomvar.h
 
    Log Message:
    plcom: speed up close with HUPCL set
 
    Instead of incurring a 1s penalty on close of a plcom device with HUPCL =
 set,
    defer the sleep until the next open, and only sleep if necessary.
 
 https://mail-index.netbsd.org/source-changes/2021/10/17/msg133107.html
 
 I later later copied it in ucom(4) while doing other renovations to
 make ucom(4) safe to yank concurrently with open/close and I/O:
 
    Module Name:    src
    Committed By:   riastradh
    Date:           Mon Mar 28 12:42:37 UTC 2022
 
    Modified Files:
            src/sys/dev/usb: ucom.c
 
    Log Message:
    ucom(4): Rework open/close/attach/detach logic.
 
    - Defer sleep after hangup until open.
 
      No need to make close hang; we just need to make sure some time has
      passed before we next try to open.
 
      This changes the wchan for the sleep.  Oh well.
 
    - Use .d_cfdriver/devtounit/cancel to resolve races between attach,
      detach, open, close, and revoke.
 
    - Use a separate .sc_closing flag instead of a UCOM_CLOSING state.
      ucomcancel/ucomclose owns this flag, and it may be set in any state
      (except UCOM_DEAD).  UCOM_OPENING remains owned by ucomopen, which
      might be interrupted by cancel/close.
 
    - Rework error branches in ucomopen.  Much simpler this way.
 
    - Nix unnecessary reference counting.
 
 https://mail-index.netbsd.org/source-changes/2022/03/28/msg137744.html
 
 If you want to move the sleep back to close(2), I have no objection,
 but it would be nice to find another way to avoid the unnecessary
 boot-time delay that jmcneill had made the change to avoid.
 
 If you want more information about the d_cancel/close split and why it
 is so important for making things safe to yank and/or revoke
 concurrently with open/close, I gave a talk last year at EuroBSDcon
 about it:
 
 https://www.NetBSD.org/gallery/presentations/riastradh/eurobsdcon2022/opend=
 etach.pdf
 
 (I acknowledge the medium is suboptimal for reference purposes, in
 contrast to, say, fixing everything in the driver(9) man page, but my
 round tuit supply for rototilling driver(9) has run short and there
 have been other more pressing issues in netbsd-10 which have been
 consuming my spoons.)
 
 There is a large class of bugs in tty drivers, leading to potential
 deadlocks and/or crashes if open and close happen concurrently, that
 could likely be eliminated by just by setting .d_cancel =3D ttycancel in
 the cdevsw.  I haven't committed that change mostly because I don't
 have hardware for anything other than com(4), plcom(4), and ucom(4),
 and I got sidetracked by trying to make tty locking less of a
 catastrophe -- which ideally would also allow us to remove the code
 you quoted and leave ucom(4) with just .d_cancel =3D ttycancel.
 


Home | Main Index | Thread Index | Old Index