kern/45129: Write handling in puffs(4) broken

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/45129: Write handling in puffs(4) broken
From: tron%zhadum.org.uk@localhost
Date: Sat, 9 Jul 2011 23:30:01 +0000 (UTC)

>Number:         45129
>Category:       kern
>Synopsis:       Write handling in puffs(4) broken
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 09 23:30:00 +0000 2011
>Originator:     tron%zhadum.org.uk@localhost
>Release:        NetBSD 5.99.54 2011-07-09 sources
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD lyssa.zhadum.org.uk 5.99.54 NetBSD 5.99.54 (LYSSA) #1: Sat Jul 9 
20:32:01 BST 2011 tron%lyssa.zhadum.org.uk@localhost:/src/sys/compile/LYSSA i386
Architecture: i386
Machine: i386
>Description:
Writing to "fuse-ext2" ("pkgsrc/filesystems/fuse-ext2") works fine
under NetBSD/amd64 5.1_STABLE but fails with a "Protocol Error"
under NetBSD/i385 5.99.54.

Using debugging output I found out that the EPROTO error is return by this
bit of code in puffs_vnop_write() in "src/sys/fs/puffs/puffs_vnops.c":

                        if (write_msg->pvnr_resid > tomove) {
                                puffs_senderr(pmp, PUFFS_ERR_WRITE,
                                    E2BIG, "resid grew", VPTOPNC(ap->a_vp));
                                error = EPROTO;
                                break;
                        }

With more debugging output I found out that write_msg->pvnr_resid
gets clobbered by this call:

                        PUFFS_MSG_ENQUEUEWAIT2(pmp, park_write, vp->v_data,
                            NULL, error);

PUFFS_MSG_ENQUEUEWAIT2() is defined as follows:

#define PUFFS_MSG_ENQUEUEWAIT2(pmp, park, vp1, vp2, var)                \
do {                                                                    \
        puffs_msg_enqueue(pmp, park);                                   \
        var = puffs_msg_wait2(pmp, park, vp1, vp2);                     \
} while (/*CONSTCOND*/0)

I've changed the code in puffs_vnop_write() to call both functions
seperately and found out that it is puffs_msg_wait2() which clobbers
write_msg->pvnr_resid. puffs_msg_wait2() is relatively short
function that calls puffs_msg_wait() and never writes to park_write->preq
(which points to write_msg). So I guess the bug is in puffs_msg_wait().

Here are the difference to this function between "netbsd-5" (which
works) and HEAD (which doesn't work):

--- old 2011-07-10 00:10:04.000000000 +0100
+++ new 2011-07-10 00:09:43.000000000 +0100
@@ -1,23 +1,35 @@
 int
 puffs_msg_wait(struct puffs_mount *pmp, struct puffs_msgpark *park)
 {
+       lwp_t *l = curlwp;
+       proc_t *p = l->l_proc;
        struct puffs_req *preq = park->park_preq; /* XXX: hmmm */
-       struct mount *mp = PMPTOMP(pmp);
+       sigset_t ss;
+       sigset_t oss;
        int error = 0;
        int rv;
 
+       /*
+        * block unimportant signals.
+        *
+        * The set of "important" signals here was chosen to be same as
+        * nfs interruptible mount.
+        */
+       sigfillset(&ss);
+       sigdelset(&ss, SIGINT);
+       sigdelset(&ss, SIGTERM);
+       sigdelset(&ss, SIGKILL);
+       sigdelset(&ss, SIGHUP);
+       sigdelset(&ss, SIGQUIT);
+       mutex_enter(p->p_lock);
+       sigprocmask1(l, SIG_BLOCK, &ss, &oss);
+       mutex_exit(p->p_lock);
+
        mutex_enter(&pmp->pmp_lock);
        puffs_mp_reference(pmp);
        mutex_exit(&pmp->pmp_lock);
 
        mutex_enter(&park->park_mtx);
-       if ((park->park_flags & PARKFLAG_WANTREPLY) == 0
-           || (park->park_flags & PARKFLAG_CALL)) {
-               mutex_exit(&park->park_mtx);
-               rv = 0;
-               goto skipwait;
-       }
-
        /* did the response beat us to the wait? */
        if (__predict_false((park->park_flags & PARKFLAG_DONE)
            || (park->park_flags & PARKFLAG_HASERROR))) {
@@ -26,6 +38,13 @@
                goto skipwait;
        }
 
+       if ((park->park_flags & PARKFLAG_WANTREPLY) == 0
+           || (park->park_flags & PARKFLAG_CALL)) {
+               mutex_exit(&park->park_mtx);
+               rv = 0;
+               goto skipwait;
+       }
+
        error = cv_wait_sig(&park->park_cv, &park->park_mtx);
        DPRINTF(("puffs_touser: waiter for %p woke up with %d\n",
            park, error));
@@ -74,24 +93,14 @@
                mutex_exit(&park->park_mtx);
        }
 
-       /*
-        * retake the lock and release.  This makes sure (haha,
-        * I'm humorous) that we don't process the same vnode in
-        * multiple threads due to the locks hacks we have in
-        * puffs_lock().  In reality this is well protected by
-        * the biglock, but once that's gone, well, hopefully
-        * this will be fixed for real.  (and when you read this
-        * comment in 2017 and subsequently barf, my condolences ;).
-        */
-       if (rv == 0 && !fstrans_is_owner(mp)) {
-               fstrans_start(mp, FSTRANS_NORMAL);
-               fstrans_done(mp);
-       }
-
  skipwait:
        mutex_enter(&pmp->pmp_lock);
        puffs_mp_release(pmp);
        mutex_exit(&pmp->pmp_lock);
 
+       mutex_enter(p->p_lock);
+       sigprocmask1(l, SIG_SETMASK, &oss, NULL);
+       mutex_exit(p->p_lock);
+
        return rv;
 }

I can unfortunately not find the bug in it.

>How-To-Repeat:
1.) Mount "fuse-ext2".
2.) Try to copy a file to it. You'll get a "Protocol Error" message
    from "cp".

>Fix:
Not known.

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: bin/45130: /etc/locate.conf cannot deal with pathnames containing spaces
Previous by Thread: Re: bin/45116 (/usr/bin/ftp fails to parse Content-Range header)
Next by Thread: Re: kern/45129: Write handling in puffs(4) broken
Indexes:

Home | Main Index | Thread Index | Old Index