kern/50481: races in sys_lseek

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/50481: races in sys_lseek
From: dholland%netbsd.org@localhost
Date: Fri, 27 Nov 2015 02:15:00 +0000 (UTC)

>Number:         50481
>Category:       kern
>Synopsis:       races in sys_lseek
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 27 02:15:00 +0000 2015
>Originator:     David A. Holland
>Release:        NetBSD 7.99.21 (20151118)
>Organization:
>Environment:
System: NetBSD valkyrie 7.99.1 NetBSD 7.99.1 (VALKYRIE) #17: Wed Oct 14 03:21:03 EDT 2015  dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
Architecture: x86_64
Machine: amd64
>Description:

sys_lseek has at least two races in it:

(1) it does not take fp->f_lock when updating fp->f_offset, so in
addition to being formally wrong, on 32-bit platforms another user of
the same file might see a torn write.

(2) when doing SEEK_END, it locks the vnode only immediately around
the call to VOP_GETATTR, so that a write from another process can
happen in between the getattr and the point at which the new offset is
installed. This results in nonsensical behavior in at least the
following case, supposing the file size starts at 128 and the seek
position starts at EOF, that is, 128:

        process 1                       process 2
        ----------                      ---------
        lseek(fd, 0, SEEK_END);
          vn_lock()
          VOP_GETATTR -> length 128
          vn_unlock()
                                        write(fd, buf, 128);
                                          vn_write()
                                            vn_lock()
                                            fetch offset (128)
                                            VOP_WRITE
                                            update offset (now 256)
                                            vn_unlock()
          update offset (now 128)

so afterwards the file size is 256 but the seek position is 128. This
should not be possible: if the lseek happens completely before or
completely after the write it has no effect and both the size and the
seek position will still be the same.

Note that since vn_write uses the vnode lock to protect the change of
offset, and doesn't apparently touch f_lock at all, lseek must also
use the vnode lock; holding f_lock is not sufficient.

(and therefore it is not necessary, and issue #1 becomes more or less
moot...)

Note that there appear to be other places where the locking for
f_offset is not consistent.

>How-To-Repeat:
code reading

>Fix:

Something like this (untested):

diff -r 61c7b392f54d sys/kern/vfs_syscalls.c
--- a/sys/kern/vfs_syscalls.c	Thu Nov 26 20:27:12 2015 -0500
+++ b/sys/kern/vfs_syscalls.c	Thu Nov 26 21:08:40 2015 -0500
@@ -2762,15 +2762,16 @@
 		goto out;
 	}
 
+	vn_lock(vp, LK_SHARED | LK_RETRY);
+
 	switch (SCARG(uap, whence)) {
 	case SEEK_CUR:
 		newoff = fp->f_offset + SCARG(uap, offset);
 		break;
 	case SEEK_END:
-		vn_lock(vp, LK_SHARED | LK_RETRY);
 		error = VOP_GETATTR(vp, &vattr, cred);
-		VOP_UNLOCK(vp);
 		if (error) {
+			VOP_UNLOCK(vp);
 			goto out;
 		}
 		newoff = SCARG(uap, offset) + vattr.va_size;
@@ -2780,8 +2781,10 @@
 		break;
 	default:
 		error = EINVAL;
+		VOP_UNLOCK(vp);
 		goto out;
 	}
+	VOP_UNLOCK(vp);
 	if ((error = VOP_SEEK(vp, fp->f_offset, newoff, cred)) == 0) {
 		*(off_t *)retval = fp->f_offset = newoff;
 	}

Prev by Date: re: xsrc/50429: Xserver fails to start on Ultra-5 andNetBSD/sparc647.0
Next by Date: Re: xsrc/50429: Xserver fails to start on Ultra-5 andNetBSD/sparc647.0
Previous by Thread: kern/50480: posix_spawn vs. fork code duplication
Next by Thread: kern/50479: Forgotten new flag for uvm_default_mapaddr function.
Indexes:

Home | Main Index | Thread Index | Old Index