Re: kern/58898: Crash when testing camera in chromium

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,kikadf.01%gmail.com@localhost
Subject: Re: kern/58898: Crash when testing camera in chromium
From: "Taylor R Campbell via gnats" <gnats-admin%NetBSD.org@localhost>
Date: Thu, 12 Dec 2024 19:25:01 +0000 (UTC)

The following reply was made to PR kern/58898; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Robert Bagdan <kikadf.01%gmail.com@localhost>
Cc: gnats-bugs%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/58898: Crash when testing camera in chromium
Date: Thu, 12 Dec 2024 19:21:55 +0000

 > Date: Thu, 12 Dec 2024 18:46:43 +0100
 > From: Robert Bagdan <kikadf.01%gmail.com@localhost>
 >=20
 > I don't have changes in NetBSD tree, I'm on the 10.0 branch, upgraded
 > with sysupgrade auto. Here is my /etc/release relevant part:
 > Build information:
 >          Build date   Fri Dec  6 20:38:43 UTC 2024
 >            Built by   builder%localhost.NetBSD.org@localhost
 >            Build ID   202412081740Z
 >=20
 > Because of the gdb output, I'm just clone the NetBSD src from github,
 > this is the git log output:
 > commit 313a9eb5b1b8b899e5611e78063c2949affc811d (HEAD -> netbsd-10,
 > origin/netbsd-10)
 > Author: snj <snj%NetBSD.org@localhost>
 > Date:   Fri Dec 6 20:38:43 2024 +0000
 >=20
 >    tickets 1024-1026
 
 Thanks, glad to confirm that!  It was suspicious because the stack
 trace in gdb doesn't make sense and gdb warned about source newer than
 object file.
 
 > Some dates and sizes:
 > /var/crash $ ls -l netbsd.4*
 > -rw-------  1 root  wheel    2428478 Dec  9 13:02 netbsd.4
 > -rw-------  1 root  wheel  583708184 Dec  9 13:02 netbsd.4.core
 > ls -l /netbsd
 > -rw-r--r--  1 root  wheel  29528376 Dec  9 12:48 /netbsd
 >=20
 > I share `dmesg -M netbsd.4.core -N netbsd.4 > dmesg.txt' in
 > https://pastebin.com/raw/jWLjzQbw . This is all that I got.
 >=20
 > I run `gdb /netbsd --eval-command=3D"target kvm netbsd.4.core"', then
 > your suggested gdb commands, I share the output in
 > https://pastebin.com/raw/wQL0uN0z .
 
 Thanks!
 
 So it looks like it is here:
 
    2479 	/* dequeue all buffers */
    2480 	while (SIMPLEQ_FIRST(&vs->vs_ingress) !=3D NULL)
 =3D> 2481 		SIMPLEQ_REMOVE_HEAD(&vs->vs_ingress, entries);
    2482 	while (SIMPLEQ_FIRST(&vs->vs_egress) !=3D NULL)
    2483 		SIMPLEQ_REMOVE_HEAD(&vs->vs_egress, entries);
 
 https://nxr.netbsd.org/xref/src/sys/dev/video.c?r=3D1.45#2481
 
 Here's the disassembly:
 
    0xffffffff80db14df <+166>:   call   0xffffffff8023fac0 <mutex_enter>
    0xffffffff80db14e4 <+171>:   mov    0xa0(%rbx),%rax
    0xffffffff80db14eb <+178>:   lea    0xa0(%rbx),%rdx
    0xffffffff80db14f2 <+185>:   jmp    0xffffffff80db1504 <videoclose+203>
 =3D> 0xffffffff80db14f4 <+187>:   mov    0x8(%rax),%rax
    0xffffffff80db14f8 <+191>:   mov    %rax,0xa0(%rbx)
    0xffffffff80db14ff <+198>:   test   %rax,%rax
    0xffffffff80db1502 <+201>:   je     0xffffffff80db1578 <videoclose+319>
    0xffffffff80db1504 <+203>:   test   %rax,%rax
    0xffffffff80db1507 <+206>:   jne    0xffffffff80db14f4 <videoclose+187>
 ...
    0xffffffff80db1578 <+319>:   mov    %rdx,0xa8(%rbx)
    0xffffffff80db157f <+326>:   jmp    0xffffffff80db1509 <videoclose+208>
 
 (The DPRINTF has presumably been compiled away, so there's no `if
 (vs->vs_streaming)' branch in the code.)
 
 We have:
 
 (gdb) print &((struct video_softc *)0)->sc_stream_in.vs_ingress
 $2 =3D (struct sample_queue *) 0xa0
 (gdb) print &((struct video_buffer *)0)->entries->sqe_next
 $3 =3D (struct video_buffer **) 0x8
 
 And the crash is:
 
 [   512.917593] trap type 4 code 0 rip 0xffffffff80db14f4 cs 0x8 rflags 0x1=
 0206 cr2 0x7dbd48c2e600 ilevel 0 rsp 0xffffc18242975d40
 [   512.917593] curlwp 0xffffe53d60a3ca00 pid 3297.3302 lowest kstack 0xfff=
 fc182429712c0
 
 So one of the vs_ingress buffers has been corrupted: rax hold the
 pointer to a struct video_buffer, and the CPU faulted at
 
 	mov	0x8(%rax),%rax
 
 with cr2=3D0x7dbd48c2e600 as the fault address.  That's not a kernel
 virtual address so the kernel has no business dereferencing it (except
 via copyout/copyin or similar).
 
 Now this cr2 value is interesting, because according to gdb:
 
 (gdb) print *(struct video_softc *)video_cd.cd_devs[0]->dv_private
 $1 =3D {... sc_stream_in =3D {...
       ... vs_ingress =3D {sqh_first =3D 0x5fffffe53d0009, ...}, ...}, ...}
 
 That is _also_ not a kva, but it's not the one in cr2!
 
 
 Now I checked the ioctls VIDIOC_ENUM_FRAMESIZES and
 VIDIOC_ENUM_FRAMEINTERVALS, and they don't look like they do anything
 interesting.  But videoread looks racy:
 
    1826 	mutex_enter(&vs->vs_lock);
 ...
    1845 		vb =3D SIMPLEQ_FIRST(&vs->vs_egress);
    1846 	} else {
    1847 	        vb =3D SIMPLEQ_FIRST(&vs->vs_egress);
    1848 	}
 ...
    1858 	mutex_exit(&vs->vs_lock);
    1859=20
    1860 	len =3D uimin(uio->uio_resid, vb->vb_buf->bytesused - vs->vs_bytes=
 read);
    1861 	offset =3D vb->vb_buf->m.offset + vs->vs_bytesread;
    1862=20
    1863 	if (scatter_io_init(&vs->vs_data, offset, len, &sio)) {
    1864 		err =3D scatter_io_uiomove(&sio, uio);
    1865 		if (err =3D=3D EFAULT)
    1866 			return EFAULT;
    1867 		vs->vs_bytesread +=3D (len - sio.sio_resid);
 ...
    1874 	if (vs->vs_bytesread >=3D vb->vb_buf->bytesused) {
    1875 		mutex_enter(&vs->vs_lock);
    1876 		vb =3D video_stream_dequeue(vs);
    1877 		video_stream_enqueue(vs, vb);
    1878 		mutex_exit(&vs->vs_lock);
 
 https://nxr.netbsd.org/xref/src/sys/dev/video.c?r=3D1.45#1826
 
 As far as I can tell, there is nothing preventing two threads from
 entering videoread concurrently and then handling the _same_ vb from
 the egress queue.  I don't see an obvious limit to how much damage
 could be caused by that, since it controls uiomove pointers and
 lengths.
 
 It is important to drop the lock across uiomove if that can trigger
 swapping, e.g. in copyin/copyout in case the user's pages are swapped
 out, which requires blocking on I/O which can be indefinite.  But it
 is also critical for access to each vb here to be serialized.  Maybe
 there should be a vs_iobusy `lock' taken and released by videoread,
 handled under vs_lock and signalled by vs_read_cv or something, so
 only one videoread can be running at a time.
 
 This doesn't necessarily explain your crash -- I don't know whether
 Chromium tries to read from the same /dev/video instance in more than
 one thread -- but it's definitely a bug that we need to fix.

Prev by Date: Re: kern/58898: Crash when testing camera in chromium
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: kern/58898: Crash when testing camera in chromium
Next by Thread: Re: kern/58898: Crash when testing camera in chromium
Indexes:

Home | Main Index | Thread Index | Old Index