NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/58898: Crash when testing camera in chromium
The following reply was made to PR kern/58898; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Robert Bagdan <kikadf.01%gmail.com@localhost>
Cc: gnats-bugs%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/58898: Crash when testing camera in chromium
Date: Thu, 12 Dec 2024 19:21:55 +0000
> Date: Thu, 12 Dec 2024 18:46:43 +0100
> From: Robert Bagdan <kikadf.01%gmail.com@localhost>
>=20
> I don't have changes in NetBSD tree, I'm on the 10.0 branch, upgraded
> with sysupgrade auto. Here is my /etc/release relevant part:
> Build information:
> Build date Fri Dec 6 20:38:43 UTC 2024
> Built by builder%localhost.NetBSD.org@localhost
> Build ID 202412081740Z
>=20
> Because of the gdb output, I'm just clone the NetBSD src from github,
> this is the git log output:
> commit 313a9eb5b1b8b899e5611e78063c2949affc811d (HEAD -> netbsd-10,
> origin/netbsd-10)
> Author: snj <snj%NetBSD.org@localhost>
> Date: Fri Dec 6 20:38:43 2024 +0000
>=20
> tickets 1024-1026
Thanks, glad to confirm that! It was suspicious because the stack
trace in gdb doesn't make sense and gdb warned about source newer than
object file.
> Some dates and sizes:
> /var/crash $ ls -l netbsd.4*
> -rw------- 1 root wheel 2428478 Dec 9 13:02 netbsd.4
> -rw------- 1 root wheel 583708184 Dec 9 13:02 netbsd.4.core
> ls -l /netbsd
> -rw-r--r-- 1 root wheel 29528376 Dec 9 12:48 /netbsd
>=20
> I share `dmesg -M netbsd.4.core -N netbsd.4 > dmesg.txt' in
> https://pastebin.com/raw/jWLjzQbw . This is all that I got.
>=20
> I run `gdb /netbsd --eval-command=3D"target kvm netbsd.4.core"', then
> your suggested gdb commands, I share the output in
> https://pastebin.com/raw/wQL0uN0z .
Thanks!
So it looks like it is here:
2479 /* dequeue all buffers */
2480 while (SIMPLEQ_FIRST(&vs->vs_ingress) !=3D NULL)
=3D> 2481 SIMPLEQ_REMOVE_HEAD(&vs->vs_ingress, entries);
2482 while (SIMPLEQ_FIRST(&vs->vs_egress) !=3D NULL)
2483 SIMPLEQ_REMOVE_HEAD(&vs->vs_egress, entries);
https://nxr.netbsd.org/xref/src/sys/dev/video.c?r=3D1.45#2481
Here's the disassembly:
0xffffffff80db14df <+166>: call 0xffffffff8023fac0 <mutex_enter>
0xffffffff80db14e4 <+171>: mov 0xa0(%rbx),%rax
0xffffffff80db14eb <+178>: lea 0xa0(%rbx),%rdx
0xffffffff80db14f2 <+185>: jmp 0xffffffff80db1504 <videoclose+203>
=3D> 0xffffffff80db14f4 <+187>: mov 0x8(%rax),%rax
0xffffffff80db14f8 <+191>: mov %rax,0xa0(%rbx)
0xffffffff80db14ff <+198>: test %rax,%rax
0xffffffff80db1502 <+201>: je 0xffffffff80db1578 <videoclose+319>
0xffffffff80db1504 <+203>: test %rax,%rax
0xffffffff80db1507 <+206>: jne 0xffffffff80db14f4 <videoclose+187>
...
0xffffffff80db1578 <+319>: mov %rdx,0xa8(%rbx)
0xffffffff80db157f <+326>: jmp 0xffffffff80db1509 <videoclose+208>
(The DPRINTF has presumably been compiled away, so there's no `if
(vs->vs_streaming)' branch in the code.)
We have:
(gdb) print &((struct video_softc *)0)->sc_stream_in.vs_ingress
$2 =3D (struct sample_queue *) 0xa0
(gdb) print &((struct video_buffer *)0)->entries->sqe_next
$3 =3D (struct video_buffer **) 0x8
And the crash is:
[ 512.917593] trap type 4 code 0 rip 0xffffffff80db14f4 cs 0x8 rflags 0x1=
0206 cr2 0x7dbd48c2e600 ilevel 0 rsp 0xffffc18242975d40
[ 512.917593] curlwp 0xffffe53d60a3ca00 pid 3297.3302 lowest kstack 0xfff=
fc182429712c0
So one of the vs_ingress buffers has been corrupted: rax hold the
pointer to a struct video_buffer, and the CPU faulted at
mov 0x8(%rax),%rax
with cr2=3D0x7dbd48c2e600 as the fault address. That's not a kernel
virtual address so the kernel has no business dereferencing it (except
via copyout/copyin or similar).
Now this cr2 value is interesting, because according to gdb:
(gdb) print *(struct video_softc *)video_cd.cd_devs[0]->dv_private
$1 =3D {... sc_stream_in =3D {...
... vs_ingress =3D {sqh_first =3D 0x5fffffe53d0009, ...}, ...}, ...}
That is _also_ not a kva, but it's not the one in cr2!
Now I checked the ioctls VIDIOC_ENUM_FRAMESIZES and
VIDIOC_ENUM_FRAMEINTERVALS, and they don't look like they do anything
interesting. But videoread looks racy:
1826 mutex_enter(&vs->vs_lock);
...
1845 vb =3D SIMPLEQ_FIRST(&vs->vs_egress);
1846 } else {
1847 vb =3D SIMPLEQ_FIRST(&vs->vs_egress);
1848 }
...
1858 mutex_exit(&vs->vs_lock);
1859=20
1860 len =3D uimin(uio->uio_resid, vb->vb_buf->bytesused - vs->vs_bytes=
read);
1861 offset =3D vb->vb_buf->m.offset + vs->vs_bytesread;
1862=20
1863 if (scatter_io_init(&vs->vs_data, offset, len, &sio)) {
1864 err =3D scatter_io_uiomove(&sio, uio);
1865 if (err =3D=3D EFAULT)
1866 return EFAULT;
1867 vs->vs_bytesread +=3D (len - sio.sio_resid);
...
1874 if (vs->vs_bytesread >=3D vb->vb_buf->bytesused) {
1875 mutex_enter(&vs->vs_lock);
1876 vb =3D video_stream_dequeue(vs);
1877 video_stream_enqueue(vs, vb);
1878 mutex_exit(&vs->vs_lock);
https://nxr.netbsd.org/xref/src/sys/dev/video.c?r=3D1.45#1826
As far as I can tell, there is nothing preventing two threads from
entering videoread concurrently and then handling the _same_ vb from
the egress queue. I don't see an obvious limit to how much damage
could be caused by that, since it controls uiomove pointers and
lengths.
It is important to drop the lock across uiomove if that can trigger
swapping, e.g. in copyin/copyout in case the user's pages are swapped
out, which requires blocking on I/O which can be indefinite. But it
is also critical for access to each vb here to be serialized. Maybe
there should be a vs_iobusy `lock' taken and released by videoread,
handled under vs_lock and signalled by vs_read_cv or something, so
only one videoread can be running at a time.
This doesn't necessarily explain your crash -- I don't know whether
Chromium tries to read from the same /dev/video instance in more than
one thread -- but it's definitely a bug that we need to fix.
Home |
Main Index |
Thread Index |
Old Index