NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: install/49470
The following reply was made to PR port-amd64/49470; it has been noted by GNATS.
From: Andrius V <vezhlys%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: Taylor R Campbell <riastradh%netbsd.org@localhost>
Subject: Re: install/49470
Date: Fri, 16 Aug 2024 00:50:03 +0300
On Mon, Jul 22, 2024 at 9:45=E2=80=AFAM Andrius V <vezhlys%gmail.com@localhost> wrote=
:
>
> The following reply was made to PR port-amd64/49470; it has been noted by=
GNATS.
>
> From: Andrius V <vezhlys%gmail.com@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc:
> Subject: Re: install/49470
> Date: Mon, 22 Jul 2024 09:40:20 +0300
>
> Hi,
>
>
> Initial analysis:
> The issue seems to have started with switch to gcc 4.8
> (https://github.com/NetBSD/src/commit/ad33dd774c2fe8beb41c96d1d29aef4ebc=
e3f=3D
> 5cb
> and https://github.com/NetBSD/src/commit/f8008b9438a836d85ee0b14cb56ee82=
966=3D
> fd8216).
>
> It prints this log and reboots:
> booting hd0a:netbsd (howto 0xa0000)
> 22005524+593000+742296 [1003696+1091430+13604]=3D3D0x1846608
> [ 1.0000000] cpu_rng: via
> [ 1.0000000] pmap_kenter_pa: mapping already present
>
> It reboots on memcpy() call, I assume because both source and
> destination pointers are NULL (printed them out for testing):
> https://github.com/NetBSD/src/blob/ee4113b4927055cea72b04191634f65fc3bf3=
580=3D
> /sys/kern/subr_kcpuset.c#L351
> which in turn was called from pmap_tlb_shootdown() ->
> kcpuset_copy(ci->ci_tlb_cpuset, kcpuset_running)
> https://github.com/NetBSD/src/blob/ee4113b4927055cea72b04191634f65fc3bf3=
580=3D
> /sys/arch/x86/x86/x86_tlb.c#L289
> And finally shootdown was called from kenter funciton -
> pmap_kenter_pa() in the "/* This should not happen. */" path:
> https://github.com/NetBSD/src/blob/ee4113b4927055cea72b04191634f65fc3bf3=
580=3D
> /sys/arch/x86/x86/pmap.c#L1056
>
> This typically happens if arguments are passed to the boot command
> (e.g. boot netbsd or boot netbsd -vx), but boots properly without
> passing the kernel (e.g. boot or boot -vx), but it's not always the
> case for all my setups (it is an opposite in one setup, where system
> is installed in sd media (a bit older release)). If it boots, it
> doesn't enter the unexpected path in kenter function.
>
> It happens only if ACPI version is set to v3.0 in BIOS. It boots in
> all combinations I have tried if ACPI version is set to v2.0 or v1.0.
>
> I see it is likely at least two issues:
> No checks before memcpy() or kcpuset_copy() that values are null and I
> guess some additional asserts are needed here as a band-aid before one
> of these calls?
>
> Another issue is the root cause why this unintended path is taken,
> somehow related to ACPI version and boot command. Is it also some kind
> of undesirable optimization, maybe CPU bug or something else, I still
> need to understand.
>
> Regards,
> Andrius V
>
Hi,
After many additional debugging sessions and big help from Taylor I
narrowed down the issue to the bi_getmemmap() call, or more
specifically to getmementry(&i, buf)
(https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/biosmemx.S#87)
This method invokes BIOS call and fills the buffer[5] with defined
values. For some reason though, with ACPI 3.0 it actually seems to get
6 values,
thus leading stack corruption (overflow(?)). The sixth value is always
1, and is set after each and every call.
All other buf values [0-4] and number of iterations match completely
the ones when ACPI 2.0 is set.
Workaround can be allocating more memory to buf, for example setting
buf size to 6 (buf[6]):
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/bootinfo_memmap.c#4=
0
In more detailed flow:
Bootloader executes exec_netbsd()
it calls common_load_kernel()
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#495
it loads kernel file and sets marks[MARK_MAX] (MARK_MAX=3D6) values
(mainly in loadfile logic):
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#416
It calls bi_getmemmap();
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#436
It calls getmementry(&i, buf) multiple times which returns values
beyond 20 bytes allocated.
After return bi_getmemmap(); all marks[] values are corrupted in
common_load_kernel()
Wrong image_end value is set because of that
https://nxr.netbsd.org/xref/src/sys/arch/i386/stand/lib/exec.c#455
After return to exec_netbsd() most marks[] value become OK, except
sometimes marks[MARK_END], which still can be corrupted but closer to
original value.
Invalid image_end leads to wrong atdevbase and PDPpaddr in kernel,
leading to other failures if the value is high.
(which finally leads to instant reboot as described in previous emails)
(Sorry, if some terminology is incorrect)
What I don't know in this situation:
1. Why BIOS calls returns "sixth" value and why it is always 1. Is it
a BIOS bug or something else, I can't tell..
2. Not sure what would the correct way to fix the issue:
should getmementry() assembly change somehow prevent on exceeding
allocated space, even if BIOS returns more than necessary info?
some special kludge for this specific case?
or like workaround, give buf[] more memory "just in case"?
Something other options?
Advice is welcome.
Regards,
Andrius V
Home |
Main Index |
Thread Index |
Old Index