On 09.12.2017 19:15, Robert Elz wrote:
> Date: Sat, 9 Dec 2017 15:46:42 +0100
> From: Kamil Rytarowski <n54%gmx.com@localhost>
> Message-ID: <d69bba40-7068-096a-4333-863800c10fe6%gmx.com@localhost>
>
> | However there exist programs in the basesystem that shadow libc
> | symbol routines as well,
>
> There is nothing wrong with that, in fact it is almost unavoidable,
> as programs need names to use, and libraries need names for functions
> they add later, and it is inevitable that they will clash from time to
> time.
>
> | for example ps(1):
> |
> | bin/ps/extern.h:void uname(struct pinfo *, VARENT *, enum mode);
>
> I suspect that the BSD ps command has had a uname() function since long
> before the Sys V (or Sys III or wherever it originated) was added to the
> BSD libc - this is a perfect example.
>
> To handle this kind of issue, the libc functions only get to be defined
> when the relevant header file is included, in this case <sys/utsname.h>
> which ps does not do, hence, it is perfectly entitled to have a function
> called uname if it wants, or a "struct utsname" if it really wanted to
> be perverse.
>
> | I'm going to rename the symbol routine names when I will hit them.
>
> There is nothing inherently wrong with that - they are just names after
> all, but it is the wrong solution, and one that would have no end.
>
> There could easily be a "usrname()" function added to libc next week,
> and the sanitizers could learn about it the week after, and then you're
> back with the exact same problem.
>
> The right way is for the sanitizers to learn which headers define the
> symbols that they want to take over, and only do that when the appropriate
> header is included (one way to do that would be to define shadow headers,
> so LLVM could define a sys/utsname.h and arrange for that one to be found
> ahead of /usr/include/sys/utsname.h when compiling. Then that header does
> the magic needed to get the LLVM version of uname() - otherwise it simply
> does nothing with a function called uname() if the program happens to have one.
>
> And the same for all the other symbols that it feels the need to take over
> from libc (or other libraries.)
>
> Whether that's done with actual new header files, or simply by recognising
> the system headers being included and then adding the appropriate magic
> only in those cases when it observes the system header being included is
> just an implementation detail.
>
> kre
>
The problem is not on the header files (preprocessor), but on the linker
level.
We are linking prebuilt .a / .so files with a target application.
$ nm
/usr/local/lib/clang/6.0.0/lib/netbsd/libclang_rt.msan-x86_64.a|grep uname
0000000000000000 B _ZN14__interception10real_unameE
0000000000000000 T __interceptor_uname
0000000000000000 T uname
We are intercepting uname(3) because behind the scenes it's a syscall
and we need to hardcode sanitizing rules (length of a field that is
being initialized).
INTERCEPTOR(int, uname, struct utsname *utsname) {
ENSURE_MSAN_INITED();
int res = REAL(uname)(utsname);
if (!res)
__msan_unpoison(utsname, __sanitizer::struct_utsname_sz);
return res;
}
In the MSan case we mark the utsname pointer as initialized.
The impact for basesystem utilities is rather low so far (in sh(1) there
are 0 symbol clashes, in ksh(1) there is 1 clash) and appears to be the
least intrusive workaround.
I agree that this is not perfect, but I'm not aware about a better
solution that does not introduce redesign&rewrite of the sanitizers.
Attachment:
signature.asc
Description: OpenPGP digital signature