tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Basesystem programs redefine routine symbols from libc



On 09.12.2017 19:15, Robert Elz wrote:
>     Date:        Sat, 9 Dec 2017 15:46:42 +0100
>     From:        Kamil Rytarowski <n54%gmx.com@localhost>
>     Message-ID:  <d69bba40-7068-096a-4333-863800c10fe6%gmx.com@localhost>
> 
>   | However there exist programs in the basesystem that shadow libc
>   | symbol routines as well,
> 
> There is nothing wrong with that, in fact it is almost unavoidable,
> as programs need names to use, and libraries need names for functions
> they add later, and it is inevitable that they will clash from time to
> time.
> 
>   | for example ps(1):
>   | 
>   | bin/ps/extern.h:void    uname(struct pinfo *, VARENT *, enum mode);
> 
> I suspect that the BSD ps command has had a uname() function since long
> before the Sys V (or Sys III or wherever it originated) was added to the
> BSD libc - this is a perfect example.
> 
> To handle this kind of issue, the libc functions only get to be defined
> when the relevant header file is included, in this case <sys/utsname.h>
> which ps does not do, hence, it is perfectly entitled to have a function
> called uname if it wants, or a "struct utsname" if it really wanted to
> be perverse.
> 
>   | I'm going to rename the symbol routine names when I will hit them.
> 
> There is nothing inherently wrong with that - they are just names after
> all, but it is the wrong solution, and one that would have no end.
> 
> There could easily be a "usrname()" function added to libc next week,
> and the sanitizers could learn about it the week after, and then you're
> back with the exact same problem.
> 
> The right way is for the sanitizers to learn which headers define the
> symbols that they want to take over, and only do that when the appropriate
> header is included (one way to do that would be to define shadow headers,
> so LLVM could define a sys/utsname.h and arrange for that one to be found
> ahead of /usr/include/sys/utsname.h when compiling.  Then that header does
> the magic needed to get the LLVM version of uname() - otherwise it simply
> does nothing with a function called uname() if the program happens to have one.
> 
> And the same for all the other symbols that it feels the need to take over
> from libc (or other libraries.)
> 
> Whether that's done with actual new header files, or simply by recognising
> the system headers being included and then adding the appropriate magic
> only in those cases when it observes the system header being included is
> just an implementation detail.
> 
> kre
> 

The problem is not on the header files (preprocessor), but on the linker
level.

We are linking prebuilt .a / .so files with a target application.

$ nm
/usr/local/lib/clang/6.0.0/lib/netbsd/libclang_rt.msan-x86_64.a|grep uname
0000000000000000 B _ZN14__interception10real_unameE
0000000000000000 T __interceptor_uname
0000000000000000 T uname

We are intercepting uname(3) because behind the scenes it's a syscall
and we need to hardcode sanitizing rules (length of a field that is
being initialized).

INTERCEPTOR(int, uname, struct utsname *utsname) {
  ENSURE_MSAN_INITED();
  int res = REAL(uname)(utsname);
  if (!res)
    __msan_unpoison(utsname, __sanitizer::struct_utsname_sz);
  return res;
}

In the MSan case we mark the utsname pointer as initialized.

The impact for basesystem utilities is rather low so far (in sh(1) there
are 0 symbol clashes, in ksh(1) there is 1 clash) and appears to be the
least intrusive workaround.

I agree that this is not perfect, but I'm not aware about a better
solution that does not introduce redesign&rewrite of the sanitizers.

Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index