tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Patch to make <stdio.h> reentrant by default

On 27.04.2019 09:16, Martin Husemann wrote:
> On Fri, Apr 26, 2019 at 11:36:00PM +0200, Kamil Rytarowski wrote:
>> We keep detecting that more software is happy to just pick -lpthread
>> (like LLVM OpenMP) and prebuilt software works by an accident.
> That would be easily catchable in pkgsrc wrappers (i.e. remove all "-lpthread"
> or similar args if no "-pthread" is there). We may even be able to come
> up with some magic to make initialization of libpthread fail in that case.

This is not that simple as we need to pass -pthread to CFLAGS before
linking. pkgsrc wrappers cannot catch scenarios when such flag is not
passed to CFLAGS.

Most projects use today -lpthread for the linking process only. Many C++
programs forget about -lpthread for threaded programs. Not many people
(in toolchains!) find -D_REENTRANT relevant today. This triggered
hardcoding reentrant variations in stdio.h for C++ back few years ago.
(But it could be mandated by the standard as well)

Existing build frameworks for using POSIX threads are not that feasible
to use for the NetBSD case. E.g. in CMake we must use something like:

find_package(Threads REQUIRED)


> Also forcing things to be multithreaded when compiling for sanitizers sounds
> ok, so I do not really see the big problem you are trying to solve here.
> Or am I overlooking something?

This is more than sanitizers and more than pkgsrc. Sanitizers (MSan in
particular) are just the more fragile here and explode early.

We keep mutating external projects to use -pthread for compiling and
linking and it's getting worse over time.

As an example see:

LLVM is a good example as we must support building it out of the pkgsrc
context, using the default CMake framework. We use it on the NetBSD

I don't know how does it look in meson, bazel or gyp and other more
moder ones.. but I don't expect them to be more aware about corner case
than CMake that is in use for 19 years now.

> I would also be interested (as Thor said) in runtime differences, but not
> on amd64 where they are likely unnoticable. What memory footprint / cache
> differences will this cause on tiny machines? Most tiny appliances still
> use very little (if at all) threaded code.

I didn't came up with tests in the beginning as the proposal is to make

1. Utility functions feof(3), ferror(3), fileno(3), clearerr(3) that are
used once/rarely at the end of a stream.

2. getc(3) fgetc(3) getchar(3) putchar(3) and variations of the same
routines with unlocked() suffix that are performing only ONE character
read or write.

These functions are not bottlenecks in any performance critical
circumstances. Once we want speed, there is better to go for
fgets(3)/fputs(3). If we want more speed we go for fread(3)/fwrite(3).
Such functions are not inlinable in our headers.

The proposed change is only for the single character routines and they
are used mainly in src/games and related software when we don't need
extra speed for the simplicity sake.

I made a quick benchmark with two tests:

I. rw.c I/O copying 177MB from one file to another

 - original stdio.h 100% [real time]
 - new stdio.h 98,86% [real time]
 - new stdio.h and unlocked() functions used directly 96,23% [real time]

In my tests this made new non-inlined functions.. quicker.

This is probably the most real-life-like benchmarkable usage. (But I
doubt that someone would perform 1 character reads for larger than page
size buffers.)

Once a buffer is cached in the kernel:
 - original stdio.h 100% [real time]
 - new stdio.h 147% [real time]
 - new stdio.h explicit unlocked() 122,76% [real time]

There is overhead of going through libc call and not executing local
inlined code.

II. no I/O involved, raw fileno() calls

This is an extreme example without I/O operations involved and the most
non-real-life usage.

 - original stdio.h 100% [real time]
 - new stdio.h 370,29% [real time]
 - new stdio.h explicit unlocked() 156,96% [real time]

Summary of mine from quick tests:

 - when performance matters relying on single character reads/writes is
not a good idea, regardless of the underlying implementation

 - when feasibility to peek poke a single character, libc calls are
slower unless we perform [any] I/O

> Martin

Attachment: signature.asc
Description: OpenPGP digital signature

Home | Main Index | Thread Index | Old Index