On 27.04.2019 09:16, Martin Husemann wrote: > On Fri, Apr 26, 2019 at 11:36:00PM +0200, Kamil Rytarowski wrote: >> We keep detecting that more software is happy to just pick -lpthread >> (like LLVM OpenMP) and prebuilt software works by an accident. > > That would be easily catchable in pkgsrc wrappers (i.e. remove all "-lpthread" > or similar args if no "-pthread" is there). We may even be able to come > up with some magic to make initialization of libpthread fail in that case. > This is not that simple as we need to pass -pthread to CFLAGS before linking. pkgsrc wrappers cannot catch scenarios when such flag is not passed to CFLAGS. Most projects use today -lpthread for the linking process only. Many C++ programs forget about -lpthread for threaded programs. Not many people (in toolchains!) find -D_REENTRANT relevant today. This triggered hardcoding reentrant variations in stdio.h for C++ back few years ago. (But it could be mandated by the standard as well) Existing build frameworks for using POSIX threads are not that feasible to use for the NetBSD case. E.g. in CMake we must use something like: set(CMAKE_THREAD_PREFER_PTHREAD TRUE) set(THREADS_PREFER_PTHREAD_FLAG TRUE) find_package(Threads REQUIRED) set(C_FLAGS "${OpenMP_C_FLAGS} ${CMAKE_THREAD_LIBS_INIT}") set(CXX_FLAGS "${OpenMP_CXX_FLAGS} ${CMAKE_THREAD_LIBS_INIT}") > Also forcing things to be multithreaded when compiling for sanitizers sounds > ok, so I do not really see the big problem you are trying to solve here. > > Or am I overlooking something? > This is more than sanitizers and more than pkgsrc. Sanitizers (MSan in particular) are just the more fragile here and explode early. We keep mutating external projects to use -pthread for compiling and linking and it's getting worse over time. As an example see: https://github.com/llvm-mirror/openmp/commit/cf0276b9ab4dba5d77a131fcf4061e5e2172d8eb LLVM is a good example as we must support building it out of the pkgsrc context, using the default CMake framework. We use it on the NetBSD buildbot. I don't know how does it look in meson, bazel or gyp and other more moder ones.. but I don't expect them to be more aware about corner case than CMake that is in use for 19 years now. > I would also be interested (as Thor said) in runtime differences, but not > on amd64 where they are likely unnoticable. What memory footprint / cache > differences will this cause on tiny machines? Most tiny appliances still > use very little (if at all) threaded code. > I didn't came up with tests in the beginning as the proposal is to make reentrant: 1. Utility functions feof(3), ferror(3), fileno(3), clearerr(3) that are used once/rarely at the end of a stream. 2. getc(3) fgetc(3) getchar(3) putchar(3) and variations of the same routines with unlocked() suffix that are performing only ONE character read or write. These functions are not bottlenecks in any performance critical circumstances. Once we want speed, there is better to go for fgets(3)/fputs(3). If we want more speed we go for fread(3)/fwrite(3). Such functions are not inlinable in our headers. The proposed change is only for the single character routines and they are used mainly in src/games and related software when we don't need extra speed for the simplicity sake. I made a quick benchmark with two tests: http://netbsd.org/~kamil/reentrant_stdio_benchmarks/reentrant_benchmarks.ods I. rw.c I/O copying 177MB from one file to another - original stdio.h 100% [real time] - new stdio.h 98,86% [real time] - new stdio.h and unlocked() functions used directly 96,23% [real time] In my tests this made new non-inlined functions.. quicker. This is probably the most real-life-like benchmarkable usage. (But I doubt that someone would perform 1 character reads for larger than page size buffers.) Once a buffer is cached in the kernel: - original stdio.h 100% [real time] - new stdio.h 147% [real time] - new stdio.h explicit unlocked() 122,76% [real time] There is overhead of going through libc call and not executing local inlined code. II. no I/O involved, raw fileno() calls This is an extreme example without I/O operations involved and the most non-real-life usage. - original stdio.h 100% [real time] - new stdio.h 370,29% [real time] - new stdio.h explicit unlocked() 156,96% [real time] Summary of mine from quick tests: - when performance matters relying on single character reads/writes is not a good idea, regardless of the underlying implementation - when feasibility to peek poke a single character, libc calls are slower unless we perform [any] I/O > Martin >
Attachment:
signature.asc
Description: OpenPGP digital signature