tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD truss(1), coredumper(1) and performance bottlenecks



On 24.05.2019 17:09, Michael van Elst wrote:
> On Fri, May 24, 2019 at 10:17:54AM +0200, Kamil Rytarowski wrote:
> 
>> Shouldn't that be optimized with libc functions? It calls read(2) for
>> each character.
> 
> The input might be read by shell and programs launched by the shell.
> For files you can read-ahead and seek back, but for pipes you can
> only read single bytes.
> 

As far as I'm aware we can use read(2) and write(2) in pipes with longer
transfers than 1 byte.

But the real question here is what is heavy in the build infrastructure.
5k times transferring 1 byte was just a potential starting point.

> 
> 
>>>> 2. Firefox and Thunderbird and certainly other similar software calls
>>>> excessively gettimeofday() and clock_gettime(). At least around 100k
>>>> times per 1 minute, and the program spends around 30sec (cumulative time
>>> >from all LWPs in a process) in the kernel space prompting for the
>>>> current time.
>>>
>>> That's only a symptom. The real question is why it doesn't sleep.
>>
>> This is a symptom, but this is not specific to a single application. In
>> my checks other programs like top(1) are relatively hungry for checking
>> for the current time. More than 70% syscalls from top(1) are for
>> __gettimeofday50() (but of course top(1) doesn't emit so many syscalls
>> in so short periods).
> 
> Top caches data from several databases (e.g. passwd) and checks time for
> each lookup to find out whether the cache needs a refresh. Compared to
> everything else done by top it is neglible.
> 

My observation was general that this syscall is frequently called by
many programs. Optimization of it can potentially change responsiveness
of the whole system.

> 
> 
>> In NetBSD truss(1) we prompt for the current time for each event like a
>> syscall entry/exit of a traced process.
>>
>> Jason Thorpe mentioned how to optimize it. As far as I understand, we
>> can create a page shared between userland and kernel, pass it through
>> AUXV vector and effectively replace all syscalls with memory reads.
> 
> Yes, that is an option, als for other calls like getuid() or getpid().
> 
> On the other hand, your measurement is probably a bit misleading,
> a modern system does 100k gettimeofday calls in about a millisecond.
> 


My computers are slower than that. Also as long as ptrace(2) is racy, I
cannot guarantee any accurate numbers of calls with this tool (unless
profiling a single-threaded application).

I'm looking forward to some analysis with the right tool (DTrace is more
appropriate here). ptrace(2) based syscall tracers can give merely some
rough idea here. There are some websites (no need to market them here)
that present profiling with strace and show when it is efficient.

At some point of time Joyent optimized bulk builds of pkgsrc from 2 days
to 3 h. There are certainly low-hanging fruits in build.sh as well.

> I'm not sure whether the additional complexity would be justified.
> Another argument against this optimization is that tracing these
> non-syscalls is even more complex.
> 

I'm not sure that this would be a real concern here to skip gettimeofday
calls in strace-like programs. On the other hand it would be helpful to
filter out moderately interesting syscalls.

Tracing libc calls with ptrace(2) shouldn't be that difficult, but it
would need a tool with MD code.

> 
> 
> Greetings,
> 

Anyway I gave a tool, if someone is interested in experimenting and
feedbacking patches, feel free to do so. I will keep using them for
catching kernel stability problems of the ptrace(2) APIs.

Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index