Subject: Re: ktr from freebsd
To: Bill Studenmund <wrstuden@netbsd.org>
From: Andrey Petrov <petrov@netbsd.org>
List: tech-kern
Date: 04/27/2004 14:46:01
We probably see purpose of kernel trace facility differently.
For me it's needed when you can't use printf because it's too
intrusive and using it leads to different system behavouir.
So lightwight and no side-effects would be priority 1.

Second would be reliablility, I don;t want losing information
so everything which is collected can be trusted.

I can't tell if complete log is needed, so far the tail of
it was enought. From my experience setting buffer size and trace mask
let me have sufficient amount of trace.

Saving to file seems nice but not necessary feature,
target user of kernel trace should be able to use ddb. Utility to
extract log from corefile would be useful though.

On Tue, Apr 27, 2004 at 01:00:57PM -0700, Bill Studenmund wrote:
> On Tue, Apr 27, 2004 at 11:03:38AM -0700, Andrey Petrov wrote:
> > 
> > Control over subsystem logging can be done by mask, while merging different
> > buffers when you need to see interactions between subsystems could be
> > somewhat difficult.
> 
> If you want things in the same log, just code them to use the same history 
> log.
> 
> If we use a bitmask, we force different systems to use the same log 
> regardless of what is really needed. As we go towards more & more 
> concurrency (yes, it is a ways off for the kernel), we will be limiting 
> potential concurrency.
> 

All rigth, supporting different buffers could be plus, but if you
think of mask as of DPRINTF then it gives similar per module control.
Multiple buffers causes more TLB traps.

> > 
> > Seems that priority is reversed here, it's more important to be efficient
> > when data is being collected, then when it's dumped.
> 
> Nope. For two reasons:
> 
> 1) If you're logging a very busy service, the act of dumping the log can 
> impact what you're doing. So you need to think about logging as a whole 
> process; a change that makes collecting a little faster is no good if it 
> makes dumping hideously worse.
> 
> It's much much more efficient to store the length than not. Think about 
> what it takes for libkvm to get data out of the log. If we have the 
> lengths, we extract the struct uvm_history_ent. We then extract fmtlen 
> bytes from fmt and fnlen bytes from fn. Three reads of known-sizes of data 
> from known addresses. And three system calls.
> 
> If we don't store the lengths, how are we going to read the strings? 
> kvm_read() reads a fixed-number of bytes. So we have to kvm_read() a byte 
> at a time and compare it with zero. So for the case above, we have one 
> system call for the overall entry and one per character in each string, 
> plus one each for the trailing NUL.
> 
> We could just snprintf() into a ring buffer in the kernel, so the logger
> only has to read characters. But then we're putting a call to printf() in
> the code we're trying to log, which will make logging impact things even
> more. It's one thing to call strlen() (which will probably be inlined!)
> and store about 12 ints worth of data (ILP32) each log entry, it's another
> to call printf().
> 
> Yes, I am assuming logging in a case where your logging has an impact on 
> what you're doing but that you can process the logs (in a userland program 
> for instance) fast enough to keep up.
> 
> 2) In uvmhist, half of the string lengths are of the format strings. gcc 
> will turn strlen("Foo") into a constant (3 in this case) at compile time. 
> So by storing lengths, our compiler is removing half the calls to 
> strlen().
> 
> The uvm history system is designed to handle logging a very busy kernel 
> subsystem to disk very efficiently. While ktr or something like it may be 
> simpler, I don't think it will perform as well under heavy load. Since we 
> will need to log such subsystems, why not focus on one logging system that 
> will work well for all (almost all?) our needs, rather than have two 
> subsystems, one of which is only good some of the time.
> 

Would be interesting to see how libkvm-based utility does the job.
vmstat seems to do only a snapshot.

> The one thing I wish we had with uvm hist is an ability to dump the raw 
> uvmhist_ent's to disk then we run a program to turn that binary dump into 
> a text dump. That way we don't even call printf() while we're actually 
> logging.
> 

All right, there is an objection, so I put this off for better time.

	Andrey