Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Leak Sanitizer - how to suppress leaks



    Date:        Fri, 13 Sep 2019 02:07:43 +0200
    From:        Kamil Rytarowski <n54%gmx.com@localhost>
    Message-ID:  <1ea33d0c-fb0f-ecdd-1706-b11841dc67ad%gmx.com@localhost>

  | Please note that in all arguments about leak sanitization, that it uses
  | internally the same mechanism as garbage collectors. lsan and libgc (and
  | others) also server similar purpose.

Noted.

But please note that all of that means nothing to me at all.   Nor do I
care how any of those things work.   What they do is of some interest,
how they do it is not.

  | LSan can detect indirect leaks:

That's nice.

  | No more leaks were reported, if there are any I will address them.

See my (was very recent when I typed this...) reply to wiz's message.

  | LSan/NetBSD still needs more work and it is not expected to be perfect.

I am not asking for it to be perfect.   What I am requesting is intelligent
interpretation of its results, not just "LSan says this is wrong, so I must
immediately fix it".

  | How about adding ifdef __NO_LEAKS like:
  |
  | #ifdef __NO_LEAKS
  | free(3)?
  | #endif
  |
  | And in lsan/asan/valgrind/etc checks use -D__NO_LEAKS.

It isn't so much that I think we need to save the cost of doing
the free() (though for ps it turns out to be harder than you'd expect
to actually get it right) but whether it is worth anyone time and
effort to actually work out what is needed (if anything at all).  Since
ps simply exits, we know there is no real leak, only the illusion of
one very briefly.

The problem more is that we need a way to tell LSan (and the others)
"yes, you told me this looks wrong, I investigated, it is OK, there
is no need to ever tell me about this again".   Even lint had that
ability from is beginning (30 years ago).

With that these tools can actually be useful (when used properly) - when
an issue is reported, if it is a bug, it gets fixed, if it isn't, then
the tool is told that, and we move on.

  | Feel free to experiment.

Right now, I can't - my build system is unable to build anything llvm
related (its compiler is too old) - which is why I haven't been making
any src changes recently.   Currently I can't build the TOOLS needed to
build everything else.   I need to update it so it can actually
build things again...   In progress, but very slowly.

  | I think we need to specify the definition. Leak is a memory object
  | without a pointer referencing it.

I can accept that as a definition.   But we also need to recognise that
there are no leaks after the program has finished.   And the program has
finished as soon as any of exit()/exec*()/_exit() is called (and succeeds)
or when main() returns.   After that there's nothing left of interest.
Leak detection needs to happen before one of those events occurs.

  | A special type of a leak is referencing memory that is no longer needed
  | and could be freed.

That one will be interesting to find...


  | From the compiler point of view it is only slightly different than any
  | regular "int foo(int, char**)".

From the compiler's viewpoint you're right, main() is just a function.
But from the overall system's viewpoint it isn't.

  | From sanitizer point of view it is not distinguishable and we cannot
  | (easily) make any conditions based on this.

This is a problem, as return from main, and a call to exit() are
equivalent.   Perhaps we should stop returning from main, and always
explicitly call exit() ?

  | Actually globals are not used in LSan and can be ignored.

I don't know how to interpret that.   In many programs the pointers
to allocated memory are stored in (or found from other data stored in)
global vars.

  | Dynamic linker execution time is a part of execution time of the whole
  | application.

Of course.

  | This implementation of __suppress_leak() takes 9-10 amd64 CPU
  | instructions for each __suppress_leak() call.

What does each of those calls accomplish?   That is, how many of
them are needed?

  | Before getting measurable performance impact of __suppress_leak() we
  | will can go out of memory..

It isn't the call itself which is probably the issue, but traversing
all the data structs (some of which may not have been used in ages, and
might be paged out) in order to find all the pointers to the memory that
is allocated.   Alternatively, we could do:

	void *
	xmalloc(size_t n)
	{
		void *p = malloc(n);

		if (p == NULL) {
			err(1, "....");	/* or whatever is appropriate *.
		}

		__suppress_leak(p);

		return p;
	}

but is that going to help anything at all, ever?

  | In the context of anything executed in libc/csu/application, even for a
  | hello world application this is negligible.

It isn't the trivial applications that matter, it is the big ones.   And it
isn't the comparison with time currently spent, but how much addional time
is consumed, and whether there is any benefit gained from that.

For the ps example, the actual cost of doing the needed free() calls to
return everything is probaly insignificant compared with the rest of what
ps does ... but it is also completely pointless.   There is no actual bug,
nothing is being lost, ps processes do not continually grow bigger because
of lost memory that can't be reused.   Any extra code is accomplishing
nothing - but still costs something, however small that might be.

  | Sanitizer does not distinguish problems from non-problems, it detects
  | that what is was designed for.

That is fine - but the humans interpreting the results need to distinguish.

Note this is not all on you - if you get an error report from LSan (or any
of the others) it is not your job to fix it (or even understand it) - unless
it is in your code of course.  In other cases, just file a PR, and let
someone more familiar with the code in question look into it.   Just be
prepared for a "no actual problem here" response.

  | In current POSIX we are lucky as this is defined 'Memory mappings that
  | were created in the process shall be unmapped before the process is
  | destroyed.'.

Yes, it has always been that way in unix (and any rational OS).

  | In a strict C code, in hosted environments this is not specified.

No, as there isn't necessarily an OS there at all.   Fortunately, for
our purposes, that is not us.

  | I might be wrong with my memory, but I recall something that early SunOS
  | had a problem, that unmapping the memory regions on exit was imperfect
  | and the resources were lost until reboot. (Don't ask me to find where I
  | read about it now).

I don't recall anything like that (but almost anything was possible of
SunOS 1, if you're going back that far).   Of course, me not remembering
something doesn't mean anything one way or the other...

  | With garbage collectors, certain implementations of Lisp computers were
  | so slow, that it was quicker to reboot a computer than wait for the
  | garbage collector to finish its work.

Yes, that one I understand, and which is why garbage collection is so
little used in real world applications.   It is a very nice technique
to dream about, even nicer to actually use, but a nightmare to implement
in a way that is practically useful.

  | exit(3)/process termination is defined to close file descriptors.

Yes, just as it is defined to free all memory.

  | LSan does not look look for unclosed file descriptors so we can skip
  | discussing the file descriptor leaks as offtopic here.

We can - but the issue is essentially the same.

  | > If not, your program, according to you, is leaking file descriptors.
  | No.

What is the difference?

  | Actually whenever I open a file descriptor I try to close it when no
  | longer needed.

That's fine, and if you were to do, in some function

	func()
	{
		char buf[BIG_ENOUGH];
		int fd;

		fd = open("myfile", 0);
		if (fd < 0)	/* whatever */;
		read(fd, buf, BIG_ENOUGH);

		return buf[0] == '#';
	}

I think we'd all agree that there's a fd leak in there.   Just the
same as if fd was a pointer, and the open() was malloc() (and some other
code replaced the read) it would be a memory leak.

The two situations are entirely parallel.

  | If we speak about philosophy, the same philospohy of not freeing the
  | resources is like to avoiding checks for the return value from
  | malloc(3), as it hardly ever fails today.

That's nonsense, they are nothing alike at all.

  | Yes, I'm for checking for the return value always, ideally using
  | emalloc(3) from util(3).

Unless you write your own error handler, you will have problems
with
	func() {
		void *p, *q;

		p = emalloc(SIZE1);
		q = emalloc(SIZE2);
		
		/*....*/

		free(q); free(p);
	}

as when the 2nd malloc fails, err() will normally be called, and if
that happens the memory that was allocated to p will have been "leaked"
in your view of the world.   So will any memory that has previously been
allocated by the program.

Even with an error handler, dealing with this is very hard, without
making (at least) p be global, and that is difficult to manage (not
impossible, just difficult) if func() needs to be recursive (directly
or indirectly).  It can be handled with setjmp()/longjmp() but is all of
that mechanism really necessary when what happens when err() is used
as the error handler is to write a message, and exit() ?

The exit() results in the "leaked" memory claimed by p being freed (along
with everything else in the program) - are you really suggesting that in
these cases the program needs to go free everything it had allocated before
it is allowed to exit, or it will be considered buggy?

If not, then what's the difference beween the exit(n) in err() and the
exit(0) that happens after main returns ?

  | As I stated, ATF tries to free the memory, but it is not freed
  | diligently and the job is done only partially.

I haven't looked inside ATF, and haven't investigated it, so I am
not going to speculate.   As I said last time, it all depends just
what is not freed.  That some attempt is made to free some of it
means nothing, one way or the other.

  | Please try to change the perspective [ ... ]
  | but it does not help to address the real problem.

What real problem?

  | As an example, please imagine that a vendor, a user, a hobbyist, a
  | company is doing a product, a project on top of the NetBSD
  | codebase/system.

I can imagine that - but I will also assume that said vendor/user/...
understands the environment in which the NetBSD (and all other unix)
code operates, and either duplicates it, or adapts the code to suit.

  | After the investigation we see that the RSS/MAXRSS keeps growing because
  | of the pinfo struct.

Yes, if everything happened the way you postulate (and the developer is
that incompetent) then that would happen.   But that's a trivial issue
compared with everything else that isn't being freed in ps.  Much more
will be being leaked than the trivial pinfo struct.

  | If we want to catch leaks in sh(1) (in our releng builds and machines in
  | a future MKSANITIZER mode) that affect long-running applications, we
  | first need to remove the noise from uninteresting leaks. And we are back
  | to the original problem.

Once again, at least to the best of my knowledge, there are no leaks as
you defined the term above in sh.

All the memory that is not freed when sh exits has pointers to it.

Even stricter, at least as far as sh is concerned, none of the data
it has allocated is not needed, and could have been freed.  Of course
the shell doesn't know that the script has done

	VAR='some long string'

and has finished using the variable, but hasn't bothered doing

	unset VAR

which would allow the shell to free the 'some long string' and the
associated data associated with it (including the name "VAR").

So, from that sense, it may have useless data allocated, but there's no
way for the shell to know that.

  |  - use free(3) optionally wrapped in #ifdef _NO_LEAKS
  |  - use __suppress_leak() as a generic approach for all leak detecting
  | software with a negligible cost on
  |  - use per-tool approach

     - understand that exit() is "free everything"

That one is my preferred choice.   Then do leak detection while the
program is running, rather that while it is exiting - that's too
late and you can't tell the difference any more between memory that
was actually leaked, and memory that was in use, but (obviously) now
that the program is done, is needed no more.

kre



Home | Main Index | Thread Index | Old Index