tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Leak Sanitizer - how to suppress leaks



    Date:        Thu, 12 Sep 2019 15:12:25 +0200
    From:        Kamil Rytarowski <n54%gmx.com@localhost>
    Message-ID:  <2a6e1fb2-cedc-4a57-750b-45f101be9fb9%gmx.com@localhost>


  | This proposal is practically equivalent of disabling leak detection at
  | all and removes the whole purpose.

No it isn't.   Or rather, it might be, the way you describe LSan to
work, but if that's what it is doing, then it is close to useless:

  | Leak detection works simply by scanning memory (TLS, stacks, heap) for
  | pointers to allocations. If there is no longer a reference, than there
  | is a leak.

So the following code

	struct list {
		struct list *prev;
		struct list *next;
		int age;
		/* other stuff not relevant here */
	};

	struct list *
	new_list(int age1, int age2)
	{
		strust list *l1, *l2;

		l1 = malloc(sizeof *l1);	/* check for NULL elided here */
		l2 = malloc(sizeof *l2); /* ditto */

		l1->prev = NULL; l1->next = l2; l1->age = age1;
		l2->prev = l1; l2->next = NULL; l2->age = age2;

		if (age1 > age2)
			return NULL;

		return l1;
	}

has no leak (in the case age1 > age2) ?   Scanning memory would see
a pointer to *l1 (in *l2), and a pointer to *l2 (in *l1).  Both blocks
of memory are referenced according to that trivial analysis.   Useless.

The "close to" just above came from there being cases that it can find.

Because of this, I also looked at the change you pade to "fix" ps.
You added

	free(pinfo);

But *pinfo (a struct pinfo) contains 3 pointers, each of which may
refer to other malloc'd blocks.   Further, pinfo points to an array
of those structs.   Because of the memory scan, those pointers (in each
element of the array) will still have been seen, but when you free(pinfo)
then those pointers can no longer be accessed (depending upon what the free()
in use while the sanatiser is running does, they may not still exist
in memory, or they might, just in a block that is now free).  Anyway
the struct kinfo_proc2 struct kinfo_lwp and char * (prefix) entries in
all the struct pinfo's are not being freed.  There was no leak before, but
your "fix" introduced one (or three * N).

I'd suggest reverting the change you made, and instead make pinfo a
global instead of a local in main - then its value will be seen still
existing by the memory scan (LSan will be happy) and ps won't be wasting
(a trivial amount of) time doing an incorrect (as it is now) free(), and
we'll all be happier.

Incidentally, what happens with code like:

	struct foo {
		int a, b, c;
	};

	int *
	make_foo(int a, int b, int c)
	{
		struct foo *p;

		p = malloc(sizeof *p);		/* again NULL check elided here */
		p->a = a; p->b = b; p->c = c;

		return (&p->b);
	}

	int * global_ptr;

	main()
	{
		global_ptr = make_foo(1,2,3);

		/* ... */

		{
			struct foo *p;

			p = (struct foo *)((intptr_t)global_ptr -
				 offsetof(struct foo, b));

			p->a++;
		}
		
		/* ... */
	}

Ugly (and stupid) but not illegal I think.   The more common case is
a list where the back pointers point at the forward pointer, rather
than at the struct itself, so one can do

	if ((*(p->back) = p->next) != NULL)
		p->next->back = p->back
	free(p);

to delete an element from the middle of a list easily - which works
even if the element to be deleted is first or last.   There if the
next field is the first in the struct, the back field will refer to
the malloc's memory block (by accident) but if it isn't, it won't.

  | I have not investigated where are the sh(1) leaks,

As far as I know it doesn't.   What I said is that it does not free
memory it allocates before it exits.  That is not a leak.

  | In sanitizers we cannot overload or do anything special with main()
  | (such as replace it with our own copy).

That wasn't the suggestion.   However, what was assumed a more rational
leak detection algorithm than appears to be in use - I agree, by simply
scanning memory you cannot do anything special with main() as when the
code runs, apparently, main's stack frame has already gone.

But were this done a different way -- the right way is to annotate, in the
code, all assignments to pointers , and add a ref counter to all allocated
memory blocks - when a pointer is assigned, and it points to something
allocated, then incr the ref count.  Whan the value of a pointer is
overwritten, and it used to point to allocated memory, decr the ref count.
If that decrease drops it to 0, then memory has been leaked.  In the
exit code for every function (except main, which can be recognised, we
know its name is "main", in this scenario) put code that checks every
local var that is a pointer, for each which points at allocated memory,
decrease the ref count.  If it reaches 0, that's a leak.

This will still give false errors, eg in

	main()
	{
		init();
		read_config();

		return do_the_work();
	}

assuming that none of init() read_config() or do_the_work() are
called anywhere else, then memory they allocate (and particularly
do_the_work()) which is not freed is not a leak.

This method still has problems with doubly linked lists, handling
those properly needs code to recognise what is going on, and when
analysing a list pointer drref, instead of checking against 0,
scan the list and find out how many internal pointers ref the
list elements, and ensure that (at least) one element on the list
(ahead of the one being deleted, for singly linked lists) has
a higher ref count than can be attributed to pointers inside the
list itself.   That's a lot of work for big lists (and trees, etc)
especially ones that are constantly being updated.

  | main() is the same entity as any regular fun(). Unless we decorate it
  | with some attributes in .c code, it is not distinguishable.

Decoration is not needed, its name is its decoration.

  | Globals are treated specially in sanitizers as sanitizers generate
  | constructors and destructors to keep the track of them.

Sorry, that (by itself) means nothing to me,   What do those constructors
and destructors do?

What we have here is a global pointer, which points to allocated memory,
and which is still valid when the program exits (the memory is still
allocated, and the global var still points to it).

  | I have benchmarked in the past that we execute for true(1) or similar
  | applications at least 1 million instructions on CPU.
  |
  |
  | $ ./singlestepper true
  | Total count: 1639915

True is a chell script.   You're measuring the startup (including
dynamic linking) of a fairly large program.

Try

	main() { _exit(0); }

and cc -static

and then measure that, and see how many instructions are executed
(if we wanted an executable "true" - that is, if that were ever
actually (significantly) used outside shell scripts, where it is
a builtin, that is how we would write it.)

[Aside: for some reason, probably related to crt0 or something, the
result of this is ridiculously large, and seems to have a good fraction
of libc linked into it.   That could be fixed...]

  | $ ./singlestepper echo
  | Total count: 1039124

Again, that will almost all be the dynamic linker.

  | Calling one 1 dummy function in libc in comparison to that is negligible.

Sure, if one function call is all it takes - but that would have to be
"turn off the memory scan" - ie: do no leak detection at all.   That isn't
what is desired, that would truly make the whole thing useless.

leak detection that finds bugs like the one I invented earlier in this
message is a good thing to have - it is very easy to forget to free
everything that has been allocated in a complicated function that is
bailing out in an error case.   Finding those bugs is definitely worthwhile.

But insisting that all memory that is ever allocated is explicitly freed,
just so the sanitiser doesn't claim it has "leaked" is absurd.

If it is possible to write code to run just before exit() which frees
memory, that memory, by definition, has not been leaked - only memory
that we no longer have a reference to, and so cannot possibly free()
can be rationally considered leaked.

  | We are talking here about simple programs that are expected to get some
  | measurable optimization by skipping free(3) calls. Such as sh(1) or ps(1).

I don't know the insides of ps, I don't know that there is much that needs
to be freed which isn't already.  None of the things that the pinfo points
to contain pointers themselves, so all that would be needed is 3*N + 1 free()
calls (assuming this is all there is).  Certainly I did not care about the one
free(psinfo) call that was added - that (if it were not creating a leak)
would be irrelevant to anyone.

It was the general principle of requiring memory to be freed - actually
writing code to do that, and then immediately calling exit() which frees
everything, always, that I was objecting to.   The only reason for ever
doing that extra work is to make it easier for a half baked sanitiser
implementation to not report non-problems as errors.

  | In all other cases I consider this style of programming as buggy.

You can consider it however you like, but it isn't.   The interface
specs for the OS guarantee that when a process exits, all of its
resources (memory, fds, ...) are released.   Programs (and programmers)
are entitled to rely upon this, and allow it to work for them.

All kinds of programs build all kinds of data structs as they work, and
then they're done, they exit() - tearing it all down for no reason is
lunacy.

  | If there are thousands of leaked memory objects than it is not just
  | simple optimization, but really wrong style of programming.

Not wrong, just not the style you (apparently) like.

Incidentally, do you close(0) close(1) close(2) in all your programs
before you exit()?   (fclose(stdin) etc ae as good when stdio is being
used).

If not, your program, according to you, is leaking file descriptors.

The kernel closes them for you during the exit processing of course,
but in your apparent philosophy that's not good enough...

  | One example
  | of that is ATF, it randomly picks one set of pointers to free before
  | exit, and skips others leaking the memory. It is bug.

If it is actually doing just what you say, then no, it is not a bug.
If it is losing memory while running, so as it runs more and more
tests (or subtests) it keeps getting bigger and bigger as (some of)
the memory for a previous (sub-)test hasn't been freed, but is no longer
needed, or accessible - then that would be a bug, and should be fixed.

But if it is just doing what you say, and not bothering to free memory
that it has been using (for example) for accumulating stats on the number
of tests run, how many failed, the names of those that failed, ... and
at the end it prints all that and exits without freeing that memory,
that is 100% perfectly OK.

  | As I have stated originally, we could remove free(3) from almost every
  | basesystem program and not many would notice.

I know what you're saying, but people would notice, as ...

  | There is even a C compiler
  | that performs only allocations and never frees the memory. It is still a
  | correct UNIX application, regardless of taking like 2GB for building a
  | hello world application.

people observe that kind of thing, programs that are huge when they
should not be.

Certainly with the huge (comparatively) VM space we have available today
it is easy to let many of these kinds of things slide as not being all
that important.

But that isn't the case we're talking about - rather memory that *cannot*
be freed until the program exits (the data will still be needed as long as
the program remains running) but which is clearly useless after the
program has finished.   exit() (inside the kernel, not the libc function)
frees all of that for us.   Using it is not incorrect, not even slightly.

kre

ps: an alternative shcnge for ps.c would be to alter

	return eval;

at the end of main() to be

	fflush(stdout);
	_exit(eval);

ps doesn't need atexit handlers (other than to flush its stdout, in
case it has been buffered), so we do the flush manually, and then _exit
sends back the exit code.   No atexit() handlers ever get run, and LSan
won't get a chance to object to the pinfo value not being freed.

I might start altering all programs I work on to use  that technique.















(Not really!)



Home | Main Index | Thread Index | Old Index