[patch] pthread(3) + malloc(3) init model

I propose to separate the pthread_atfork(3) call from pthread_tsd_init()
and move it into a distinct function.

I propose to call late TSD initialization after "pthread_atfork(NULL,
NULL, pthread__fork_callback);" from pthread__init().

This change:

1. Stops initializing jemalloc prematurely and unintentionally.
2. Eliminates '#if 0' hacks in pthread_mutex.c.
3. Restores control when to initialize a malloc implementation.

No regressions are observed.

