have you tested other values than 1 and 16? what about 4 or 8? can you post the size difference of kernels? particularly the kernel without DIAGNOSTIC or DEBUG (since those are the ones where performance matters most.) thanks. .mrg.