   I thought the 16-byte alignment would also benefit secondary caches
   that use such (or larger) cache line sizes?

So the theory goes.  In practice it has no significant effect, and even
worse, the larger code could cause the cache (especially the internal
one on the 486) to roll over more quickly.