The program is as fast as I know how to make it. The hot loops are tuned, the locks as fine-grained as the design allows, the slow paths read and re-read. It’s still slower than it should be, and slow to start. The profiler keeps pointing at a name I’ve been ignoring because it isn’t mine: malloc.
malloc holds a global lock, and in a multi-threaded program that lock is where the threads line up. You never wrote a line of it. The libraries you lean on allocate on paths you’ve never read. You can’t grep for it – the calls aren’t in your code.
dtrace can see them.
The restrictions on our zones mean I’ve only just gotten to try it, and the first thing I point it at is malloc. One line:
dtrace -n 'pid$target:libc:malloc:return { @[ustack()] = count(); }' -p `pgrep -n program`
That aggregates a stack trace for every return from malloc and counts how often each one comes back. You get a ranked list of the paths into the allocator, the hottest at the bottom. Mine are not where I’d have guessed – most are library code, three and four frames deep, on paths I have never once looked at.
The known fix is mtmalloc, a multi-threaded allocator that trades memory for fewer collisions at the lock. I tried it. Startup went from thirty seconds to twenty. Memory went from twelve gigabytes to sixteen. Ten seconds for four gigabytes. Whether that’s a good trade depends entirely on which one you have to spare, and I’m not sure I do.
So before I commit to it I want to know what’s actually calling malloc, not just that something is. Premature optimization is the root of all evil – and swapping your allocator to fix a bottleneck you haven’t located is exactly that. The one-liner tells you where to look first.
A longer script logs every allocation and free, with the stack on the way in:
#!/usr/sbin/dtrace -s
pid$target:libc:malloc:entry { ustack(); }
pid$target:libc:malloc:return { printf("%s: %x\n", probefunc, arg1); }
pid$target:libc:free:entry { printf("%s: %x\n", probefunc, arg0); }
Run it against the process, send stdout somewhere with room, and you have the raw material for a leak hunt – match the addresses that came back from malloc against the ones that reached free. The ones that never show up again are yours to explain. I found a few that way. A couple were real leaks. A couple were just sloppy.
This is the smallest thing it does. I’m going to be here a while.
