You may remember that last year I wrote an MQTT broker using vibe.d in response to a challenge from a work colleague about his implementation in Go, and that ended up being a fight between implementations in D, Go, C, C++, Java and Erlang. The D/vibe.d combo won the throughput benchmark by a mile but was middle-of-the-road in latency. That always got to me and I never did know why.

Last week at DConf I learned about the perf tool. It seemed pretty cool but I had to use it on something and I ended up looking at the Java and D implementations in terms of latency. Since the Java version had the best score in that benchmark, it was unsurprising to me that perf revealed that the CPU was idle a lot more for the D version than it was for Java.

I dug deeper and, according to perf, the most time was being spent on pthread_mutex_lock, shortly followed by __pthread_mutex_unlock_usercnt. The call graph points to, in the former case, to calls to FreeListAlloc.elementSize, which in turn is called by AutoFreeListAllocator.free.

The Java version, in contrast, had sys_epoll_ctl at the top and a lower percentage for the top stop.

I think I just found the reason for the performance discrepancy, at least on Linux. Then again I only measured the original benchmarks in Linux.

Atila