It almost looks like it never scales beyond a single core for some
reason. I'll have to start another profiling round to be sure, but it
could be related to the switch to std.experimental.allocator. Maybe
the GC is now suddenly the bottleneck.

BTW, thanks a lot for fixing the benchmark suite! This is something that
I always had in mind as an important issue, but could never find the
time for. I'll try look into the performance issue within the next few days.

I didn't post it but tested also with profilegc and new vibe-core allocates insanely more than the old one in the same simple plaintext test so it might be the case.