I'd still wait a little. It looks like there might be more optimization
opportunities and I'll create a new pre-release version when those have
been investigated. But in any case, at least the latest alpha release
should be used, as that fixes the multi-core scaling issues.

I would like share my thoughts about vibed performance.

I use Linux, git master with latest multicore fixes and improvements. I think Vibed has bottleneck in libevent2 library now: Libevent2TCPConnection class. I am not expert in libevent library, but I am sure that Libevent2TCPConnection class currently uses expensive and inefficient call sequence. I wrote another implementation libevent2_tcp.d

It works only for small request like hello-world from WebFrameworkBenchmark/benchmarks/vibed, but it has 2.5 performance gap over current version. The main idea is simple: read all data from one libevent2 chunk at once and do not use bufferevent_read in read method. You can take a look to peek() and read() methods in my implementation. I could not find correct way advance reading to next libevent2 data chunk and integrate it this Vibed.

Also I suppose read method is problem it self. I do not think that it is important right now, but it has argument ubyte[] and it makes impossible to use zero-copy approach. I always have to copy data in this method. It may be problem for high-speed processing with zero-copy solution like PFQ, DPDK, or Netmap.

My test result for my version:

wrk -t 4 -d 2s "http://localhost:8081/"
Running 2s test @ http://localhost:8081/
4 threads and 10 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
Latency     2.10ms    5.24ms  48.73ms   94.95%
Req/Sec    56.81k    13.56k   78.04k    65.85%
463299 requests in 2.10s, 78.32MB read
Socket errors: connect 0, read 717, write 0, timeout 0
Non-2xx or 3xx responses: 717
Requests/sec: 220691.82
Transfer/sec:     37.31MB

Please notice that my version has 717 errors even with small requests, and average is worst than 2ms

git master:

wrk -t 4 -d 2s "http://localhost:8081/"
Running 2s test @ http://localhost:8081/
4 threads and 10 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
Latency   318.67us    1.60ms  24.78ms   97.33%
Req/Sec    21.92k     2.33k   30.23k    73.49%
180981 requests in 2.10s, 30.20MB read
Requests/sec:  86188.69
Transfer/sec:     14.38MB