Am 17.12.2015 um 04:32 schrieb Nikolay Tolstokulakov:
I'd still wait a little. It looks like there might be more optimization
opportunities and I'll create a new pre-release version when those have
been investigated. But in any case, at least the latest alpha release
should be used, as that fixes the multi-core scaling issues.I would like share my thoughts about vibed performance.
I use Linux, git master with latest multicore fixes and improvements. I think Vibed has bottleneck in libevent2 library now:
Libevent2TCPConnection
class. I am not expert in libevent library, but I am sure that Libevent2TCPConnection class currently uses expensive and inefficient call sequence. I wrote another implementation libevent2_tcp.dIt works only for small request like hello-world from WebFrameworkBenchmark/benchmarks/vibed, but it has 2.5 performance gap over current version. The main idea is simple: read all data from one libevent2 chunk at once and do not use bufferevent_read in read method. You can take a look to peek() and read() methods in my implementation. I could not find correct way advance reading to next libevent2 data chunk and integrate it this Vibed.
This is great to know. I actually experimented a little with an
implementation that directly works on select/epoll and it also was much
faster. So it seems like the bufferevent API of libevent is inefficient
and we should simply ditch it in favor of an own read buffer.
Also I suppose read method is problem it self. I do not think that it is important right now, but it has argument ubyte[] and it makes impossible to use zero-copy approach. I always have to copy data in this method. It may be problem for high-speed processing with zero-copy solution like PFQ, DPDK, or Netmap.
This is true regarding the current implementation, there is also a
discussion about adding a new read
overload somewhere. But for the
HTTP request benchmark game with its ~20MB/s per thread it should indeed
not matter.
My test result for my version:
wrk -t 4 -d 2s "http://localhost:8081/" Running 2s test @ http://localhost:8081/ 4 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.10ms 5.24ms 48.73ms 94.95% Req/Sec 56.81k 13.56k 78.04k 65.85% 463299 requests in 2.10s, 78.32MB read Socket errors: connect 0, read 717, write 0, timeout 0 Non-2xx or 3xx responses: 717 Requests/sec: 220691.82 Transfer/sec: 37.31MB
Please notice that my version has 717 errors even with small requests, and average is worst than 2ms
git master:
wrk -t 4 -d 2s "http://localhost:8081/" Running 2s test @ http://localhost:8081/ 4 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 318.67us 1.60ms 24.78ms 97.33% Req/Sec 21.92k 2.33k 30.23k 73.49% 180981 requests in 2.10s, 30.20MB read Requests/sec: 86188.69 Transfer/sec: 14.38MB
What CPU do you have? I'd be interested in how this roughly translates
to the system I tested on for the previous results.