RejectedSoftware Forums

Sign up

Pages: 1 2

Re: Benchmarks?

Am 25.11.2015 um 15:57 schrieb Sönke Ludwig:

I tested a bit and here are some results:

DMD debug, nogc, singlethread: 30kreq/s
DMD plain, nogc, singlethread: 32kreq/s
DMD release, nogc, singlethread: 61kreq/s
DMD release, nogc, singlethread, libasync: 62kreq/s
GDC release, nogc, singlethread: 63kreq/s
GDC release, nogc, multithread: 64kreq/s
GDC release, gc, singlethread: 58kreq/s
GDC release, gc, multithread: 59kreq/s

release: "dub -b release"
nogc: versions "VibeManualMemoryManagement"
multithread: HTTPServerOption.distribute

With a few optimizations (most importantly removing unnecessary locks),
I'm now getting 89kreq/s for DMD release, nogc, multithread. This is now
actually hitting the limits of the dual-core CPU, I'll test this on a
quad-core later.

Re: Benchmarks?

On Wed, 25 Nov 2015 15:57:51 +0100, Sönke Ludwig wrote:

Am 24.11.2015 um 19:01 schrieb Adam Strzelecki:

Hello,

I am an author of some more lightweight benchmark WebFrameworkBenchmark, it is much simpler but also more friendly benchmark than Techempower's Benchmark.

I wanted to ask you to have a look at my Vibe.d results, which are pretty disappointing, just below Ruby performance and far below C and Java frameworks. Vibe.d looks really nice, it has interesting features, but its performance really needs some love (aka optimizations).

I tested a bit and here are some results:

DMD debug, nogc, singlethread: 30kreq/s
DMD plain, nogc, singlethread: 32kreq/s
DMD release, nogc, singlethread: 61kreq/s
DMD release, nogc, singlethread, libasync: 62kreq/s
GDC release, nogc, singlethread: 63kreq/s
GDC release, nogc, multithread: 64kreq/s
GDC release, gc, singlethread: 58kreq/s
GDC release, gc, multithread: 59kreq/s

release: "dub -b release"
nogc: versions "VibeManualMemoryManagement"
multithread: HTTPServerOption.distribute

I went ahead and looked, and it appears he's using an outdated version of vibe. Based on your results, could we improve this benchmark just by submitting a PR for the dub.sdl file?

Re: Benchmarks?

Am 03.12.2015 um 03:01 schrieb Charles:

I went ahead and looked, and it appears he's using an outdated version of vibe. Based on your results, could we improve this benchmark just by submitting a PR for the dub.sdl file?

I'd still wait a little. It looks like there might be more optimization
opportunities and I'll create a new pre-release version when those have
been investigated. But in any case, at least the latest alpha release
should be used, as that fixes the multi-core scaling issues.

Re: Benchmarks?

I'd still wait a little. It looks like there might be more optimization
opportunities and I'll create a new pre-release version when those have
been investigated. But in any case, at least the latest alpha release
should be used, as that fixes the multi-core scaling issues.

I would like share my thoughts about vibed performance.

I use Linux, git master with latest multicore fixes and improvements. I think Vibed has bottleneck in libevent2 library now: Libevent2TCPConnection class. I am not expert in libevent library, but I am sure that Libevent2TCPConnection class currently uses expensive and inefficient call sequence. I wrote another implementation libevent2_tcp.d

It works only for small request like hello-world from WebFrameworkBenchmark/benchmarks/vibed, but it has 2.5 performance gap over current version. The main idea is simple: read all data from one libevent2 chunk at once and do not use bufferevent_read in read method. You can take a look to peek() and read() methods in my implementation. I could not find correct way advance reading to next libevent2 data chunk and integrate it this Vibed.

Also I suppose read method is problem it self. I do not think that it is important right now, but it has argument ubyte[] and it makes impossible to use zero-copy approach. I always have to copy data in this method. It may be problem for high-speed processing with zero-copy solution like PFQ, DPDK, or Netmap.

My test result for my version:

wrk -t 4 -d 2s "http://localhost:8081/"
Running 2s test @ http://localhost:8081/
4 threads and 10 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
Latency     2.10ms    5.24ms  48.73ms   94.95%
Req/Sec    56.81k    13.56k   78.04k    65.85%
463299 requests in 2.10s, 78.32MB read
Socket errors: connect 0, read 717, write 0, timeout 0
Non-2xx or 3xx responses: 717
Requests/sec: 220691.82
Transfer/sec:     37.31MB

Please notice that my version has 717 errors even with small requests, and average is worst than 2ms

git master:

wrk -t 4 -d 2s "http://localhost:8081/"
Running 2s test @ http://localhost:8081/
4 threads and 10 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
Latency   318.67us    1.60ms  24.78ms   97.33%
Req/Sec    21.92k     2.33k   30.23k    73.49%
180981 requests in 2.10s, 30.20MB read
Requests/sec:  86188.69
Transfer/sec:     14.38MB

Re: Benchmarks?

Am 17.12.2015 um 04:32 schrieb Nikolay Tolstokulakov:

I'd still wait a little. It looks like there might be more optimization
opportunities and I'll create a new pre-release version when those have
been investigated. But in any case, at least the latest alpha release
should be used, as that fixes the multi-core scaling issues.

I would like share my thoughts about vibed performance.

I use Linux, git master with latest multicore fixes and improvements. I think Vibed has bottleneck in libevent2 library now: Libevent2TCPConnection class. I am not expert in libevent library, but I am sure that Libevent2TCPConnection class currently uses expensive and inefficient call sequence. I wrote another implementation libevent2_tcp.d

It works only for small request like hello-world from WebFrameworkBenchmark/benchmarks/vibed, but it has 2.5 performance gap over current version. The main idea is simple: read all data from one libevent2 chunk at once and do not use bufferevent_read in read method. You can take a look to peek() and read() methods in my implementation. I could not find correct way advance reading to next libevent2 data chunk and integrate it this Vibed.

This is great to know. I actually experimented a little with an
implementation that directly works on select/epoll and it also was much
faster. So it seems like the bufferevent API of libevent is inefficient
and we should simply ditch it in favor of an own read buffer.

Also I suppose read method is problem it self. I do not think that it is important right now, but it has argument ubyte[] and it makes impossible to use zero-copy approach. I always have to copy data in this method. It may be problem for high-speed processing with zero-copy solution like PFQ, DPDK, or Netmap.

This is true regarding the current implementation, there is also a
discussion about adding a new read overload somewhere. But for the
HTTP request benchmark game with its ~20MB/s per thread it should indeed
not matter.

My test result for my version:

 wrk -t 4 -d 2s "http://localhost:8081/"
 Running 2s test @ http://localhost:8081/
 4 threads and 10 connections
 Thread Stats   Avg      Stdev     Max   +/- Stdev
 Latency     2.10ms    5.24ms  48.73ms   94.95%
 Req/Sec    56.81k    13.56k   78.04k    65.85%
 463299 requests in 2.10s, 78.32MB read
 Socket errors: connect 0, read 717, write 0, timeout 0
 Non-2xx or 3xx responses: 717
 Requests/sec: 220691.82
 Transfer/sec:     37.31MB

Please notice that my version has 717 errors even with small requests, and average is worst than 2ms

git master:

 wrk -t 4 -d 2s "http://localhost:8081/"
 Running 2s test @ http://localhost:8081/
 4 threads and 10 connections
 Thread Stats   Avg      Stdev     Max   +/- Stdev
 Latency   318.67us    1.60ms  24.78ms   97.33%
 Req/Sec    21.92k     2.33k   30.23k    73.49%
 180981 requests in 2.10s, 30.20MB read
 Requests/sec:  86188.69
 Transfer/sec:     14.38MB

What CPU do you have? I'd be interested in how this roughly translates
to the system I tested on for the previous results.

Re: Benchmarks?

What CPU do you have? I'd be interested in how this roughly translates
to the system I tested on for the previous results.

Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz

PS
I think direct epoll usage will be good decision.

Re: Benchmarks?

I think direct epoll usage will be good decision.

FYI
I found async framework on code.dlang: https://github.com/ikod/fio
It uses epoll and implements custom event loop over it.

Re: Benchmarks?

At the moment vibed benchmark is failing...

https://github.com/TechEmpower/TFB-Round-11/blob/master/peak/linux/2015-09-11-final/latest/logs/vibed/out.txt

Re: Benchmarks?

On Mon, 21 Dec 2015 13:21:36 GMT, Dejan Lekic wrote:

At the moment vibed benchmark is failing...

https://github.com/TechEmpower/TFB-Round-11/blob/master/peak/linux/2015-09-11-final/latest/logs/vibed/out.txt

I've recently pushed some fixes: https://github.com/TechEmpower/FrameworkBenchmarks/pull/1771

Pages: 1 2