Techempower roud 14 preview 4 available

Permalink: HTTP NNTP

Tomáš Chaloupka

Posted Thu, 20 Apr 2017 16:24:59 GMT

Reply

https://www.techempower.com/benchmarks/previews/round14

It finally seems to at least pass all the tests. Also it uses the new vibe-core beta.

But I would expect much higher position at least in the plaintext test.

Re: Techempower roud 14 preview 4 available

Permalink: HTTP NNTP

Tomáš Chaloupka

Posted Thu, 20 Apr 2017 19:33:58 GMT in reply to Tomáš Chaloupka

Reply

On Thu, 20 Apr 2017 16:24:59 GMT, Tomáš Chaloupka wrote:

https://www.techempower.com/benchmarks/previews/round14

It finally seems to at least pass all the tests. Also it uses the new vibe-core beta.

But I would expect much higher position at least in the plaintext test.

I've tried the plaintext test on my Broadwell NTB with i5-5300U

new vibe-core

[tomas@E7450 wrk]$ ./wrk -c 10 -d 30 -t 4 http://localhost:8080/plaintext
Running 30s test @ http://localhost:8080/plaintext
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.86ms    2.16ms  83.77ms   99.19%
    Req/Sec     2.64k   291.25     3.40k    82.08%
  315459 requests in 30.02s, 50.24MB read
Requests/sec:  10507.38
Transfer/sec:      1.67MB

old

[tomas@E7450 wrk]$ ./wrk -c 10 -d 30 -t 4 http://localhost:8080/plaintext
Running 30s test @ http://localhost:8080/plaintext
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   430.97us    0.86ms  20.36ms   96.28%
    Req/Sec     6.04k     1.24k    9.09k    68.58%
  721024 requests in 30.03s, 111.39MB read
Requests/sec:  24012.03
Transfer/sec:      3.71MB

Compiled with dmd 2.074.0 with dub -b release

Compared to for example go-std:

[tomas@E7450 wrk]$ ./wrk -c 10 -d 30 -t 4 http://localhost:8080/plaintext
Running 30s test @ http://localhost:8080/plaintext
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   728.27us    2.72ms 107.30ms   96.02%
    Req/Sec     6.62k     1.14k   10.02k    74.67%
  790419 requests in 30.04s, 95.73MB read
Requests/sec:  26315.10
Transfer/sec:      3.19MB

Compiled with go1.7.5

Re: Techempower roud 14 preview 4 available

Permalink: HTTP NNTP

Sönke Ludwig

Posted Sun, 23 Apr 2017 13:53:18 +0200 in reply to Tomáš Chaloupka

Reply

Am 20.04.2017 um 21:33 schrieb Tomáš Chaloupka:

On Thu, 20 Apr 2017 16:24:59 GMT, Tomáš Chaloupka wrote:

https://www.techempower.com/benchmarks/previews/round14

It finally seems to at least pass all the tests. Also it uses the new vibe-core beta.

But I would expect much higher position at least in the plaintext test.

I've tried the plaintext test on my Broadwell NTB with i5-5300U

new vibe-core

[tomas@E7450 wrk]$ ./wrk -c 10 -d 30 -t 4 http://localhost:8080/plaintext
Running 30s test @ http://localhost:8080/plaintext
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.86ms    2.16ms  83.77ms   99.19%
    Req/Sec     2.64k   291.25     3.40k    82.08%
  315459 requests in 30.02s, 50.24MB read
Requests/sec:  10507.38
Transfer/sec:      1.67MB

old

[tomas@E7450 wrk]$ ./wrk -c 10 -d 30 -t 4 http://localhost:8080/plaintext
Running 30s test @ http://localhost:8080/plaintext
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   430.97us    0.86ms  20.36ms   96.28%
    Req/Sec     6.04k     1.24k    9.09k    68.58%
  721024 requests in 30.03s, 111.39MB read
Requests/sec:  24012.03
Transfer/sec:      3.71MB

Compiled with dmd 2.074.0 with dub -b release

Compared to for example go-std:

[tomas@E7450 wrk]$ ./wrk -c 10 -d 30 -t 4 http://localhost:8080/plaintext
Running 30s test @ http://localhost:8080/plaintext
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   728.27us    2.72ms 107.30ms   96.02%
    Req/Sec     6.62k     1.14k   10.02k    74.67%
  790419 requests in 30.04s, 95.73MB read
Requests/sec:  26315.10
Transfer/sec:      3.19MB

Compiled with go1.7.5

It almost looks like it never scales beyond a single core for some
reason. I'll have to start another profiling round to be sure, but it
could be related to the switch to std.experimental.allocator. Maybe
the GC is now suddenly the bottleneck.

BTW, thanks a lot for fixing the benchmark suite! This is something that
I always had in mind as an important issue, but could never find the
time for. I'll try look into the performance issue within the next few days.

Re: Techempower roud 14 preview 4 available

Permalink: HTTP NNTP

Jacob Carlborg

Posted Sun, 23 Apr 2017 16:48:14 +0200 in reply to Sönke Ludwig

Reply

On 2017-04-23 13:53, Sönke Ludwig wrote:

It almost looks like it never scales beyond a single core for some
reason.

I've seen similar behaviors when I was running performance tests on a
small vibe.d application last year. It had the same performance
regardless if it was running single or multi-threaded.

BTW, how does vibe.d's multi-threading functionality works? Does it
spread the fibers across multiple threads or does it use multiple event
loops?

/Jacob Carlborg

Re: Techempower roud 14 preview 4 available

Permalink: HTTP NNTP

Sönke Ludwig

Posted Sun, 23 Apr 2017 18:57:13 +0200 in reply to Jacob Carlborg

Reply

Am 23.04.2017 um 16:48 schrieb Jacob Carlborg:

On 2017-04-23 13:53, Sönke Ludwig wrote:

It almost looks like it never scales beyond a single core for some
reason.

I've seen similar behaviors when I was running performance tests on a
small vibe.d application last year. It had the same performance
regardless if it was running single or multi-threaded.

BTW, how does vibe.d's multi-threading functionality works? Does it
spread the fibers across multiple threads or does it use multiple event
loops?

It starts one loop per thread and lets the OS distribute incoming
connections across threads (using SO_REUSEPORT). However, usually the
better method is to actually start up one process per CPU core, as that
avoids issues like the GC lock bringing everything to a crawl.

Re: Techempower roud 14 preview 4 available

Permalink: HTTP NNTP

Tomáš Chaloupka

Posted Mon, 24 Apr 2017 11:13:54 GMT in reply to Sönke Ludwig

Reply

It almost looks like it never scales beyond a single core for some
reason. I'll have to start another profiling round to be sure, but it
could be related to the switch to std.experimental.allocator. Maybe
the GC is now suddenly the bottleneck.

BTW, thanks a lot for fixing the benchmark suite! This is something that
I always had in mind as an important issue, but could never find the
time for. I'll try look into the performance issue within the next few days.

I didn't post it but tested also with profilegc and new vibe-core allocates insanely more than the old one in the same simple plaintext test so it might be the case.