My benchmarks

Permalink: HTTP NNTP

Posted Tue, 11 Feb 2014 11:31:43 GMT

I've been working towards getting more requests per second.

I'm doing this on not such a good machine so take this with a grain of salt (HP elite 190a). E.g. my hard drives are quite old and worn with noticeable speed issues. Even things like AV effect its performance. For this they are disabled.
To do this was basically commenting out at https://github.com/rejectedsoftware/vibe.d/blob/06adc36e73a588f4563d33396a95c3891b2bb373/source/vibe/http/server.d#L1431

I'm currently getting around 5.3k requests per second.
The other things I have done differently is that I have implemented a router that generates its checking code at compile time via delegates (because routes are known at CTFE).
There is around 18 routes currently being served of which only one (index) is being tested. It outputs basically just a HEAD request.

Strangely manually specifying a HEAD request with ab is slower by around 100ms.

For routes that doesn't exist, I'm getting around 5.2k requests/s. And with -i on ab around 4.2 which is quite a bit different. However my router handles 404's and so should not be causing an exception.

I rewrote my router specifically to test this out at. Before hand the most I could get was 1.3k. I don't know if its an accumulation of factors or simply regex harming.

I'm a lot happier with this now than what I was a few days ago.

Re: My benchmarks

Permalink: HTTP NNTP

Stephan Dilly

Posted Tue, 11 Feb 2014 12:24:56 GMT in reply to Rikki Cattermole

On Tue, 11 Feb 2014 11:31:43 GMT, Rikki Cattermole wrote:

I've been working towards getting more requests per second.

I'm doing this on not such a good machine so take this with a grain of salt (HP elite 190a). E.g. my hard drives are quite old and worn with noticeable speed issues. Even things like AV effect its performance. For this they are disabled.
To do this was basically commenting out at https://github.com/rejectedsoftware/vibe.d/blob/06adc36e73a588f4563d33396a95c3891b2bb373/source/vibe/http/server.d#L1431

I'm currently getting around 5.3k requests per second.

how much was it b4 your changes ? 1.3k ?

The other things I have done differently is that I have implemented a router that generates its checking code at compile time via delegates (because routes are known at CTFE).

Thats a great idea. For my projects I could use that too. I hope this can be merged into the main repo!

There is around 18 routes currently being served of which only one (index) is being tested. It outputs basically just a HEAD request.

Strangely manually specifying a HEAD request with ab is slower by around 100ms.

For routes that doesn't exist, I'm getting around 5.2k requests/s. And with -i on ab around 4.2 which is quite a bit different. However my router handles 404's and so should not be causing an exception.

I rewrote my router specifically to test this out at. Before hand the most I could get was 1.3k. I don't know if its an accumulation of factors or simply regex harming.

I'm a lot happier with this now than what I was a few days ago.

Re: My benchmarks

Permalink: HTTP NNTP

Rikki Cattermole

Posted Tue, 11 Feb 2014 12:37:09 GMT in reply to Stephan Dilly

Yes I was getting around 1.3k req/s.

There isn't per say anything that would stop it going into Vibe. https://github.com/rikkimax/Cmsed/blob/master/source/cmsed/base/internal/routing/defs.d#L127

But it wouldn't be as useful, compared to using with Cmsed as well.

Re: My benchmarks

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 11 Feb 2014 13:34:09 GMT in reply to Rikki Cattermole

On Tue, 11 Feb 2014 11:31:43 GMT, Rikki Cattermole wrote:

I've been working towards getting more requests per second.

I'm doing this on not such a good machine so take this with a grain of salt (HP elite 190a). E.g. my hard drives are quite old and worn with noticeable speed issues. Even things like AV effect its performance. For this they are disabled.
To do this was basically commenting out at https://github.com/rejectedsoftware/vibe.d/blob/06adc36e73a588f4563d33396a95c3891b2bb373/source/vibe/http/server.d#L1431

You mean the foreach (log; context.loggers)? If you don't have accessLogToConsole or accessLogFile set in the HTTPServerSettings, I don't see how that could possibly have a measurable influence on performance.

I'm currently getting around 5.3k requests per second.

That's still quite low actually, which OS is that on?

The other things I have done differently is that I have implemented a router that generates its checking code at compile time via delegates (because routes are known at CTFE).
There is around 18 routes currently being served of which only one (index) is being tested. It outputs basically just a HEAD request.

My idea was rather to optimize the existing runtime router by creating a decision tree from all the routes, so that the selection time goes down from O(n) to O(log(n)), which should be plenty sufficient if the server serves any kind of non-trivial content.

Strangely manually specifying a HEAD request with ab is slower by around 100ms.

For routes that doesn't exist, I'm getting around 5.2k requests/s. And with -i on ab around 4.2 which is quite a bit different. However my router handles 404's and so should not be causing an exception.

I think I'll replace that exception in the HTTP server by a manual call to the error page handler. Even if exceptions will be optimized (as they definitely need to), there is no need to go with an exception there, except for saving a tiny litte code duplication.

I rewrote my router specifically to test this out at. Before hand the most I could get was 1.3k. I don't know if its an accumulation of factors or simply regex harming.

When I get the time, I'll start another profiler run in VTune. After my last optimization run some months (a year?) ago, almost all of the time was taken by the kernel for I/O related things. Since nothing has changed since then regarding the router, logging or similar, my guess would be that something like a GC allocation or some performance hungry debug assertion has slipped back in somewhere and causes the slowdown*.

I'm a lot happier with this now than what I was a few days ago.

What I can say is that I've got up to 80k/s requests on my AMD Phenom-II quad-core on Linux (should be considarably slower than your i7). However it made a huge difference if the benchmark app (ab) was run on the same machine, or on a different one. The client and server processes can have surprising interactions when they run on the same machine and on the loopback device. There is also weighttp, which has a much lower overhead than ab. The only drawback is that it doesn't output as detailed statistics as ab.

* it should be said that I've used HTTPServerOption.distribute and -version=VibeManualMemoryManagement for the benchmarks.

Re: My benchmarks

Permalink: HTTP NNTP

Stephan Dilly

Posted Tue, 11 Feb 2014 13:41:21 GMT in reply to Sönke Ludwig

On Tue, 11 Feb 2014 13:34:09 GMT, Sönke Ludwig wrote:

On Tue, 11 Feb 2014 11:31:43 GMT, Rikki Cattermole wrote:

I've been working towards getting more requests per second.

I'm doing this on not such a good machine so take this with a grain of salt (HP elite 190a). E.g. my hard drives are quite old and worn with noticeable speed issues. Even things like AV effect its performance. For this they are disabled.
To do this was basically commenting out at https://github.com/rejectedsoftware/vibe.d/blob/06adc36e73a588f4563d33396a95c3891b2bb373/source/vibe/http/server.d#L1431

You mean the foreach (log; context.loggers)? If you don't have accessLogToConsole or accessLogFile set in the HTTPServerSettings, I don't see how that could possibly have a measurable influence on performance.

I'm currently getting around 5.3k requests per second.

That's still quite low actually, which OS is that on?

The other things I have done differently is that I have implemented a router that generates its checking code at compile time via delegates (because routes are known at CTFE).
There is around 18 routes currently being served of which only one (index) is being tested. It outputs basically just a HEAD request.

My idea was rather to optimize the existing runtime router by creating a decision tree from all the routes, so that the selection time goes down from O(n) to O(log(n)), which should be plenty sufficient if the server serves any kind of non-trivial content.

Strangely manually specifying a HEAD request with ab is slower by around 100ms.

For routes that doesn't exist, I'm getting around 5.2k requests/s. And with -i on ab around 4.2 which is quite a bit different. However my router handles 404's and so should not be causing an exception.

I think I'll replace that exception in the HTTP server by a manual call to the error page handler. Even if exceptions will be optimized (as they definitely need to), there is no need to go with an exception there, except for saving a tiny litte code duplication.

I rewrote my router specifically to test this out at. Before hand the most I could get was 1.3k. I don't know if its an accumulation of factors or simply regex harming.

When I get the time, I'll start another profiler run in VTune. After my last optimization run some months (a year?) ago, almost all of the time was taken by the kernel for I/O related things. Since nothing has changed since then regarding the router, logging or similar, my guess would be that something like a GC allocation or some performance hungry debug assertion has slipped back in somewhere and causes the slowdown*.

I'm a lot happier with this now than what I was a few days ago.

What I can say is that I've got up to 80k/s requests on my AMD Phenom-II quad-core on Linux (should be considarably slower than your i7). However it made a huge difference if the benchmark app (ab) was run on the same machine, or on a different one. The client and server processes can have surprising interactions when they run on the same machine and on the loopback device. There is also weighttp, which has a much lower overhead than ab. The only drawback is that it doesn't output as detailed statistics as ab.

* it should be said that I've used HTTPServerOption.distribute and -version=VibeManualMemoryManagement for the benchmarks.

What speaks against using -version=VibeManualMemoryManagement as a default anyway ? Is this whole option documented somewhere ?

Re: My benchmarks

Permalink: HTTP NNTP

Rikki Cattermole

Posted Tue, 11 Feb 2014 14:30:40 GMT in reply to Sönke Ludwig

On Tue, 11 Feb 2014 13:34:09 GMT, Sönke Ludwig wrote:

On Tue, 11 Feb 2014 11:31:43 GMT, Rikki Cattermole wrote:

I've been working towards getting more requests per second.

I'm doing this on not such a good machine so take this with a grain of salt (HP elite 190a). E.g. my hard drives are quite old and worn with noticeable speed issues. Even things like AV effect its performance. For this they are disabled.
To do this was basically commenting out at https://github.com/rejectedsoftware/vibe.d/blob/06adc36e73a588f4563d33396a95c3891b2bb373/source/vibe/http/server.d#L1431

You mean the foreach (log; context.loggers)? If you don't have accessLogToConsole or accessLogFile set in the HTTPServerSettings, I don't see how that could possibly have a measurable influence on performance.

I would not set it, but Cmsed sorta assumes its set in config currently. And it can effect performance when the IO speed of the hdd are not up to doing it.

I'm currently getting around 5.3k requests per second.

That's still quite low actually, which OS is that on?

Windows 7 x64 (although only 32 bit app blame Vibe's no 64bit libevent libs).

The other things I have done differently is that I have implemented a router that generates its checking code at compile time via delegates (because routes are known at CTFE).
There is around 18 routes currently being served of which only one (index) is being tested. It outputs basically just a HEAD request.

My idea was rather to optimize the existing runtime router by creating a decision tree from all the routes, so that the selection time goes down from O(n) to O(log(n)), which should be plenty sufficient if the server serves any kind of non-trivial content.

I like the idea in that. This is just easier and better than what already is there.

Strangely manually specifying a HEAD request with ab is slower by around 100ms.

For routes that doesn't exist, I'm getting around 5.2k requests/s. And with -i on ab around 4.2 which is quite a bit different. However my router handles 404's and so should not be causing an exception.

I think I'll replace that exception in the HTTP server by a manual call to the error page handler. Even if exceptions will be optimized (as they definitely need to), there is no need to go with an exception there, except for saving a tiny litte code duplication.

I'm just making the point that exceptions weren't part of the chain of calls.

I rewrote my router specifically to test this out at. Before hand the most I could get was 1.3k. I don't know if its an accumulation of factors or simply regex harming.

When I get the time, I'll start another profiler run in VTune. After my last optimization run some months (a year?) ago, almost all of the time was taken by the kernel for I/O related things. Since nothing has changed since then regarding the router, logging or similar, my guess would be that something like a GC allocation or some performance hungry debug assertion has slipped back in somewhere and causes the slowdown*.

I'm a lot happier with this now than what I was a few days ago.

What I can say is that I've got up to 80k/s requests on my AMD Phenom-II quad-core on Linux (should be considarably slower than your i7). However it made a huge difference if the benchmark app (ab) was run on the same machine, or on a different one. The client and server processes can have surprising interactions when they run on the same machine and on the loopback device. There is also weighttp, which has a much lower overhead than ab. The only drawback is that it doesn't output as detailed statistics as ab.

* it should be said that I've used HTTPServerOption.distribute and -version=VibeManualMemoryManagement for the benchmarks.

This is where I say, I did try on Linux.. both in VM and actual hardware. In both it wasn't flattering. Pre rewrite of route please note. Neither was really suitable for testing upon.

Yes ab was running local. I'm using this more of a comparison between the approaches. And to get an idea where I'm going wrong.
I tried with and without VibeManualMemoryManagement and it hit 6.1k (I tuned ab parameters a little bit more).
ab -n100000 -c500 -k
I did try again from my main machine to my 'server' which got me around 4.5k req/s. So quite a bit less.

Would it be possible for you to make a tutorial on this? It might be possible to nail down what is different in our setups.

All in all, honestly as long as I'm above 2.3k I'm happy. Thats pretty much the fastest of any php framework I've found to date. If its possible to hit 10k+ I would be ecstatic.

Re: My benchmarks

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 11 Feb 2014 15:51:56 +0100 in reply to Stephan Dilly

Am 11.02.2014 14:41, schrieb Stephan Dilly:

On Tue, 11 Feb 2014 13:34:09 GMT, Sönke Ludwig wrote:

On Tue, 11 Feb 2014 11:31:43 GMT, Rikki Cattermole wrote:

I've been working towards getting more requests per second.

I'm doing this on not such a good machine so take this with a grain of salt (HP elite 190a). E.g. my hard drives are quite old and worn with noticeable speed issues. Even things like AV effect its performance. For this they are disabled.
To do this was basically commenting out at https://github.com/rejectedsoftware/vibe.d/blob/06adc36e73a588f4563d33396a95c3891b2bb373/source/vibe/http/server.d#L1431

You mean the foreach (log; context.loggers)? If you don't have accessLogToConsole or accessLogFile set in the HTTPServerSettings, I don't see how that could possibly have a measurable influence on performance.

I'm currently getting around 5.3k requests per second.

That's still quite low actually, which OS is that on?

The other things I have done differently is that I have implemented a router that generates its checking code at compile time via delegates (because routes are known at CTFE).
There is around 18 routes currently being served of which only one (index) is being tested. It outputs basically just a HEAD request.

My idea was rather to optimize the existing runtime router by creating a decision tree from all the routes, so that the selection time goes down from O(n) to O(log(n)), which should be plenty sufficient if the server serves any kind of non-trivial content.

Strangely manually specifying a HEAD request with ab is slower by around 100ms.

For routes that doesn't exist, I'm getting around 5.2k requests/s. And with -i on ab around 4.2 which is quite a bit different. However my router handles 404's and so should not be causing an exception.

I think I'll replace that exception in the HTTP server by a manual call to the error page handler. Even if exceptions will be optimized (as they definitely need to), there is no need to go with an exception there, except for saving a tiny litte code duplication.

I rewrote my router specifically to test this out at. Before hand the most I could get was 1.3k. I don't know if its an accumulation of factors or simply regex harming.

When I get the time, I'll start another profiler run in VTune. After my last optimization run some months (a year?) ago, almost all of the time was taken by the kernel for I/O related things. Since nothing has changed since then regarding the router, logging or similar, my guess would be that something like a GC allocation or some performance hungry debug assertion has slipped back in somewhere and causes the slowdown*.

I'm a lot happier with this now than what I was a few days ago.

What I can say is that I've got up to 80k/s requests on my AMD Phenom-II quad-core on Linux (should be considarably slower than your i7). However it made a huge difference if the benchmark app (ab) was run on the same machine, or on a different one. The client and server processes can have surprising interactions when they run on the same machine and on the loopback device. There is also weighttp, which has a much lower overhead than ab. The only drawback is that it doesn't output as detailed statistics as ab.

* it should be said that I've used HTTPServerOption.distribute and -version=VibeManualMemoryManagement for the benchmarks.

What speaks against using -version=VibeManualMemoryManagement as a default anyway ? Is this whole option documented somewhere ?

It is not @safe so to speak. Especially the HTTP request handlers
would need to be changed from (HTTPServerRequest, HTTPServerResponse)
to (scope Scoped!HTTPServerRequest, scope Scoped!HTTPServerResponse)
(conceptually).

Re: My benchmarks

Permalink: HTTP NNTP

Etienne

Posted Tue, 11 Feb 2014 15:14:19 GMT in reply to Rikki Cattermole

On Tue, 11 Feb 2014 14:30:40 GMT, Rikki Cattermole wrote:

All in all, honestly as long as I'm above 2.3k I'm happy. Thats pretty much the fastest of any php framework I've found to date. If its possible to hit 10k+ I would be ecstatic.

I'm waiting for LDC or GDC to be more mature to get into benchmarking vibe.d more seriously. So far they've been 3x faster than DMD in other benchmarks. I have a nice little xeon e5-2620 with 32gb of ddr3 with fedora 20 x64 sitting next to me waiting for the opportunity :) I plan on seeing how many concurrent websockets it can handle, the fibers are nearly weightless which makes it the best use of vibe.d imo, I could probably have 5m concurrent websocket connections on there before needing a little more ram, but I can fit a lot of 16gb bars on this fella http://www.newegg.ca/Product/Product.aspx?Item=N82E16813182346

Re: My benchmarks

Permalink: HTTP NNTP

Sönke Ludwig

Posted Mon, 24 Feb 2014 16:18:14 +0100 in reply to Rikki Cattermole

Am 11.02.2014 15:30, schrieb Rikki Cattermole:

Would it be possible for you to make a tutorial on this? It might be possible to nail down what is different in our setups.

What I'm using to test locally is the bench-http-server example in the
vibe.d package, built using dub -b release.

Windows is yielding similar results for me with this approach (around
5.9 kreq/s on my Laptop). On Linux (Mint Linux, 64-bit, latest version),
using the bench-http-request example to perform the requests performs
quite poorly (around 3 kreq/s if I remember right), but using ab -n <br>100000 -c 100 -k http://127.0.0.1:8080/empty yields about 15 kreq/s.

I didn't test using a remote machine to generate the requests, but in
earlier tests with a powerful enough client machine and a GbE
connection, this was about four times faster than locally.

Unfortunately, my Linux tests were on a different machine than those
done earlier, so I can't say right now if the performance has improved
or declined since then. I'll look into why the bench-http-request
example is so slow on Linux, though.