RejectedSoftware Forums

Sign up

Vibe.d as high performance server&client

Hi Sönke:

I'd like to investigate the possibility of using vibe.d as a high performance server & client.

Our typical usage pattern is this:

  1. a GET request of about 200~500 Bytes coming in the server.
  2. some little parsing and redis lookup.
  3. prepare a JSON/protobuf request, size about 500 Bytes.
  4. send to multiple upstream servers.
  5. wait for their response, if one of them is too long, would drop the request (or connection).
  6. combine the results from upstream servers.
  7. respond to Browser.

The problem is we might have 5 billion ~ 10 billion request per day. And we need a latency as short as possible.

We are currently using Java, but the memory cost is HUGE for each request. So GC, especially considering the poor implementation of D's GC is a major block.

So my question is, does vibe.d's design support this kind of scenario?

  1. Does vibe.d use a lot of classes instead of structs? does vibe.d allocate much?
  2. Does the client support timer and wait/drop?
  3. how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?
  4. how do we utilize multicore in this setup? multiple processes?
  5. how does vibe.d handle network related errors? (connection break, etc)

As general, what would you suggest if you are to design this system in Vibe.d?

Re: Vibe.d as high performance server&client

I know you asked Sönke but the question was too interesting and I felt tempted to throw in some of my perspective about it since I've had these same questions myself before.

On 2014-07-22 5:31 AM, zhaopuming wrote:

The problem is we might have 5 billion ~ 10 billion request per day. And we need a latency as short as possible.

That's really huge, I'm expecting something in that order as well. It's safer to make it possible to scale it onto multiple servers with load balancers. I would suggest you use hash-based distribution to multiple Redis servers using the Redis key hashes, though you should calculate how many of them you'll need at first to avoid moving keys in interation until you find a "load % sweet spot" (they're scaled by Hash Range).

  1. Does vibe.d use a lot of classes instead of structs? does vibe.d allocate much?

The less you use the GC the less it could happen to collect BUT collection is still very quick - I've benchmarked it extensively and they're not even in the order of a hundred usecs for 1-2GB. If you compile with VibeManualMemoryManagement, most allocations are scoped and on vibe.memory freelists which are stacked, it is very light and you would avoid the GC for that part. But, you should use the GC for smaller data like string appenders because the free lists there are incremented by powers of 2 and the freelist in vibe.utils.memory are incremented by powers of 10 (which increases memory use). Packets won't produce allocations unless you want them to, they're put into a circular buffer for you to fetch with the InputStream.read(ubyte[]) base class interface of TCPConnection.

The libevent engine handling connections is a VERY fast library (written in C) and I don't think it allocates.

  1. Does the client support timer and wait/drop?

The timers are optimized carefully imo, they're in expiration-sorted arrays that use malloc and double-sized growth policy, this is the kind of place where compiler optimizations kick in. I haven't seen a connection timeout feature but I suppose it would be easy to implement with the connections started in runTask and closed by throwing in a timer

  1. how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?

There's some allocations in the JSON deserializer. The stack is used as much as possible though with recursion and the CPU usage is amazingly low because of compile-time traits which I think are 10-100x faster than run-time introspection in interpreted languages. You won't see any problems there and this is where you'll see D is most powerful.

  1. how do we utilize multicore in this setup? multiple processes?

The server automatically scales to all processors : many vibe.d worker tasks are started with libevent and listen on the same socket, which means your handlers will be run in every processor available. The D's druntime takes care of keeping the variables thread-local. So, multi-core optimizations are given for free. However, the bulk of the improvements will be in your usage of tasks, so don't be afraid to abuse starting new vibe.d tasks if you insist on never seeing any blocking, they're so much faster than threads with the same benefits.

  1. how does vibe.d handle network related errors? (connection break, etc)

The task throws with the error number (usually this happens in a libevent callback), drops the connection and cleans up automatically. The other connections continue as if nothing happened. Every error possibility is covered up to your application's code, and you can log it or not depending on the log settings with setLogLevel.

Finally, comparing it to Java's bytecode, I've put my bet that machine code optimizations and compile-time optimizations have put D in a much better place than even C, so there's definitely a head start. The rest depends of course on the quality of your code and your ability to keep the data near the computation with the given tools - and D has those tools ;)

Re: Vibe.d as high performance server&client

On Tue, 22 Jul 2014 17:45:48 GMT, Etienne Cimon wrote:

I know you asked Sönke but the question was too interesting and I felt tempted to throw in some of my perspective about it since I've had these same questions myself before.

On 2014-07-22 5:31 AM, zhaopuming wrote:

The problem is we might have 5 billion ~ 10 billion request per day. And we need a latency as short as possible.

That's really huge, I'm expecting something in that order as well. It's safer to make it possible to scale it onto multiple servers with load balancers. I would suggest you use hash-based distribution to multiple Redis servers using the Redis key hashes, though you should calculate how many of them you'll need at first to avoid moving keys in interation until you find a "load % sweet spot" (they're scaled by Hash Range).

Yes, we are using a Redis cluster with a similar approach.

  1. Does vibe.d use a lot of classes instead of structs? does vibe.d allocate much?

The less you use the GC the less it could happen to collect BUT collection is still very quick - I've benchmarked it extensively and they're not even in the order of a hundred usecs for 1-2GB.

That's what I was worrying about. D's GC won't be able to handle that huge and quick garbage.

If you compile with VibeManualMemoryManagement, most allocations are scoped and on vibe.memory freelists which are stacked, it is very light and you would avoid the GC for that part.

This is what I'm looking for :-). WOW, so vibe.d already has got this memory switch. Definitely gonna look into that.

What tool do you use to find allocations on the heap?

But, you should use the GC for smaller data like string appenders because the free lists there are incremented by powers of 2 and the freelist in vibe.utils.memory are incremented by powers of 10 (which increases memory use).

Unless it is some application wide status data, all request related data are so frequent that I feel the need to make them scoped, or in a pool. If you have many many small string appenders, what would you do to free them in GC?

Packets won't produce allocations unless you want them to, they're put into a circular buffer for you to fetch with the InputStream.read(ubyte[]) base class interface of TCPConnection.

This is nice :-) I see that vibe.d provides easy API for parameters/forms retrieval, does it parse the parameters eagerly and thouroughly(I don't need most of the parameters send in, just gonna toss it to the upstream server)? Are the parsed data a struct or class?

The libevent engine handling connections is a VERY fast library (written in C) and I don't think it allocates.

I heard that there is libev which competes with libevent and claims to be more memory friendly and faster. And there is libuv which claims to be even better (libuv uses libev in Linux, but recenlty switched to their own implementation due to limitations in libev). What is your opinion on this?

  1. Does the client support timer and wait/drop?

The timers are optimized carefully imo, they're in expiration-sorted arrays that use malloc and double-sized growth policy, this is the kind of place where compiler optimizations kick in. I haven't seen a connection timeout feature but I suppose it would be easy to implement with the connections started in runTask and closed by throwing in a timer

I was hoping for this kind of API:

client.send(message, timeout); // which get a connection from a pool and send the message, if timeout kills the connection and create a new one for the pool if neccesary.
  1. how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?

There's some allocations in the JSON deserializer. The stack is used as much as possible though with recursion and the CPU usage is amazingly low because of compile-time traits which I think are 10-100x faster than run-time introspection in interpreted languages. You won't see any problems there and this is where you'll see D is most powerful.

Well this is interesting, gonna check it out. I'm wondering why vibe.d's JSON lib is not pushed to phobos. std.json gave me the impression that D is not good in handling JSON currently.

  1. how do we utilize multicore in this setup? multiple processes?

The server automatically scales to all processors : many vibe.d worker tasks are started with libevent and listen on the same socket, which means your handlers will be run in every processor available. The D's druntime takes care of keeping the variables thread-local. So, multi-core optimizations are given for free. However, the bulk of the improvements will be in your usage of tasks, so don't be afraid to abuse starting new vibe.d tasks if you insist on never seeing any blocking, they're so much faster than threads with the same benefits.

So I only need one instance of vibe.d in a server and it will automatically consume all the computing power? That is cool!

  1. how does vibe.d handle network related errors? (connection break, etc)

The task throws with the error number (usually this happens in a libevent callback), drops the connection and cleans up automatically. The other connections continue as if nothing happened. Every error possibility is covered up to your application's code, and you can log it or not depending on the log settings with setLogLevel.

Finally, comparing it to Java's bytecode, I've put my bet that machine code optimizations and compile-time optimizations have put D in a much better place than even C, so there's definitely a head start. The rest depends of course on the quality of your code and your ability to keep the data near the computation with the given tools - and D has those tools ;)

Thanks for giving me this detailed answer and vibe.d looks promising. I will take further investigation into vibe.d and hopefully it would come into our dev environment. :-)

Re: Vibe.d as high performance server&client

Picking out a few of the questions:

On Wed, 23 Jul 2014 08:22:59 GMT, zhaopuming wrote:

On Tue, 22 Jul 2014 17:45:48 GMT, Etienne Cimon wrote:

If you compile with VibeManualMemoryManagement, most allocations are scoped and on vibe.memory freelists which are stacked, it is very light and you would avoid the GC for that part.

This is what I'm looking for :-). WOW, so vibe.d already has got this memory switch. Definitely gonna look into that.

What tool do you use to find allocations on the heap?

What I usually do is to either set a debugger breakpoint at __gc_malloc or similar, to make sure that some simple critical paths don't allocate at all (such as handling a simple HTTP request). But in cases where there may be some allocations, using a profiler to check if the GC actually takes a considerable amount of time is usually the best approach. Just a few GC allocations here and there are actually not that bad.

In the future, I'd also like to start using @nogc in as many parts of the library as possible to statically ensure that no allocations happen.

But, you should use the GC for smaller data like string appenders because the free lists there are incremented by powers of 2 and the freelist in vibe.utils.memory are incremented by powers of 10 (which increases memory use).

I think all free lists should be powers of two, where did you see powers of 10?

Unless it is some application wide status data, all request related data are so frequent that I feel the need to make them scoped, or in a pool. If you have many many small string appenders, what would you do to free them in GC?

One solution is to use AllocAppender in vibe.utils.array, which can be used with .reset(AppenderResetMode.freeData) to explicitly free the data when it isn't needed anymore.

Packets won't produce allocations unless you want them to, they're put into a circular buffer for you to fetch with the InputStream.read(ubyte[]) base class interface of TCPConnection.

This is nice :-) I see that vibe.d provides easy API for parameters/forms retrieval, does it parse the parameters eagerly and thouroughly(I don't need most of the parameters send in, just gonna toss it to the upstream server)? Are the parsed data a struct or class?

Parameters are currently parsed eagerly and into a special AA-like struct (vibe.utils.dictionarylist). This struct avoids allocations for up to 16 fields. However, query string and form body parsing can be completely avoided by removing the HTTPServerOption.parseFormBody and HTTPServerOption.parseQueryString flags. In that case, the query string could be passed verbatim to the client using the HTTPServerRequest.queryString field.

The libevent engine handling connections is a VERY fast library (written in C) and I don't think it allocates.

I heard that there is libev which competes with libevent and claims to be more memory friendly and faster. And there is libuv which claims to be even better (libuv uses libev in Linux, but recenlty switched to their own implementation due to limitations in libev). What is your opinion on this?

I had made some initial tests with libev in the early days of vibe.d, but the result for some reason was slower than libevent, so I've suspended that idea for the time being. The plan is to write a native D wrapper around the native OS facilities instead of using an additional C library in-between (like it is already done for the win32 driver). Etienne has already started some work in this direction in his fork.

  1. Does the client support timer and wait/drop?

The timers are optimized carefully imo, they're in expiration-sorted arrays that use malloc and double-sized growth policy, this is the kind of place where compiler optimizations kick in. I haven't seen a connection timeout feature but I suppose it would be easy to implement with the connections started in runTask and closed by throwing in a timer

There is one unfortunate effect for timers currently. When stopTimer() is called, the timeout isn't always removed from the heap, which can cause the heap to grow very large when timers are repeatedly set and stopped. Maybe the code needs in fact to be switched to a red-black tree, so that timeouts can also be efficiently removed.

I was hoping for this kind of API:

client.send(message, timeout); // which get a connection from a pool and send the message, if timeout kills the connection and create a new one for the pool if neccesary.

There is currently an open pull request by Etienne to add a HTTPClientSettings class to control the HTTP client behavior. An API that would fit well in there would be:

auto settings = new HTTPClientSettings;
settings.requestTimeout = timeout;

requestHTTP(..., settings);

This would then properly open and close connections in the pool as needed. This would also be quick to add.

  1. how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?

There's some allocations in the JSON deserializer. The stack is used as much as possible though with recursion and the CPU usage is amazingly low because of compile-time traits which I think are 10-100x faster than run-time introspection in interpreted languages. You won't see any problems there and this is where you'll see D is most powerful.

Well this is interesting, gonna check it out. I'm wondering why vibe.d's JSON lib is not pushed to phobos. std.json gave me the impression that D is not good in handling JSON currently.

In fact, Andrei already asked about that and now is working on a new std.json module. I've also mentioned to him that there are still some changes that I'd like to do. Most importantly, instead of separate Json and Bson structs, my idea is to define a generic tagged union type that allows operations on and conversion between different types in a generic way. This would for example enable converting between Json and Bson without having any dependencies between the two.

  1. how do we utilize multicore in this setup? multiple processes?

The server automatically scales to all processors : many vibe.d worker tasks are started with libevent and listen on the same socket, which means your handlers will be run in every processor available. The D's druntime takes care of keeping the variables thread-local. So, multi-core optimizations are given for free. However, the bulk of the improvements will be in your usage of tasks, so don't be afraid to abuse starting new vibe.d tasks if you insist on never seeing any blocking, they're so much faster than threads with the same benefits.

So I only need one instance of vibe.d in a server and it will automatically consume all the computing power? That is cool!

You also need to set the HTTPServerOption.distribute flag, but that's it. You currently have to be a bit careful, though, to not accidentially share unprotected data between threads:

class MyClass {
    private int m_someVar;

    this()
    {
        auto router = new URLRouter;
        router.get("/", &handler);
        auto settings = new HTTPServerSettings;
        settings.options |= HTTPServerOptions.distribute;
        listenHTTP(settings, router);
    }

    // this could be called from any of the worker threads
    void handler(HTTPServerRequest req, HTTPServerResponse res)
    {
        // OOPS: race condition without mutex/atomic op
        m_someVar++;
    }
}

Re: Vibe.d as high performance server&client

On Wed, 23 Jul 2014 09:46:09 GMT, Sönke Ludwig wrote:

Picking out a few of the questions:

On Wed, 23 Jul 2014 08:22:59 GMT, zhaopuming wrote:

On Tue, 22 Jul 2014 17:45:48 GMT, Etienne Cimon wrote:

If you compile with VibeManualMemoryManagement, most allocations are scoped and on vibe.memory freelists which are stacked, it is very light and you would avoid the GC for that part.

This is what I'm looking for :-). WOW, so vibe.d already has got this memory switch. Definitely gonna look into that.

What tool do you use to find allocations on the heap?

What I usually do is to either set a debugger breakpoint at __gc_malloc or similar, to make sure that some simple critical paths don't allocate at all (such as handling a simple HTTP request). But in cases where there may be some allocations, using a profiler to check if the GC actually takes a considerable amount of time is usually the best approach. Just a few GC allocations here and there are actually not that bad.

Thanks, I just want to ensure that there is no much GC allocation per request.

In the future, I'd also like to start using @nogc in as many parts of the library as possible to statically ensure that no allocations happen.

@nogc helps here :-)

But, you should use the GC for smaller data like string appenders because the free lists there are incremented by powers of 2 and the freelist in vibe.utils.memory are incremented by powers of 10 (which increases memory use).

I think all free lists should be powers of two, where did you see powers of 10?

Unless it is some application wide status data, all request related data are so frequent that I feel the need to make them scoped, or in a pool. If you have many many small string appenders, what would you do to free them in GC?

One solution is to use AllocAppender in vibe.utils.array, which can be used with .reset(AppenderResetMode.freeData) to explicitly free the data when it isn't needed anymore.

There are many useful tools in vibe.d, we just need more docs :-)

Packets won't produce allocations unless you want them to, they're put into a circular buffer for you to fetch with the InputStream.read(ubyte[]) base class interface of TCPConnection.

This is nice :-) I see that vibe.d provides easy API for parameters/forms retrieval, does it parse the parameters eagerly and thouroughly(I don't need most of the parameters send in, just gonna toss it to the upstream server)? Are the parsed data a struct or class?

Parameters are currently parsed eagerly and into a special AA-like struct (vibe.utils.dictionarylist). This struct avoids allocations for up to 16 fields. However, query string and form body parsing can be completely avoided by removing the HTTPServerOption.parseFormBody and HTTPServerOption.parseQueryString flags. In that case, the query string could be passed verbatim to the client using the HTTPServerRequest.queryString field.

Yes, that is exactly what I want.

The libevent engine handling connections is a VERY fast library (written in C) and I don't think it allocates.

I heard that there is libev which competes with libevent and claims to be more memory friendly and faster. And there is libuv which claims to be even better (libuv uses libev in Linux, but recenlty switched to their own implementation due to limitations in libev). What is your opinion on this?

I had made some initial tests with libev in the early days of vibe.d, but the result for some reason was slower than libevent, so I've suspended that idea for the time being. The plan is to write a native D wrapper around the native OS facilities instead of using an additional C library in-between (like it is already done for the win32 driver). Etienne has already started some work in this direction in his fork.

Great to hear :-) It would be interesting to see what the D lib would be compared to libevent/libev

  1. Does the client support timer and wait/drop?

The timers are optimized carefully imo, they're in expiration-sorted arrays that use malloc and double-sized growth policy, this is the kind of place where compiler optimizations kick in. I haven't seen a connection timeout feature but I suppose it would be easy to implement with the connections started in runTask and closed by throwing in a timer

There is one unfortunate effect for timers currently. When stopTimer() is called, the timeout isn't always removed from the heap, which can cause the heap to grow very large when timers are repeatedly set and stopped. Maybe the code needs in fact to be switched to a red-black tree, so that timeouts can also be efficiently removed.

I was hoping for this kind of API:

client.send(message, timeout); // which get a connection from a pool and send the message, if timeout kills the connection and create a new one for the pool if neccesary.

There is currently an open pull request by Etienne to add a HTTPClientSettings class to control the HTTP client behavior. An API that would fit well in there would be:

auto settings = new HTTPClientSettings;
settings.requestTimeout = timeout;

requestHTTP(..., settings);

This would then properly open and close connections in the pool as needed. This would also be quick to add.

I like that :-)

  1. how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?

There's some allocations in the JSON deserializer. The stack is used as much as possible though with recursion and the CPU usage is amazingly low because of compile-time traits which I think are 10-100x faster than run-time introspection in interpreted languages. You won't see any problems there and this is where you'll see D is most powerful.

Well this is interesting, gonna check it out. I'm wondering why vibe.d's JSON lib is not pushed to phobos. std.json gave me the impression that D is not good in handling JSON currently.

In fact, Andrei already asked about that and now is working on a new std.json module. I've also mentioned to him that there are still some changes that I'd like to do. Most importantly, instead of separate Json and Bson structs, my idea is to define a generic tagged union type that allows operations on and conversion between different types in a generic way. This would for example enable converting between Json and Bson without having any dependencies between the two.

Would you first split the JSON lib out to an independent project so that people could check it? The same applies for the Task/Actor part.

  1. how do we utilize multicore in this setup? multiple processes?

The server automatically scales to all processors : many vibe.d worker tasks are started with libevent and listen on the same socket, which means your handlers will be run in every processor available. The D's druntime takes care of keeping the variables thread-local. So, multi-core optimizations are given for free. However, the bulk of the improvements will be in your usage of tasks, so don't be afraid to abuse starting new vibe.d tasks if you insist on never seeing any blocking, they're so much faster than threads with the same benefits.

So I only need one instance of vibe.d in a server and it will automatically consume all the computing power? That is cool!

You also need to set the HTTPServerOption.distribute flag, but that's it. You currently have to be a bit careful, though, to not accidentially share unprotected data between threads:

class MyClass {
    private int m_someVar;

    this()
    {
        auto router = new URLRouter;
        router.get("/", &handler);
        auto settings = new HTTPServerSettings;
        settings.options |= HTTPServerOptions.distribute;
        listenHTTP(settings, router);
    }

    // this could be called from any of the worker threads
    void handler(HTTPServerRequest req, HTTPServerResponse res)
    {
        // OOPS: race condition without mutex/atomic op
        m_someVar++;
    }
}

This also needs documentation. Hopefully there would be a High performance HTTP server tuning in D tutorial.

Re: Vibe.d as high performance server&client

I think all free lists should be powers of two, where did you see powers of 10?

Sorry this is wrong, my memory failed me =)

The plan is to write a native D wrapper around the native OS facilities instead of using an additional C library in-between (like it is already done for the win32 driver). Etienne has already started some work in this direction in his fork.

I have a few things in the works right now for vibe.d

  • Native TCP/UDP Driver that interacts directly with the kernel, the back-end also allows async dns, file access, etc.
  • Native TLS library that uses Botan (C++ library) factory for encryption, but uses a native generic BER/DER serialization library and ports all certificate/x.509 objects to D structs and links statically if needed
  • ASN.1 compiler to D for building those serialize-able D structs mentioned above
  • Serialize-able session managers for Redis, filesystem, or a native cache engine
  • Native cache/db master-slave engine which works like Redis but serializes to the filesystem by appending and keeping tabs in file-based hashmaps (also, excellent for SSDs) and keeps pointers alive locally for speed with use of the GC - while masters are @nogc - with ACL & Websocket API for NAT traversal
  • A web API library collection based on the above native storage and a dependency injection library to fit library instances together on a per-domain basis, with Javascript MVCs as public/admin front-ends.

There's low-level but also high-level, with possibility to statically link everything. I'm hoping on developing more web API libraries after that, like an oauth3 server API or a calendar server API, depending on what project I'm working on at the time. The high-level potential of vibe.d is very promising as a platform for the long term.

Re: Vibe.d as high performance server&client

To minimize impact of GC on latency I recommend doing something similar to our approach at Sociomantic - have task-local buffers that get reused over and over again without ever being freed. Such buffer quickly grows to the size enough to handle typical runtime load with no allocations at all and the fact that you don't create new GC roots all the time greatly reduces scan time when it is necessary.

I may take some time to get use to writing code that way but results are good.

Re: Vibe.d as high performance server&client

On Thu, 24 Jul 2014 11:25:08 GMT, Dicebot wrote:

To minimize impact of GC on latency I recommend doing something similar to our approach at Sociomantic - have task-local buffers that get reused over and over again without ever being freed. Such buffer quickly grows to the size enough to handle typical runtime load with no allocations at all and the fact that you don't create new GC roots all the time greatly reduces scan time when it is necessary.

I may take some time to get use to writing code that way but results are good.

A fully reusable Task/Handler (with buffers) for each request is a great idea.

I was wondering when Sociomantic would opensource its infrastructure :-)

Do you consider using Vibe.d or do you continue using your own http server?

Re: Vibe.d as high performance server&client

On Thu, 24 Jul 2014 11:25:08 GMT, Dicebot wrote:

To minimize impact of GC on latency I recommend doing something similar to our approach at Sociomantic - have task-local buffers that get reused over and over again without ever being freed. Such buffer quickly grows to the size enough to handle typical runtime load with no allocations at all and the fact that you don't create new GC roots all the time greatly reduces scan time when it is necessary.

I may take some time to get use to writing code that way but results are good.

Actually I'm in a very similar bussiness to yours, I'm developing a SSP server.

Re: Vibe.d as high performance server&client

On Fri, 25 Jul 2014 03:10:11 GMT, zhaopuming wrote:

I was wondering when Sociomantic would opensource its infrastructure :-)

Not before we are done with D2 transition for sure =/

Do you consider using Vibe.d or do you continue using your own http server?

Own epoll-based stack, we are still bound to D1. Though I am trying to advertise vibe.d among frontenders as an option for new web services ;)

Actually I'm in a very similar bussiness to yours, I'm developing a SSP server

Hope it will work better than most SSP services I have seen so far :P