Re: Vibe.d as high performance server&client

Posted Wed, 23 Jul 2014 09:46:09 GMT in reply to zhaopuming

Picking out a few of the questions:

On Wed, 23 Jul 2014 08:22:59 GMT, zhaopuming wrote:

On Tue, 22 Jul 2014 17:45:48 GMT, Etienne Cimon wrote:

If you compile with VibeManualMemoryManagement, most allocations are scoped and on vibe.memory freelists which are stacked, it is very light and you would avoid the GC for that part.

This is what I'm looking for :-). WOW, so vibe.d already has got this memory switch. Definitely gonna look into that.

What tool do you use to find allocations on the heap?

What I usually do is to either set a debugger breakpoint at __gc_malloc or similar, to make sure that some simple critical paths don't allocate at all (such as handling a simple HTTP request). But in cases where there may be some allocations, using a profiler to check if the GC actually takes a considerable amount of time is usually the best approach. Just a few GC allocations here and there are actually not that bad.

In the future, I'd also like to start using @nogc in as many parts of the library as possible to statically ensure that no allocations happen.

But, you should use the GC for smaller data like string appenders because the free lists there are incremented by powers of 2 and the freelist in vibe.utils.memory are incremented by powers of 10 (which increases memory use).

I think all free lists should be powers of two, where did you see powers of 10?

Unless it is some application wide status data, all request related data are so frequent that I feel the need to make them scoped, or in a pool. If you have many many small string appenders, what would you do to free them in GC?

One solution is to use AllocAppender in vibe.utils.array, which can be used with .reset(AppenderResetMode.freeData) to explicitly free the data when it isn't needed anymore.

Packets won't produce allocations unless you want them to, they're put into a circular buffer for you to fetch with the InputStream.read(ubyte[]) base class interface of TCPConnection.

This is nice :-) I see that vibe.d provides easy API for parameters/forms retrieval, does it parse the parameters eagerly and thouroughly(I don't need most of the parameters send in, just gonna toss it to the upstream server)? Are the parsed data a struct or class?

Parameters are currently parsed eagerly and into a special AA-like struct (vibe.utils.dictionarylist). This struct avoids allocations for up to 16 fields. However, query string and form body parsing can be completely avoided by removing the HTTPServerOption.parseFormBody and HTTPServerOption.parseQueryString flags. In that case, the query string could be passed verbatim to the client using the HTTPServerRequest.queryString field.

The libevent engine handling connections is a VERY fast library (written in C) and I don't think it allocates.

I heard that there is libev which competes with libevent and claims to be more memory friendly and faster. And there is libuv which claims to be even better (libuv uses libev in Linux, but recenlty switched to their own implementation due to limitations in libev). What is your opinion on this?

I had made some initial tests with libev in the early days of vibe.d, but the result for some reason was slower than libevent, so I've suspended that idea for the time being. The plan is to write a native D wrapper around the native OS facilities instead of using an additional C library in-between (like it is already done for the win32 driver). Etienne has already started some work in this direction in his fork.

Does the client support timer and wait/drop?

The timers are optimized carefully imo, they're in expiration-sorted arrays that use malloc and double-sized growth policy, this is the kind of place where compiler optimizations kick in. I haven't seen a connection timeout feature but I suppose it would be easy to implement with the connections started in runTask and closed by throwing in a timer

There is one unfortunate effect for timers currently. When stopTimer() is called, the timeout isn't always removed from the heap, which can cause the heap to grow very large when timers are repeatedly set and stopped. Maybe the code needs in fact to be switched to a red-black tree, so that timeouts can also be efficiently removed.

I was hoping for this kind of API:

client.send(message, timeout); // which get a connection from a pool and send the message, if timeout kills the connection and create a new one for the pool if neccesary.

There is currently an open pull request by Etienne to add a HTTPClientSettings class to control the HTTP client behavior. An API that would fit well in there would be:

auto settings = new HTTPClientSettings;
settings.requestTimeout = timeout;

requestHTTP(..., settings);

This would then properly open and close connections in the pool as needed. This would also be quick to add.

how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?

There's some allocations in the JSON deserializer. The stack is used as much as possible though with recursion and the CPU usage is amazingly low because of compile-time traits which I think are 10-100x faster than run-time introspection in interpreted languages. You won't see any problems there and this is where you'll see D is most powerful.

Well this is interesting, gonna check it out. I'm wondering why vibe.d's JSON lib is not pushed to phobos. std.json gave me the impression that D is not good in handling JSON currently.

In fact, Andrei already asked about that and now is working on a new std.json module. I've also mentioned to him that there are still some changes that I'd like to do. Most importantly, instead of separate Json and Bson structs, my idea is to define a generic tagged union type that allows operations on and conversion between different types in a generic way. This would for example enable converting between Json and Bson without having any dependencies between the two.

how do we utilize multicore in this setup? multiple processes?

The server automatically scales to all processors : many vibe.d worker tasks are started with libevent and listen on the same socket, which means your handlers will be run in every processor available. The D's druntime takes care of keeping the variables thread-local. So, multi-core optimizations are given for free. However, the bulk of the improvements will be in your usage of tasks, so don't be afraid to abuse starting new vibe.d tasks if you insist on never seeing any blocking, they're so much faster than threads with the same benefits.

So I only need one instance of vibe.d in a server and it will automatically consume all the computing power? That is cool!

You also need to set the HTTPServerOption.distribute flag, but that's it. You currently have to be a bit careful, though, to not accidentially share unprotected data between threads:

class MyClass {
    private int m_someVar;

    this()
    {
        auto router = new URLRouter;
        router.get("/", &handler);
        auto settings = new HTTPServerSettings;
        settings.options |= HTTPServerOptions.distribute;
        listenHTTP(settings, router);
    }

    // this could be called from any of the worker threads
    void handler(HTTPServerRequest req, HTTPServerResponse res)
    {
        // OOPS: race condition without mutex/atomic op
        m_someVar++;
    }
}