Re: Vibe.d as high performance server&client

zhaopuming

Posted Wed, 23 Jul 2014 08:22:59 GMT in reply to Etienne Cimon

On Tue, 22 Jul 2014 17:45:48 GMT, Etienne Cimon wrote:

I know you asked Sönke but the question was too interesting and I felt tempted to throw in some of my perspective about it since I've had these same questions myself before.

On 2014-07-22 5:31 AM, zhaopuming wrote:

The problem is we might have 5 billion ~ 10 billion request per day. And we need a latency as short as possible.

That's really huge, I'm expecting something in that order as well. It's safer to make it possible to scale it onto multiple servers with load balancers. I would suggest you use hash-based distribution to multiple Redis servers using the Redis key hashes, though you should calculate how many of them you'll need at first to avoid moving keys in interation until you find a "load % sweet spot" (they're scaled by Hash Range).

Yes, we are using a Redis cluster with a similar approach.

Does vibe.d use a lot of classes instead of structs? does vibe.d allocate much?

The less you use the GC the less it could happen to collect BUT collection is still very quick - I've benchmarked it extensively and they're not even in the order of a hundred usecs for 1-2GB.

That's what I was worrying about. D's GC won't be able to handle that huge and quick garbage.

If you compile with VibeManualMemoryManagement, most allocations are scoped and on vibe.memory freelists which are stacked, it is very light and you would avoid the GC for that part.

This is what I'm looking for :-). WOW, so vibe.d already has got this memory switch. Definitely gonna look into that.

What tool do you use to find allocations on the heap?

But, you should use the GC for smaller data like string appenders because the free lists there are incremented by powers of 2 and the freelist in vibe.utils.memory are incremented by powers of 10 (which increases memory use).

Unless it is some application wide status data, all request related data are so frequent that I feel the need to make them scoped, or in a pool. If you have many many small string appenders, what would you do to free them in GC?

Packets won't produce allocations unless you want them to, they're put into a circular buffer for you to fetch with the InputStream.read(ubyte[]) base class interface of TCPConnection.

This is nice :-) I see that vibe.d provides easy API for parameters/forms retrieval, does it parse the parameters eagerly and thouroughly(I don't need most of the parameters send in, just gonna toss it to the upstream server)? Are the parsed data a struct or class?

The libevent engine handling connections is a VERY fast library (written in C) and I don't think it allocates.

I heard that there is libev which competes with libevent and claims to be more memory friendly and faster. And there is libuv which claims to be even better (libuv uses libev in Linux, but recenlty switched to their own implementation due to limitations in libev). What is your opinion on this?

Does the client support timer and wait/drop?

The timers are optimized carefully imo, they're in expiration-sorted arrays that use malloc and double-sized growth policy, this is the kind of place where compiler optimizations kick in. I haven't seen a connection timeout feature but I suppose it would be easy to implement with the connections started in runTask and closed by throwing in a timer

I was hoping for this kind of API:

client.send(message, timeout); // which get a connection from a pool and send the message, if timeout kills the connection and create a new one for the pool if neccesary.

how's vibe.d's JSON support in this scenario? would it hurdle the async mode because there are too much CPU work when doing serialization/deserialization?

There's some allocations in the JSON deserializer. The stack is used as much as possible though with recursion and the CPU usage is amazingly low because of compile-time traits which I think are 10-100x faster than run-time introspection in interpreted languages. You won't see any problems there and this is where you'll see D is most powerful.

Well this is interesting, gonna check it out. I'm wondering why vibe.d's JSON lib is not pushed to phobos. std.json gave me the impression that D is not good in handling JSON currently.

how do we utilize multicore in this setup? multiple processes?

The server automatically scales to all processors : many vibe.d worker tasks are started with libevent and listen on the same socket, which means your handlers will be run in every processor available. The D's druntime takes care of keeping the variables thread-local. So, multi-core optimizations are given for free. However, the bulk of the improvements will be in your usage of tasks, so don't be afraid to abuse starting new vibe.d tasks if you insist on never seeing any blocking, they're so much faster than threads with the same benefits.

So I only need one instance of vibe.d in a server and it will automatically consume all the computing power? That is cool!

how does vibe.d handle network related errors? (connection break, etc)

The task throws with the error number (usually this happens in a libevent callback), drops the connection and cleans up automatically. The other connections continue as if nothing happened. Every error possibility is covered up to your application's code, and you can log it or not depending on the log settings with setLogLevel.

Finally, comparing it to Java's bytecode, I've put my bet that machine code optimizations and compile-time optimizations have put D in a much better place than even C, so there's definitely a head start. The rest depends of course on the quality of your code and your ability to keep the data near the computation with the given tools - and D has those tools ;)

Thanks for giving me this detailed answer and vibe.d looks promising. I will take further investigation into vibe.d and hopefully it would come into our dev environment. :-)