Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Pages: 1 2

Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

zhaopuming

Posted Fri, 04 Jan 2013 13:20:45 GMT

Our use case senario is like this:

We have a server that receives browser requests.
For each browser request, we have to send them to mulitple remote servers (about 10~20 of them), by using HttpClient
When all of the remote servers sends a reply back, or a timeout occurs, we need to put together the replies we get so far, and do some computation on them.
a response is generated by the computation and send back to the browser.

In step 3, if the timeout occurs before all replies are sent back, those that are not back should be ignored, or better aborted.

The remote servers are fairly stable, so we'd like to make an HttpClient for each of them.

For each browser request, all HttpClients are used to send a client request.

But looking at httpClient sample code:

auto res = requestHttp("http://google.com", (req){/* could e.g. add headers here before sending*/});
logInfo("Response: %s", res.bodyReader.readAllUtf8());

I have no idea of how I could write the logic of 'Whoever comes first will be served and those later than the timeout will be aborted'.

Could you shed some light on that, Sönke?

Thanks :-)

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Fri, 04 Jan 2013 17:39:54 GMT in reply to zhaopuming

On Fri, 04 Jan 2013 13:20:45 GMT, zhaopuming wrote:

Our use case senario is like this:

We have a server that receives browser requests.

For each browser request, we have to send them to mulitple remote servers (about 10~20 of them), by using HttpClient

When all of the remote servers sends a reply back, or a timeout occurs, we need to put together the replies we get so far, and do some computation on them.

a response is generated by the computation and send back to the browser.

In step 3, if the timeout occurs before all replies are sent back, those that are not back should be ignored, or better aborted.

The remote servers are fairly stable, so we'd like to make an HttpClient for each of them.

For each browser request, all HttpClients are used to send a client request.

But looking at httpClient sample code:
auto res = requestHttp("http://google.com", (req){/* could e.g. add headers here before sending*/});
logInfo("Response: %s", res.bodyReader.readAllUtf8());
I have no idea of how I could write the logic of 'Whoever comes first will be served and those later than the timeout will be aborted'.

Could you shed some light on that, Sönke?

Thanks :-)

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

It's planned for later to have more facilities for controlling tasks (e.g. waiting for their end or terminating them) and also to have a generic broadcast class that could be made general enough to handle this situation. I think such a broadcast class is quite important because generally it shouldn't be necessary to work with such low-level details such as rawYield. But I cannot really say when I'm able to get that done...

Btw. using requestHttp() will keep a connection open to each server automatically using a ConnectionPool internally. So there is no need to explicitly store HttpClient instances.

string[] servers;

void handleRequest(HttpServerRequest req, HttpServerResponse res)
{
	// if the body is JSON and JSON parsing is enabled
	// in the HttpServerSettings, this will need to get
	// req.json.toString() instead.
	auto body = req.bodyReader.readAll();

	// run all broadcast requests as separate tasks
	string[string] replies;
	auto tm = setTimer(dur!"seconds"(10), null);
	auto sig = createSignal();
	foreach( srv; servers )
		runTask({
			auto cres = requestHttp("http://"~srv~"/url", (creq){
				creq.method = req.method;
				creq.bodyWriter.write(body);
			});
			replies[srv] = cres.bodyReader.readAllUtf8();

			// wake up the original fiber
			sig.emit();
		});

	// yield until either the timeout is reached or all replies are collected.
	// the timer and the signal will both cause rawYield() to continue.
	tm.acquire();
	while( tm.pending || replies.length < servers.length )
		rawYield();
	tm.release();
	
	// save the current replies (other requests might still come in later)
	auto saved_replies = replies;
	replies = null;

	// do something with saved_replies and respond to the original request...
}

Please bear with me, I haven't tested the code, so it may very well contain some mistakes.

Regards

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

zhaopuming

Posted Sat, 05 Jan 2013 04:00:41 GMT in reply to Sönke Ludwig

On Fri, 04 Jan 2013 17:39:54 GMT, Sönke Ludwig wrote:

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

Thanks :-) The timer and signal mechanism is exactly what we were expecting.

But there is still a question, we have to use keep-alive connection and even http pipelining for better performance,
but in this senario, those surviving replies after the timeout will affect later requests. because in a keep-alive and pipelined connection, a new request won't be sent until the last response is replied(otherwise the ordering of http req/resp is messed up),

So if we have a huge incoming requests that fill up all connections in the connection pool, and some of them is timed out,
then new requests would have to wait for those timed out replies to come back, which is unnecessary, and will waste their own precious time, because their timer is already running on! Is there a way to abort a httpclient request without closing the current connection and affecting the connection pool's ability to immediately handler new incoming requests?

One way would be to make the connection pool more elastic to handle this, it could keep a timer on each connection,
and when a connection is timed out, it won't receive new requests for the time being, and the request goes to a 'availale' connection.
If there is no 'available' connection, which means current connections can't handle the traffic, the pool could create new connections,
and or abort the oldest connections(because it is already timed out, so the current request on that connection is useless).

I know this seems a very strange requirement, but our first priority is throughput and quick response, all timed out requests
should be throw away.

It's planned for later to have more facilities for controlling tasks (e.g. waiting for their end or terminating them) and also to have a generic broadcast class that could be made general enough to handle this situation. I think such a broadcast class is quite important because generally it shouldn't be necessary to work with such low-level details such as rawYield. But I cannot really say when I'm able to get that done...

I don't quite understand emit() and rawYield() yet, I'll try the code later :-)

Btw. using requestHttp() will keep a connection open to each server automatically using a ConnectionPool internally. So there is no need to explicitly store HttpClient instances.

I have a question: each time requestHttp() is called for the same server, will it create a new ConnetionPool? or somehow it manages
to reuse a ConnectionPool the first time a requestHttp() is called for this server?

string[] servers;

void handleRequest(HttpServerRequest req, HttpServerResponse res)
{
	// if the body is JSON and JSON parsing is enabled
	// in the HttpServerSettings, this will need to get
	// req.json.toString() instead.
	auto body = req.bodyReader.readAll();

	// run all broadcast requests as separate tasks
	string[string] replies;
	auto tm = setTimer(dur!"seconds"(10), null);
	auto sig = createSignal();
	foreach( srv; servers )
		runTask({
			auto cres = requestHttp("http://"~srv~"/url", (creq){
				creq.method = req.method;
				creq.bodyWriter.write(body);
			});
			replies[srv] = cres.bodyReader.readAllUtf8();

			// wake up the original fiber
			sig.emit();
		});

	// yield until either the timeout is reached or all replies are collected.
	// the timer and the signal will both cause rawYield() to continue.
	tm.acquire();
	while( tm.pending || replies.length < servers.length )
		rawYield();
	tm.release();
	
	// save the current replies (other requests might still come in later)
	auto saved_replies = replies;
	replies = null;

	// do something with saved_replies and respond to the original request...
}

Please bear with me, I haven't tested the code, so it may very well contain some mistakes.

Regards

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

zhaopuming

Posted Sat, 05 Jan 2013 04:06:56 GMT in reply to Sönke Ludwig

On Fri, 04 Jan 2013 17:39:54 GMT, Sönke Ludwig wrote:

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

A little background, we are currently using vert.x for our project, but after first online traffic tests,
we found that their client code does not satisfy our requirements.

We are considering writing a new HttpClient.

I personally find that vibe.d is more elegant (and theoretically more efficient) than vert.x/netty, so I'd like
to evaluate more and hopefully switch to vibe.d when it becomes more stable and we get more understanding to it.

I'll have to persuade my colleagues for switching to D though, which is not an easy task (because we are in Java world).

Best Regards

Puming

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Sat, 05 Jan 2013 08:14:20 GMT in reply to zhaopuming

On Sat, 05 Jan 2013 04:00:41 GMT, zhaopuming wrote:

On Fri, 04 Jan 2013 17:39:54 GMT, Sönke Ludwig wrote:

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

Thanks :-) The timer and signal mechanism is exactly what we were expecting.

But there is still a question, we have to use keep-alive connection and even http pipelining for better performance,
but in this senario, those surviving replies after the timeout will affect later requests. because in a keep-alive and pipelined connection, a new request won't be sent until the last response is replied(otherwise the ordering of http req/resp is messed up),

So if we have a huge incoming requests that fill up all connections in the connection pool, and some of them is timed out,
then new requests would have to wait for those timed out replies to come back, which is unnecessary, and will waste their own precious time, because their timer is already running on! Is there a way to abort a httpclient request without closing the current connection and affecting the connection pool's ability to immediately handler new incoming requests?

One way would be to make the connection pool more elastic to handle this, it could keep a timer on each connection,
and when a connection is timed out, it won't receive new requests for the time being, and the request goes to a 'availale' connection.
If there is no 'available' connection, which means current connections can't handle the traffic, the pool could create new connections,
and or abort the oldest connections(because it is already timed out, so the current request on that connection is useless).

I know this seems a very strange requirement, but our first priority is throughput and quick response, all timed out requests
should be throw away.

The connection pool will automatically create another connection to the same server if there are no existing unused connections left. So without terminating the timed out requests they will stack up a number of connections, but not directly influence later requests. For termination, I'll first have to add some new function to provide access to the TcpConnection.close function in some way.

It's planned for later to have more facilities for controlling tasks (e.g. waiting for their end or terminating them) and also to have a generic broadcast class that could be made general enough to handle this situation. I think such a broadcast class is quite important because generally it shouldn't be necessary to work with such low-level details such as rawYield. But I cannot really say when I'm able to get that done...

I don't quite understand emit() and rawYield() yet, I'll try the code later :-)

Now that you say it, there is a bug in that code. The signal also needs to be acquired/released during the while loop, just as the timer.

But basically the Signal.emit() call will cause every fiber that has currently acquired the signal to continue execution if it is stuck in a rawYield() (which goes to Fiber.yield() so this is basically just a fiber based condition variable. The same goes for the timer when it fires.

Btw. using requestHttp() will keep a connection open to each server automatically using a ConnectionPool internally. So there is no need to explicitly store HttpClient instances.

I have a question: each time requestHttp() is called for the same server, will it create a new ConnetionPool? or somehow it manages
to reuse a ConnectionPool the first time a requestHttp() is called for this server?

There is basically one static connection pool per thread. Each time a request is made, the pool is first searched for an existing (keep-alive) connection to that server. If one is found, it will be used for the request. Otherwise a new connection is made and put into the pool after the request. So effectively there will always be n active connections for n concurrent requests.

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Sat, 05 Jan 2013 08:42:09 GMT in reply to zhaopuming

On Sat, 05 Jan 2013 04:06:56 GMT, zhaopuming wrote:

On Fri, 04 Jan 2013 17:39:54 GMT, Sönke Ludwig wrote:

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

A little background, we are currently using vert.x for our project, but after first online traffic tests,
we found that their client code does not satisfy our requirements.

We are considering writing a new HttpClient.

I personally find that vibe.d is more elegant (and theoretically more efficient) than vert.x/netty, so I'd like
to evaluate more and hopefully switch to vibe.d when it becomes more stable and we get more understanding to it.

I'll have to persuade my colleagues for switching to D though, which is not an easy task (because we are in Java world).

Best Regards

Puming

Sounds like, language wise, it should be such a relief for everyone to suddenly have all those possibilities in D :D (I never could get over the fact that there is no operator overloading in Java). But I know that a lot of convicing is necessary for most people to have them switch languages, especially since they may not even recognize the benefits as they have to learn a lot of new concepts first. So naturally most people will first concentrate on every drawback they can find to defend the current language/process/situation. But regarding D it's also still important to carefully evaluate the tooling/platform issues depending on the type of application (iOS/Android, Win64/Win8, GC etc.).

I definitely agree in terms of stability of vibe.d - altough in general it's a very stable experience now, there are at least two issues that only crop up in very high load scenarios that definitely have to be fixed before it can really be considered production quality (plus a thorough test and benchmark of the whole system has to be done at some point). Well, I'm going to use it for more serious things soon, but someone has to go first, and in case of an emergency I can commit a hot fix in a matter of minutes/hours, which will not necessarily be the case for someone else. Anyway, I try to gradually increase the risk as time goes by. The vibe.d site and everything around it ran pretty flawlessly for the last few months with increasing traffic, so I think it's ready to be taken to the next step now.

Regards,
Sönke

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Sat, 05 Jan 2013 09:01:26 GMT in reply to zhaopuming

A little background, we are currently using vert.x for our project, but after first online traffic tests,
we found that their client code does not satisfy our requirements.

Btw. is it the case that for client code, that vert.x drops back to threads+synchronous I/O?

I just briefly read a bit about it and it seems like it uses the asynchronous I/O only for handling requests and provides thread pools with traditional I/O for anything on top of that.

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

zhaopuming

Posted Sun, 06 Jan 2013 01:41:14 GMT in reply to Sönke Ludwig

On Sat, 05 Jan 2013 08:14:20 GMT, Sönke Ludwig wrote:

On Sat, 05 Jan 2013 04:00:41 GMT, zhaopuming wrote:

On Fri, 04 Jan 2013 17:39:54 GMT, Sönke Ludwig wrote:

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

Thanks :-) The timer and signal mechanism is exactly what we were expecting.

But there is still a question, we have to use keep-alive connection and even http pipelining for better performance,
but in this senario, those surviving replies after the timeout will affect later requests. because in a keep-alive and pipelined connection, a new request won't be sent until the last response is replied(otherwise the ordering of http req/resp is messed up),

So if we have a huge incoming requests that fill up all connections in the connection pool, and some of them is timed out,
then new requests would have to wait for those timed out replies to come back, which is unnecessary, and will waste their own precious time, because their timer is already running on! Is there a way to abort a httpclient request without closing the current connection and affecting the connection pool's ability to immediately handler new incoming requests?

One way would be to make the connection pool more elastic to handle this, it could keep a timer on each connection,
and when a connection is timed out, it won't receive new requests for the time being, and the request goes to a 'availale' connection.
If there is no 'available' connection, which means current connections can't handle the traffic, the pool could create new connections,
and or abort the oldest connections(because it is already timed out, so the current request on that connection is useless).

I know this seems a very strange requirement, but our first priority is throughput and quick response, all timed out requests
should be throw away.

The connection pool will automatically create another connection to the same server if there are no existing unused connections left. So without terminating the timed out requests they will stack up a number of connections, but not directly influence later requests. For termination, I'll first have to add some new function to provide access to the TcpConnection.close function in some way.

What do you mean by 'unused connections'? How do you support http pipelining?

In my understanding, and in vert.x, if using Http Pipelining, a connection will write out a request and immediately become available to receive new requests without waiting for the response. In this case, a new request will have a chance to be put into a connection that is currently 'timeout' and will have to wait till the earlier requests to be handled by remote server. So there should be a 'request sent, waiting reply' queue in each connection,
in our current server, this queue would pile up sometimes, perhaps due to network congestion. We'd like the connection pool would handle that situation. i.e. when one connection is holding many awaiting requests, it will not receive new requests.

It's planned for later to have more facilities for controlling tasks (e.g. waiting for their end or terminating them) and also to have a generic broadcast class that could be made general enough to handle this situation. I think such a broadcast class is quite important because generally it shouldn't be necessary to work with such low-level details such as rawYield. But I cannot really say when I'm able to get that done...

I don't quite understand emit() and rawYield() yet, I'll try the code later :-)

Now that you say it, there is a bug in that code. The signal also needs to be acquired/released during the while loop, just as the timer.

But basically the Signal.emit() call will cause every fiber that has currently acquired the signal to continue execution if it is stuck in a rawYield() (which goes to Fiber.yield() so this is basically just a fiber based condition variable. The same goes for the timer when it fires.

Btw. using requestHttp() will keep a connection open to each server automatically using a ConnectionPool internally. So there is no need to explicitly store HttpClient instances.

I have a question: each time requestHttp() is called for the same server, will it create a new ConnetionPool? or somehow it manages
to reuse a ConnectionPool the first time a requestHttp() is called for this server?

There is basically one static connection pool per thread. Each time a request is made, the pool is first searched for an existing (keep-alive) connection to that server. If one is found, it will be used for the request. Otherwise a new connection is made and put into the pool after the request. So effectively there will always be n active connections for n concurrent requests.

OK. Now i understand it :-) Thanks

Best Regards

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

zhaopuming

Posted Sun, 06 Jan 2013 03:43:43 GMT in reply to Sönke Ludwig

On Sat, 05 Jan 2013 08:42:09 GMT, Sönke Ludwig wrote:

On Sat, 05 Jan 2013 04:06:56 GMT, zhaopuming wrote:

On Fri, 04 Jan 2013 17:39:54 GMT, Sönke Ludwig wrote:

One way to achieve this would be to use a timer and and a signal. The requests are all run in parallel in different tasks/fibers using runTask and the original task will wait until either the timeout is reached or all requests have been answered. The only drawback is that all the late requests will continue to run after the timeout - it shouldn't have any serious impact on anything though, their connections will time out eventually or the request is finished a bit later.

A little background, we are currently using vert.x for our project, but after first online traffic tests,
we found that their client code does not satisfy our requirements.

We are considering writing a new HttpClient.

I personally find that vibe.d is more elegant (and theoretically more efficient) than vert.x/netty, so I'd like
to evaluate more and hopefully switch to vibe.d when it becomes more stable and we get more understanding to it.

I'll have to persuade my colleagues for switching to D though, which is not an easy task (because we are in Java world).

Best Regards

Puming

Sounds like, language wise, it should be such a relief for everyone to suddenly have all those possibilities in D :D (I never could get over the fact that there is no operator overloading in Java). But I know that a lot of convicing is necessary for most people to have them switch languages, especially since they may not even recognize the benefits as they have to learn a lot of new concepts first. So naturally most people will first concentrate on every drawback they can find to defend the current language/process/situation. But regarding D it's also still important to carefully evaluate the tooling/platform issues depending on the type of application (iOS/Android, Win64/Win8, GC etc.).

Yes, there are a lot of issues to be sorted out before I can actually sneak D and vibe.d into our company.

I'll have to learn much details about it in order to convince my colleagues, it'll take time.

Meanwhile I'll keep close look on your work and learn :-)

I definitely agree in terms of stability of vibe.d - altough in general it's a very stable experience now, there are at least two issues that only crop up in very high load scenarios that definitely have to be fixed before it can really be considered production quality (plus a thorough test and benchmark of the whole system has to be done at some point). Well, I'm going to use it for more serious things soon, but someone has to go first, and in case of an emergency I can commit a hot fix in a matter of minutes/hours, which will not necessarily be the case for someone else. Anyway, I try to gradually increase the risk as time goes by. The vibe.d site and everything around it ran pretty flawlessly for the last few months with increasing traffic, so I think it's ready to be taken to the next step now.

Keep up the good work :-) You've got a good start.

Regards,
Sönke

Re: Does HttpClients support the use case of broadcasting messages to multiple remote servers?

Permalink: HTTP NNTP

zhaopuming

Posted Sun, 06 Jan 2013 04:04:31 GMT in reply to Sönke Ludwig

On Sat, 05 Jan 2013 09:01:26 GMT, Sönke Ludwig wrote:

A little background, we are currently using vert.x for our project, but after first online traffic tests,
we found that their client code does not satisfy our requirements.

Btw. is it the case that for client code, that vert.x drops back to threads+synchronous I/O?

I just briefly read a bit about it and it seems like it uses the asynchronous I/O only for handling requests and provides thread pools with traditional I/O for anything on top of that.

The HttpClient in Vert.x is also Asynchronous I/O

Each HttpClient is associated with a connection pool, and each request will try to use an available connection, if no one is available, it will try to create a new connection, if the max pool size is reached, the request is stored in a waiter queue.

Each connection has a requests queue (as pipelining) to store requests that are already written, but no response is replied.
Once a request is written, the connection is considered available and will be accepting new requests.

When a response is send back, an asynchronized callback will be invoked, and the request at the top of the connection will be removed.

Pages: 1 2