Are the Client requests components of the framework fitted to write a web crawler/scraper?
Am 28.03.2015 um 16:59 schrieb RASOLOFONIAINA Menjanahary Razafindranto:
Are the Client requests components of the framework fitted to write a web crawler/scraper?
You could certainly do that. vibe.http.client.requestHTTP
uses a
connection pool internally, so that existing server (keep-alive)
connections will be reused, so this kind of access pattern should be
efficient. Running multiple parallel requests using runTask
is also
not a problem. The only thing you'd potentially have to write on your
own is a throttling logic to avoid using up too many server resources.
On Sat, 28 Mar 2015 19:05:32 +0100, Sönke Ludwig wrote:
Am 28.03.2015 um 16:59 schrieb RASOLOFONIAINA Menjanahary Razafindranto:
Are the Client requests components of the framework fitted to write a web crawler/scraper?
You could certainly do that.
vibe.http.client.requestHTTP
uses a
connection pool internally, so that existing server (keep-alive)
connections will be reused, so this kind of access pattern should be
efficient. Running multiple parallel requests usingrunTask
is also
not a problem. The only thing you'd potentially have to write on your
own is a throttling logic to avoid using up too many server resources.
Waow, That's promising!
I have delved into the Module documentation.
Vibe.d doesn't provide any resources for DOM manipulation and Node selection, do you have some suggestion please?
On Sun, 05 Apr 2015 09:21:05 GMT, RASOLOFONIAINA Menjanahary Razafindranto wrote:
On Sat, 28 Mar 2015 19:05:32 +0100, Sönke Ludwig wrote:
Am 28.03.2015 um 16:59 schrieb RASOLOFONIAINA Menjanahary Razafindranto:
Are the Client requests components of the framework fitted to write a web crawler/scraper?
You could certainly do that.
vibe.http.client.requestHTTP
uses a
connection pool internally, so that existing server (keep-alive)
connections will be reused, so this kind of access pattern should be
efficient. Running multiple parallel requests usingrunTask
is also
not a problem. The only thing you'd potentially have to write on your
own is a throttling logic to avoid using up too many server resources.Waow, That's promising!
I have delved into the Module documentation.
Vibe.d doesn't provide any resources for DOM manipulation and Node selection, do you have some suggestion please?
The things I know of are a DOM module in the arsd collection and there is the recently announced htmld package.
On Sat, 28 Mar 2015 19:05:32 +0100, Sönke Ludwig wrote:
Am 28.03.2015 um 16:59 schrieb RASOLOFONIAINA Menjanahary Razafindranto:
Are the Client requests components of the framework fitted to write a web crawler/scraper?
You could certainly do that.
vibe.http.client.requestHTTP
uses a
connection pool internally, so that existing server (keep-alive)
connections will be reused, so this kind of access pattern should be
efficient. Running multiple parallel requests usingrunTask
is also
not a problem. The only thing you'd potentially have to write on your
own is a throttling logic to avoid using up too many server resources.
i did this a few weeks ago to an extend where i brought down the api server, so its quite fast :)
roughly i used this pattern: https://gist.github.com/yannick/98c94cb6530d8aabd420
there might be a better approach though.
all in all a smooth ride with two main problems as i remember:
i had to use requestHTTP with delegates as under high concurrency i got weird errors
from bodyReader when using the HTTPClientResponse (probably too late ).
a memory leak (which in the end i did not fix) .
On Mon, 13 Apr 2015 20:35:17 GMT, yawniek wrote:
On Sat, 28 Mar 2015 19:05:32 +0100, Sönke Ludwig wrote:
Am 28.03.2015 um 16:59 schrieb RASOLOFONIAINA Menjanahary Razafindranto:
Are the Client requests components of the framework fitted to write a web crawler/scraper?
You could certainly do that.
vibe.http.client.requestHTTP
uses a
connection pool internally, so that existing server (keep-alive)
connections will be reused, so this kind of access pattern should be
efficient. Running multiple parallel requests usingrunTask
is also
not a problem. The only thing you'd potentially have to write on your
own is a throttling logic to avoid using up too many server resources.i did this a few weeks ago to an extend where i brought down the api server, so its quite fast :)
roughly i used this pattern: https://gist.github.com/yannick/98c94cb6530d8aabd420
there might be a better approach though.all in all a smooth ride with two main problems as i remember:
i had to use requestHTTP with delegates as under high concurrency i got weird errors
from bodyReader when using the HTTPClientResponse (probably too late ).a memory leak (which in the end i did not fix) .
I appreciate your GIST. Thanks for the input.