HTTP request question for an arango driver

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

HTTP request question for an arango driver

YannLesage
Hello,


I write an driver for Arangodb . So like it's indicated in Arango
documentation, I use the HTTP API.  The repo is
https://github.com/Valtena/Pharango


Now, the problem : Arango using Znclient make around 1 000 requests/second.


And the question : Are there any recommended pratice to have the better
performance with ZNClient or a better way to perform lot of HTTP requests ?


Thanks for your attention,

Yann Lesage



Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

NorbertHartl
Hi,

> Am 08.12.2017 um 15:32 schrieb Yann Lesage <[hidden email]>:
>
> Hello,
>
>
> I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango
>
that is really wonderful. I‘m planning to do that for a long time but didn‘t find the time yet. Maybe I can find some time to help a little if you are interested.
> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>
>
> And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?
>
I think the biggest benefit comes from caching the ZnClient. It is designed to be used like that. Next could be using keep-alive in order to reduce the number of connections to be made. Finalky I could imagine that pooling the most heavy objects in zinc can remove stress from the garbage collector.

Norbert
>
> Thanks for your attention,
>
> Yann Lesage

Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

Stephane Ducasse-3
In reply to this post by YannLesage
Yann
I do not think that connecting a database via a HTTP client can be as
fast as with a FFI or other means.
This is mainly why database like Gemstone are superior for accessing a
lot of data.
Stef

On Fri, Dec 8, 2017 at 4:32 PM, Yann Lesage <[hidden email]> wrote:

> Hello,
>
>
> I write an driver for Arangodb . So like it's indicated in Arango
> documentation, I use the HTTP API.  The repo is
> https://github.com/Valtena/Pharango
>
>
> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>
>
> And the question : Are there any recommended pratice to have the better
> performance with ZNClient or a better way to perform lot of HTTP requests ?
>
>
> Thanks for your attention,
>
> Yann Lesage
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

YannLesage
Le 08/12/2017 à 17:13, Norbert Hartl a écrit :

> Hi,
>
>> Am 08.12.2017 um 15:32 schrieb Yann Lesage <[hidden email]>:
>>
>> Hello,
>>
>>
>> I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango
>>
> that is really wonderful. I‘m planning to do that for a long time but didn‘t find the time yet. Maybe I can find some time to help a little if you are interested.
I have see that on the Pharo Discord. I don't had finish a first
version, write the road map and contributor.md, so it maybe a little
early for that. But I not against a code review in futur or another help.

>> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>>
>>
>> And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?
>>
> I think the biggest benefit comes from caching the ZnClient. It is designed to be used like that. Next could be using keep-alive in order to reduce the number of connections to be made. Finalky I could imagine that pooling the most heavy objects in zinc can remove stress from the garbage collector.
I am agree with this. I cache the ZnClient, without that, I don't have
more than 400 or 500 request/s. And the keep-alive is the default
behavior in Arango et ZnClient.

> Norbert
>> Thanks for your attention,
>>
>> Yann Lesage

Le 08/12/2017 à 22:14, Stephane Ducasse a écrit :
> Yann
> I do not think that connecting a database via a HTTP client can be as
> fast as with a FFI or other means.
> This is mainly why database like Gemstone are superior for accessing a
> lot of data.
> Stef

I know that, but I would try to have the better performances. After,
user can group requests to limit the cost of http protocol. So, it's not
a fatality. I think try to use the arango shell in futur. In first, I
will finish the driver with HTTP client because is more easier to write
it and more flexible in using with distant server.

Yann Lesage

Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

Stephane Ducasse-3
Super Yann! Keep pushing.
Let us know. This is important to announce like that other people can
join forces.


On Fri, Dec 8, 2017 at 11:52 PM, Yann Lesage <[hidden email]> wrote:

> Le 08/12/2017 à 17:13, Norbert Hartl a écrit :
>
>> Hi,
>>
>>> Am 08.12.2017 um 15:32 schrieb Yann Lesage <[hidden email]>:
>>>
>>> Hello,
>>>
>>>
>>> I write an driver for Arangodb . So like it's indicated in Arango
>>> documentation, I use the HTTP API.  The repo is
>>> https://github.com/Valtena/Pharango
>>>
>> that is really wonderful. I‘m planning to do that for a long time but
>> didn‘t find the time yet. Maybe I can find some time to help a little if you
>> are interested.
>
> I have see that on the Pharo Discord. I don't had finish a first version,
> write the road map and contributor.md, so it maybe a little early for that.
> But I not against a code review in futur or another help.
>
>>> Now, the problem : Arango using Znclient make around 1 000
>>> requests/second.
>>>
>>>
>>> And the question : Are there any recommended pratice to have the better
>>> performance with ZNClient or a better way to perform lot of HTTP requests ?
>>>
>> I think the biggest benefit comes from caching the ZnClient. It is
>> designed to be used like that. Next could be using keep-alive in order to
>> reduce the number of connections to be made. Finalky I could imagine that
>> pooling the most heavy objects in zinc can remove stress from the garbage
>> collector.
>
> I am agree with this. I cache the ZnClient, without that, I don't have more
> than 400 or 500 request/s. And the keep-alive is the default behavior in
> Arango et ZnClient.
>
>> Norbert
>>>
>>> Thanks for your attention,
>>>
>>> Yann Lesage
>
>
> Le 08/12/2017 à 22:14, Stephane Ducasse a écrit :
>>
>> Yann
>> I do not think that connecting a database via a HTTP client can be as
>> fast as with a FFI or other means.
>> This is mainly why database like Gemstone are superior for accessing a
>> lot of data.
>> Stef
>
>
> I know that, but I would try to have the better performances. After, user
> can group requests to limit the cost of http protocol. So, it's not a
> fatality. I think try to use the arango shell in futur. In first, I will
> finish the driver with HTTP client because is more easier to write it and
> more flexible in using with distant server.
>
> Yann Lesage
>

Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

NorbertHartl
In reply to this post by Stephane Ducasse-3


> Am 08.12.2017 um 21:14 schrieb Stephane Ducasse <[hidden email]>:
>
> Yann
> I do not think that connecting a database via a HTTP client can be as
> fast as with a FFI or other means.
> This is mainly why database like Gemstone are superior for accessing a
> lot of data.
> Stef
>
Do you have any numbers to back this? We are not talking about a local database. The usual deployment case is that the database is remote and clustered so you need to connect one or more hosts at the same time. Via FFI you would trigger native code that does network calls to the database. The main network handling part is in the SocketPlugin. So I‘m not sure how much faster it will be to use FFI. IIRC using FFI is still a stop-the-world action, right? This has some drawbacks, too. In order to get the maximum out of it you would need to use callbacks (do they work with uffi?) and write a lot stuff in C. And debugging is hell.
I think that one of the things that make gemstone fast is that the in-memory format is the same as the disk-format so no need to serialize/materialize. That is one of the high costs in using a database. And this you would still do in pharo, right? So I would be really interested in some numbers of a use case like this.

Norbert

>> On Fri, Dec 8, 2017 at 4:32 PM, Yann Lesage <[hidden email]> wrote:
>> Hello,
>>
>>
>> I write an driver for Arangodb . So like it's indicated in Arango
>> documentation, I use the HTTP API.  The repo is
>> https://github.com/Valtena/Pharango
>>
>>
>> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>>
>>
>> And the question : Are there any recommended pratice to have the better
>> performance with ZNClient or a better way to perform lot of HTTP requests ?
>>
>>
>> Thanks for your attention,
>>
>> Yann Lesage
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

Sven Van Caekenberghe-2
In reply to this post by YannLesage
Hi Yann,

Zinc HTTP Components can do 1000s of requests per second, to localhost (so excluding a real network) and using a single ZnClient instance with a reused connection (HTTP/1.1's default). Of course, data size is also a factor, I am talking about small requests/responses.

I browsed your code a bit on GitHub. You do reuse an instance, so that is good. But I think you are using HTTPS (TLS), which is a real slowdown (encryption is native, but costs real resources). Also, your data payload is using JSON which also adds a cost (parsing, generating).

So what you measured sounds about right. You might be able to optimise a bit, but that won't give you a factor 10 improvement, IMHO. The trick is usually to make as few requests as possible.

Sven

> On 8 Dec 2017, at 16:32, Yann Lesage <[hidden email]> wrote:
>
> Hello,
>
>
> I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango
>
>
> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>
>
> And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?
>
>
> Thanks for your attention,
>
> Yann Lesage
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

YannLesage


Le 09/12/2017 à 15:52, Sven Van Caekenberghe a écrit :
> Hi Yann,
>
> Zinc HTTP Components can do 1000s of requests per second, to localhost (so excluding a real network) and using a single ZnClient instance with a reused connection (HTTP/1.1's default). Of course, data size is also a factor, I am talking about small requests/responses.
>
> I browsed your code a bit on GitHub. You do reuse an instance, so that is good. But I think you are using HTTPS (TLS), which is a real slowdown (encryption is native, but costs real resources). Also, your data payload is using JSON which also adds a cost (parsing, generating).
Thanks for this review.
>
> So what you measured sounds about right. You might be able to optimise a bit, but that won't give you a factor 10 improvement, IMHO. The trick is usually to make as few requests as possible.
A factor of 10 maximum ? Ok, this indication help me.

For the trick, I know it. But it's must be performed by user no ?

> Sven
>
>> On 8 Dec 2017, at 16:32, Yann Lesage <[hidden email]> wrote:
>>
>> Hello,
>>
>>
>> I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango
>>
>>
>> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>>
>>
>> And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?
>>
>>
>> Thanks for your attention,
>>
>> Yann Lesage
>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

Sven Van Caekenberghe-2


> On 9 Dec 2017, at 18:34, Yann Lesage <[hidden email]> wrote:
>
>
>
> Le 09/12/2017 à 15:52, Sven Van Caekenberghe a écrit :
>> Hi Yann,
>>
>> Zinc HTTP Components can do 1000s of requests per second, to localhost (so excluding a real network) and using a single ZnClient instance with a reused connection (HTTP/1.1's default). Of course, data size is also a factor, I am talking about small requests/responses.
>>
>> I browsed your code a bit on GitHub. You do reuse an instance, so that is good. But I think you are using HTTPS (TLS), which is a real slowdown (encryption is native, but costs real resources). Also, your data payload is using JSON which also adds a cost (parsing, generating).
> Thanks for this review.
>>
>> So what you measured sounds about right. You might be able to optimise a bit, but that won't give you a factor 10 improvement, IMHO. The trick is usually to make as few requests as possible.
> A factor of 10 maximum ? Ok, this indication help me.

If you cache and reuse a single ZnClient instance, that is already good. Maybe try once without SSL to see how much difference that makes. You could also try to disable logging. As in

ZnClient new in: [ :client |
  [ client get: '<a href="http://localhost:8080'">http://localhost:8080' ] benchFor: 5 seconds ].

vs.

ZnClient new in: [ :client |
  client loggingOff.
  [ client get: '<a href="http://localhost:8080'">http://localhost:8080' ] benchFor: 5 seconds ].

You can also try to run in a Time profiler, but the server time is hard to abstract away from.

> For the trick, I know it. But it's must be performed by user no ?

Yes, the more you do in a single netwerk round trip, the better. Aggregation is good.

>> Sven
>>
>>> On 8 Dec 2017, at 16:32, Yann Lesage <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>>
>>> I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango
>>>
>>>
>>> Now, the problem : Arango using Znclient make around 1 000 requests/second.
>>>
>>>
>>> And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?
>>>
>>>
>>> Thanks for your attention,
>>>
>>> Yann Lesage
>>>
>>>
>>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

YannLesage



Le 10/12/2017 à 19:18, Sven Van Caekenberghe a écrit :

On 9 Dec 2017, at 18:34, Yann Lesage [hidden email] wrote:



Le 09/12/2017 à 15:52, Sven Van Caekenberghe a écrit :
Hi Yann,

Zinc HTTP Components can do 1000s of requests per second, to localhost (so excluding a real network) and using a single ZnClient instance with a reused connection (HTTP/1.1's default). Of course, data size is also a factor, I am talking about small requests/responses.

I browsed your code a bit on GitHub. You do reuse an instance, so that is good. But I think you are using HTTPS (TLS), which is a real slowdown (encryption is native, but costs real resources). Also, your data payload is using JSON which also adds a cost (parsing, generating).
Thanks for this review.
So what you measured sounds about right. You might be able to optimise a bit, but that won't give you a factor 10 improvement, IMHO. The trick is usually to make as few requests as possible.
A factor of 10 maximum ? Ok, this indication help me.
If you cache and reuse a single ZnClient instance, that is already good. Maybe try once without SSL to see how much difference that makes. You could also try to disable logging. As in

ZnClient new in: [ :client |
  [ client get: 'http://localhost:8080' ] benchFor: 5 seconds ].

vs.

ZnClient new in: [ :client |
  client loggingOff.
  [ client get: 'http://localhost:8080' ] benchFor: 5 seconds ].

You can also try to run in a Time profiler, but the server time is hard to abstract away from.
Use SSL or not is set by ZNCLient in fonction of url (http or https), no ? If there an option to set completely off SSL, I don't think is a good idea to use it. SSL is a real improve in security when we connect to a distant server.

For the logs, I did not see that they were active by default.
LoggingOff improve the perf thanks.


I have already look with a TimeProfiler, and yes, it's difficult to extract an information.


For the trick, I know it. But it's must be performed by user no ?
Yes, the more you do in a single netwerk round trip, the better. Aggregation is good. 

Aggregation is possible for user with AQL(Arango Query Langage).


      
Sven

On 8 Dec 2017, at 16:32, Yann Lesage [hidden email] wrote:

Hello,


I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango


Now, the problem : Arango using Znclient make around 1 000 requests/second.


And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?


Thanks for your attention,

Yann Lesage




        



Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

Stephan Eggermont-3
Op 10-12-2017 om 20:35 schreef Yann Lesage:
> ? If there an option to set completely off SSL, I don't think is a good
> idea to use it. SSL is a real improve in security when we connect to a
> distant server.

Depending on what you need to secure against, using a single or a few
secure shared tunnel(s) might be good enough

Stephan


Reply | Threaded
Open this post in threaded view
|

Re: HTTP request question for an arango driver

NorbertHartl
In reply to this post by YannLesage


Am 10.12.2017 um 20:35 schrieb Yann Lesage <[hidden email]>:



Le 10/12/2017 à 19:18, Sven Van Caekenberghe a écrit :

      
On 9 Dec 2017, at 18:34, Yann Lesage [hidden email] wrote:



Le 09/12/2017 à 15:52, Sven Van Caekenberghe a écrit :
Hi Yann,

Zinc HTTP Components can do 1000s of requests per second, to localhost (so excluding a real network) and using a single ZnClient instance with a reused connection (HTTP/1.1's default). Of course, data size is also a factor, I am talking about small requests/responses.

I browsed your code a bit on GitHub. You do reuse an instance, so that is good. But I think you are using HTTPS (TLS), which is a real slowdown (encryption is native, but costs real resources). Also, your data payload is using JSON which also adds a cost (parsing, generating).
Thanks for this review.
So what you measured sounds about right. You might be able to optimise a bit, but that won't give you a factor 10 improvement, IMHO. The trick is usually to make as few requests as possible.
A factor of 10 maximum ? Ok, this indication help me.
If you cache and reuse a single ZnClient instance, that is already good. Maybe try once without SSL to see how much difference that makes. You could also try to disable logging. As in

ZnClient new in: [ :client |
  [ client get: 'http://localhost:8080' ] benchFor: 5 seconds ].

vs.

ZnClient new in: [ :client |
  client loggingOff.
  [ client get: 'http://localhost:8080' ] benchFor: 5 seconds ].

You can also try to run in a Time profiler, but the server time is hard to abstract away from.
Use SSL or not is set by ZNCLient in fonction of url (http or https), no ? If there an option to set completely off SSL, I don't think is a good idea to use it. SSL is a real improve in security when we connect to a distant server.

It depends on what need. If you try to get maximum performance you should try without SSL to see the difference and tell us ;)
Be able to use SSL enables easy setups of multiple hosts with a standalone application. If you need maximum performance the apllication setup might be different. The OS can add encryption transparently and much faster. 
What we use is several machines with a second network card and a private switch where they are attached to. So no need to add extra security

For the logs, I did not see that they were active by default. LoggingOff improve the perf thanks.

I have already look with a TimeProfiler, and yes, it's difficult to extract an information.


      
For the trick, I know it. But it's must be performed by user no ?
Yes, the more you do in a single netwerk round trip, the better. Aggregation is good. 

Aggregation is possible for user with AQL(Arango Query Langage).


      
Sven

On 8 Dec 2017, at 16:32, Yann Lesage [hidden email] wrote:

Hello,


I write an driver for Arangodb . So like it's indicated in Arango documentation, I use the HTTP API.  The repo is https://github.com/Valtena/Pharango


Now, the problem : Arango using Znclient make around 1 000 requests/second.


And the question : Are there any recommended pratice to have the better performance with ZNClient or a better way to perform lot of HTTP requests ?


Thanks for your attention,

Yann Lesage