So I broke 8k req/s today

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

So I broke 8k req/s today

Philippe Marschall-2-3
Hi

I heighten everybody's mood I'll post some positive news.

After some optimizations in both Seaside and AJP I managed to break 8000
requests / sec with a single Pharo 1.3 image. Thanks to SystemProfiler I
knew where to look.

This is with a single request handler that just returns a two byte
response. It doesn't involve any rendering, sessions, continuations or
whatsoever but it kicks on the full Seaside request handling machinery
with a request context and everything.

I'm using WASmallRequestHandler from the Seaside-Benchmark package.

WASmallRequestHandler >> #handleFiltered: aRequestContext
        aRequestContext respond: [ :response |
                response
                        binary;
                        contentType: WAMimeType textHtml;
                        nextPutAll: 'OK' asByteArray ]

Apache 2.2.21 mpm_worker mod_proxy_ajp
CPU Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
SmalltalkImage current vmVersion 'Croquet Closure Cog VM [CoInterpreter
VMMaker.oscog-eem.138]'

Attached you'll find the output of ApacheBench.

Cheers
Philippe

8k.txt (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Sven Van Caekenberghe
Philippe,

On 25 Feb 2012, at 14:35, Philippe Marschall wrote:

> Hi
>
> I heighten everybody's mood I'll post some positive news.
>
> After some optimizations in both Seaside and AJP I managed to break 8000 requests / sec with a single Pharo 1.3 image. Thanks to SystemProfiler I knew where to look.
>
> This is with a single request handler that just returns a two byte response. It doesn't involve any rendering, sessions, continuations or whatsoever but it kicks on the full Seaside request handling machinery with a request context and everything.
>
> I'm using WASmallRequestHandler from the Seaside-Benchmark package.
>
> WASmallRequestHandler >> #handleFiltered: aRequestContext
> aRequestContext respond: [ :response |
> response
> binary;
> contentType: WAMimeType textHtml;
> nextPutAll: 'OK' asByteArray ]
>
> Apache 2.2.21 mpm_worker mod_proxy_ajp
> CPU Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
> SmalltalkImage current vmVersion 'Croquet Closure Cog VM [CoInterpreter VMMaker.oscog-eem.138]'
>
> Attached you'll find the output of ApacheBench.
>
> Cheers
> Philippe
> <8k.txt>

Very nice, indeed.

What would you get with your setup when you increase the work a bit from the your lower limit 2 byte response to something like this page ?

  http://zn.stfx.eu/dw-bench  (dynamically generated by Zn)
  http://caretaker.wolf359.be:8080/DW-Bench  (dynamically generated by Seaside)
  http://stfx.eu/static.html  (static reference by Apache)

The response should be about 8Kb.

Sven



Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Stéphane Ducasse
In reply to this post by Philippe Marschall-2-3

On Feb 25, 2012, at 2:35 PM, Philippe Marschall wrote:

> Hi
>
> I heighten everybody's mood I'll post some positive news.


Ok now I understand what the earth stopped to spin today :)
Thanks for the mail

Stef

>
> After some optimizations in both Seaside and AJP I managed to break 8000 requests / sec with a single Pharo 1.3 image. Thanks to SystemProfiler I knew where to look.
>
> This is with a single request handler that just returns a two byte response. It doesn't involve any rendering, sessions, continuations or whatsoever but it kicks on the full Seaside request handling machinery with a request context and everything.
>
> I'm using WASmallRequestHandler from the Seaside-Benchmark package.
>
> WASmallRequestHandler >> #handleFiltered: aRequestContext
> aRequestContext respond: [ :response |
> response
> binary;
> contentType: WAMimeType textHtml;
> nextPutAll: 'OK' asByteArray ]
>
> Apache 2.2.21 mpm_worker mod_proxy_ajp
> CPU Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
> SmalltalkImage current vmVersion 'Croquet Closure Cog VM [CoInterpreter VMMaker.oscog-eem.138]'
>
> Attached you'll find the output of ApacheBench.
>
> Cheers
> Philippe
> <8k.txt>


Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Philippe Marschall-2-3
In reply to this post by Sven Van Caekenberghe
On 25.02.2012 16:02, Sven Van Caekenberghe wrote:

> Philippe,
>
> On 25 Feb 2012, at 14:35, Philippe Marschall wrote:
>
>> Hi
>>
>> I heighten everybody's mood I'll post some positive news.
>>
>> After some optimizations in both Seaside and AJP I managed to break 8000 requests / sec with a single Pharo 1.3 image. Thanks to SystemProfiler I knew where to look.
>>
>> This is with a single request handler that just returns a two byte response. It doesn't involve any rendering, sessions, continuations or whatsoever but it kicks on the full Seaside request handling machinery with a request context and everything.
>>
>> I'm using WASmallRequestHandler from the Seaside-Benchmark package.
>>
>> WASmallRequestHandler>>  #handleFiltered: aRequestContext
>> aRequestContext respond: [ :response |
>> response
>> binary;
>> contentType: WAMimeType textHtml;
>> nextPutAll: 'OK' asByteArray ]
>>
>> Apache 2.2.21 mpm_worker mod_proxy_ajp
>> CPU Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>> SmalltalkImage current vmVersion 'Croquet Closure Cog VM [CoInterpreter VMMaker.oscog-eem.138]'
>>
>> Attached you'll find the output of ApacheBench.
>>
>> Cheers
>> Philippe
>> <8k.txt>
>
> Very nice, indeed.
>
> What would you get with your setup when you increase the work a bit from the your lower limit 2 byte response to something like this page ?
>
>    http://zn.stfx.eu/dw-bench  (dynamically generated by Zn)
>    http://caretaker.wolf359.be:8080/DW-Bench  (dynamically generated by Seaside)
>    http://stfx.eu/static.html  (static reference by Apache)
Drops to about 6.5 k but throughput goes up to about 50 Mbytes/sec.
That's with a statically allocated byte array, no rendering.

It's still doing too much copying so there is space left to go.

Cheers
Philippe


dw-bench.txt (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Sven Van Caekenberghe
Philippe ,

That is incredibly fast, I just tried and I can't even get plain apache2 serve the static.html that fast over the local network !

When I have more time, I really have to try to repeat your results with your code ( as well as study the code ;-)

Can you please provide the main pointers again ?

I remember you once explained how to setup AJP somewhere...

Sven

On 25 Feb 2012, at 17:31, Philippe Marschall wrote:

> On 25.02.2012 16:02, Sven Van Caekenberghe wrote:
>> Philippe,
>>
>> On 25 Feb 2012, at 14:35, Philippe Marschall wrote:
>>
>>> Hi
>>>
>>> I heighten everybody's mood I'll post some positive news.
>>>
>>> After some optimizations in both Seaside and AJP I managed to break 8000 requests / sec with a single Pharo 1.3 image. Thanks to SystemProfiler I knew where to look.
>>>
>>> This is with a single request handler that just returns a two byte response. It doesn't involve any rendering, sessions, continuations or whatsoever but it kicks on the full Seaside request handling machinery with a request context and everything.
>>>
>>> I'm using WASmallRequestHandler from the Seaside-Benchmark package.
>>>
>>> WASmallRequestHandler>>  #handleFiltered: aRequestContext
>>> aRequestContext respond: [ :response |
>>> response
>>> binary;
>>> contentType: WAMimeType textHtml;
>>> nextPutAll: 'OK' asByteArray ]
>>>
>>> Apache 2.2.21 mpm_worker mod_proxy_ajp
>>> CPU Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>>> SmalltalkImage current vmVersion 'Croquet Closure Cog VM [CoInterpreter VMMaker.oscog-eem.138]'
>>>
>>> Attached you'll find the output of ApacheBench.
>>>
>>> Cheers
>>> Philippe
>>> <8k.txt>
>>
>> Very nice, indeed.
>>
>> What would you get with your setup when you increase the work a bit from the your lower limit 2 byte response to something like this page ?
>>
>>   http://zn.stfx.eu/dw-bench  (dynamically generated by Zn)
>>   http://caretaker.wolf359.be:8080/DW-Bench  (dynamically generated by Seaside)
>>   http://stfx.eu/static.html  (static reference by Apache)
>
> Drops to about 6.5 k but throughput goes up to about 50 Mbytes/sec. That's with a statically allocated byte array, no rendering.
>
> It's still doing too much copying so there is space left to go.
>
> Cheers
> Philippe
>
> <dw-bench.txt>


Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Philippe Marschall-2-3
On 25.02.2012 17:47, Sven Van Caekenberghe wrote:
> Philippe ,
>
> That is incredibly fast, I just tried and I can't even get plain apache2 serve the static.html that fast over the local network !

It's not over the network, it's on the local machine. Keep-alive makes a
big difference.

> When I have more time, I really have to try to repeat your results with your code ( as well as study the code ;-)
>
> Can you please provide the main pointers again ?

  - get the image from [1]
  - in Apache 2.2 you set up AJP the way you set up an HTTP reverse
proxy, the protocol is just ajp:// instead of http://
  - load Seaside-Benchmark from [2]
  - see the class side of WASmallRequestHandler (2 byte response),
WAFastRequestHandler (16k response, seaside.st homepage) and
WADwBenchHandler (dw-bench) to register the request handlers

  [1]
http://jenkins.lukas-renggli.ch/job/Seaside%203.1/lastSuccessfulBuild/artifact/seaside31-ajp.zip
  [2] http://www.squeaksource.com/Seaside31Addons

Cheers
Philippe


Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Sven Van Caekenberghe

On 25 Feb 2012, at 19:21, Philippe Marschall wrote:

> On 25.02.2012 17:47, Sven Van Caekenberghe wrote:
>> Philippe ,
>>
>> That is incredibly fast, I just tried and I can't even get plain apache2 serve the static.html that fast over the local network !
>
> It's not over the network, it's on the local machine. Keep-alive makes a big difference.

Yes, I meant 127.0.0.1. And yes keep alive is necessary.

>> When I have more time, I really have to try to repeat your results with your code ( as well as study the code ;-)
>>
>> Can you please provide the main pointers again ?
>
> - get the image from [1]
> - in Apache 2.2 you set up AJP the way you set up an HTTP reverse proxy, the protocol is just ajp:// instead of http://
> - load Seaside-Benchmark from [2]
> - see the class side of WASmallRequestHandler (2 byte response), WAFastRequestHandler (16k response, seaside.st homepage) and WADwBenchHandler (dw-bench) to register the request handlers
>
> [1] http://jenkins.lukas-renggli.ch/job/Seaside%203.1/lastSuccessfulBuild/artifact/seaside31-ajp.zip
> [2] http://www.squeaksource.com/Seaside31Addons

OK, Thanks!

Sven
Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Philippe Marschall-2
In reply to this post by Sven Van Caekenberghe
On 02/25/2012 05:47 PM, Sven Van Caekenberghe wrote:
> Philippe ,
>
> That is incredibly fast, I just tried and I can't even get plain apache2 serve the static.html that fast over the local network !
>
> When I have more time, I really have to try to repeat your results with your code ( as well as study the code ;-)

The biggest thing is probably the recycling of the response buffers.
Each worker thread has a response buffer that is reused. I found that
request handling is very sensitive to allocation. There is a direct
correlation between removing allocation and handling more requests and
having a bigger throughput. The more allocation you can remove the
better. Especially things like Stream >> #contents.

There is also some code to have buffers that can efficiently work on
both ByteArray and ByteString.

Cheers
Philippe


Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Göran Krampe
Hi Philippe!

Nice to see your AJP work giving results! I think Nginx has a module for
AJP, would be interesting to see if that makes a difference. :)

Are you using stock SocketStream internally or anything even more bare bone?

regards, Göran

Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Philippe Marschall-2
On 02/27/2012 10:11 AM, Göran Krampe wrote:
> Hi Philippe!
>
> Nice to see your AJP work giving results! I think Nginx has a module for
> AJP, would be interesting to see if that makes a difference. :)

I don't see how this should help when the Pharo image is at 100% CPU. I
don't see how event driven IO is supposed to help for few, high
throughput connections.

> Are you using stock SocketStream internally or anything even more bare
> bone?

No, I build my own buffer and go straight to Socket. AJP is packet
oriented with 8k packets so this is easy.

Cheers
Philippe



Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Göran Krampe
On 02/27/2012 10:27 AM, Philippe Marschall wrote:
> On 02/27/2012 10:11 AM, Göran Krampe wrote:
>> Hi Philippe!
>>
>> Nice to see your AJP work giving results! I think Nginx has a module for
>> AJP, would be interesting to see if that makes a difference. :)
>
> I don't see how this should help when the Pharo image is at 100% CPU. I
> don't see how event driven IO is supposed to help for few, high
> throughput connections.

I agree that it sounds like it wouldn't help - but still, curious. I got
a feeling when I messed around with SCGI that it can also matter "how"
the frontend works against the backend.

regards, Göran

Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Janko Mivšek
In reply to this post by Philippe Marschall-2
Hi guys,

S, Philippe Marschall piše:
> Göran Krampe wrote:

>> Nice to see your AJP work giving results! I think Nginx has a module for
>> AJP, would be interesting to see if that makes a difference. :)

> I don't see how this should help when the Pharo image is at 100% CPU. I
> don't see how event driven IO is supposed to help for few, high
> throughput connections.

>> Are you using stock SocketStream internally or anything even more bare
>> bone?

> No, I build my own buffer and go straight to Socket. AJP is packet
> oriented with 8k packets so this is easy.

Reuse of the same buffer (same ByteArray) on raw socket is also the
technique used in Swazoo and results are similar. I'm preparing a
similar benchmark including the comparison with VW, so that we can see
how Pharo is progressing on network field and also in general.

Best regards
Janko


--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Philippe Marschall-2
On 02/27/2012 12:53 PM, Janko Mivšek wrote:

> Hi guys,
>
> S, Philippe Marschall piše:
>> Göran Krampe wrote:
>
>>> Nice to see your AJP work giving results! I think Nginx has a module for
>>> AJP, would be interesting to see if that makes a difference. :)
>
>> I don't see how this should help when the Pharo image is at 100% CPU. I
>> don't see how event driven IO is supposed to help for few, high
>> throughput connections.
>
>>> Are you using stock SocketStream internally or anything even more bare
>>> bone?
>
>> No, I build my own buffer and go straight to Socket. AJP is packet
>> oriented with 8k packets so this is easy.
>
> Reuse of the same buffer (same ByteArray) on raw socket is also the
> technique used in Swazoo and results are similar. I'm preparing a
> similar benchmark including the comparison with VW, so that we can see
> how Pharo is progressing on network field and also in general.

One thing I noted with testing Swazoo is that is doesn't support
Keep-Alive with HTTP 1.0. Unfortunately ApacheBench uses exactly this.

Cheers
Philippe


Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Janko Mivšek
S, Philippe Marschall piše:

>>> No, I build my own buffer and go straight to Socket. AJP is packet
>>> oriented with 8k packets so this is easy.
>>
>> Reuse of the same buffer (same ByteArray) on raw socket is also the
>> technique used in Swazoo and results are similar. I'm preparing a
>> similar benchmark including the comparison with VW, so that we can see
>> how Pharo is progressing on network field and also in general.
>
> One thing I noted with testing Swazoo is that is doesn't support
> Keep-Alive with HTTP 1.0. Unfortunately ApacheBench uses exactly this.

Correcting this was easy:

Swazoo patch to allow Keep Alive (ab -k) over HTTP 1.0:

HTTPConnection>>getAndDispatchMessages
  ...
  (self task request isHttp10 and: [self task request isKeepAlive not])
        ifTrue: [self close].
  ...



Best regards
Janko

--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: So I broke 8k req/s today

Stéphane Ducasse
In reply to this post by Janko Mivšek
>
> Reuse of the same buffer (same ByteArray) on raw socket is also the
> technique used in Swazoo and results are similar. I'm preparing a
> similar benchmark including the comparison with VW, so that we can see
> how Pharo is progressing on network field and also in general.

let us know because this is nice to know.
I want to also know the slope of progress :)

Stef