real world pharo web application set ups

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

philippeback
I can try that: I've some 48-core 500+ GB RAM boxes around at the moment.

If someone can make me a test script, can run it.

Phil

On Wed, Dec 14, 2016 at 8:54 PM, Stephan Eggermont <[hidden email]> wrote:
On 14/12/16 19:41, Dimitris Chloupis wrote:
That 5% idle CPU consumption is not the only problem, creating a new
process will add additional overheads anyway so it does not make sense
to have more pharo processes than CPU cores.

You can probably run a few thousand images on a single high-end x86 server (2*22 Core Xeon, 1TB ram). I just started 21 images on my dual-core i5 MBP. top tells me my system is 75-80% idle, pharo processes taking 2-3%CPU and 1.6GB MEM (33MB-146MB)

Stephan





Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Sven Van Caekenberghe-2
In reply to this post by Vitor Medina Cruz

> On 14 Dec 2016, at 23:29, Vitor Medina Cruz <[hidden email]> wrote:
>
> Pharo don't have non-blocking I/O?

It certainly does at the networking level, but some native code interfaces might not act so nice.

> On Wed, Dec 14, 2016 at 6:59 PM, Ramon Leon <[hidden email]> wrote:
> On 12/14/2016 12:09 PM, Esteban A. Maringolo wrote:
> Can you extend on suspending the UI process? I never did that.
>
> I feed my images a start script on the command line
>
> pharo-vm-nox \
>     -vm-sound-null -vm-display-null \
>     /var/pharo/app.image \
>     /var/pharo/startScript
>
> startScript containing one line (among others) like so...
>
>         Project uiProcess suspend.
>
> I'm on an older Pharo, but I presume the newer ones are the same or similar. No sense in wasting CPU on a UI in a headless image
>
> Won't the idle use add up?
>
> Sure eventually, but you don't run more than a 2 or so per core so that'll never be a problem.  You shouldn't be running 5 images on a single core, let alone more.
>
> In my case I served up to 20 concurrent users (out of ~100 total) with
> only 5 images. Plus another two images for the REST API. In a dual
> core server.
>
> That's barely a server, most laptops these days have more cores. Rent a virtual server with a dozen or more cores, then you can run a few images per core without the idle mattering at all and run 2 dozen images in total per 12 core server.
>
> Scale by adding cores and ram allowing you to run more images per box; or scale by running more boxes, ultimately, you need to spread out the load across many many cores.
>
> --
> Ramon Leon
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

jtuchel
In reply to this post by Vitor Medina Cruz
Victor,

Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less. 

Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?

seriously, I think 20 is very optimistic for several reasons.

One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.

Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.

Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.

So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.


Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.

Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.

Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.

To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.

So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)

So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).

Joachim








Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Sven Van Caekenberghe-2
Joachim,

> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>
> Victor,
>
> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>
>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>
> seriously, I think 20 is very optimistic for several reasons.
>
> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>
> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>
> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>
> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>
>
> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>
> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>
> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>
> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>
> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>
> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>
> Joachim

Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.

But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).

One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.


$ pharo Pharo.image printVersion
[version] 4.0 #40626

$ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &

$ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:        Zinc
Server Hostname:        127.0.0.1
Server Port:            1701

Document Path:          /bytes/32
Document Length:        32 bytes

Concurrency Level:      8
Time taken for tests:   1.945 seconds
Complete requests:      10240
Failed requests:        0
Keep-Alive requests:    10240
Total transferred:      2109440 bytes
HTML transferred:       327680 bytes
Requests per second:    5265.17 [#/sec] (mean)
Time per request:       1.519 [ms] (mean)
Time per request:       0.190 [ms] (mean, across all concurrent requests)
Transfer rate:          1059.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       2
Processing:     0    2   8.0      2     309
Waiting:        0    1   8.0      1     309
Total:          0    2   8.0      2     309

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      3
  98%      3
  99%      3
 100%    309 (longest request)


More than 5K req/s (10K requests, 8 concurrent clients).

Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.

A more realistic payload (7K HTML) gives the following:


$ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:        Zinc
Server Hostname:        127.0.0.1
Server Port:            1701

Document Path:          /dw-bench
Document Length:        7734 bytes

Concurrency Level:      8
Time taken for tests:   7.874 seconds
Complete requests:      10240
Failed requests:        0
Keep-Alive requests:    10240
Total transferred:      80988160 bytes
HTML transferred:       79196160 bytes
Requests per second:    1300.46 [#/sec] (mean)
Time per request:       6.152 [ms] (mean)
Time per request:       0.769 [ms] (mean, across all concurrent requests)
Transfer rate:          10044.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    6 183.4      1    7874
Waiting:        1    6 183.4      1    7874
Total:          1    6 183.4      1    7874

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%   7874 (longest request)


That is more than 1K req/s.

In both cases we are talking about sub 1ms req/resp cycles !

I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.

Sven



Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Vitor Medina Cruz
Joachim


seriously, I think 20 is very optimistic for several reasons. (...)

Whoa! Thanks for the careful and insightful response, I really appreciate that! :)

On Thu, Dec 15, 2016 at 12:00 PM, Sven Van Caekenberghe <[hidden email]> wrote:
Joachim,

> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>
> Victor,
>
> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>
>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>
> seriously, I think 20 is very optimistic for several reasons.
>
> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>
> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>
> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>
> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>
>
> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>
> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>
> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>
> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>
> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>
> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>
> Joachim

Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.

But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).

One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.


$ pharo Pharo.image printVersion
[version] 4.0 #40626

$ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &

$ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:        Zinc
Server Hostname:        127.0.0.1
Server Port:            1701

Document Path:          /bytes/32
Document Length:        32 bytes

Concurrency Level:      8
Time taken for tests:   1.945 seconds
Complete requests:      10240
Failed requests:        0
Keep-Alive requests:    10240
Total transferred:      2109440 bytes
HTML transferred:       327680 bytes
Requests per second:    5265.17 [#/sec] (mean)
Time per request:       1.519 [ms] (mean)
Time per request:       0.190 [ms] (mean, across all concurrent requests)
Transfer rate:          1059.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       2
Processing:     0    2   8.0      2     309
Waiting:        0    1   8.0      1     309
Total:          0    2   8.0      2     309

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      3
  98%      3
  99%      3
 100%    309 (longest request)


More than 5K req/s (10K requests, 8 concurrent clients).

Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.

A more realistic payload (7K HTML) gives the following:


$ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:        Zinc
Server Hostname:        127.0.0.1
Server Port:            1701

Document Path:          /dw-bench
Document Length:        7734 bytes

Concurrency Level:      8
Time taken for tests:   7.874 seconds
Complete requests:      10240
Failed requests:        0
Keep-Alive requests:    10240
Total transferred:      80988160 bytes
HTML transferred:       79196160 bytes
Requests per second:    1300.46 [#/sec] (mean)
Time per request:       6.152 [ms] (mean)
Time per request:       0.769 [ms] (mean, across all concurrent requests)
Transfer rate:          10044.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    6 183.4      1    7874
Waiting:        1    6 183.4      1    7874
Total:          1    6 183.4      1    7874

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%   7874 (longest request)


That is more than 1K req/s.

In both cases we are talking about sub 1ms req/resp cycles !

I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.

Sven




Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Vitor Medina Cruz
In reply to this post by Sven Van Caekenberghe-2
> On 14 Dec 2016, at 23:29, Vitor Medina Cruz <[hidden email]> wrote:
>
> Pharo don't have non-blocking I/O?
It certainly does at the networking level, but some native code interfaces might not act so nice.

Humm, I asked because those times I experimented slow I/O processing and the image seems to freeze. There are some situations where not even a cntrl+. can interrupt the work that is freezing it. As I understand, I/O can be interleaved with other work, but not in all cases because of some native code that keeps the thread blocked, is that correct? For hi CPU usage procedures, if it do not explicitly yield execution, then the image will be blocked until the end of it's execution, right? Also, to take advantage of multiple cores one must use multiple images or use the CPP library that Dimitris talked about? Is that another way to spem OS Threads or process inside an image? 


On Wed, Dec 14, 2016 at 8:51 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> On 14 Dec 2016, at 23:29, Vitor Medina Cruz <[hidden email]> wrote:
>
> Pharo don't have non-blocking I/O?

It certainly does at the networking level, but some native code interfaces might not act so nice.

> On Wed, Dec 14, 2016 at 6:59 PM, Ramon Leon <[hidden email]> wrote:
> On 12/14/2016 12:09 PM, Esteban A. Maringolo wrote:
> Can you extend on suspending the UI process? I never did that.
>
> I feed my images a start script on the command line
>
> pharo-vm-nox \
>     -vm-sound-null -vm-display-null \
>     /var/pharo/app.image \
>     /var/pharo/startScript
>
> startScript containing one line (among others) like so...
>
>         Project uiProcess suspend.
>
> I'm on an older Pharo, but I presume the newer ones are the same or similar. No sense in wasting CPU on a UI in a headless image
>
> Won't the idle use add up?
>
> Sure eventually, but you don't run more than a 2 or so per core so that'll never be a problem.  You shouldn't be running 5 images on a single core, let alone more.
>
> In my case I served up to 20 concurrent users (out of ~100 total) with
> only 5 images. Plus another two images for the REST API. In a dual
> core server.
>
> That's barely a server, most laptops these days have more cores. Rent a virtual server with a dozen or more cores, then you can run a few images per core without the idle mattering at all and run 2 dozen images in total per 12 core server.
>
> Scale by adding cores and ram allowing you to run more images per box; or scale by running more boxes, ultimately, you need to spread out the load across many many cores.
>
> --
> Ramon Leon
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Volkert
In reply to this post by Sven Van Caekenberghe-2
Sven,

compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @
3.20GHz × 4, on linux ...

Conncurrent request: 8

$ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        7734 bytes

Concurrency Level:      8
Time taken for tests:   0.192 seconds
Complete requests:      10240
Failed requests:        0
Keep-Alive requests:    10143
Total transferred:      80658152 bytes
HTML transferred:       79196160 bytes
Requests per second:    53414.29 [#/sec] (mean)
Time per request:       0.150 [ms] (mean)
Time per request:       0.019 [ms] (mean, across all concurrent requests)
Transfer rate:          410871.30 [Kbytes/sec] received

Connection Times (ms)
               min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.2      0       3
Waiting:        0    0   0.2      0       3
Total:          0    0   0.2      0       3

Percentage of the requests served within a certain time (ms)
   50%      0
   66%      0
   75%      0
   80%      0
   90%      0
   95%      1
   98%      1
   99%      1
  100%      3 (longest request)


And here with 1000 concurrent request ...

$ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        7734 bytes

Concurrency Level:      1000
Time taken for tests:   0.225 seconds
Complete requests:      10240
Failed requests:        0
Keep-Alive requests:    10232
Total transferred:      80660288 bytes
HTML transferred:       79196160 bytes
Requests per second:    45583.23 [#/sec] (mean)
Time per request:       21.938 [ms] (mean)
Time per request:       0.022 [ms] (mean, across all concurrent requests)
Transfer rate:          350642.85 [Kbytes/sec] received

Connection Times (ms)
               min  mean[+/-sd] median   max
Connect:        0    1   3.3      0      23
Processing:     0    6  16.1      0     198
Waiting:        0    6  16.1      0     198
Total:          0    7  18.0      0     211

Percentage of the requests served within a certain time (ms)
   50%      0
   66%      2
   75%      6
   80%     10
   90%     21
   95%     32
   98%     47
   99%    108
  100%    211 (longest request)



Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:

> Joachim,
>
>> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>>
>> Victor,
>>
>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>>
>>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>> seriously, I think 20 is very optimistic for several reasons.
>>
>> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>>
>> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>>
>> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>>
>> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
>> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>>
>>
>> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>>
>> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>>
>> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>>
>> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>>
>> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>>
>> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>>
>> Joachim
> Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.
>
> But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).
>
> One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.
>
>
> $ pharo Pharo.image printVersion
> [version] 4.0 #40626
>
> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>
> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
>
>
> Server Software:        Zinc
> Server Hostname:        127.0.0.1
> Server Port:            1701
>
> Document Path:          /bytes/32
> Document Length:        32 bytes
>
> Concurrency Level:      8
> Time taken for tests:   1.945 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10240
> Total transferred:      2109440 bytes
> HTML transferred:       327680 bytes
> Requests per second:    5265.17 [#/sec] (mean)
> Time per request:       1.519 [ms] (mean)
> Time per request:       0.190 [ms] (mean, across all concurrent requests)
> Transfer rate:          1059.20 [Kbytes/sec] received
>
> Connection Times (ms)
>                min  mean[+/-sd] median   max
> Connect:        0    0   0.0      0       2
> Processing:     0    2   8.0      2     309
> Waiting:        0    1   8.0      1     309
> Total:          0    2   8.0      2     309
>
> Percentage of the requests served within a certain time (ms)
>    50%      2
>    66%      2
>    75%      2
>    80%      2
>    90%      2
>    95%      3
>    98%      3
>    99%      3
>   100%    309 (longest request)
>
>
> More than 5K req/s (10K requests, 8 concurrent clients).
>
> Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.
>
> A more realistic payload (7K HTML) gives the following:
>
>
> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
>
>
> Server Software:        Zinc
> Server Hostname:        127.0.0.1
> Server Port:            1701
>
> Document Path:          /dw-bench
> Document Length:        7734 bytes
>
> Concurrency Level:      8
> Time taken for tests:   7.874 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10240
> Total transferred:      80988160 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    1300.46 [#/sec] (mean)
> Time per request:       6.152 [ms] (mean)
> Time per request:       0.769 [ms] (mean, across all concurrent requests)
> Transfer rate:          10044.25 [Kbytes/sec] received
>
> Connection Times (ms)
>                min  mean[+/-sd] median   max
> Connect:        0    0   0.0      0       0
> Processing:     1    6 183.4      1    7874
> Waiting:        1    6 183.4      1    7874
> Total:          1    6 183.4      1    7874
>
> Percentage of the requests served within a certain time (ms)
>    50%      1
>    66%      1
>    75%      1
>    80%      1
>    90%      1
>    95%      1
>    98%      1
>    99%      1
>   100%   7874 (longest request)
>
>
> That is more than 1K req/s.
>
> In both cases we are talking about sub 1ms req/resp cycles !
>
> I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.
>
> Sven
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Sven Van Caekenberghe-2
I did not say we are the fastest, for from it. I absolutely do not want to go into a contest, there is no point in doing so.

(The dw-bench page was meant to be generated dynamically on each request without caching, did you do that too ?).

My point was: Pharo is good enough for most web applications. The rest of the challenge is standard software architecture, design and development. I choose to do that in Pharo because I like it so much. It is perfectly fine by me that 99.xx % of the world makes other decisions, for whatever reason.

> On 16 Dec 2016, at 09:57, volkert <[hidden email]> wrote:
>
> Sven,
>
> compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @ 3.20GHz × 4, on linux ...
>
> Conncurrent request: 8
>
> $ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
>
>
> Server Software:
> Server Hostname:        127.0.0.1
> Server Port:            8080
>
> Document Path:          /
> Document Length:        7734 bytes
>
> Concurrency Level:      8
> Time taken for tests:   0.192 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10143
> Total transferred:      80658152 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    53414.29 [#/sec] (mean)
> Time per request:       0.150 [ms] (mean)
> Time per request:       0.019 [ms] (mean, across all concurrent requests)
> Transfer rate:          410871.30 [Kbytes/sec] received
>
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    0   0.0      0       0
> Processing:     0    0   0.2      0       3
> Waiting:        0    0   0.2      0       3
> Total:          0    0   0.2      0       3
>
> Percentage of the requests served within a certain time (ms)
>  50%      0
>  66%      0
>  75%      0
>  80%      0
>  90%      0
>  95%      1
>  98%      1
>  99%      1
> 100%      3 (longest request)
>
>
> And here with 1000 concurrent request ...
>
> $ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
>
>
> Server Software:
> Server Hostname:        127.0.0.1
> Server Port:            8080
>
> Document Path:          /
> Document Length:        7734 bytes
>
> Concurrency Level:      1000
> Time taken for tests:   0.225 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10232
> Total transferred:      80660288 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    45583.23 [#/sec] (mean)
> Time per request:       21.938 [ms] (mean)
> Time per request:       0.022 [ms] (mean, across all concurrent requests)
> Transfer rate:          350642.85 [Kbytes/sec] received
>
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    1   3.3      0      23
> Processing:     0    6  16.1      0     198
> Waiting:        0    6  16.1      0     198
> Total:          0    7  18.0      0     211
>
> Percentage of the requests served within a certain time (ms)
>  50%      0
>  66%      2
>  75%      6
>  80%     10
>  90%     21
>  95%     32
>  98%     47
>  99%    108
> 100%    211 (longest request)
>
>
>
> Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
>> Joachim,
>>
>>> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>>>
>>> Victor,
>>>
>>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>>>
>>>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>>> seriously, I think 20 is very optimistic for several reasons.
>>>
>>> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>>>
>>> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>>>
>>> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>>>
>>> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
>>> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>>>
>>>
>>> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>>>
>>> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>>>
>>> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>>>
>>> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>>>
>>> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>>>
>>> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>>>
>>> Joachim
>> Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.
>>
>> But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).
>>
>> One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.
>>
>>
>> $ pharo Pharo.image printVersion
>> [version] 4.0 #40626
>>
>> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>>
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>>
>>
>> Server Software:        Zinc
>> Server Hostname:        127.0.0.1
>> Server Port:            1701
>>
>> Document Path:          /bytes/32
>> Document Length:        32 bytes
>>
>> Concurrency Level:      8
>> Time taken for tests:   1.945 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10240
>> Total transferred:      2109440 bytes
>> HTML transferred:       327680 bytes
>> Requests per second:    5265.17 [#/sec] (mean)
>> Time per request:       1.519 [ms] (mean)
>> Time per request:       0.190 [ms] (mean, across all concurrent requests)
>> Transfer rate:          1059.20 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       2
>> Processing:     0    2   8.0      2     309
>> Waiting:        0    1   8.0      1     309
>> Total:          0    2   8.0      2     309
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%      2
>>   66%      2
>>   75%      2
>>   80%      2
>>   90%      2
>>   95%      3
>>   98%      3
>>   99%      3
>>  100%    309 (longest request)
>>
>>
>> More than 5K req/s (10K requests, 8 concurrent clients).
>>
>> Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.
>>
>> A more realistic payload (7K HTML) gives the following:
>>
>>
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>>
>>
>> Server Software:        Zinc
>> Server Hostname:        127.0.0.1
>> Server Port:            1701
>>
>> Document Path:          /dw-bench
>> Document Length:        7734 bytes
>>
>> Concurrency Level:      8
>> Time taken for tests:   7.874 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10240
>> Total transferred:      80988160 bytes
>> HTML transferred:       79196160 bytes
>> Requests per second:    1300.46 [#/sec] (mean)
>> Time per request:       6.152 [ms] (mean)
>> Time per request:       0.769 [ms] (mean, across all concurrent requests)
>> Transfer rate:          10044.25 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       0
>> Processing:     1    6 183.4      1    7874
>> Waiting:        1    6 183.4      1    7874
>> Total:          1    6 183.4      1    7874
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%      1
>>   66%      1
>>   75%      1
>>   80%      1
>>   90%      1
>>   95%      1
>>   98%      1
>>   99%      1
>>  100%   7874 (longest request)
>>
>>
>> That is more than 1K req/s.
>>
>> In both cases we are talking about sub 1ms req/resp cycles !
>>
>> I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.
>>
>> Sven
>>
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

jtuchel
Sven,

Am 16.12.16 um 10:05 schrieb Sven Van Caekenberghe:
> I did not say we are the fastest, for from it. I absolutely do not want to go into a contest, there is no point in doing so.
Absolutely right.
>
> (The dw-bench page was meant to be generated dynamically on each request without caching, did you do that too ?).
>
> My point was: Pharo is good enough for most web applications. The rest of the challenge is standard software architecture, design and development. I choose to do that in Pharo because I like it so much. It is perfectly fine by me that 99.xx % of the world makes other decisions, for whatever reason.
Exactly. Smalltalk and Seaside are perfectly suited for web applications
and are not per se extremely slow or anything.
Raw benchmarks are some indicator, but whether a web applicaton is fast
or slow depends so much more on your applications' architecture than the
underlying HTTP handling stuff.

The important question is not "how fast can Smalltalk serve a number of
bytes?" but "how fast can your application do whatever is needed to put
those bytes together?".

So your benchmarks show that Smalltalk can serve stuff more than fast
enough for amost all situations (let's be honest, most of us will never
have to server thousands of concurrent users - of course I hope I am
wrong ;-) ). The rest is application architecture, infrastructure and
avoiding stupid errors. Nothing Smalltalk specific.


Joachim



--

-----------------------------------------------------------------------
Objektfabrik Joachim Tuchel          mailto:[hidden email]
Fliederweg 1                         http://www.objektfabrik.de
D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

NorbertHartl
In reply to this post by Volkert
I'm still not sure about what we are talking. There are some many opinions regarding totally different things.

These benchmark don't say much. As Sven and you did the benchmark on different machines they are hard to compare in numbers. It is not that important because you can not make many conclusions from a micro benchmark. So what Sven has proven is the fact that there is no limit for pharo per se to handle 1k requests per second and more.

From these 1k req/s to Joachims 5 reqs/s is big difference. You can always assume there is something blocking the vm or a synchron I/O call takes a lot of time. But it is not helpful either because that is an edge case like Svens test with an app that does nothing. I would even state that it is not that easy to produce a situation like Joachim describes. If you have that kind of a problem than I'm pretty sure the reasons are mostly not pharo related. Sure if it comes to blocking I/O then it is pharo's fault because it cannot do async I/O, yet. But a slow database query is not the fault of pharo and you will experience the exact same thing in any other runtime.

Whatever it will be there is no other way then to measure your exact use case and find the bottlenecks that prevent your app from being able to handle 1000 concurrent requests. While I agree with a lot of points mentioned in this thread I cannot share the general notion of saying that you reduce the number of requests per image and "just" use more images and more machines. That is not true.
The moment you cannot deal with all your requests in a single image you are in trouble. As soon as there is a second image you need to make sure there is no volatile shared state between those images. You need to take caution then. Scaling up using more images and more machines shifts problem to the database because it is a central component that is not easy to scale. But again it is not pharo's fault either.

So I would state two things:

- We are talking about really high numbers of requests/s. The odds you are getting in this kind of scaling trouble are usually close to zero. It means you need to generate an application that has really many users. Most projects we know end up using a single image for everything.
- Whenever you have performance problems in your application architecture I'm pretty sure pharo is not in the top of the list of bottlenecks.

So yes, you can handle pretty huge numbers using pharo.

Norbert

> Am 16.12.2016 um 09:57 schrieb volkert <[hidden email]>:
>
> Sven,
>
> compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @ 3.20GHz × 4, on linux ...
>
> Conncurrent request: 8
>
> $ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
>
>
> Server Software:
> Server Hostname:        127.0.0.1
> Server Port:            8080
>
> Document Path:          /
> Document Length:        7734 bytes
>
> Concurrency Level:      8
> Time taken for tests:   0.192 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10143
> Total transferred:      80658152 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    53414.29 [#/sec] (mean)
> Time per request:       0.150 [ms] (mean)
> Time per request:       0.019 [ms] (mean, across all concurrent requests)
> Transfer rate:          410871.30 [Kbytes/sec] received
>
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    0   0.0      0       0
> Processing:     0    0   0.2      0       3
> Waiting:        0    0   0.2      0       3
> Total:          0    0   0.2      0       3
>
> Percentage of the requests served within a certain time (ms)
>  50%      0
>  66%      0
>  75%      0
>  80%      0
>  90%      0
>  95%      1
>  98%      1
>  99%      1
> 100%      3 (longest request)
>
>
> And here with 1000 concurrent request ...
>
> $ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
>
>
> Server Software:
> Server Hostname:        127.0.0.1
> Server Port:            8080
>
> Document Path:          /
> Document Length:        7734 bytes
>
> Concurrency Level:      1000
> Time taken for tests:   0.225 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10232
> Total transferred:      80660288 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    45583.23 [#/sec] (mean)
> Time per request:       21.938 [ms] (mean)
> Time per request:       0.022 [ms] (mean, across all concurrent requests)
> Transfer rate:          350642.85 [Kbytes/sec] received
>
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    1   3.3      0      23
> Processing:     0    6  16.1      0     198
> Waiting:        0    6  16.1      0     198
> Total:          0    7  18.0      0     211
>
> Percentage of the requests served within a certain time (ms)
>  50%      0
>  66%      2
>  75%      6
>  80%     10
>  90%     21
>  95%     32
>  98%     47
>  99%    108
> 100%    211 (longest request)
>
>
>
> Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
>> Joachim,
>>
>>> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>>>
>>> Victor,
>>>
>>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>>>
>>>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>>> seriously, I think 20 is very optimistic for several reasons.
>>>
>>> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>>>
>>> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>>>
>>> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>>>
>>> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
>>> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>>>
>>>
>>> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>>>
>>> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>>>
>>> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>>>
>>> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>>>
>>> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>>>
>>> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>>>
>>> Joachim
>> Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.
>>
>> But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).
>>
>> One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.
>>
>>
>> $ pharo Pharo.image printVersion
>> [version] 4.0 #40626
>>
>> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>>
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>>
>>
>> Server Software:        Zinc
>> Server Hostname:        127.0.0.1
>> Server Port:            1701
>>
>> Document Path:          /bytes/32
>> Document Length:        32 bytes
>>
>> Concurrency Level:      8
>> Time taken for tests:   1.945 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10240
>> Total transferred:      2109440 bytes
>> HTML transferred:       327680 bytes
>> Requests per second:    5265.17 [#/sec] (mean)
>> Time per request:       1.519 [ms] (mean)
>> Time per request:       0.190 [ms] (mean, across all concurrent requests)
>> Transfer rate:          1059.20 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       2
>> Processing:     0    2   8.0      2     309
>> Waiting:        0    1   8.0      1     309
>> Total:          0    2   8.0      2     309
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%      2
>>   66%      2
>>   75%      2
>>   80%      2
>>   90%      2
>>   95%      3
>>   98%      3
>>   99%      3
>>  100%    309 (longest request)
>>
>>
>> More than 5K req/s (10K requests, 8 concurrent clients).
>>
>> Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.
>>
>> A more realistic payload (7K HTML) gives the following:
>>
>>
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>>
>>
>> Server Software:        Zinc
>> Server Hostname:        127.0.0.1
>> Server Port:            1701
>>
>> Document Path:          /dw-bench
>> Document Length:        7734 bytes
>>
>> Concurrency Level:      8
>> Time taken for tests:   7.874 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10240
>> Total transferred:      80988160 bytes
>> HTML transferred:       79196160 bytes
>> Requests per second:    1300.46 [#/sec] (mean)
>> Time per request:       6.152 [ms] (mean)
>> Time per request:       0.769 [ms] (mean, across all concurrent requests)
>> Transfer rate:          10044.25 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       0
>> Processing:     1    6 183.4      1    7874
>> Waiting:        1    6 183.4      1    7874
>> Total:          1    6 183.4      1    7874
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%      1
>>   66%      1
>>   75%      1
>>   80%      1
>>   90%      1
>>   95%      1
>>   98%      1
>>   99%      1
>>  100%   7874 (longest request)
>>
>>
>> That is more than 1K req/s.
>>
>> In both cases we are talking about sub 1ms req/resp cycles !
>>
>> I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.
>>
>> Sven
>>
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Volkert
come on, i am only interested in what set up Pharo is currently used (as
mentioned in my initial question).
This gives me a feeling, if my requirements are nearby the requirement
found in current pharo
based systems ... if i am complete out of population of current pharo
systems, this is for me a good
indication to bet not on it ...

On 16.12.2016 10:41, Norbert Hartl wrote:

> I'm still not sure about what we are talking. There are some many opinions regarding totally different things.
>
> These benchmark don't say much. As Sven and you did the benchmark on different machines they are hard to compare in numbers. It is not that important because you can not make many conclusions from a micro benchmark. So what Sven has proven is the fact that there is no limit for pharo per se to handle 1k requests per second and more.
>
>  From these 1k req/s to Joachims 5 reqs/s is big difference. You can always assume there is something blocking the vm or a synchron I/O call takes a lot of time. But it is not helpful either because that is an edge case like Svens test with an app that does nothing. I would even state that it is not that easy to produce a situation like Joachim describes. If you have that kind of a problem than I'm pretty sure the reasons are mostly not pharo related. Sure if it comes to blocking I/O then it is pharo's fault because it cannot do async I/O, yet. But a slow database query is not the fault of pharo and you will experience the exact same thing in any other runtime.
>
> Whatever it will be there is no other way then to measure your exact use case and find the bottlenecks that prevent your app from being able to handle 1000 concurrent requests. While I agree with a lot of points mentioned in this thread I cannot share the general notion of saying that you reduce the number of requests per image and "just" use more images and more machines. That is not true.
> The moment you cannot deal with all your requests in a single image you are in trouble. As soon as there is a second image you need to make sure there is no volatile shared state between those images. You need to take caution then. Scaling up using more images and more machines shifts problem to the database because it is a central component that is not easy to scale. But again it is not pharo's fault either.
>
> So I would state two things:
>
> - We are talking about really high numbers of requests/s. The odds you are getting in this kind of scaling trouble are usually close to zero. It means you need to generate an application that has really many users. Most projects we know end up using a single image for everything.
> - Whenever you have performance problems in your application architecture I'm pretty sure pharo is not in the top of the list of bottlenecks.
>
> So yes, you can handle pretty huge numbers using pharo.
>
> Norbert
>
>> Am 16.12.2016 um 09:57 schrieb volkert <[hidden email]>:
>>
>> Sven,
>>
>> compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @ 3.20GHz × 4, on linux ...
>>
>> Conncurrent request: 8
>>
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
>> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>>
>>
>> Server Software:
>> Server Hostname:        127.0.0.1
>> Server Port:            8080
>>
>> Document Path:          /
>> Document Length:        7734 bytes
>>
>> Concurrency Level:      8
>> Time taken for tests:   0.192 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10143
>> Total transferred:      80658152 bytes
>> HTML transferred:       79196160 bytes
>> Requests per second:    53414.29 [#/sec] (mean)
>> Time per request:       0.150 [ms] (mean)
>> Time per request:       0.019 [ms] (mean, across all concurrent requests)
>> Transfer rate:          410871.30 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       0
>> Processing:     0    0   0.2      0       3
>> Waiting:        0    0   0.2      0       3
>> Total:          0    0   0.2      0       3
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%      0
>>   66%      0
>>   75%      0
>>   80%      0
>>   90%      0
>>   95%      1
>>   98%      1
>>   99%      1
>> 100%      3 (longest request)
>>
>>
>> And here with 1000 concurrent request ...
>>
>> $ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
>> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>>
>>
>> Server Software:
>> Server Hostname:        127.0.0.1
>> Server Port:            8080
>>
>> Document Path:          /
>> Document Length:        7734 bytes
>>
>> Concurrency Level:      1000
>> Time taken for tests:   0.225 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10232
>> Total transferred:      80660288 bytes
>> HTML transferred:       79196160 bytes
>> Requests per second:    45583.23 [#/sec] (mean)
>> Time per request:       21.938 [ms] (mean)
>> Time per request:       0.022 [ms] (mean, across all concurrent requests)
>> Transfer rate:          350642.85 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    1   3.3      0      23
>> Processing:     0    6  16.1      0     198
>> Waiting:        0    6  16.1      0     198
>> Total:          0    7  18.0      0     211
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%      0
>>   66%      2
>>   75%      6
>>   80%     10
>>   90%     21
>>   95%     32
>>   98%     47
>>   99%    108
>> 100%    211 (longest request)
>>
>>
>>
>> Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
>>> Joachim,
>>>
>>>> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>>>>
>>>> Victor,
>>>>
>>>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>>>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>>>>
>>>>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>>>> seriously, I think 20 is very optimistic for several reasons.
>>>>
>>>> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>>>>
>>>> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>>>>
>>>> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>>>>
>>>> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
>>>> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>>>>
>>>>
>>>> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>>>>
>>>> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>>>>
>>>> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>>>>
>>>> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>>>>
>>>> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>>>>
>>>> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>>>>
>>>> Joachim
>>> Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.
>>>
>>> But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).
>>>
>>> One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.
>>>
>>>
>>> $ pharo Pharo.image printVersion
>>> [version] 4.0 #40626
>>>
>>> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>>>
>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
>>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>
>>> Benchmarking 127.0.0.1 (be patient)
>>> Completed 1024 requests
>>> Completed 2048 requests
>>> Completed 3072 requests
>>> Completed 4096 requests
>>> Completed 5120 requests
>>> Completed 6144 requests
>>> Completed 7168 requests
>>> Completed 8192 requests
>>> Completed 9216 requests
>>> Completed 10240 requests
>>> Finished 10240 requests
>>>
>>>
>>> Server Software:        Zinc
>>> Server Hostname:        127.0.0.1
>>> Server Port:            1701
>>>
>>> Document Path:          /bytes/32
>>> Document Length:        32 bytes
>>>
>>> Concurrency Level:      8
>>> Time taken for tests:   1.945 seconds
>>> Complete requests:      10240
>>> Failed requests:        0
>>> Keep-Alive requests:    10240
>>> Total transferred:      2109440 bytes
>>> HTML transferred:       327680 bytes
>>> Requests per second:    5265.17 [#/sec] (mean)
>>> Time per request:       1.519 [ms] (mean)
>>> Time per request:       0.190 [ms] (mean, across all concurrent requests)
>>> Transfer rate:          1059.20 [Kbytes/sec] received
>>>
>>> Connection Times (ms)
>>>                min  mean[+/-sd] median   max
>>> Connect:        0    0   0.0      0       2
>>> Processing:     0    2   8.0      2     309
>>> Waiting:        0    1   8.0      1     309
>>> Total:          0    2   8.0      2     309
>>>
>>> Percentage of the requests served within a certain time (ms)
>>>    50%      2
>>>    66%      2
>>>    75%      2
>>>    80%      2
>>>    90%      2
>>>    95%      3
>>>    98%      3
>>>    99%      3
>>>   100%    309 (longest request)
>>>
>>>
>>> More than 5K req/s (10K requests, 8 concurrent clients).
>>>
>>> Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.
>>>
>>> A more realistic payload (7K HTML) gives the following:
>>>
>>>
>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
>>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>
>>> Benchmarking 127.0.0.1 (be patient)
>>> Completed 1024 requests
>>> Completed 2048 requests
>>> Completed 3072 requests
>>> Completed 4096 requests
>>> Completed 5120 requests
>>> Completed 6144 requests
>>> Completed 7168 requests
>>> Completed 8192 requests
>>> Completed 9216 requests
>>> Completed 10240 requests
>>> Finished 10240 requests
>>>
>>>
>>> Server Software:        Zinc
>>> Server Hostname:        127.0.0.1
>>> Server Port:            1701
>>>
>>> Document Path:          /dw-bench
>>> Document Length:        7734 bytes
>>>
>>> Concurrency Level:      8
>>> Time taken for tests:   7.874 seconds
>>> Complete requests:      10240
>>> Failed requests:        0
>>> Keep-Alive requests:    10240
>>> Total transferred:      80988160 bytes
>>> HTML transferred:       79196160 bytes
>>> Requests per second:    1300.46 [#/sec] (mean)
>>> Time per request:       6.152 [ms] (mean)
>>> Time per request:       0.769 [ms] (mean, across all concurrent requests)
>>> Transfer rate:          10044.25 [Kbytes/sec] received
>>>
>>> Connection Times (ms)
>>>                min  mean[+/-sd] median   max
>>> Connect:        0    0   0.0      0       0
>>> Processing:     1    6 183.4      1    7874
>>> Waiting:        1    6 183.4      1    7874
>>> Total:          1    6 183.4      1    7874
>>>
>>> Percentage of the requests served within a certain time (ms)
>>>    50%      1
>>>    66%      1
>>>    75%      1
>>>    80%      1
>>>    90%      1
>>>    95%      1
>>>    98%      1
>>>    99%      1
>>>   100%   7874 (longest request)
>>>
>>>
>>> That is more than 1K req/s.
>>>
>>> In both cases we are talking about sub 1ms req/resp cycles !
>>>
>>> I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.
>>>
>>> Sven
>>>
>>>
>>>
>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Sven Van Caekenberghe-2

> On 16 Dec 2016, at 11:33, Volkert <[hidden email]> wrote:
>
> come on, i am only interested in what set up Pharo is currently used (as mentioned in my initial question).
> This gives me a feeling, if my requirements are nearby the requirement found in current pharo
> based systems ... if i am complete out of population of current pharo systems, this is for me a good
> indication to bet not on it ...

Well, yes.

Norbert's conclusion (last two points) was spot on.

You can do in the order of 1K req/s on a single image. If you want more, you need to scale (horizontally). Either you don't share state and you can do that easily. Or you do share state and you will have to build something custom (perfectly doable, but you will have to architect/design for that, preferably upfront).

Note that the initial versions of all successful web apps that now serve millions of people on thousands of servers all started with very simple, inferior technology stacks. Make something great first, scale later.

> On 16.12.2016 10:41, Norbert Hartl wrote:
>> I'm still not sure about what we are talking. There are some many opinions regarding totally different things.
>>
>> These benchmark don't say much. As Sven and you did the benchmark on different machines they are hard to compare in numbers. It is not that important because you can not make many conclusions from a micro benchmark. So what Sven has proven is the fact that there is no limit for pharo per se to handle 1k requests per second and more.
>>
>> From these 1k req/s to Joachims 5 reqs/s is big difference. You can always assume there is something blocking the vm or a synchron I/O call takes a lot of time. But it is not helpful either because that is an edge case like Svens test with an app that does nothing. I would even state that it is not that easy to produce a situation like Joachim describes. If you have that kind of a problem than I'm pretty sure the reasons are mostly not pharo related. Sure if it comes to blocking I/O then it is pharo's fault because it cannot do async I/O, yet. But a slow database query is not the fault of pharo and you will experience the exact same thing in any other runtime.
>>
>> Whatever it will be there is no other way then to measure your exact use case and find the bottlenecks that prevent your app from being able to handle 1000 concurrent requests. While I agree with a lot of points mentioned in this thread I cannot share the general notion of saying that you reduce the number of requests per image and "just" use more images and more machines. That is not true.
>> The moment you cannot deal with all your requests in a single image you are in trouble. As soon as there is a second image you need to make sure there is no volatile shared state between those images. You need to take caution then. Scaling up using more images and more machines shifts problem to the database because it is a central component that is not easy to scale. But again it is not pharo's fault either.
>>
>> So I would state two things:
>>
>> - We are talking about really high numbers of requests/s. The odds you are getting in this kind of scaling trouble are usually close to zero. It means you need to generate an application that has really many users. Most projects we know end up using a single image for everything.
>> - Whenever you have performance problems in your application architecture I'm pretty sure pharo is not in the top of the list of bottlenecks.
>>
>> So yes, you can handle pretty huge numbers using pharo.
>>
>> Norbert
>>
>>> Am 16.12.2016 um 09:57 schrieb volkert <[hidden email]>:
>>>
>>> Sven,
>>>
>>> compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @ 3.20GHz × 4, on linux ...
>>>
>>> Conncurrent request: 8
>>>
>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
>>> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>
>>> Benchmarking 127.0.0.1 (be patient)
>>> Completed 1024 requests
>>> Completed 2048 requests
>>> Completed 3072 requests
>>> Completed 4096 requests
>>> Completed 5120 requests
>>> Completed 6144 requests
>>> Completed 7168 requests
>>> Completed 8192 requests
>>> Completed 9216 requests
>>> Completed 10240 requests
>>> Finished 10240 requests
>>>
>>>
>>> Server Software:
>>> Server Hostname:        127.0.0.1
>>> Server Port:            8080
>>>
>>> Document Path:          /
>>> Document Length:        7734 bytes
>>>
>>> Concurrency Level:      8
>>> Time taken for tests:   0.192 seconds
>>> Complete requests:      10240
>>> Failed requests:        0
>>> Keep-Alive requests:    10143
>>> Total transferred:      80658152 bytes
>>> HTML transferred:       79196160 bytes
>>> Requests per second:    53414.29 [#/sec] (mean)
>>> Time per request:       0.150 [ms] (mean)
>>> Time per request:       0.019 [ms] (mean, across all concurrent requests)
>>> Transfer rate:          410871.30 [Kbytes/sec] received
>>>
>>> Connection Times (ms)
>>>              min  mean[+/-sd] median   max
>>> Connect:        0    0   0.0      0       0
>>> Processing:     0    0   0.2      0       3
>>> Waiting:        0    0   0.2      0       3
>>> Total:          0    0   0.2      0       3
>>>
>>> Percentage of the requests served within a certain time (ms)
>>>  50%      0
>>>  66%      0
>>>  75%      0
>>>  80%      0
>>>  90%      0
>>>  95%      1
>>>  98%      1
>>>  99%      1
>>> 100%      3 (longest request)
>>>
>>>
>>> And here with 1000 concurrent request ...
>>>
>>> $ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
>>> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>
>>> Benchmarking 127.0.0.1 (be patient)
>>> Completed 1024 requests
>>> Completed 2048 requests
>>> Completed 3072 requests
>>> Completed 4096 requests
>>> Completed 5120 requests
>>> Completed 6144 requests
>>> Completed 7168 requests
>>> Completed 8192 requests
>>> Completed 9216 requests
>>> Completed 10240 requests
>>> Finished 10240 requests
>>>
>>>
>>> Server Software:
>>> Server Hostname:        127.0.0.1
>>> Server Port:            8080
>>>
>>> Document Path:          /
>>> Document Length:        7734 bytes
>>>
>>> Concurrency Level:      1000
>>> Time taken for tests:   0.225 seconds
>>> Complete requests:      10240
>>> Failed requests:        0
>>> Keep-Alive requests:    10232
>>> Total transferred:      80660288 bytes
>>> HTML transferred:       79196160 bytes
>>> Requests per second:    45583.23 [#/sec] (mean)
>>> Time per request:       21.938 [ms] (mean)
>>> Time per request:       0.022 [ms] (mean, across all concurrent requests)
>>> Transfer rate:          350642.85 [Kbytes/sec] received
>>>
>>> Connection Times (ms)
>>>              min  mean[+/-sd] median   max
>>> Connect:        0    1   3.3      0      23
>>> Processing:     0    6  16.1      0     198
>>> Waiting:        0    6  16.1      0     198
>>> Total:          0    7  18.0      0     211
>>>
>>> Percentage of the requests served within a certain time (ms)
>>>  50%      0
>>>  66%      2
>>>  75%      6
>>>  80%     10
>>>  90%     21
>>>  95%     32
>>>  98%     47
>>>  99%    108
>>> 100%    211 (longest request)
>>>
>>>
>>>
>>> Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
>>>> Joachim,
>>>>
>>>>> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>>>>>
>>>>> Victor,
>>>>>
>>>>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>>>>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>>>>>
>>>>>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>>>>> seriously, I think 20 is very optimistic for several reasons.
>>>>>
>>>>> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>>>>>
>>>>> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>>>>>
>>>>> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>>>>>
>>>>> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
>>>>> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>>>>>
>>>>>
>>>>> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>>>>>
>>>>> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>>>>>
>>>>> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>>>>>
>>>>> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>>>>>
>>>>> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>>>>>
>>>>> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>>>>>
>>>>> Joachim
>>>> Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.
>>>>
>>>> But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).
>>>>
>>>> One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.
>>>>
>>>>
>>>> $ pharo Pharo.image printVersion
>>>> [version] 4.0 #40626
>>>>
>>>> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>>>>
>>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
>>>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>
>>>> Benchmarking 127.0.0.1 (be patient)
>>>> Completed 1024 requests
>>>> Completed 2048 requests
>>>> Completed 3072 requests
>>>> Completed 4096 requests
>>>> Completed 5120 requests
>>>> Completed 6144 requests
>>>> Completed 7168 requests
>>>> Completed 8192 requests
>>>> Completed 9216 requests
>>>> Completed 10240 requests
>>>> Finished 10240 requests
>>>>
>>>>
>>>> Server Software:        Zinc
>>>> Server Hostname:        127.0.0.1
>>>> Server Port:            1701
>>>>
>>>> Document Path:          /bytes/32
>>>> Document Length:        32 bytes
>>>>
>>>> Concurrency Level:      8
>>>> Time taken for tests:   1.945 seconds
>>>> Complete requests:      10240
>>>> Failed requests:        0
>>>> Keep-Alive requests:    10240
>>>> Total transferred:      2109440 bytes
>>>> HTML transferred:       327680 bytes
>>>> Requests per second:    5265.17 [#/sec] (mean)
>>>> Time per request:       1.519 [ms] (mean)
>>>> Time per request:       0.190 [ms] (mean, across all concurrent requests)
>>>> Transfer rate:          1059.20 [Kbytes/sec] received
>>>>
>>>> Connection Times (ms)
>>>>               min  mean[+/-sd] median   max
>>>> Connect:        0    0   0.0      0       2
>>>> Processing:     0    2   8.0      2     309
>>>> Waiting:        0    1   8.0      1     309
>>>> Total:          0    2   8.0      2     309
>>>>
>>>> Percentage of the requests served within a certain time (ms)
>>>>   50%      2
>>>>   66%      2
>>>>   75%      2
>>>>   80%      2
>>>>   90%      2
>>>>   95%      3
>>>>   98%      3
>>>>   99%      3
>>>>  100%    309 (longest request)
>>>>
>>>>
>>>> More than 5K req/s (10K requests, 8 concurrent clients).
>>>>
>>>> Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.
>>>>
>>>> A more realistic payload (7K HTML) gives the following:
>>>>
>>>>
>>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
>>>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>
>>>> Benchmarking 127.0.0.1 (be patient)
>>>> Completed 1024 requests
>>>> Completed 2048 requests
>>>> Completed 3072 requests
>>>> Completed 4096 requests
>>>> Completed 5120 requests
>>>> Completed 6144 requests
>>>> Completed 7168 requests
>>>> Completed 8192 requests
>>>> Completed 9216 requests
>>>> Completed 10240 requests
>>>> Finished 10240 requests
>>>>
>>>>
>>>> Server Software:        Zinc
>>>> Server Hostname:        127.0.0.1
>>>> Server Port:            1701
>>>>
>>>> Document Path:          /dw-bench
>>>> Document Length:        7734 bytes
>>>>
>>>> Concurrency Level:      8
>>>> Time taken for tests:   7.874 seconds
>>>> Complete requests:      10240
>>>> Failed requests:        0
>>>> Keep-Alive requests:    10240
>>>> Total transferred:      80988160 bytes
>>>> HTML transferred:       79196160 bytes
>>>> Requests per second:    1300.46 [#/sec] (mean)
>>>> Time per request:       6.152 [ms] (mean)
>>>> Time per request:       0.769 [ms] (mean, across all concurrent requests)
>>>> Transfer rate:          10044.25 [Kbytes/sec] received
>>>>
>>>> Connection Times (ms)
>>>>               min  mean[+/-sd] median   max
>>>> Connect:        0    0   0.0      0       0
>>>> Processing:     1    6 183.4      1    7874
>>>> Waiting:        1    6 183.4      1    7874
>>>> Total:          1    6 183.4      1    7874
>>>>
>>>> Percentage of the requests served within a certain time (ms)
>>>>   50%      1
>>>>   66%      1
>>>>   75%      1
>>>>   80%      1
>>>>   90%      1
>>>>   95%      1
>>>>   98%      1
>>>>   99%      1
>>>>  100%   7874 (longest request)
>>>>
>>>>
>>>> That is more than 1K req/s.
>>>>
>>>> In both cases we are talking about sub 1ms req/resp cycles !
>>>>
>>>> I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.
>>>>
>>>> Sven
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

NorbertHartl

> Am 16.12.2016 um 11:50 schrieb Sven Van Caekenberghe <[hidden email]>:
>
>
>> On 16 Dec 2016, at 11:33, Volkert <[hidden email]> wrote:
>>
>> come on, i am only interested in what set up Pharo is currently used (as mentioned in my initial question).
>> This gives me a feeling, if my requirements are nearby the requirement found in current pharo
>> based systems ... if i am complete out of population of current pharo systems, this is for me a good
>> indication to bet not on it ...
>
> Well, yes.
>
> Norbert's conclusion (last two points) was spot on.
>
> You can do in the order of 1K req/s on a single image. If you want more, you need to scale (horizontally). Either you don't share state and you can do that easily. Or you do share state and you will have to build something custom (perfectly doable, but you will have to architect/design for that, preferably upfront).
>
> Note that the initial versions of all successful web apps that now serve millions of people on thousands of servers all started with very simple, inferior technology stacks. Make something great first, scale later.

So true! Because there is nothing that just scales. The perfect solution for 100 concurrent requests is likely to be very different to the perfect solution for 1000 concurrent requests etc. And expect to discover bottlenecks you couldn't really anticipate.

Norbert

>
>> On 16.12.2016 10:41, Norbert Hartl wrote:
>>> I'm still not sure about what we are talking. There are some many opinions regarding totally different things.
>>>
>>> These benchmark don't say much. As Sven and you did the benchmark on different machines they are hard to compare in numbers. It is not that important because you can not make many conclusions from a micro benchmark. So what Sven has proven is the fact that there is no limit for pharo per se to handle 1k requests per second and more.
>>>
>>> From these 1k req/s to Joachims 5 reqs/s is big difference. You can always assume there is something blocking the vm or a synchron I/O call takes a lot of time. But it is not helpful either because that is an edge case like Svens test with an app that does nothing. I would even state that it is not that easy to produce a situation like Joachim describes. If you have that kind of a problem than I'm pretty sure the reasons are mostly not pharo related. Sure if it comes to blocking I/O then it is pharo's fault because it cannot do async I/O, yet. But a slow database query is not the fault of pharo and you will experience the exact same thing in any other runtime.
>>>
>>> Whatever it will be there is no other way then to measure your exact use case and find the bottlenecks that prevent your app from being able to handle 1000 concurrent requests. While I agree with a lot of points mentioned in this thread I cannot share the general notion of saying that you reduce the number of requests per image and "just" use more images and more machines. That is not true.
>>> The moment you cannot deal with all your requests in a single image you are in trouble. As soon as there is a second image you need to make sure there is no volatile shared state between those images. You need to take caution then. Scaling up using more images and more machines shifts problem to the database because it is a central component that is not easy to scale. But again it is not pharo's fault either.
>>>
>>> So I would state two things:
>>>
>>> - We are talking about really high numbers of requests/s. The odds you are getting in this kind of scaling trouble are usually close to zero. It means you need to generate an application that has really many users. Most projects we know end up using a single image for everything.
>>> - Whenever you have performance problems in your application architecture I'm pretty sure pharo is not in the top of the list of bottlenecks.
>>>
>>> So yes, you can handle pretty huge numbers using pharo.
>>>
>>> Norbert
>>>
>>>> Am 16.12.2016 um 09:57 schrieb volkert <[hidden email]>:
>>>>
>>>> Sven,
>>>>
>>>> compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @ 3.20GHz × 4, on linux ...
>>>>
>>>> Conncurrent request: 8
>>>>
>>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
>>>> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>
>>>> Benchmarking 127.0.0.1 (be patient)
>>>> Completed 1024 requests
>>>> Completed 2048 requests
>>>> Completed 3072 requests
>>>> Completed 4096 requests
>>>> Completed 5120 requests
>>>> Completed 6144 requests
>>>> Completed 7168 requests
>>>> Completed 8192 requests
>>>> Completed 9216 requests
>>>> Completed 10240 requests
>>>> Finished 10240 requests
>>>>
>>>>
>>>> Server Software:
>>>> Server Hostname:        127.0.0.1
>>>> Server Port:            8080
>>>>
>>>> Document Path:          /
>>>> Document Length:        7734 bytes
>>>>
>>>> Concurrency Level:      8
>>>> Time taken for tests:   0.192 seconds
>>>> Complete requests:      10240
>>>> Failed requests:        0
>>>> Keep-Alive requests:    10143
>>>> Total transferred:      80658152 bytes
>>>> HTML transferred:       79196160 bytes
>>>> Requests per second:    53414.29 [#/sec] (mean)
>>>> Time per request:       0.150 [ms] (mean)
>>>> Time per request:       0.019 [ms] (mean, across all concurrent requests)
>>>> Transfer rate:          410871.30 [Kbytes/sec] received
>>>>
>>>> Connection Times (ms)
>>>>             min  mean[+/-sd] median   max
>>>> Connect:        0    0   0.0      0       0
>>>> Processing:     0    0   0.2      0       3
>>>> Waiting:        0    0   0.2      0       3
>>>> Total:          0    0   0.2      0       3
>>>>
>>>> Percentage of the requests served within a certain time (ms)
>>>> 50%      0
>>>> 66%      0
>>>> 75%      0
>>>> 80%      0
>>>> 90%      0
>>>> 95%      1
>>>> 98%      1
>>>> 99%      1
>>>> 100%      3 (longest request)
>>>>
>>>>
>>>> And here with 1000 concurrent request ...
>>>>
>>>> $ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
>>>> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>
>>>> Benchmarking 127.0.0.1 (be patient)
>>>> Completed 1024 requests
>>>> Completed 2048 requests
>>>> Completed 3072 requests
>>>> Completed 4096 requests
>>>> Completed 5120 requests
>>>> Completed 6144 requests
>>>> Completed 7168 requests
>>>> Completed 8192 requests
>>>> Completed 9216 requests
>>>> Completed 10240 requests
>>>> Finished 10240 requests
>>>>
>>>>
>>>> Server Software:
>>>> Server Hostname:        127.0.0.1
>>>> Server Port:            8080
>>>>
>>>> Document Path:          /
>>>> Document Length:        7734 bytes
>>>>
>>>> Concurrency Level:      1000
>>>> Time taken for tests:   0.225 seconds
>>>> Complete requests:      10240
>>>> Failed requests:        0
>>>> Keep-Alive requests:    10232
>>>> Total transferred:      80660288 bytes
>>>> HTML transferred:       79196160 bytes
>>>> Requests per second:    45583.23 [#/sec] (mean)
>>>> Time per request:       21.938 [ms] (mean)
>>>> Time per request:       0.022 [ms] (mean, across all concurrent requests)
>>>> Transfer rate:          350642.85 [Kbytes/sec] received
>>>>
>>>> Connection Times (ms)
>>>>             min  mean[+/-sd] median   max
>>>> Connect:        0    1   3.3      0      23
>>>> Processing:     0    6  16.1      0     198
>>>> Waiting:        0    6  16.1      0     198
>>>> Total:          0    7  18.0      0     211
>>>>
>>>> Percentage of the requests served within a certain time (ms)
>>>> 50%      0
>>>> 66%      2
>>>> 75%      6
>>>> 80%     10
>>>> 90%     21
>>>> 95%     32
>>>> 98%     47
>>>> 99%    108
>>>> 100%    211 (longest request)
>>>>
>>>>
>>>>
>>>> Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
>>>>> Joachim,
>>>>>
>>>>>> On 15 Dec 2016, at 11:43, [hidden email] wrote:
>>>>>>
>>>>>> Victor,
>>>>>>
>>>>>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>>>>>> If I tell you that my current estimate is that a Smalltalk image with Seaside will not be able to handle more than 20 concurrent users, in many cases even less.
>>>>>>>
>>>>>>> Seriously? That is kinda a low number, I would expect more for each image. Certainly it depends much on many things, but it is certainly very low for a rough estimate, why you say that?
>>>>>> seriously, I think 20 is very optimistic for several reasons.
>>>>>>
>>>>>> One, you want to be fast and responsive for every single user, so there is absolutely no point in going too close to any limit. It's easy to lose users by providing bad experience.
>>>>>>
>>>>>> Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is only for 0.3 seconds, you postpone processing for each "unaffected" user by these 0.3 seconds, so this adds to significant delays in response time. And if you do some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the materialization and stuff within the Smalltalk image.
>>>>>>
>>>>>> Seaside adapters usually start off green threads for each request. But there are things that need to be serialized (like in a critical Block). So in reality, users block each other way more often than you'd like.
>>>>>>
>>>>>> So if you asked me to give a more realistic estimation, I'd correct myself down to a number between 5 and probably a maximum of 10 users. Everything else means you must use all those fancy tricks and tools people mention in this thread.
>>>>>> So what you absolutely need to do is start with an estimate of 5 concurrent users per image and look for ways to distribute work among servers/images so that these blocking situations are down to a minimum. If you find your software works much better, congratulate yourself and stack up new machines more slowly than initially estimated.
>>>>>>
>>>>>>
>>>>>> Before you turn around and say: Smalltalk is unsuitable for the web, let's take a brief look at what concurrent users really means. Concurrent users are users that request some processing from the server at they very same time (maybe within an interval of 200-400msec). This is not the same as 5 people being currently logged on to the server and requesting something sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at the same time.
>>>>>>
>>>>>> Then there is this sad "share all vs. share nothing" argument. In Seaside you keep all your objects alive (read from db and materialized) between web requests. IN share nothing, you read everything back from disc/db whenever a request comes in. This also takes time and ressources (and pssibly blocks the server for the blink of an eye or two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla and guesses of what definitely must be better in one's world.
>>>>>>
>>>>>> Why do I come up with this share everything stuff? Because it usually means that each user that is logged on holds onto a load of objects on the server side (session storage), like their user account, shopping card, settings, last purchases, account information and whatnot. That's easily a list of a few thousand objects (and be it only Proxies) that take up space and want to be inspected by the garbage collector. So each connected user not only needs CPU cycles whenever they send a request to the server, but also uses RAM. In our case, this can easily be 5-10 MB of objects per user. Add to that the shadow copies that your persistence mechanism needs for undo and stuff, and all the data Seaside needs for Continuations etc, and each logged on users needs 15, 20 or more MB of object space. Connect ten users and you have 150-200 MB. That is not a problem per se, but also means there is some hard limit, especially in a 32 bit world. You don't want your server to slow down because it cannot allocate new memory or can't find contiguous slots for stuff and GCs all the time.
>>>>>>
>>>>>> To sum up, I think the number of influencing factors is way too high to really give a good estimate. Our experience (based on our mix of computation and I/O) says that 5 concurrent users per image is doable without negative impact on other users. Some operations take so much time that you really need to move them out of the front-facing image and distribute work to backend servers. More than 5 is probably possible but chances are that there are operations that will affect all users and with every additional user there is a growing chance that you have 2 or more requesting the yery same operation within a very short interval. This will make things worse and worse.
>>>>>>
>>>>>> So I trust in you guys having lots of cool tools around and knowing loads of tricks to wrench out much more power of a single Smalltalk image, but you also need to take a look at your productivity and speed in creating new features and fixing bugs. Sometimes throwing hardware at a problem like growth and starting with a clever architecture to scale on multiple layers is just the perfect thing to do. To me, handling 7 instead of 5 concurrent users is not such a big win as long as we are not in a posotion where we have so many users that this really matters. For sites like Amazon, Google, Facebook etc. saving 40% in server cost by optimizing the software (investing a few man years) is significant. I hope we'll soon change our mind about this question ;-)
>>>>>>
>>>>>> So load balancing and services outsourced to backend servers are key to scalability. This, btw, is not smalltalk specific (some people seem to think you won't get these problems in Java or Ruby because they are made for the web...).
>>>>>>
>>>>>> Joachim
>>>>> Everything you say, all your considerations, especially the last paragraph is/are correct and I agree.
>>>>>
>>>>> But some people will only remember the very low number you seem to be suggesting (which is more of a worse case scenario, with Seaside+blocking/slow connections to back end systems).
>>>>>
>>>>> One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is quick & dirty benchmark I just did on one of our modern/big machines (inside an LXD container, light load) using a single stock image on Linux.
>>>>>
>>>>>
>>>>> $ pharo Pharo.image printVersion
>>>>> [version] 4.0 #40626
>>>>>
>>>>> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>>>>>
>>>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
>>>>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>>
>>>>> Benchmarking 127.0.0.1 (be patient)
>>>>> Completed 1024 requests
>>>>> Completed 2048 requests
>>>>> Completed 3072 requests
>>>>> Completed 4096 requests
>>>>> Completed 5120 requests
>>>>> Completed 6144 requests
>>>>> Completed 7168 requests
>>>>> Completed 8192 requests
>>>>> Completed 9216 requests
>>>>> Completed 10240 requests
>>>>> Finished 10240 requests
>>>>>
>>>>>
>>>>> Server Software:        Zinc
>>>>> Server Hostname:        127.0.0.1
>>>>> Server Port:            1701
>>>>>
>>>>> Document Path:          /bytes/32
>>>>> Document Length:        32 bytes
>>>>>
>>>>> Concurrency Level:      8
>>>>> Time taken for tests:   1.945 seconds
>>>>> Complete requests:      10240
>>>>> Failed requests:        0
>>>>> Keep-Alive requests:    10240
>>>>> Total transferred:      2109440 bytes
>>>>> HTML transferred:       327680 bytes
>>>>> Requests per second:    5265.17 [#/sec] (mean)
>>>>> Time per request:       1.519 [ms] (mean)
>>>>> Time per request:       0.190 [ms] (mean, across all concurrent requests)
>>>>> Transfer rate:          1059.20 [Kbytes/sec] received
>>>>>
>>>>> Connection Times (ms)
>>>>>              min  mean[+/-sd] median   max
>>>>> Connect:        0    0   0.0      0       2
>>>>> Processing:     0    2   8.0      2     309
>>>>> Waiting:        0    1   8.0      1     309
>>>>> Total:          0    2   8.0      2     309
>>>>>
>>>>> Percentage of the requests served within a certain time (ms)
>>>>>  50%      2
>>>>>  66%      2
>>>>>  75%      2
>>>>>  80%      2
>>>>>  90%      2
>>>>>  95%      3
>>>>>  98%      3
>>>>>  99%      3
>>>>> 100%    309 (longest request)
>>>>>
>>>>>
>>>>> More than 5K req/s (10K requests, 8 concurrent clients).
>>>>>
>>>>> Granted, this is only for just 32 bytes payload and the loopback network interface. But this is the other end of the interval, the maximum speed.
>>>>>
>>>>> A more realistic payload (7K HTML) gives the following:
>>>>>
>>>>>
>>>>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
>>>>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>>>>> Licensed to The Apache Software Foundation, http://www.apache.org/
>>>>>
>>>>> Benchmarking 127.0.0.1 (be patient)
>>>>> Completed 1024 requests
>>>>> Completed 2048 requests
>>>>> Completed 3072 requests
>>>>> Completed 4096 requests
>>>>> Completed 5120 requests
>>>>> Completed 6144 requests
>>>>> Completed 7168 requests
>>>>> Completed 8192 requests
>>>>> Completed 9216 requests
>>>>> Completed 10240 requests
>>>>> Finished 10240 requests
>>>>>
>>>>>
>>>>> Server Software:        Zinc
>>>>> Server Hostname:        127.0.0.1
>>>>> Server Port:            1701
>>>>>
>>>>> Document Path:          /dw-bench
>>>>> Document Length:        7734 bytes
>>>>>
>>>>> Concurrency Level:      8
>>>>> Time taken for tests:   7.874 seconds
>>>>> Complete requests:      10240
>>>>> Failed requests:        0
>>>>> Keep-Alive requests:    10240
>>>>> Total transferred:      80988160 bytes
>>>>> HTML transferred:       79196160 bytes
>>>>> Requests per second:    1300.46 [#/sec] (mean)
>>>>> Time per request:       6.152 [ms] (mean)
>>>>> Time per request:       0.769 [ms] (mean, across all concurrent requests)
>>>>> Transfer rate:          10044.25 [Kbytes/sec] received
>>>>>
>>>>> Connection Times (ms)
>>>>>              min  mean[+/-sd] median   max
>>>>> Connect:        0    0   0.0      0       0
>>>>> Processing:     1    6 183.4      1    7874
>>>>> Waiting:        1    6 183.4      1    7874
>>>>> Total:          1    6 183.4      1    7874
>>>>>
>>>>> Percentage of the requests served within a certain time (ms)
>>>>>  50%      1
>>>>>  66%      1
>>>>>  75%      1
>>>>>  80%      1
>>>>>  90%      1
>>>>>  95%      1
>>>>>  98%      1
>>>>>  99%      1
>>>>> 100%   7874 (longest request)
>>>>>
>>>>>
>>>>> That is more than 1K req/s.
>>>>>
>>>>> In both cases we are talking about sub 1ms req/resp cycles !
>>>>>
>>>>> I think all commercial users of Pharo today know what is possible and what needs to be done to achieve their goals. Pure speed might not be the main consideration, ease/speed/joy of development and just being capable of solving complex problems and offering compelling solutions to end users is probably more important.
>>>>>
>>>>> Sven
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

philippeback
In reply to this post by Sven Van Caekenberghe-2
That, just that.
There is something in Pharo that I just do not experience elsewhere.

Phil

On Fri, Dec 16, 2016 at 10:05 AM, Sven Van Caekenberghe <[hidden email]> wrote:

I choose to do that in Pharo because I like it so much. It is perfectly fine by me that 99.xx % of the world makes other decisions, for whatever reason.

Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

EstebanLM
In reply to this post by NorbertHartl
Hi, 

On 16 Dec 2016, at 10:41, Norbert Hartl <[hidden email]> wrote:

 We are talking about really high numbers of requests/s. The odds you are getting in this kind of scaling trouble are usually close to zero. It means you need to generate an application that has really many users. Most projects we know end up using a single image for everything. 

amen to everything, but this in particular. 1000 /concurrent/ requests are a HUGE amount of requests most applications will never need. 

Remember concurrent does not means simultaneous but in the same lapse… which means in any fraction of time you measures you can count 1000 requests being processed (no matter if that’s 1ms, 1s or 1m)… when I was designing web-applications all the time, the count I usually was doing was: the number of users I expect to have, grouped by time-picks then I was dividing that per 50/s (this was an “obscure” heuristic I got from some even more obscure general observation that have much to do with the fact that people spend much more time looking a monitor than clicking a mouse). 

For example: to serve an application to 1000 users,

- let’s consider 80% are connected at pick times = 800 users who I need to serve
- = roughly 40 requests per second… 

so in general a couple of tomcats would be ok  (because at the time I was working in java). 
… or 4 pharos. 

now, as I always said at the time: this are estimations that are meant to calme the stress of customers (the ones who pays for the projects) or my project managers (who didn’t know much about systems anyway)… 
and they just worked as “pain-killers” because since I really cannot know how much will take a request I cannot measure anything. 
Even worst: I’m assuming all request take same time, which absolutely a non sense. 

But well, since people (both customers and managers) always made that question, I made up that number based in my own observation (20 years of experience, not so bad) that “in general, a Tomcat can handle about 40 req/s and Seaside can handle something around 15… and you always need to calculate a bit more because of murphy’s law”. Fun fact: the estimation was in general correct :P

In conclusion: if you *really* need to serve 1000 concurrent users, you’ll probably have the budget to make it right :)

Esteban
Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

philippeback
I have been doing lots of Tomcat as well and helped some people at Orange scale some of their mobile provisioning stuff.
It scales. But one would scale Pharo just the same.


Lets of parms in there but can help scale things and do session affinity.


Phil


On Fri, Dec 16, 2016 at 1:50 PM, Esteban Lorenzano <[hidden email]> wrote:
Hi, 

On 16 Dec 2016, at 10:41, Norbert Hartl <[hidden email]> wrote:

 We are talking about really high numbers of requests/s. The odds you are getting in this kind of scaling trouble are usually close to zero. It means you need to generate an application that has really many users. Most projects we know end up using a single image for everything. 

amen to everything, but this in particular. 1000 /concurrent/ requests are a HUGE amount of requests most applications will never need. 

Remember concurrent does not means simultaneous but in the same lapse… which means in any fraction of time you measures you can count 1000 requests being processed (no matter if that’s 1ms, 1s or 1m)… when I was designing web-applications all the time, the count I usually was doing was: the number of users I expect to have, grouped by time-picks then I was dividing that per 50/s (this was an “obscure” heuristic I got from some even more obscure general observation that have much to do with the fact that people spend much more time looking a monitor than clicking a mouse). 

For example: to serve an application to 1000 users,

- let’s consider 80% are connected at pick times = 800 users who I need to serve
- = roughly 40 requests per second… 

so in general a couple of tomcats would be ok  (because at the time I was working in java). 
… or 4 pharos. 

now, as I always said at the time: this are estimations that are meant to calme the stress of customers (the ones who pays for the projects) or my project managers (who didn’t know much about systems anyway)… 
and they just worked as “pain-killers” because since I really cannot know how much will take a request I cannot measure anything. 
Even worst: I’m assuming all request take same time, which absolutely a non sense. 

But well, since people (both customers and managers) always made that question, I made up that number based in my own observation (20 years of experience, not so bad) that “in general, a Tomcat can handle about 40 req/s and Seaside can handle something around 15… and you always need to calculate a bit more because of murphy’s law”. Fun fact: the estimation was in general correct :P

In conclusion: if you *really* need to serve 1000 concurrent users, you’ll probably have the budget to make it right :)

Esteban

Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Michael J. Forster-2
In reply to this post by jtuchel
On 16 December 2016 at 03:15, [hidden email]
<[hidden email]> wrote:

> Sven,
>
> Am 16.12.16 um 10:05 schrieb Sven Van Caekenberghe:
>>
>> I did not say we are the fastest, for from it. I absolutely do not want to
>> go into a contest, there is no point in doing so.
>
> Absolutely right.
>>
>>
>> (The dw-bench page was meant to be generated dynamically on each request
>> without caching, did you do that too ?).
>>
>> My point was: Pharo is good enough for most web applications. The rest of
>> the challenge is standard software architecture, design and development. I
>> choose to do that in Pharo because I like it so much. It is perfectly fine
>> by me that 99.xx % of the world makes other decisions, for whatever reason.
>
> Exactly. Smalltalk and Seaside are perfectly suited for web applications and
> are not per se extremely slow or anything.
> Raw benchmarks are some indicator, but whether a web applicaton is fast or
> slow depends so much more on your applications' architecture than the
> underlying HTTP handling stuff.
>
> The important question is not "how fast can Smalltalk serve a number of
> bytes?" but "how fast can your application do whatever is needed to put
> those bytes together?".
>
> So your benchmarks show that Smalltalk can serve stuff more than fast enough
> for amost all situations (let's be honest, most of us will never have to
> server thousands of concurrent users - of course I hope I am wrong ;-) ).
> The rest is application architecture, infrastructure and avoiding stupid
> errors. Nothing Smalltalk specific.
>
>
> Joachim
>
[...]


In our benchmarking and production experience, Pharo--even with
Seaside--has fared well against Common Lisp, Java, Ruby, and Erlang
web applications in terms of _page delivery speed_. Erlang positively
embarrasses the others at handling concurrent requests, and it does so
extremely cost-effectively in terms of hardware. Pharo (and Seaside)
does likewise, at the other end of the spectrum, when developing
sophisticated application workflow.

And that's the inflection point--a painful one--for us. To experience
such effective time to market and maintenance, grow, and then trade it
all to scale to 500+ concurrent users on a single t2.medium instance
to keep hardware costs in check.


Mike

Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Sven Van Caekenberghe-2

> On 17 Dec 2016, at 19:39, Michael J. Forster <[hidden email]> wrote:
>
> On 16 December 2016 at 03:15, [hidden email]
> <[hidden email]> wrote:
>> Sven,
>>
>> Am 16.12.16 um 10:05 schrieb Sven Van Caekenberghe:
>>>
>>> I did not say we are the fastest, for from it. I absolutely do not want to
>>> go into a contest, there is no point in doing so.
>>
>> Absolutely right.
>>>
>>>
>>> (The dw-bench page was meant to be generated dynamically on each request
>>> without caching, did you do that too ?).
>>>
>>> My point was: Pharo is good enough for most web applications. The rest of
>>> the challenge is standard software architecture, design and development. I
>>> choose to do that in Pharo because I like it so much. It is perfectly fine
>>> by me that 99.xx % of the world makes other decisions, for whatever reason.
>>
>> Exactly. Smalltalk and Seaside are perfectly suited for web applications and
>> are not per se extremely slow or anything.
>> Raw benchmarks are some indicator, but whether a web applicaton is fast or
>> slow depends so much more on your applications' architecture than the
>> underlying HTTP handling stuff.
>>
>> The important question is not "how fast can Smalltalk serve a number of
>> bytes?" but "how fast can your application do whatever is needed to put
>> those bytes together?".
>>
>> So your benchmarks show that Smalltalk can serve stuff more than fast enough
>> for amost all situations (let's be honest, most of us will never have to
>> server thousands of concurrent users - of course I hope I am wrong ;-) ).
>> The rest is application architecture, infrastructure and avoiding stupid
>> errors. Nothing Smalltalk specific.
>>
>>
>> Joachim
>>
> [...]
>
>
> In our benchmarking and production experience, Pharo--even with
> Seaside--has fared well against Common Lisp, Java, Ruby, and Erlang
> web applications in terms of _page delivery speed_. Erlang positively
> embarrasses the others at handling concurrent requests, and it does so
> extremely cost-effectively in terms of hardware. Pharo (and Seaside)
> does likewise, at the other end of the spectrum, when developing
> sophisticated application workflow.
>
> And that's the inflection point--a painful one--for us. To experience
> such effective time to market and maintenance, grow, and then trade it
> all to scale to 500+ concurrent users on a single t2.medium instance
> to keep hardware costs in check.

I think I understand your point, and in some specific situations that might be true. But if you can only afford to pay $35 a month for your hardware, how low must your income be ? Are you in a commercially viable enterprise then ?

For a couple of $1000s you can get the equivalent of 10s if not up to 100 of those instances. And that is still much less than office rent, let alone 1 employee.

The challenge today is not the cost of cloud hardware, it is simply building and operating your application. That is assuming you can sell it enough to make a living from it.

> Mike


Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Michael J. Forster-2
On 17 December 2016 at 14:09, Sven Van Caekenberghe <[hidden email]> wrote:
[...]
>
> I think I understand your point, and in some specific situations that might be true. But if you can only afford to pay $35 a month for your hardware, how low must your income be ? Are you in a commercially viable enterprise then ?
>
> For a couple of $1000s you can get the equivalent of 10s if not up to 100 of those instances. And that is still much less than office rent, let alone 1 employee.
>
> The challenge today is not the cost of cloud hardware, it is simply building and operating your application. That is assuming you can sell it enough to make a living from it.
>


Hi Sven,

No disagreement. I should clarify that, although I'm writing these
from my old email address, I'm not talking about a software
development company these days. I am now the analyst, programmer, and
system administrator for a long time client, and software is at the
heart of what we do, but we don't make money from it directly. We have
2.2MLOC of Smalltalk, Common Lisp, Erlang, and PostgreSQL spread over
30 systems, 20 databases, and 160 data entry terminals.

So, for example, while we need to sell concert tickets at an opening
peak of 10,000 seats in the first hour 10 times per year, and while we
deploy a dozen hardware-constrained gate admission nodes in a
distributed cluster that must validate up to 1000 tickets per second,
we also have some really complicated personnel scheduling and customer
reporting applications. The former really can't be done in anything
but Erlang where not just hardware but sysadmin cost is a factor.  For
the latter, I wouldn't choose anything other than Pharo+Seaside--for
the same reasons you give.

In a nutshell, we have, at times, startup/enterprise-class computing
problems and costs funded by small business revenue.

Mike

Reply | Threaded
Open this post in threaded view
|

Re: real world pharo web application set ups

Sven Van Caekenberghe-2
Mike,

> On 17 Dec 2016, at 22:48, Michael J. Forster <[hidden email]> wrote:
>
> On 17 December 2016 at 14:09, Sven Van Caekenberghe <[hidden email]> wrote:
> [...]
>>
>> I think I understand your point, and in some specific situations that might be true. But if you can only afford to pay $35 a month for your hardware, how low must your income be ? Are you in a commercially viable enterprise then ?
>>
>> For a couple of $1000s you can get the equivalent of 10s if not up to 100 of those instances. And that is still much less than office rent, let alone 1 employee.
>>
>> The challenge today is not the cost of cloud hardware, it is simply building and operating your application. That is assuming you can sell it enough to make a living from it.
>>
>
>
> Hi Sven,
>
> No disagreement. I should clarify that, although I'm writing these
> from my old email address, I'm not talking about a software
> development company these days. I am now the analyst, programmer, and
> system administrator for a long time client, and software is at the
> heart of what we do, but we don't make money from it directly. We have
> 2.2MLOC of Smalltalk, Common Lisp, Erlang, and PostgreSQL spread over
> 30 systems, 20 databases, and 160 data entry terminals.

Sounds quite familiar, real world stuff.

> So, for example, while we need to sell concert tickets at an opening
> peak of 10,000 seats in the first hour 10 times per year, and while we
> deploy a dozen hardware-constrained gate admission nodes in a
> distributed cluster that must validate up to 1000 tickets per second,
> we also have some really complicated personnel scheduling and customer
> reporting applications. The former really can't be done in anything
> but Erlang where not just hardware but sysadmin cost is a factor.  For
> the latter, I wouldn't choose anything other than Pharo+Seaside--for
> the same reasons you give.

Yeah, such peaks are a special case indeed (having been on the buyer side end as well), I would not use Pharo in the direct path there either.

> In a nutshell, we have, at times, startup/enterprise-class computing
> problems and costs funded by small business revenue.
>
> Mike

It would be interesting to hear more from your applications - please consider contributing a story to http://pharo.org/success - we need them.

Thanks for this discussion.

Sven


12