about growing streams and allocating buffers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

about growing streams and allocating buffers

Philippe Marschall-2-3
Hi

I've been doing some performance work lately in Seaside. Long story
short Seaside (and I guess AIDA too) spends of if it's rendering time in
WriteStream (#nextPutAll:, #nextPut:).

The way WriteStream >> #pastEndPut: behaves is not really ideal for
Seaside. It grows the underlying collection by just enough to
accommodate the argument collection (or 20 which ever is bigger). Now
image the following not very unlikely scenario. You start with a 4k
buffer and put on average a 10 element collection (remember all those
tags are put individually) until you have a 16k response. You allocate
more than a thousand intermediary collections to get there.
What would be better suited for Seaside is doubling the required size.
In the worst case that would mean wasting 50% of memory but it would
make the overhead of creating intermediary collections logarithmic. In
the given example that would take us only three allocations to get there.
Now I do realize there are other applications for Pharo where this
strategy is not ideal and this is not a killer for us. I just wanted to
shed some light and this and ask whether other projects are in a similar
situation.

To get a feel how allocation limited Seaside is: avoiding one allocation
of a 16k ByteArray per request can make a difference in throughput
between 10 Mbyte/s and 30 Mbyte/s (see "[Progress Report] Zinc HTTP
Components"). If anybody knows a way to make allocation of large young
space objects faster (Set GC Bias to Grow?, #vmParameterAt:put:?) I'd
like to hear it.

Cheers
Philippe


Reply | Threaded
Open this post in threaded view
|

Re: about growing streams and allocating buffers

Stéphane Ducasse
> Hi
>
> I've been doing some performance work lately in Seaside. Long story
> short Seaside (and I guess AIDA too) spends of if it's rendering time in
> WriteStream (#nextPutAll:, #nextPut:).
>
> The way WriteStream >> #pastEndPut: behaves is not really ideal for
> Seaside. It grows the underlying collection by just enough to
> accommodate the argument collection (or 20 which ever is bigger). Now
> image the following not very unlikely scenario. You start with a 4k
> buffer and put on average a 10 element collection (remember all those
> tags are put individually) until you have a 16k response. You allocate
> more than a thousand intermediary collections to get there.
> What would be better suited for Seaside is doubling the required size.
> In the worst case that would mean wasting 50% of memory but it would
> make the overhead of creating intermediary collections logarithmic. In
> the given example that would take us only three allocations to get there.
> Now I do realize there are other applications for Pharo where this
> strategy is not ideal and this is not a killer for us. I just wanted to
> shed some light and this and ask whether other projects are in a similar
> situation.

Thanks for the info.
Do you have an idea in which case allocating more would be a real problem?
Because in VW there were some patterns:
        turning , into nextPut: to avoid exact the underlying string allocation.
        preallocation OrderedCollection new: instead of relying on its growing behavior.
Final question
May be we should be able to plug the stream behavior we want. Like that seaside people can
get the speed out of it. I think that having fast web app is important.

>
> To get a feel how allocation limited Seaside is: avoiding one allocation
> of a 16k ByteArray per request can make a difference in throughput
> between 10 Mbyte/s and 30 Mbyte/s (see "[Progress Report] Zinc HTTP
> Components"). If anybody knows a way to make allocation of large young
> space objects faster (Set GC Bias to Grow?, #vmParameterAt:put:?) I'd
> like to hear it.
>
> Cheers
> Philippe
>
>


Reply | Threaded
Open this post in threaded view
|

Re: about growing streams and allocating buffers

Philippe Marschall-2-3
In reply to this post by Philippe Marschall-2-3


On 01.01.2011 14:39, Levente Uzonyi wrote:

> On Sat, 1 Jan 2011, Philippe Marschall wrote:
>
>> Hi
>>
>> I've been doing some performance work lately in Seaside. Long story
>> short Seaside (and I guess AIDA too) spends of if it's rendering time in
>> WriteStream (#nextPutAll:, #nextPut:).
>>
>> The way WriteStream >> #pastEndPut: behaves is not really ideal for
>> Seaside. It grows the underlying collection by just enough to
>> accommodate the argument collection (or 20 which ever is bigger). Now
>
> No it doesn't. The code in #pastEndPut: is
>
> collection := collection grownBy: ((collection size max: 20) min: 1000000).
>
> so the argument of #grownBy: is at least 20, at most 1000000 and it's
> size of the internal buffer (collection) if it's between 20 and 1000000.
> So in most practical cases it's the value of [collection size].
> #grownBy: allocates a new collection that has the size of the
> collection's size + the argument, which is 2 * self size in the most
> common case. So the size grows exponentially till it reaches 1000000 and
> then linearly (with a relatively high constant 1000000). This guarantees
> that the cost of #nextPut: is constant in most cases.

You're right, I misread the code, that's the behaviour for #nextPut:.
However #nextPutAll: does:

  newEnd := position + aCollection size.
  newEnd > writeLimit ifTrue:
  [self growTo: newEnd + 10].

and #growTo: does then:

      newSize := anInteger + (oldSize // 4 max: 20).
  grownCollection := collection class new: newSize.


aCollection is the argument collection, not the underlying collection.
So that's the required size puls ten plus the max of a fourth of the
current size and 20.


>> image the following not very unlikely scenario. You start with a 4k
>> buffer and put on average a 10 element collection (remember all those
>> tags are put individually) until you have a 16k response. You allocate
>> more than a thousand intermediary collections to get there.
>
> So just 3: one for 4k (the initial), one for 8k and one for 16k.

Yes.

>> What would be better suited for Seaside is doubling the required size.
>
> That's exactly what's happening in most cases.

In the #nextPut: case, not in the #nextPutAll: case.

>> In the worst case that would mean wasting 50% of memory but it would
>> make the overhead of creating intermediary collections logarithmic. In
>
> The overhead is constant (amortized) in most cases.
>
>> the given example that would take us only three allocations to get there.
>> Now I do realize there are other applications for Pharo where this
>> strategy is not ideal and this is not a killer for us. I just wanted to
>> shed some light and this and ask whether other projects are in a similar
>> situation.
>>
>> To get a feel how allocation limited Seaside is: avoiding one allocation
>> of a 16k ByteArray per request can make a difference in throughput
>> between 10 Mbyte/s and 30 Mbyte/s (see "[Progress Report] Zinc HTTP
>> Components"). If anybody knows a way to make allocation of large young
>> space objects faster (Set GC Bias to Grow?, #vmParameterAt:put:?) I'd
>> like to hear it.
>
> IIRC CogVM's GC is different from SqueakVM's GC in the sense that it is
> activated when a predefined amount of data is allocated, not when the
> number of allocations since the last GC reaches a predefined limit. So
> you have to tune the GC differently on Cog and non-Cog VMs.

Do you know where I can find more information about this?

Cheers
Philippe


Reply | Threaded
Open this post in threaded view
|

Re: about growing streams and allocating buffers

Philippe Marschall-2-3
In reply to this post by Stéphane Ducasse
On 01.01.2011 15:07, Stéphane Ducasse wrote:

>> Hi
>>
>> I've been doing some performance work lately in Seaside. Long story
>> short Seaside (and I guess AIDA too) spends of if it's rendering time in
>> WriteStream (#nextPutAll:, #nextPut:).
>>
>> The way WriteStream >> #pastEndPut: behaves is not really ideal for
>> Seaside. It grows the underlying collection by just enough to
>> accommodate the argument collection (or 20 which ever is bigger). Now
>> image the following not very unlikely scenario. You start with a 4k
>> buffer and put on average a 10 element collection (remember all those
>> tags are put individually) until you have a 16k response. You allocate
>> more than a thousand intermediary collections to get there.
>> What would be better suited for Seaside is doubling the required size.
>> In the worst case that would mean wasting 50% of memory but it would
>> make the overhead of creating intermediary collections logarithmic. In
>> the given example that would take us only three allocations to get there.
>> Now I do realize there are other applications for Pharo where this
>> strategy is not ideal and this is not a killer for us. I just wanted to
>> shed some light and this and ask whether other projects are in a similar
>> situation.
>
> Thanks for the info.
> Do you have an idea in which case allocating more would be a real problem?

Whenever you're memory constrained, I guess. When doubling would put you
past the size limit of the image.

> Because in VW there were some patterns:
> turning , into nextPut: to avoid exact the underlying string allocation.

I don't get that.

> preallocation OrderedCollection new: instead of relying on its growing behavior.

Right, but you never know how big the response is going to before
actually rendering. Otherwise you could just do Array/String new:
beforehand.

> Final question
> May be we should be able to plug the stream behavior we want. Like that seaside people can
> get the speed out of it. I think that having fast web app is important.

Oh, it's already way faster than it was previously ;-) And lets not
forget, even when you run Seaside, you will other other code like a
database driver as well, that will need streams as well. Whatever you
plug in, has to work with whatever else you have in your image.

Cheers
Philippe


Reply | Threaded
Open this post in threaded view
|

Re: about growing streams and allocating buffers

mkobetic
In reply to this post by Philippe Marschall-2-3
"Philippe Marschall"<[hidden email]> wrote:
> > preallocation OrderedCollection new: instead of relying on its growing behavior.
>
> Right, but you never know how big the response is going to before
> actually rendering. Otherwise you could just do Array/String new:
> beforehand.

I'm not quite clear about the context here so this may not be relevant, but one technique that paid of for us in terms of managing socket buffers was to start recycling them. The idea is that if you generally allocate let's say 32K buffer for every socket you open, rather then throwing the buffer away when you're done with the socket, just put it aside and let the next socket pick up and reuse the same instance. For us the main concern was a server being hammered by transient connection requests. Occasional buffer allocation could be handled by new space scavenger just fine, but a flood of such requests made the buffers spill over to the old space and throwing them away got expensive quickly.

Anyway, I realize that your concern is different here, but the same technique might help growing in most common cases. If there's a reasonable upper bound on most responses (e.g. 32K), you could simply preallocate that size always. If the request fits, great, no growing, if not, it will grow. However you'd probably have the same sort of problem as we had above if you simply threw each buffer away. Recycling should help making it manageable.

We use the same technique in Xtreams, where we need internal buffers quite frequently. Check out the RecyclingCenter class there if you want. It's a bit more generic because we need to accommodate different collection classes, so you'll probably want something different, but the basics should be similar. One aspect to note is that the recycling pool can be quite small. You don't need to recycle every instance created, you only need to be able to handle the "overflow" under load. The idea is that if you are under load and the number of concurrently used buffers fluctuates between 100 and 110, you don't need the recycling pool size to be 110, but only 10. It only needs to retain the temporarily unused buffers long enough for those to get picked up again. When the load subsides, it's fine to let the remaining 100 buffers GC, they can be allocated again quickly when next wave hits. At the same time it keeps the memory load reasonable in the quiet periods.

HTH,

Martin

Reply | Threaded
Open this post in threaded view
|

Re: about growing streams and allocating buffers

Janko Mivšek
On 04. 01. 2011 04:25, [hidden email] wrote:
> "Philippe Marschall"<[hidden email]> wrote:
>>> preallocation OrderedCollection new: instead of relying on its growing behavior.
>>
>> Right, but you never know how big the response is going to before
>> actually rendering. Otherwise you could just do Array/String new:
>> beforehand.
>
> I'm not quite clear about the context here so this may not be relevant, but one technique that paid of for us in terms of managing socket buffers was to start recycling them. The idea is that if you generally allocate let's say 32K buffer for every socket you open, rather then throwing the buffer away when you're done with the socket, just put it aside and let the next socket pick up and reuse the same instance.

Yep, this is a technique used in Swazoo too. When you nextPutAll: to
Swazoo's HTTPResponse, you actually write directly to the SwazooBuffer,
which has such recyclable collection.

Main reason to introduce those buffers was exactly to reduce garbage
collection activity to minimum, which is very important when serving (or
uploading) large content.

Second reason is that network packets must be packaged for network
performance into a reasonably big packets and not into a lot of small
ones after each nextPutAll:.

Best regards
Janko


--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: about growing streams and allocating buffers

Philippe Marschall-2-3
In reply to this post by mkobetic
On 04.01.2011 04:25, [hidden email] wrote:
> "Philippe Marschall"<[hidden email]> wrote:
>>> preallocation OrderedCollection new: instead of relying on its growing behavior.
>>
>> Right, but you never know how big the response is going to before
>> actually rendering. Otherwise you could just do Array/String new:
>> beforehand.
>
> I'm not quite clear about the context here so this may not be relevant, but one technique that paid of for us in terms of managing socket buffers was to start recycling them. The idea is that if you generally allocate let's say 32K buffer for every socket you open, rather then throwing the buffer away when you're done with the socket, just put it aside and let the next socket pick up and reuse the same instance. For us the main concern was a server being hammered by transient connection requests. Occasional buffer allocation could be handled by new space scavenger just fine, but a flood of such requests made the buffers spill over to the old space and throwing them away got expensive quickly.

Well yes and no. There are several issues. The first is that the growing
policy of #nextPut: and #nextPutAll: is different. I believe #nextPut:
has it right. That shows in general where you build a huge collection
using Streams, one example is Seaside response generation but there are
others.

Partially related to that is that allocating huge collections is
expensive. So Seaside/the server should reuse a buffer for response
rendering. The problem is that this needs server support. If you have a
64k buffer and use 90% of that you don't want to send #contents to this
because the server can handle only ByteArray/String. And you don't want
the server having to do ByteArray/String conversion. AJP [1] has an
experimental implementation of this and very happy with the results.

When we have a streaming server all of this is not really a problem
because we can/could directly render into the socket buffer. Issue 591
contains some clean up to make this approach more general/portable.


> Anyway, I realize that your concern is different here, but the same technique might help growing in most common cases. If there's a reasonable upper bound on most responses (e.g. 32K), you could simply preallocate that size always. If the request fits, great, no growing, if not, it will grow. However you'd probably have the same sort of problem as we had above if you simply threw each buffer away. Recycling should help making it manageable.
>
> We use the same technique in Xtreams, where we need internal buffers quite frequently. Check out the RecyclingCenter class there if you want. It's a bit more generic because we need to accommodate different collection classes, so you'll probably want something different, but the basics should be similar. One aspect to note is that the recycling pool can be quite small. You don't need to recycle every instance created, you only need to be able to handle the "overflow" under load. The idea is that if you are under load and the number of concurrently used buffers fluctuates between 100 and 110, you don't need the recycling pool size to be 110, but only 10. It only needs to retain the temporarily unused buffers long enough for those to get picked up again. When the load subsides, it's fine to let the remaining 100 buffers GC, they can be allocated again quickly when next wave hits. At the same time it keeps the memory load reasonable in the quiet periods.

No offense but such things quickly become like your own memory manager.
Wasn't this GC stuff supposed to free (pun) me from this?

 [1] http://www.squeaksource.com/ajp

Cheers
Philippe