Q: incremental garbage collection overhead

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Q: incremental garbage collection overhead

Alain rastoul
Hi

I'm trying to load data from a sql server database (hundred thousands rows)
into Heaps with ODBC/FFI and I noticed that most of the time is spent in
incremental garbage collection (about 80 to 90% of the running time of the
load process!).
I will look at the ODBCResultSet implementation to limit
IdentityDictionnary/Row allocation by working with preallocated arrays but
this will solve only one of my problems.

I was wondering if there is a way to limit incremental collections by
running them only when a certain amount of memory was allocated, I found
setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a
primitive has failed" error. Is it the right method ?

Another question about garbage collection is the overhead of loaded data in
objects for the VM (hundred MB) : is there a way to know if incremental
collection is bloated by those data or to know when they are moved to old
space ?

Any pointers, ideas or links are welcome

Thanks

Regards,
Alain




Reply | Threaded
Open this post in threaded view
|

Re: Q: incremental garbage collection overhead

johnmci

You need to use a VM that supports setGCBiasToGrowGCLimit
Which VM are you using?

Also to turn it on you need to do
Smalltalk setGCBiasToGrow: 1.

Other GC tuning values are below. The values given below have no  
meaning for your application and may make it better, may make it worse.

Smalltalk vmParameterAt: 5 put: 8000.  "do an  incremental GC after  
this many allocations"
Smalltalk vmParameterAt: 6 put: 4000.  "tenure when more  than this  
many objects survive the GC"

Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom"
Smalltalk vmParameterAt: 24 put: 48*1024*1024.  "shrink threshold"


> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup  
> a "a
> primitive has failed" error. Is it the right method ?

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================



Reply | Threaded
Open this post in threaded view
|

Re: Q: incremental garbage collection overhead

Jason Johnson-5
In reply to this post by Alain rastoul
This will probably sound like a cop-out, but are you sure you need to
be loading hundreds of thousands of rows?  If you are using an RDBMS
anyway, I would move as much processing as possible to the DB.

I don't know that you are doing this, but in my professional
experience I see a lot of people pulling lots of rows like this and
then doing all kinds of post processing on them.  If one is going to
do that then the overhead of having an RDBMS isn't worth it.  There
are lots of ways to persist data.

On Nov 21, 2007 9:23 PM, alain rastoul <[hidden email]> wrote:

> Hi
>
> I'm trying to load data from a sql server database (hundred thousands rows)
> into Heaps with ODBC/FFI and I noticed that most of the time is spent in
> incremental garbage collection (about 80 to 90% of the running time of the
> load process!).
> I will look at the ODBCResultSet implementation to limit
> IdentityDictionnary/Row allocation by working with preallocated arrays but
> this will solve only one of my problems.
>
> I was wondering if there is a way to limit incremental collections by
> running them only when a certain amount of memory was allocated, I found
> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a
> primitive has failed" error. Is it the right method ?
>
> Another question about garbage collection is the overhead of loaded data in
> objects for the VM (hundred MB) : is there a way to know if incremental
> collection is bloated by those data or to know when they are moved to old
> space ?
>
> Any pointers, ideas or links are welcome
>
> Thanks
>
> Regards,
> Alain
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Q: incremental garbage collection overhead

Alain rastoul
In reply to this post by johnmci
Hi John thank you very much for your answer.

Sorry for not responding earlier but my internet connection was down until
now.
With 32MB as paramter 5 , 16 mb as parameter 6 and 32 mb as parameter 25 the
time spent in incremental gc was 50% and and I was able to load 500k rows in
220 sec.
The VM i'm using is a standard 3.9, I'll try a 3.10 soon.

Best regards
Adain

"John M McIntosh" <[hidden email]> a écrit dans le message
de news: [hidden email]...

>
> You need to use a VM that supports setGCBiasToGrowGCLimit
> Which VM are you using?
>
> Also to turn it on you need to do
> Smalltalk setGCBiasToGrow: 1.
>
> Other GC tuning values are below. The values given below have no  meaning
> for your application and may make it better, may make it worse.
>
> Smalltalk vmParameterAt: 5 put: 8000.  "do an  incremental GC after  this
> many allocations"
> Smalltalk vmParameterAt: 6 put: 4000.  "tenure when more  than this  many
> objects survive the GC"
>
> Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom"
> Smalltalk vmParameterAt: 24 put: 48*1024*1024.  "shrink threshold"
>
>
>> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
>> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup  a
>> "a
>> primitive has failed" error. Is it the right method ?
>
> --
> = = =
> ========================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> = = =
> ========================================================================
>
>
>
>





Reply | Threaded
Open this post in threaded view
|

Re: Q: incremental garbage collection overhead

Alain rastoul
In reply to this post by Jason Johnson-5
Hi Jason

I 've been working with sql server every day since years, part of my job
includes helping developers or consultants to rewrite bad performing queries
(query plans etc). For some customers we set up cubes with analysis services
and we do not directly use the rdbms, that would be too much load and far
unusable.

And about your question, yes, of course I'm sure I need to load a lot of
rows, in fact I hope I could not load hundred thousands but milllions of
rows ... (one hundred millions would be fine :) ) . I don't know if it will
be possible with squeak without tackling some issues, but today I find it
good for quick prototyping and explorations (in this case about hashing,
cardinalities and computations...), it's not at all about how to persist
data.

Whatever, thank you for taking time to answer

Regards
Alain

"Jason Johnson" <[hidden email]> a écrit dans le message de
news: [hidden email]...

> This will probably sound like a cop-out, but are you sure you need to
> be loading hundreds of thousands of rows?  If you are using an RDBMS
> anyway, I would move as much processing as possible to the DB.
>
> I don't know that you are doing this, but in my professional
> experience I see a lot of people pulling lots of rows like this and
> then doing all kinds of post processing on them.  If one is going to
> do that then the overhead of having an RDBMS isn't worth it.  There
> are lots of ways to persist data.
>
> On Nov 21, 2007 9:23 PM, alain rastoul <[hidden email]> wrote:
>> Hi
>>
>> I'm trying to load data from a sql server database (hundred thousands
>> rows)
>> into Heaps with ODBC/FFI and I noticed that most of the time is spent in
>> incremental garbage collection (about 80 to 90% of the running time of
>> the
>> load process!).
>> I will look at the ODBCResultSet implementation to limit
>> IdentityDictionnary/Row allocation by working with preallocated arrays
>> but
>> this will solve only one of my problems.
>>
>> I was wondering if there is a way to limit incremental collections by
>> running them only when a certain amount of memory was allocated, I found
>> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
>> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a
>> primitive has failed" error. Is it the right method ?
>>
>> Another question about garbage collection is the overhead of loaded data
>> in
>> objects for the VM (hundred MB) : is there a way to know if incremental
>> collection is bloated by those data or to know when they are moved to old
>> space ?
>>
>> Any pointers, ideas or links are welcome
>>
>> Thanks
>>
>> Regards,
>> Alain
>>
>>
>>
>>
>>
>
>




Reply | Threaded
Open this post in threaded view
|

Re: Q: incremental garbage collection overhead

Jason Johnson-5
Ok, sounds like you know what you're doing then.  In that case, yes it
would be good for prototyping and so on.  I didn't mean my response to
be unhelpful, but I also didn't want to be person (a) from this
question:

http://weblogs.asp.net/alex_papadimoulis/archive/2005/05/25/408925.aspx

On Nov 25, 2007 5:37 PM, alain rastoul <[hidden email]> wrote:

> Hi Jason
>
> I 've been working with sql server every day since years, part of my job
> includes helping developers or consultants to rewrite bad performing queries
> (query plans etc). For some customers we set up cubes with analysis services
> and we do not directly use the rdbms, that would be too much load and far
> unusable.
>
> And about your question, yes, of course I'm sure I need to load a lot of
> rows, in fact I hope I could not load hundred thousands but milllions of
> rows ... (one hundred millions would be fine :) ) . I don't know if it will
> be possible with squeak without tackling some issues, but today I find it
> good for quick prototyping and explorations (in this case about hashing,
> cardinalities and computations...), it's not at all about how to persist
> data.
>
> Whatever, thank you for taking time to answer
>
> Regards
> Alain
>
> "Jason Johnson" <[hidden email]> a écrit dans le message de
> news: [hidden email]...
>
> > This will probably sound like a cop-out, but are you sure you need to
> > be loading hundreds of thousands of rows?  If you are using an RDBMS
> > anyway, I would move as much processing as possible to the DB.
> >
> > I don't know that you are doing this, but in my professional
> > experience I see a lot of people pulling lots of rows like this and
> > then doing all kinds of post processing on them.  If one is going to
> > do that then the overhead of having an RDBMS isn't worth it.  There
> > are lots of ways to persist data.
> >
> > On Nov 21, 2007 9:23 PM, alain rastoul <[hidden email]> wrote:
> >> Hi
> >>
> >> I'm trying to load data from a sql server database (hundred thousands
> >> rows)
> >> into Heaps with ODBC/FFI and I noticed that most of the time is spent in
> >> incremental garbage collection (about 80 to 90% of the running time of
> >> the
> >> load process!).
> >> I will look at the ODBCResultSet implementation to limit
> >> IdentityDictionnary/Row allocation by working with preallocated arrays
> >> but
> >> this will solve only one of my problems.
> >>
> >> I was wondering if there is a way to limit incremental collections by
> >> running them only when a certain amount of memory was allocated, I found
> >> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk
> >> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a
> >> primitive has failed" error. Is it the right method ?
> >>
> >> Another question about garbage collection is the overhead of loaded data
> >> in
> >> objects for the VM (hundred MB) : is there a way to know if incremental
> >> collection is bloated by those data or to know when they are moved to old
> >> space ?
> >>
> >> Any pointers, ideas or links are welcome
> >>
> >> Thanks
> >>
> >> Regards,
> >> Alain
> >>
> >>
> >>
> >>
> >>
> >
> >
>
>
>
>
>