Hi
I'm trying to load data from a sql server database (hundred thousands rows) into Heaps with ODBC/FFI and I noticed that most of the time is spent in incremental garbage collection (about 80 to 90% of the running time of the load process!). I will look at the ODBCResultSet implementation to limit IdentityDictionnary/Row allocation by working with preallocated arrays but this will solve only one of my problems. I was wondering if there is a way to limit incremental collections by running them only when a certain amount of memory was allocated, I found setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a primitive has failed" error. Is it the right method ? Another question about garbage collection is the overhead of loaded data in objects for the VM (hundred MB) : is there a way to know if incremental collection is bloated by those data or to know when they are moved to old space ? Any pointers, ideas or links are welcome Thanks Regards, Alain |
You need to use a VM that supports setGCBiasToGrowGCLimit Which VM are you using? Also to turn it on you need to do Smalltalk setGCBiasToGrow: 1. Other GC tuning values are below. The values given below have no meaning for your application and may make it better, may make it worse. Smalltalk vmParameterAt: 5 put: 8000. "do an incremental GC after this many allocations" Smalltalk vmParameterAt: 6 put: 4000. "tenure when more than this many objects survive the GC" Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom" Smalltalk vmParameterAt: 24 put: 48*1024*1024. "shrink threshold" > setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk > setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup > a "a > primitive has failed" error. Is it the right method ? -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
In reply to this post by Alain rastoul
This will probably sound like a cop-out, but are you sure you need to
be loading hundreds of thousands of rows? If you are using an RDBMS anyway, I would move as much processing as possible to the DB. I don't know that you are doing this, but in my professional experience I see a lot of people pulling lots of rows like this and then doing all kinds of post processing on them. If one is going to do that then the overhead of having an RDBMS isn't worth it. There are lots of ways to persist data. On Nov 21, 2007 9:23 PM, alain rastoul <[hidden email]> wrote: > Hi > > I'm trying to load data from a sql server database (hundred thousands rows) > into Heaps with ODBC/FFI and I noticed that most of the time is spent in > incremental garbage collection (about 80 to 90% of the running time of the > load process!). > I will look at the ODBCResultSet implementation to limit > IdentityDictionnary/Row allocation by working with preallocated arrays but > this will solve only one of my problems. > > I was wondering if there is a way to limit incremental collections by > running them only when a certain amount of memory was allocated, I found > setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk > setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a > primitive has failed" error. Is it the right method ? > > Another question about garbage collection is the overhead of loaded data in > objects for the VM (hundred MB) : is there a way to know if incremental > collection is bloated by those data or to know when they are moved to old > space ? > > Any pointers, ideas or links are welcome > > Thanks > > Regards, > Alain > > > > > |
In reply to this post by johnmci
Hi John thank you very much for your answer.
Sorry for not responding earlier but my internet connection was down until now. With 32MB as paramter 5 , 16 mb as parameter 6 and 32 mb as parameter 25 the time spent in incremental gc was 50% and and I was able to load 500k rows in 220 sec. The VM i'm using is a standard 3.9, I'll try a 3.10 soon. Best regards Adain "John M McIntosh" <[hidden email]> a écrit dans le message de news: [hidden email]... > > You need to use a VM that supports setGCBiasToGrowGCLimit > Which VM are you using? > > Also to turn it on you need to do > Smalltalk setGCBiasToGrow: 1. > > Other GC tuning values are below. The values given below have no meaning > for your application and may make it better, may make it worse. > > Smalltalk vmParameterAt: 5 put: 8000. "do an incremental GC after this > many allocations" > Smalltalk vmParameterAt: 6 put: 4000. "tenure when more than this many > objects survive the GC" > > Smalltalk vmParameterAt: 25 put: 24*1024*1024. "grow headroom" > Smalltalk vmParameterAt: 24 put: 48*1024*1024. "shrink threshold" > > >> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk >> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a >> "a >> primitive has failed" error. Is it the right method ? > > -- > = = = > ======================================================================== > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > = = = > ======================================================================== > > > > |
In reply to this post by Jason Johnson-5
Hi Jason
I 've been working with sql server every day since years, part of my job includes helping developers or consultants to rewrite bad performing queries (query plans etc). For some customers we set up cubes with analysis services and we do not directly use the rdbms, that would be too much load and far unusable. And about your question, yes, of course I'm sure I need to load a lot of rows, in fact I hope I could not load hundred thousands but milllions of rows ... (one hundred millions would be fine :) ) . I don't know if it will be possible with squeak without tackling some issues, but today I find it good for quick prototyping and explorations (in this case about hashing, cardinalities and computations...), it's not at all about how to persist data. Whatever, thank you for taking time to answer Regards Alain "Jason Johnson" <[hidden email]> a écrit dans le message de news: [hidden email]... > This will probably sound like a cop-out, but are you sure you need to > be loading hundreds of thousands of rows? If you are using an RDBMS > anyway, I would move as much processing as possible to the DB. > > I don't know that you are doing this, but in my professional > experience I see a lot of people pulling lots of rows like this and > then doing all kinds of post processing on them. If one is going to > do that then the overhead of having an RDBMS isn't worth it. There > are lots of ways to persist data. > > On Nov 21, 2007 9:23 PM, alain rastoul <[hidden email]> wrote: >> Hi >> >> I'm trying to load data from a sql server database (hundred thousands >> rows) >> into Heaps with ODBC/FFI and I noticed that most of the time is spent in >> incremental garbage collection (about 80 to 90% of the running time of >> the >> load process!). >> I will look at the ODBCResultSet implementation to limit >> IdentityDictionnary/Row allocation by working with preallocated arrays >> but >> this will solve only one of my problems. >> >> I was wondering if there is a way to limit incremental collections by >> running them only when a certain amount of memory was allocated, I found >> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk >> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a >> primitive has failed" error. Is it the right method ? >> >> Another question about garbage collection is the overhead of loaded data >> in >> objects for the VM (hundred MB) : is there a way to know if incremental >> collection is bloated by those data or to know when they are moved to old >> space ? >> >> Any pointers, ideas or links are welcome >> >> Thanks >> >> Regards, >> Alain >> >> >> >> >> > > |
Ok, sounds like you know what you're doing then. In that case, yes it
would be good for prototyping and so on. I didn't mean my response to be unhelpful, but I also didn't want to be person (a) from this question: http://weblogs.asp.net/alex_papadimoulis/archive/2005/05/25/408925.aspx On Nov 25, 2007 5:37 PM, alain rastoul <[hidden email]> wrote: > Hi Jason > > I 've been working with sql server every day since years, part of my job > includes helping developers or consultants to rewrite bad performing queries > (query plans etc). For some customers we set up cubes with analysis services > and we do not directly use the rdbms, that would be too much load and far > unusable. > > And about your question, yes, of course I'm sure I need to load a lot of > rows, in fact I hope I could not load hundred thousands but milllions of > rows ... (one hundred millions would be fine :) ) . I don't know if it will > be possible with squeak without tackling some issues, but today I find it > good for quick prototyping and explorations (in this case about hashing, > cardinalities and computations...), it's not at all about how to persist > data. > > Whatever, thank you for taking time to answer > > Regards > Alain > > "Jason Johnson" <[hidden email]> a écrit dans le message de > news: [hidden email]... > > > This will probably sound like a cop-out, but are you sure you need to > > be loading hundreds of thousands of rows? If you are using an RDBMS > > anyway, I would move as much processing as possible to the DB. > > > > I don't know that you are doing this, but in my professional > > experience I see a lot of people pulling lots of rows like this and > > then doing all kinds of post processing on them. If one is going to > > do that then the overhead of having an RDBMS isn't worth it. There > > are lots of ways to persist data. > > > > On Nov 21, 2007 9:23 PM, alain rastoul <[hidden email]> wrote: > >> Hi > >> > >> I'm trying to load data from a sql server database (hundred thousands > >> rows) > >> into Heaps with ODBC/FFI and I noticed that most of the time is spent in > >> incremental garbage collection (about 80 to 90% of the running time of > >> the > >> load process!). > >> I will look at the ODBCResultSet implementation to limit > >> IdentityDictionnary/Row allocation by working with preallocated arrays > >> but > >> this will solve only one of my problems. > >> > >> I was wondering if there is a way to limit incremental collections by > >> running them only when a certain amount of memory was allocated, I found > >> setGCBiasToGrowGCLimit in SystemDictionnary (Smalltalk > >> setGCBiasToGrowGCLimit: 16*1024*1024). but it doesn't work and popup a "a > >> primitive has failed" error. Is it the right method ? > >> > >> Another question about garbage collection is the overhead of loaded data > >> in > >> objects for the VM (hundred MB) : is there a way to know if incremental > >> collection is bloated by those data or to know when they are moved to old > >> space ? > >> > >> Any pointers, ideas or links are welcome > >> > >> Thanks > >> > >> Regards, > >> Alain > >> > >> > >> > >> > >> > > > > > > > > > |
Free forum by Nabble | Edit this page |