Hi!
Apparently, without configuring anything, the Pharo image cannot go over 500 Mb. Can this limit be moved up, to, let's say, 4Gb? Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
It depends on the OS. The 500mb limit should be for Windows.
Unix and Mac should be more (kind of 2gb). Search in the history a thread between Tudor and Igor about a windows VM with more than 500mb for Moose. cheers On Wed, Nov 23, 2011 at 6:45 PM, Alexandre Bergel <[hidden email]> wrote: Hi! -- Mariano http://marianopeck.wordpress.com |
It is problematic, and requires different memory management than we
currently have. I think if you need really big data sets, then use gemstone, which is developed to deal with that specifically. On 24 November 2011 00:55, Mariano Martinez Peck <[hidden email]> wrote: > It depends on the OS. The 500mb limit should be for Windows. > Unix and Mac should be more (kind of 2gb). > Search in the history a thread between Tudor and Igor about a windows VM > with more than 500mb for Moose. > > cheers > > On Wed, Nov 23, 2011 at 6:45 PM, Alexandre Bergel <[hidden email]> > wrote: >> >> Hi! >> >> Apparently, without configuring anything, the Pharo image cannot go over >> 500 Mb. >> Can this limit be moved up, to, let's say, 4Gb? >> >> Cheers, >> Alexandre >> -- >> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >> Alexandre Bergel http://www.bergel.eu >> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. >> >> >> >> >> >> > > > > -- > Mariano > http://marianopeck.wordpress.com > > -- Best regards, Igor Stasenko. |
I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
Alex, what is your use case (in practice!) for more than 500MB? On 23/11/11 18:25, Igor Stasenko wrote: > It is problematic, and requires different memory management than we > currently have. > I think if you need really big data sets, then use gemstone, which is > developed to deal with that specifically. |
I guess Alex talks about a Moose image.
With Moose we can easily get to quite large images with millions of objects. For Windows, we indeed had a problem with the VM that Igor fixed. At that time, the problem appeared in images of over 200MB that could not be reopened (could be built, but once closed could not be reopened). I do not know what the status is with 500 MB. But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex? Cheers, Doru On 24 Nov 2011, at 07:27, Francois Stephany wrote: > I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. > Alex, what is your use case (in practice!) for more than 500MB? > > On 23/11/11 18:25, Igor Stasenko wrote: >> It is problematic, and requires different memory management than we >> currently have. >> I think if you need really big data sets, then use gemstone, which is >> developed to deal with that specifically. > -- www.tudorgirba.com "Speaking louder won't make the point worthier." |
In reply to this post by abergel
Francois Stephany wrote:
> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. > Alex, what is your use case (in practice!) for more than 500MB? We are doing data conversion with Moose. Raw data is 740 MB. No 64-bit means being forced to make decisions early, and therefore wrong. It also forces us to do some batch processing. We're very lucky the customer has a good idea on what data to keep. Stephan |
In reply to this post by fstephany
Having a reification in Moose of 100 versions of Mondrian for example :-)
Just answering the question 'Which classes and methods of Mondrian have changed more than 10 times since the day Mondrian was born?' cannot be easily done without a lot of memory Alexandre On 24 Nov 2011, at 03:27, Francois Stephany wrote: > I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. > Alex, what is your use case (in practice!) for more than 500MB? > > On 23/11/11 18:25, Igor Stasenko wrote: >> It is problematic, and requires different memory management than we >> currently have. >> I think if you need really big data sets, then use gemstone, which is >> developed to deal with that specifically. > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
In reply to this post by Tudor Girba-2
> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex?
Yes. Alexandre > > > On 24 Nov 2011, at 07:27, Francois Stephany wrote: > >> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. >> Alex, what is your use case (in practice!) for more than 500MB? >> >> On 23/11/11 18:25, Igor Stasenko wrote: >>> It is problematic, and requires different memory management than we >>> currently have. >>> I think if you need really big data sets, then use gemstone, which is >>> developed to deal with that specifically. >> > > -- > www.tudorgirba.com > > "Speaking louder won't make the point worthier." > > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
I think you looking for solution in a wrong direction.
Just ask yourself, how much of that data you need to keep in operative memory at single moment of time to efficiently compute results. If today you need to deal with >500Mb data sets, tomorrow you may need to deal with multigigabyte datasets, which can easily surpass the amount of operative memory your computer has. I know, it is easier to find cheap solution, without spending time implementing own data caching scheme, but you just delaying inevitable. With things like Fuel, i think it won't take too much effort to do it. On 24 November 2011 14:06, Alexandre Bergel <[hidden email]> wrote: >> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex? > > Yes. > > Alexandre > >> >> >> On 24 Nov 2011, at 07:27, Francois Stephany wrote: >> >>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. >>> Alex, what is your use case (in practice!) for more than 500MB? >>> >>> On 23/11/11 18:25, Igor Stasenko wrote: >>>> It is problematic, and requires different memory management than we >>>> currently have. >>>> I think if you need really big data sets, then use gemstone, which is >>>> developed to deal with that specifically. >>> >> >> -- >> www.tudorgirba.com >> >> "Speaking louder won't make the point worthier." >> >> -- Best regards, Igor Stasenko. |
As Stefan put it, the current system forces us to make early decisions, which are likely to be wrong on some point.
Since there is no other option, we are thinking very hard on what do we really need. We are implementing ad-hoc caching as you suggested, but this clearly puts some strong constraints on what can be done. Alexandre On 24 Nov 2011, at 13:13, Igor Stasenko wrote: > I think you looking for solution in a wrong direction. > Just ask yourself, how much of that data you need to keep in operative > memory at single moment of time > to efficiently compute results. > If today you need to deal with >500Mb data sets, > tomorrow you may need to deal with multigigabyte datasets, which can > easily surpass > the amount of operative memory your computer has. > > I know, it is easier to find cheap solution, without spending time > implementing own > data caching scheme, but you just delaying inevitable. > > With things like Fuel, i think it won't take too much effort to do it. > > On 24 November 2011 14:06, Alexandre Bergel <[hidden email]> wrote: >>> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex? >> >> Yes. >> >> Alexandre >> >>> >>> >>> On 24 Nov 2011, at 07:27, Francois Stephany wrote: >>> >>>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. >>>> Alex, what is your use case (in practice!) for more than 500MB? >>>> >>>> On 23/11/11 18:25, Igor Stasenko wrote: >>>>> It is problematic, and requires different memory management than we >>>>> currently have. >>>>> I think if you need really big data sets, then use gemstone, which is >>>>> developed to deal with that specifically. >>>> >>> >>> -- >>> www.tudorgirba.com >>> >>> "Speaking louder won't make the point worthier." >>> >>> > > > -- > Best regards, > Igor Stasenko. > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
In reply to this post by abergel
On 24/11/11 05:05, Alexandre Bergel wrote: > Having a reification in Moose of 100 versions of Mondrian for example :-) > > Just answering the question 'Which classes and methods of Mondrian have changed more than 10 times since the day Mondrian was born?' cannot be easily done without a lot of memory > > Alexandre Oh ok, I hadnt Moose in mind ;) |
In reply to this post by abergel
Much larger images can be supported. This may be of interest:
<http://lists.squeakfoundation.org/pipermail/vm-dev/2010-November/005731.html> In practical use, the current garbage collector will probably be the limiting factor. Dave On Thu, Nov 24, 2011 at 01:29:26PM -0300, Alexandre Bergel wrote: > As Stefan put it, the current system forces us to make early decisions, which are likely to be wrong on some point. > > Since there is no other option, we are thinking very hard on what do we really need. > We are implementing ad-hoc caching as you suggested, but this clearly puts some strong constraints on what can be done. > > Alexandre > > > On 24 Nov 2011, at 13:13, Igor Stasenko wrote: > > > I think you looking for solution in a wrong direction. > > Just ask yourself, how much of that data you need to keep in operative > > memory at single moment of time > > to efficiently compute results. > > If today you need to deal with >500Mb data sets, > > tomorrow you may need to deal with multigigabyte datasets, which can > > easily surpass > > the amount of operative memory your computer has. > > > > I know, it is easier to find cheap solution, without spending time > > implementing own > > data caching scheme, but you just delaying inevitable. > > > > With things like Fuel, i think it won't take too much effort to do it. > > > > On 24 November 2011 14:06, Alexandre Bergel <[hidden email]> wrote: > >>> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex? > >> > >> Yes. > >> > >> Alexandre > >> > >>> > >>> > >>> On 24 Nov 2011, at 07:27, Francois Stephany wrote: > >>> > >>>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. > >>>> Alex, what is your use case (in practice!) for more than 500MB? > >>>> > >>>> On 23/11/11 18:25, Igor Stasenko wrote: > >>>>> It is problematic, and requires different memory management than we > >>>>> currently have. > >>>>> I think if you need really big data sets, then use gemstone, which is > >>>>> developed to deal with that specifically. > >>>> > >>> > >>> -- > >>> www.tudorgirba.com > >>> > >>> "Speaking louder won't make the point worthier." > >>> > >>> > > > > > > -- > > Best regards, > > Igor Stasenko. > > > > -- > _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: > Alexandre Bergel http://www.bergel.eu > ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > > > > > |
In reply to this post by abergel
Alex
two remarks: - what is the level of reification because we reify a lot and it can be discarded. - did you check the orion model because with orion you only represent the delta and this is a big difference. Stef On Nov 24, 2011, at 2:05 PM, Alexandre Bergel wrote: > Having a reification in Moose of 100 versions of Mondrian for example :-) > > Just answering the question 'Which classes and methods of Mondrian have changed more than 10 times since the day Mondrian was born?' cannot be easily done without a lot of memory > > Alexandre > > > On 24 Nov 2011, at 03:27, Francois Stephany wrote: > >> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is. >> Alex, what is your use case (in practice!) for more than 500MB? >> >> On 23/11/11 18:25, Igor Stasenko wrote: >>> It is problematic, and requires different memory management than we >>> currently have. >>> I think if you need really big data sets, then use gemstone, which is >>> developed to deal with that specifically. >> > > -- > _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: > Alexandre Bergel http://www.bergel.eu > ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > > > > > > |
In reply to this post by abergel
Igor Stasenko wrote:
> I think you looking for solution in a wrong direction. > Just ask yourself, how much of that data you need to keep in operative > memory at single moment of time > to efficiently compute results. All of it. And then of course the annotations and resulting output model. I know practically nothing about the data at first (about 700 files, about 700 MB). When trying to understand the data, it is crucial that I can test hypotheses fast. Efficiency is about my time, not computer time. > If today you need to deal with >500Mb data sets, > tomorrow you may need to deal with multigigabyte datasets, which can > easily surpass > the amount of operative memory your computer has. 16 GB DIMMS are about 250 Euro. For a commercial project it is easy to justify half a terabyte of ram. > I know, it is easier to find cheap solution, without spending time > implementing own > data caching scheme, but you just delaying inevitable. Delaying is crucial. If I wait long enough, PC's will have enough memory. Stephan |
On 25 November 2011 01:28, Stephan Eggermont <[hidden email]> wrote:
> Igor Stasenko wrote: > >> I think you looking for solution in a wrong direction. >> Just ask yourself, how much of that data you need to keep in operative >> memory at single moment of time >> to efficiently compute results. > > All of it. And then of course the annotations and resulting output model. > I know practically nothing about the data at first (about 700 files, > about 700 MB). When trying to understand the data, it is crucial that I can > test hypotheses fast. Efficiency is about my time, not computer time. > >> If today you need to deal with >500Mb data sets, >> tomorrow you may need to deal with multigigabyte datasets, which can >> easily surpass >> the amount of operative memory your computer has. > > 16 GB DIMMS are about 250 Euro. For a commercial project it is > easy to justify half a terabyte of ram. > that much memory? :) >> I know, it is easier to find cheap solution, without spending time >> implementing own >> data caching scheme, but you just delaying inevitable. > > Delaying is crucial. If I wait long enough, PC's will have enough memory. > > Stephan > > -- Best regards, Igor Stasenko. |
Am 25.11.2011 um 00:45 schrieb Igor Stasenko: > On 25 November 2011 01:28, Stephan Eggermont <[hidden email]> wrote: >> Igor Stasenko wrote: >> >>> I think you looking for solution in a wrong direction. >>> Just ask yourself, how much of that data you need to keep in operative >>> memory at single moment of time >>> to efficiently compute results. >> >> All of it. And then of course the annotations and resulting output model. >> I know practically nothing about the data at first (about 700 files, >> about 700 MB). When trying to understand the data, it is crucial that I can >> test hypotheses fast. Efficiency is about my time, not computer time. >> >>> If today you need to deal with >500Mb data sets, >>> tomorrow you may need to deal with multigigabyte datasets, which can >>> easily surpass >>> the amount of operative memory your computer has. >> >> 16 GB DIMMS are about 250 Euro. For a commercial project it is >> easy to justify half a terabyte of ram. >> > ok. then how about investing a bit to make sure VMs could conserve > that much memory? :) > Norbert >>> I know, it is easier to find cheap solution, without spending time >>> implementing own >>> data caching scheme, but you just delaying inevitable. >> >> Delaying is crucial. If I wait long enough, PC's will have enough memory. >> >> Stephan >> >> > > > > -- > Best regards, > Igor Stasenko. > |
On 25 Nov 2011, at 10:31, Norbert Hartl wrote: > A very good reason to bring 64bit back into focus, don't you think? I too think that big images are important: as other have said, RAM is cheap so there is no reason not to use it. One should not get sloppy and waiste memory, so good engineering and algorithms are very important, but if someone wants to take huge amounts of data in RAM for whatever reason, that should be possible. Being able to use a lot of memory amplifies the power of Smalltalk. At least a 32-bit VM should offer a usable memory space for the image as close as possible to its theoretical maximum. And indeed, next is a 64-bit VM/image combination. Sven |
Free forum by Nabble | Edit this page |