experience with large images?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

experience with large images?

abergel
Hi!

Apparently, without configuring anything, the Pharo image cannot go over 500 Mb.
Can this limit be moved up, to, let's say, 4Gb?

Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Mariano Martinez Peck
It depends on the OS. The 500mb limit should be for Windows.
Unix and Mac should be more (kind of 2gb).
Search in the history a thread between Tudor and Igor about a windows VM with more than 500mb for Moose.

cheers

On Wed, Nov 23, 2011 at 6:45 PM, Alexandre Bergel <[hidden email]> wrote:
Hi!

Apparently, without configuring anything, the Pharo image cannot go over 500 Mb.
Can this limit be moved up, to, let's say, 4Gb?

Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.









--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Igor Stasenko
It is problematic, and requires different memory management than we
currently have.
I think if you need really big data sets, then use gemstone, which is
developed to deal with that specifically.

On 24 November 2011 00:55, Mariano Martinez Peck <[hidden email]> wrote:

> It depends on the OS. The 500mb limit should be for Windows.
> Unix and Mac should be more (kind of 2gb).
> Search in the history a thread between Tudor and Igor about a windows VM
> with more than 500mb for Moose.
>
> cheers
>
> On Wed, Nov 23, 2011 at 6:45 PM, Alexandre Bergel <[hidden email]>
> wrote:
>>
>> Hi!
>>
>> Apparently, without configuring anything, the Pharo image cannot go over
>> 500 Mb.
>> Can this limit be moved up, to, let's say, 4Gb?
>>
>> Cheers,
>> Alexandre
>> --
>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>> Alexandre Bergel  http://www.bergel.eu
>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>
>>
>>
>>
>>
>>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>



--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

fstephany
I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
Alex, what is your use case (in practice!) for more than 500MB?

On 23/11/11 18:25, Igor Stasenko wrote:
> It is problematic, and requires different memory management than we
> currently have.
> I think if you need really big data sets, then use gemstone, which is
> developed to deal with that specifically.

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Tudor Girba-2
I guess Alex talks about a Moose image.

With Moose we can easily get to quite large images with millions of objects.

For Windows, we indeed had a problem with the VM that Igor fixed. At that time, the problem appeared in images of over 200MB that could not be reopened (could be built, but once closed could not be reopened). I do not know what the status is with 500 MB.

But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex?

Cheers,
Doru


On 24 Nov 2011, at 07:27, Francois Stephany wrote:

> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
> Alex, what is your use case (in practice!) for more than 500MB?
>
> On 23/11/11 18:25, Igor Stasenko wrote:
>> It is problematic, and requires different memory management than we
>> currently have.
>> I think if you need really big data sets, then use gemstone, which is
>> developed to deal with that specifically.
>

--
www.tudorgirba.com

"Speaking louder won't make the point worthier."


Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Stephan Eggermont-3
In reply to this post by abergel
Francois Stephany wrote:
> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
> Alex, what is your use case (in practice!) for more than 500MB?

We are doing data conversion with Moose. Raw data is 740 MB.
No 64-bit means being forced to make decisions early, and therefore wrong.
It also forces us to do some batch processing.

We're very lucky the customer has a good idea on what data to keep.

Stephan

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

abergel
In reply to this post by fstephany
Having a reification in Moose of 100 versions of Mondrian for example :-)

Just answering the question 'Which classes and methods of Mondrian have changed more than 10 times since the day Mondrian was born?' cannot be easily done without a lot of memory

Alexandre


On 24 Nov 2011, at 03:27, Francois Stephany wrote:

> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
> Alex, what is your use case (in practice!) for more than 500MB?
>
> On 23/11/11 18:25, Igor Stasenko wrote:
>> It is problematic, and requires different memory management than we
>> currently have.
>> I think if you need really big data sets, then use gemstone, which is
>> developed to deal with that specifically.
>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

abergel
In reply to this post by Tudor Girba-2
> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex?

Yes.

Alexandre

>
>
> On 24 Nov 2011, at 07:27, Francois Stephany wrote:
>
>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
>> Alex, what is your use case (in practice!) for more than 500MB?
>>
>> On 23/11/11 18:25, Igor Stasenko wrote:
>>> It is problematic, and requires different memory management than we
>>> currently have.
>>> I think if you need really big data sets, then use gemstone, which is
>>> developed to deal with that specifically.
>>
>
> --
> www.tudorgirba.com
>
> "Speaking louder won't make the point worthier."
>
>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Igor Stasenko
I think you looking for solution in a wrong direction.
Just ask yourself, how much of that data you need to keep in operative
memory at single moment of time
to efficiently compute results.
If today you need to deal with >500Mb data sets,
tomorrow you may need to deal with multigigabyte datasets, which can
easily surpass
the amount of operative memory your computer has.

I know, it is easier to find cheap solution, without spending time
implementing own
data caching scheme, but you just delaying inevitable.

With things like Fuel, i think it won't take too much effort to do it.

On 24 November 2011 14:06, Alexandre Bergel <[hidden email]> wrote:

>> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex?
>
> Yes.
>
> Alexandre
>
>>
>>
>> On 24 Nov 2011, at 07:27, Francois Stephany wrote:
>>
>>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
>>> Alex, what is your use case (in practice!) for more than 500MB?
>>>
>>> On 23/11/11 18:25, Igor Stasenko wrote:
>>>> It is problematic, and requires different memory management than we
>>>> currently have.
>>>> I think if you need really big data sets, then use gemstone, which is
>>>> developed to deal with that specifically.
>>>
>>
>> --
>> www.tudorgirba.com
>>
>> "Speaking louder won't make the point worthier."
>>
>>


--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

abergel
As Stefan put it, the current system forces us to make early decisions, which are likely to be wrong on some point.

Since there is no other option, we are thinking very hard on what do we really need.
We are implementing ad-hoc caching as you suggested, but this clearly puts some strong constraints on what can be done.

Alexandre


On 24 Nov 2011, at 13:13, Igor Stasenko wrote:

> I think you looking for solution in a wrong direction.
> Just ask yourself, how much of that data you need to keep in operative
> memory at single moment of time
> to efficiently compute results.
> If today you need to deal with >500Mb data sets,
> tomorrow you may need to deal with multigigabyte datasets, which can
> easily surpass
> the amount of operative memory your computer has.
>
> I know, it is easier to find cheap solution, without spending time
> implementing own
> data caching scheme, but you just delaying inevitable.
>
> With things like Fuel, i think it won't take too much effort to do it.
>
> On 24 November 2011 14:06, Alexandre Bergel <[hidden email]> wrote:
>>> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex?
>>
>> Yes.
>>
>> Alexandre
>>
>>>
>>>
>>> On 24 Nov 2011, at 07:27, Francois Stephany wrote:
>>>
>>>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
>>>> Alex, what is your use case (in practice!) for more than 500MB?
>>>>
>>>> On 23/11/11 18:25, Igor Stasenko wrote:
>>>>> It is problematic, and requires different memory management than we
>>>>> currently have.
>>>>> I think if you need really big data sets, then use gemstone, which is
>>>>> developed to deal with that specifically.
>>>>
>>>
>>> --
>>> www.tudorgirba.com
>>>
>>> "Speaking louder won't make the point worthier."
>>>
>>>
>
>
> --
> Best regards,
> Igor Stasenko.
>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

fstephany
In reply to this post by abergel


On 24/11/11 05:05, Alexandre Bergel wrote:
> Having a reification in Moose of 100 versions of Mondrian for example :-)
>
> Just answering the question 'Which classes and methods of Mondrian have changed more than 10 times since the day Mondrian was born?' cannot be easily done without a lot of memory
>
> Alexandre

Oh ok, I hadnt Moose in mind ;)


Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

David T. Lewis
In reply to this post by abergel
Much larger images can be supported. This may be of interest:

  <http://lists.squeakfoundation.org/pipermail/vm-dev/2010-November/005731.html>

In practical use, the current garbage collector will probably be the
limiting factor.

Dave

On Thu, Nov 24, 2011 at 01:29:26PM -0300, Alexandre Bergel wrote:

> As Stefan put it, the current system forces us to make early decisions, which are likely to be wrong on some point.
>
> Since there is no other option, we are thinking very hard on what do we really need.
> We are implementing ad-hoc caching as you suggested, but this clearly puts some strong constraints on what can be done.
>
> Alexandre
>
>
> On 24 Nov 2011, at 13:13, Igor Stasenko wrote:
>
> > I think you looking for solution in a wrong direction.
> > Just ask yourself, how much of that data you need to keep in operative
> > memory at single moment of time
> > to efficiently compute results.
> > If today you need to deal with >500Mb data sets,
> > tomorrow you may need to deal with multigigabyte datasets, which can
> > easily surpass
> > the amount of operative memory your computer has.
> >
> > I know, it is easier to find cheap solution, without spending time
> > implementing own
> > data caching scheme, but you just delaying inevitable.
> >
> > With things like Fuel, i think it won't take too much effort to do it.
> >
> > On 24 November 2011 14:06, Alexandre Bergel <[hidden email]> wrote:
> >>> But, I also guess that Alex refers to the default values for memory when running the VM. Is that correct Alex?
> >>
> >> Yes.
> >>
> >> Alexandre
> >>
> >>>
> >>>
> >>> On 24 Nov 2011, at 07:27, Francois Stephany wrote:
> >>>
> >>>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
> >>>> Alex, what is your use case (in practice!) for more than 500MB?
> >>>>
> >>>> On 23/11/11 18:25, Igor Stasenko wrote:
> >>>>> It is problematic, and requires different memory management than we
> >>>>> currently have.
> >>>>> I think if you need really big data sets, then use gemstone, which is
> >>>>> developed to deal with that specifically.
> >>>>
> >>>
> >>> --
> >>> www.tudorgirba.com
> >>>
> >>> "Speaking louder won't make the point worthier."
> >>>
> >>>
> >
> >
> > --
> > Best regards,
> > Igor Stasenko.
> >
>
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel  http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Stéphane Ducasse
In reply to this post by abergel
Alex

two remarks:
        - what is the level of reification because we reify a lot and it can be discarded.
        - did you check the orion model
        because with orion you only represent the delta and this is a big difference.
       
Stef


On Nov 24, 2011, at 2:05 PM, Alexandre Bergel wrote:

> Having a reification in Moose of 100 versions of Mondrian for example :-)
>
> Just answering the question 'Which classes and methods of Mondrian have changed more than 10 times since the day Mondrian was born?' cannot be easily done without a lot of memory
>
> Alexandre
>
>
> On 24 Nov 2011, at 03:27, Francois Stephany wrote:
>
>> I'm wondering: how big is a dataset > 500MB ? I've no idea how big it is.
>> Alex, what is your use case (in practice!) for more than 500MB?
>>
>> On 23/11/11 18:25, Igor Stasenko wrote:
>>> It is problematic, and requires different memory management than we
>>> currently have.
>>> I think if you need really big data sets, then use gemstone, which is
>>> developed to deal with that specifically.
>>
>
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel  http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Stephan Eggermont-3
In reply to this post by abergel
Igor Stasenko wrote:

> I think you looking for solution in a wrong direction.
> Just ask yourself, how much of that data you need to keep in operative
> memory at single moment of time
> to efficiently compute results.

All of it. And then of course the annotations and resulting output model.
I know practically nothing about the data at first (about 700 files,
about 700 MB). When trying to understand the data, it is crucial that I can
test hypotheses fast. Efficiency is about my time, not computer time.

> If today you need to deal with >500Mb data sets,
> tomorrow you may need to deal with multigigabyte datasets, which can
> easily surpass
> the amount of operative memory your computer has.

16 GB DIMMS are about 250 Euro. For a commercial project it is
easy to justify half a terabyte of ram.

> I know, it is easier to find cheap solution, without spending time
> implementing own
> data caching scheme, but you just delaying inevitable.

Delaying is crucial. If I wait long enough, PC's will have enough memory.

Stephan


Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Igor Stasenko
On 25 November 2011 01:28, Stephan Eggermont <[hidden email]> wrote:

> Igor Stasenko wrote:
>
>> I think you looking for solution in a wrong direction.
>> Just ask yourself, how much of that data you need to keep in operative
>> memory at single moment of time
>> to efficiently compute results.
>
> All of it. And then of course the annotations and resulting output model.
> I know practically nothing about the data at first (about 700 files,
> about 700 MB). When trying to understand the data, it is crucial that I can
> test hypotheses fast. Efficiency is about my time, not computer time.
>
>> If today you need to deal with >500Mb data sets,
>> tomorrow you may need to deal with multigigabyte datasets, which can
>> easily surpass
>> the amount of operative memory your computer has.
>
> 16 GB DIMMS are about 250 Euro. For a commercial project it is
> easy to justify half a terabyte of ram.
>
ok. then how about investing a bit to make sure VMs could conserve
that much memory? :)

>> I know, it is easier to find cheap solution, without spending time
>> implementing own
>> data caching scheme, but you just delaying inevitable.
>
> Delaying is crucial. If I wait long enough, PC's will have enough memory.
>
> Stephan
>
>



--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

NorbertHartl

Am 25.11.2011 um 00:45 schrieb Igor Stasenko:

> On 25 November 2011 01:28, Stephan Eggermont <[hidden email]> wrote:
>> Igor Stasenko wrote:
>>
>>> I think you looking for solution in a wrong direction.
>>> Just ask yourself, how much of that data you need to keep in operative
>>> memory at single moment of time
>>> to efficiently compute results.
>>
>> All of it. And then of course the annotations and resulting output model.
>> I know practically nothing about the data at first (about 700 files,
>> about 700 MB). When trying to understand the data, it is crucial that I can
>> test hypotheses fast. Efficiency is about my time, not computer time.
>>
>>> If today you need to deal with >500Mb data sets,
>>> tomorrow you may need to deal with multigigabyte datasets, which can
>>> easily surpass
>>> the amount of operative memory your computer has.
>>
>> 16 GB DIMMS are about 250 Euro. For a commercial project it is
>> easy to justify half a terabyte of ram.
>>
> ok. then how about investing a bit to make sure VMs could conserve
> that much memory? :)
>
A very good reason to bring 64bit back into focus, don't you think?

Norbert

>>> I know, it is easier to find cheap solution, without spending time
>>> implementing own
>>> data caching scheme, but you just delaying inevitable.
>>
>> Delaying is crucial. If I wait long enough, PC's will have enough memory.
>>
>> Stephan
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko.
>


Reply | Threaded
Open this post in threaded view
|

Re: experience with large images?

Sven Van Caekenberghe

On 25 Nov 2011, at 10:31, Norbert Hartl wrote:

> A very good reason to bring 64bit back into focus, don't you think?

I too think that big images are important: as other have said, RAM is cheap so there is no reason not to use it.

One should not get sloppy and waiste memory, so good engineering and algorithms are very important, but if someone wants to take huge amounts of data in RAM for whatever reason, that should be possible.

Being able to use a lot of memory amplifies the power of Smalltalk.

At least a 32-bit VM should offer a usable memory space for the image as close as possible to its theoretical maximum.

And indeed, next is a 64-bit VM/image combination.

Sven