Nearly limitless Image: revisited.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Nearly limitless Image: revisited.

TedVanGaalen
"I first used image persistence but the image grow too large.."
thanks for reminding, Norbert.


Hi all,

perhaps you'll find me an idiot, I don't care.
(i know that I am: (class SelfReflection :o) )

Nevertheless, some brainstorming here.

I've brought this up before, but it seems nobody is interested in this
topic.. why?

it's about image size.

Currently,  Pharo 1.3-13315 at my 4 Gigabyte Debian amd64 Linux machine
reports

memory            75,392,160 bytes
     old            61,370,144 bytes (81.4%)
     young        2,169,692 bytes (2.9000000000000004%)
     used        63,539,836 bytes (84.30000000000001%)
     free        11,852,324 bytes (15.700000000000001%)

(nothing peculiar added, just the Seaside package and a few classes of
my own. )

So, if i understand this well I have only 11,8 megabytes to play with?

To begin with: why is only 75 megabyte available in Pharo (or Squeak as
well)
when there is 4 gigabyte physical memory available on my machine??

Anyway, IMHO this very limited storage violates the Smalltalk-all-in-one
image ideal,
because with this limited storage one cannot avoid the use of external
data stores
like databases  (of course, exception: there always will be import an
export of objects/data)

External data storage is complex, unreliable and should be something of
the past.
I'd suggest: don't waste time with this. rather improve the Smalltalk
internal data handling.

I wish to have a very large image so  e.q. I can have an entire
company's  administration
(client and accounting data etc.) alive in it.
I want to use Collections and descendants, not external databases.
So, everything within the image.

The beautiful (although opinions may vary) principle
of a Smalltalk environment is Image Persistence, isn't it?

IMHO this requires:
- virtually unlimited image memory size.
-no external databases like e.g. Mongo or DB2 whatever.
-no persistent external storage at all.

-You should see the (currently still) faster database performance as a
challenge
to improve the Collection and other related data handling classes

-You would have to completely restructure the architecture.
or just extend the address width: 64 bits?

what about an image of virtually unlimited size > gigabytes so

(for now, if it goes beyond the physical memory size (say 32 GB?)
one could use a VM that works with virtual storage based VM
Virtual storage was first in use successfully with IBM mainframes
in the 1970s (and still is) )

on the other hand, I estimate holographic? physical memory
going in Terabytes is about to be available in 5-10 years from now.
blindingly fast, compared with today's standards.

So, one has about 5 years to come up with such a system :o)
if one does, Smalltalk is ahead of everything then..

Don't laugh. One didn't even dream of 4 gigabytes of memory
during the Commodore 64 era. 1982-1994.

In short: what would have to be changed to enable this?

Some thoughts and speculations?
Anyone for tennis?

Thanks
Ted






--
Ted F.A. van Gaalen
Danketsweiler 608
D-88263 Horgenzell
Germany
T: +49 750 491 48 38
M: +49 151 587 862 47


Reply | Threaded
Open this post in threaded view
|

Re: Nearly limitless Image: revisited.

Sven Van Caekenberghe

On 28 Feb 2012, at 23:25, ted f.a. van gaalen wrote:

> it's about image size.
>
> Currently,  Pharo 1.3-13315 at my 4 Gigabyte Debian amd64 Linux machine reports
>
> memory            75,392,160 bytes
>    old            61,370,144 bytes (81.4%)
>    young        2,169,692 bytes (2.9000000000000004%)
>    used        63,539,836 bytes (84.30000000000001%)
>    free        11,852,324 bytes (15.700000000000001%)
>
> (nothing peculiar added, just the Seaside package and a few classes of my own. )
>
> So, if i understand this well I have only 11,8 megabytes to play with?
>
> To begin with: why is only 75 megabyte available in Pharo (or Squeak as well)
> when there is 4 gigabyte physical memory available on my machine??

You are interpreting this wrongly: total memory allocation of the heap did not yet grow beyond 75Mb because you still have 11Mb free, so there was not need for it.

See this thread: http://forum.world.st/Big-Image-Tests-td4188045.html#a4188548

Although you cannot allocate much more than 1Gb, that is already more than enough for a lot of purposes.

Of course, 64-bit VMs / Smalltalk images would be helpful in growing even larger, there seems to be some progress there.

Sven
Reply | Threaded
Open this post in threaded view
|

Re: Nearly limitless Image: revisited.

NorbertHartl
In reply to this post by TedVanGaalen

Am 28.02.2012 um 23:25 schrieb ted f.a. van gaalen:

In short: what would have to be changed to enable this?

You can read the LOOM paper [1] or you can have a look at GemStone [2]

Some thoughts and speculations?

Yes. Predictions about how long it takes for something technical to become true are always too short termed. :)

I think that the gap between volatile memory and persistent memory is only a necessity nothing natural. The gap will be closed somewhere in the future. Having a LOOM architecture helps mitigate the effect of the gap if not make it unnoticable. From then on you just need to buy the newest things like SSDs to make it happen sooner.
 
Anyone for tennis?

Sure. Didn't play for a long time and over 500Km is bit far off, don't you think? :)

Norbert

[1] http://dl.acm.org/citation.cfm?id=94112
Reply | Threaded
Open this post in threaded view
|

Re: Nearly limitless Image: revisited.

David Graham
On 2/28/12 5:44 PM, Norbert Hartl wrote:


You can read the LOOM paper [1] or you can have a look at GemStone [2]

I was interested in this as well.  After some googling for the LOOM paper, I discovered it's in the " Smalltalk-80 Bits of History, Words of Advice " (green book). Pages 251-271.

http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/BitsOfHistory.pdf
Reply | Threaded
Open this post in threaded view
|

Re: Nearly limitless Image: revisited.

Janko Mivšek
In reply to this post by TedVanGaalen
Hi Ted,

I also support the image as a sole persistence idea and I'm actually
using it for years on VisualWorks.

For Pharo there seems everyone is avoiding this idea, mostly because of
image corruption fear. Well, we need to improve the robustness of Pharo
image and VM then this fear will vanish. It is obviously doable, if VW
is reliable enough, why not once Pharo?

Best regards
Janko


S, ted f.a. van gaalen piše:

> "I first used image persistence but the image grow too large.."
> thanks for reminding, Norbert.
>
>
> Hi all,
>
> perhaps you'll find me an idiot, I don't care.
> (i know that I am: (class SelfReflection :o) )
>
> Nevertheless, some brainstorming here.
>
> I've brought this up before, but it seems nobody is interested in this
> topic.. why?
>
> it's about image size.
>
> Currently,  Pharo 1.3-13315 at my 4 Gigabyte Debian amd64 Linux machine
> reports
>
> memory            75,392,160 bytes
>     old            61,370,144 bytes (81.4%)
>     young        2,169,692 bytes (2.9000000000000004%)
>     used        63,539,836 bytes (84.30000000000001%)
>     free        11,852,324 bytes (15.700000000000001%)
>
> (nothing peculiar added, just the Seaside package and a few classes of
> my own. )
>
> So, if i understand this well I have only 11,8 megabytes to play with?
>
> To begin with: why is only 75 megabyte available in Pharo (or Squeak as
> well)
> when there is 4 gigabyte physical memory available on my machine??
>
> Anyway, IMHO this very limited storage violates the Smalltalk-all-in-one
> image ideal,
> because with this limited storage one cannot avoid the use of external
> data stores
> like databases  (of course, exception: there always will be import an
> export of objects/data)
>
> External data storage is complex, unreliable and should be something of
> the past.
> I'd suggest: don't waste time with this. rather improve the Smalltalk
> internal data handling.
>
> I wish to have a very large image so  e.q. I can have an entire
> company's  administration
> (client and accounting data etc.) alive in it.
> I want to use Collections and descendants, not external databases.
> So, everything within the image.
>
> The beautiful (although opinions may vary) principle
> of a Smalltalk environment is Image Persistence, isn't it?
>
> IMHO this requires:
> - virtually unlimited image memory size.
> -no external databases like e.g. Mongo or DB2 whatever.
> -no persistent external storage at all.
>
> -You should see the (currently still) faster database performance as a
> challenge
> to improve the Collection and other related data handling classes
>
> -You would have to completely restructure the architecture.
> or just extend the address width: 64 bits?
>
> what about an image of virtually unlimited size > gigabytes so
>
> (for now, if it goes beyond the physical memory size (say 32 GB?)
> one could use a VM that works with virtual storage based VM
> Virtual storage was first in use successfully with IBM mainframes
> in the 1970s (and still is) )
>
> on the other hand, I estimate holographic? physical memory
> going in Terabytes is about to be available in 5-10 years from now.
> blindingly fast, compared with today's standards.
>
> So, one has about 5 years to come up with such a system :o)
> if one does, Smalltalk is ahead of everything then..
>
> Don't laugh. One didn't even dream of 4 gigabytes of memory
> during the Commodore 64 era. 1982-1994.
>
> In short: what would have to be changed to enable this?
>
> Some thoughts and speculations?
> Anyone for tennis?
>
> Thanks
> Ted
>
>
>
>
>
>

--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Nearly limitless Image: revisited.

Marcus Denker-4
In reply to this post by TedVanGaalen

On Feb 29, 2012, at 12:59 PM, Janko Mivšek wrote:

> Hi Ted,
>
> I also support the image as a sole persistence idea and I'm actually
> using it for years on VisualWorks.
>
> For Pharo there seems everyone is avoiding this idea, mostly because of
> image corruption fear. Well, we need to improve the robustness of Pharo
> image and VM then this fear will vanish. It is obviously doable, if VW
> is reliable enough, why not once Pharo?
>
Someone needs to start fixing and improving. Why not, for example, you?

Here is a link:

        http://code.google.com/p/pharo/issues/list

It's much better than talking.

        Marcus


--
Marcus Denker -- http://marcusdenker.de


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Nearly limitless Image: revisited.

drush66
On Wed, Feb 29, 2012 at 1:06 PM, Marcus Denker <[hidden email]> wrote:
> Someone needs to start fixing and improving. Why not, for example, you?

Funny, I was also inclined in last few days to post how Pharo would
need to stabilize VM at least a bit, and the reason I did not was that
I have expected answer along these lines.

Now I understand and I am grateful for all effort that goes on to
improve Pharo, and also that helping hand is much more needed than
people instructing what should be done by someone.

But I do not think it is also good that many observations about pharo
get quickly dismissed that way. For instance what would someone that
tries Pharo, gets VM crash, complains, gets the "go-hack-yourself-vm
answer" think about this whole smalltalk business?

"Smalltalk - I am going to use that, yeah right."

So yes, complaints can be complete noise, but sometimes they can also
be a small contribution. Orders of magnitude smaller than committing
code for sure, but still valuable.

Davorin Rusevljan
http://www.cloud208.com/

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Nearly limitless Image: revisited.

Igor Stasenko
Incidentally, we had a little chat with Marcus yesterday about that.

No, i don't think it is feasible to use single image to store everything.
It is convenient, cheap and of course it is way better than dealing
with communicating with external DB/servers whatever.

But there's one thing you should know already: the days of vertical
growth is over.

Running a service (under VM or not) on a single machine is asking for troubles:
 - limits on load
 - susceptible to power outage and other reliability problems
 etc

Also, think that the amount of data you need to process correlates
with CPU horsepower available.
Which means that yes, you can run a huge image with 64Gb data in it..
but that means that responsiveness
of your service will quite often fall beyond any usability limits.

If we look in terms of VM and pick only one thing - garbage collection,
you will see that there is certain limits beyond which a performance
will drop too much, so you naturally
will start thinking about ways to split data to separate chunks and
run them on different machines/VMs.

It is because GC's mark algorithm is O(n) bound, when n is total
number of references between objects,
and GC's scavenge algorithm is at best O(n) bound where n is total
number of objects in object memory,
and at worst is where n is total memory used by objects.
No matter how you turn it, i just wanted to indicate that time to run
GC is in linear dependency from the amount of data.

Yes, we might invest a lot of effort in making GC more clever, more
complex and more robust.. but no matter what you do,
you cannot change the above facts. It means, that any improvements
will be about diminishing returns, but won't change the picture
radically.

That means that sooner or later you will have to deal with it: a
problem of splitting data on multiple independent chunks,
and making your service to run on multiple machines , in order to use
more CPU power, more memory and be more reliable etc.
At this point, your main dilemma is to invent a fast and robust
interfaces to communicate between images or between image(s)/ database
etc.

We should concentrate on things which dealing with inter-image
communication and image-database communication,
because it is the only way to ensure that we will answer upcoming
future problems. Relying on using a single huge image is way to
nowhere.

--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: Nearly limitless Image: revisited.

Janko Mivšek
Good compromise is a step by step approach, something like:

  1.step: image based persistence up to 1GB with hourly snapshot
  2.step: parts migrated to Fuel and file based persistence
  3.step: Gemstone, with "images" running in parallel (well, any DB
          with images in parallel, Gemstone is certainly the easiest to
          scale from image based persistence)

1GB limit is here just for simplicity, you can probably go further with
64bit images.

Advantages:

  - very very simple start
  - freedom of pure OO modeling,
  - good enough for probably 90% of all projects
  - fastest way from your dreams to reality
  - speed of development - no impedance mismatch, no ORM nightmare
  - you won't believe how much data you can put in 1GB image
  - speed because of always in-memory data processing
  - you can always scale further if you make your design from the start
    with above steps in mind
  - reliability good enough, on reliable hardware probably even better
    than more complex solutions. Main reason: simplicity.

Disadvantages:

  - you easy forgot to include later scalability requirements in
    upfront design
  - such scaling is easy only to OO database while migrating to
    NoSQL (not to mention SQL) database later is very hard if not
    possible
  - up to about 1GB only, because of GC problems as Igor described
  - active users limit (number of requests/s)
  - single point of failure
  - corrupted image will loose all data (but good backup approach helps)
  - undetected image corruption fear (after many otherwise successful
    snapshots, causing non-startable image)
  - lengthy snapshots of bigger images (can be improved with two step
    snapshots, first in memory, then on disk)
  - loss of data between snapshots in case of power or machine failure
    (but this is very rare those days)

Bet regards
Janko

S, Igor Stasenko piše:

> Incidentally, we had a little chat with Marcus yesterday about that.
>
> No, i don't think it is feasible to use single image to store everything.
> It is convenient, cheap and of course it is way better than dealing
> with communicating with external DB/servers whatever.
>
> But there's one thing you should know already: the days of vertical
> growth is over.
>
> Running a service (under VM or not) on a single machine is asking for troubles:
>  - limits on load
>  - susceptible to power outage and other reliability problems
>  etc
>
> Also, think that the amount of data you need to process correlates
> with CPU horsepower available.
> Which means that yes, you can run a huge image with 64Gb data in it..
> but that means that responsiveness
> of your service will quite often fall beyond any usability limits.
>
> If we look in terms of VM and pick only one thing - garbage collection,
> you will see that there is certain limits beyond which a performance
> will drop too much, so you naturally
> will start thinking about ways to split data to separate chunks and
> run them on different machines/VMs.
>
> It is because GC's mark algorithm is O(n) bound, when n is total
> number of references between objects,
> and GC's scavenge algorithm is at best O(n) bound where n is total
> number of objects in object memory,
> and at worst is where n is total memory used by objects.
> No matter how you turn it, i just wanted to indicate that time to run
> GC is in linear dependency from the amount of data.
>
> Yes, we might invest a lot of effort in making GC more clever, more
> complex and more robust.. but no matter what you do,
> you cannot change the above facts. It means, that any improvements
> will be about diminishing returns, but won't change the picture
> radically.
>
> That means that sooner or later you will have to deal with it: a
> problem of splitting data on multiple independent chunks,
> and making your service to run on multiple machines , in order to use
> more CPU power, more memory and be more reliable etc.
> At this point, your main dilemma is to invent a fast and robust
> interfaces to communicate between images or between image(s)/ database
> etc.
>
> We should concentrate on things which dealing with inter-image
> communication and image-database communication,
> because it is the only way to ensure that we will answer upcoming
> future problems. Relying on using a single huge image is way to
> nowhere.
>

--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si