"I first used image persistence but the image grow too large.."
thanks for reminding, Norbert. Hi all, perhaps you'll find me an idiot, I don't care. (i know that I am: (class SelfReflection :o) ) Nevertheless, some brainstorming here. I've brought this up before, but it seems nobody is interested in this topic.. why? it's about image size. Currently, Pharo 1.3-13315 at my 4 Gigabyte Debian amd64 Linux machine reports memory 75,392,160 bytes old 61,370,144 bytes (81.4%) young 2,169,692 bytes (2.9000000000000004%) used 63,539,836 bytes (84.30000000000001%) free 11,852,324 bytes (15.700000000000001%) (nothing peculiar added, just the Seaside package and a few classes of my own. ) So, if i understand this well I have only 11,8 megabytes to play with? To begin with: why is only 75 megabyte available in Pharo (or Squeak as well) when there is 4 gigabyte physical memory available on my machine?? Anyway, IMHO this very limited storage violates the Smalltalk-all-in-one image ideal, because with this limited storage one cannot avoid the use of external data stores like databases (of course, exception: there always will be import an export of objects/data) External data storage is complex, unreliable and should be something of the past. I'd suggest: don't waste time with this. rather improve the Smalltalk internal data handling. I wish to have a very large image so e.q. I can have an entire company's administration (client and accounting data etc.) alive in it. I want to use Collections and descendants, not external databases. So, everything within the image. The beautiful (although opinions may vary) principle of a Smalltalk environment is Image Persistence, isn't it? IMHO this requires: - virtually unlimited image memory size. -no external databases like e.g. Mongo or DB2 whatever. -no persistent external storage at all. -You should see the (currently still) faster database performance as a challenge to improve the Collection and other related data handling classes -You would have to completely restructure the architecture. or just extend the address width: 64 bits? what about an image of virtually unlimited size > gigabytes so (for now, if it goes beyond the physical memory size (say 32 GB?) one could use a VM that works with virtual storage based VM Virtual storage was first in use successfully with IBM mainframes in the 1970s (and still is) ) on the other hand, I estimate holographic? physical memory going in Terabytes is about to be available in 5-10 years from now. blindingly fast, compared with today's standards. So, one has about 5 years to come up with such a system :o) if one does, Smalltalk is ahead of everything then.. Don't laugh. One didn't even dream of 4 gigabytes of memory during the Commodore 64 era. 1982-1994. In short: what would have to be changed to enable this? Some thoughts and speculations? Anyone for tennis? Thanks Ted -- Ted F.A. van Gaalen Danketsweiler 608 D-88263 Horgenzell Germany T: +49 750 491 48 38 M: +49 151 587 862 47 |
On 28 Feb 2012, at 23:25, ted f.a. van gaalen wrote: > it's about image size. > > Currently, Pharo 1.3-13315 at my 4 Gigabyte Debian amd64 Linux machine reports > > memory 75,392,160 bytes > old 61,370,144 bytes (81.4%) > young 2,169,692 bytes (2.9000000000000004%) > used 63,539,836 bytes (84.30000000000001%) > free 11,852,324 bytes (15.700000000000001%) > > (nothing peculiar added, just the Seaside package and a few classes of my own. ) > > So, if i understand this well I have only 11,8 megabytes to play with? > > To begin with: why is only 75 megabyte available in Pharo (or Squeak as well) > when there is 4 gigabyte physical memory available on my machine?? You are interpreting this wrongly: total memory allocation of the heap did not yet grow beyond 75Mb because you still have 11Mb free, so there was not need for it. See this thread: http://forum.world.st/Big-Image-Tests-td4188045.html#a4188548 Although you cannot allocate much more than 1Gb, that is already more than enough for a lot of purposes. Of course, 64-bit VMs / Smalltalk images would be helpful in growing even larger, there seems to be some progress there. Sven |
In reply to this post by TedVanGaalen
Am 28.02.2012 um 23:25 schrieb ted f.a. van gaalen: In short: what would have to be changed to enable this?You can read the LOOM paper [1] or you can have a look at GemStone [2] Some thoughts and speculations? I think that the gap between volatile memory and persistent memory is only a necessity nothing natural. The gap will be closed somewhere in the future. Having a LOOM architecture helps mitigate the effect of the gap if not make it unnoticable. From then on you just need to buy the newest things like SSDs to make it happen sooner. Anyone for tennis? Sure. Didn't play for a long time and over 500Km is bit far off, don't you think? :) Norbert |
On 2/28/12 5:44 PM, Norbert Hartl wrote:
I was interested in this as well. After some googling for the LOOM paper, I discovered it's in the " http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/BitsOfHistory.pdf |
In reply to this post by TedVanGaalen
Hi Ted,
I also support the image as a sole persistence idea and I'm actually using it for years on VisualWorks. For Pharo there seems everyone is avoiding this idea, mostly because of image corruption fear. Well, we need to improve the robustness of Pharo image and VM then this fear will vanish. It is obviously doable, if VW is reliable enough, why not once Pharo? Best regards Janko S, ted f.a. van gaalen piše: > "I first used image persistence but the image grow too large.." > thanks for reminding, Norbert. > > > Hi all, > > perhaps you'll find me an idiot, I don't care. > (i know that I am: (class SelfReflection :o) ) > > Nevertheless, some brainstorming here. > > I've brought this up before, but it seems nobody is interested in this > topic.. why? > > it's about image size. > > Currently, Pharo 1.3-13315 at my 4 Gigabyte Debian amd64 Linux machine > reports > > memory 75,392,160 bytes > old 61,370,144 bytes (81.4%) > young 2,169,692 bytes (2.9000000000000004%) > used 63,539,836 bytes (84.30000000000001%) > free 11,852,324 bytes (15.700000000000001%) > > (nothing peculiar added, just the Seaside package and a few classes of > my own. ) > > So, if i understand this well I have only 11,8 megabytes to play with? > > To begin with: why is only 75 megabyte available in Pharo (or Squeak as > well) > when there is 4 gigabyte physical memory available on my machine?? > > Anyway, IMHO this very limited storage violates the Smalltalk-all-in-one > image ideal, > because with this limited storage one cannot avoid the use of external > data stores > like databases (of course, exception: there always will be import an > export of objects/data) > > External data storage is complex, unreliable and should be something of > the past. > I'd suggest: don't waste time with this. rather improve the Smalltalk > internal data handling. > > I wish to have a very large image so e.q. I can have an entire > company's administration > (client and accounting data etc.) alive in it. > I want to use Collections and descendants, not external databases. > So, everything within the image. > > The beautiful (although opinions may vary) principle > of a Smalltalk environment is Image Persistence, isn't it? > > IMHO this requires: > - virtually unlimited image memory size. > -no external databases like e.g. Mongo or DB2 whatever. > -no persistent external storage at all. > > -You should see the (currently still) faster database performance as a > challenge > to improve the Collection and other related data handling classes > > -You would have to completely restructure the architecture. > or just extend the address width: 64 bits? > > what about an image of virtually unlimited size > gigabytes so > > (for now, if it goes beyond the physical memory size (say 32 GB?) > one could use a VM that works with virtual storage based VM > Virtual storage was first in use successfully with IBM mainframes > in the 1970s (and still is) ) > > on the other hand, I estimate holographic? physical memory > going in Terabytes is about to be available in 5-10 years from now. > blindingly fast, compared with today's standards. > > So, one has about 5 years to come up with such a system :o) > if one does, Smalltalk is ahead of everything then.. > > Don't laugh. One didn't even dream of 4 gigabytes of memory > during the Commodore 64 era. 1982-1994. > > In short: what would have to be changed to enable this? > > Some thoughts and speculations? > Anyone for tennis? > > Thanks > Ted > > > > > > -- Janko Mivšek Aida/Web Smalltalk Web Application Server http://www.aidaweb.si |
In reply to this post by TedVanGaalen
On Feb 29, 2012, at 12:59 PM, Janko Mivšek wrote: > Hi Ted, > > I also support the image as a sole persistence idea and I'm actually > using it for years on VisualWorks. > > For Pharo there seems everyone is avoiding this idea, mostly because of > image corruption fear. Well, we need to improve the robustness of Pharo > image and VM then this fear will vanish. It is obviously doable, if VW > is reliable enough, why not once Pharo? > Here is a link: http://code.google.com/p/pharo/issues/list It's much better than talking. Marcus -- Marcus Denker -- http://marcusdenker.de |
On Wed, Feb 29, 2012 at 1:06 PM, Marcus Denker <[hidden email]> wrote:
> Someone needs to start fixing and improving. Why not, for example, you? Funny, I was also inclined in last few days to post how Pharo would need to stabilize VM at least a bit, and the reason I did not was that I have expected answer along these lines. Now I understand and I am grateful for all effort that goes on to improve Pharo, and also that helping hand is much more needed than people instructing what should be done by someone. But I do not think it is also good that many observations about pharo get quickly dismissed that way. For instance what would someone that tries Pharo, gets VM crash, complains, gets the "go-hack-yourself-vm answer" think about this whole smalltalk business? "Smalltalk - I am going to use that, yeah right." So yes, complaints can be complete noise, but sometimes they can also be a small contribution. Orders of magnitude smaller than committing code for sure, but still valuable. Davorin Rusevljan http://www.cloud208.com/ |
Incidentally, we had a little chat with Marcus yesterday about that.
No, i don't think it is feasible to use single image to store everything. It is convenient, cheap and of course it is way better than dealing with communicating with external DB/servers whatever. But there's one thing you should know already: the days of vertical growth is over. Running a service (under VM or not) on a single machine is asking for troubles: - limits on load - susceptible to power outage and other reliability problems etc Also, think that the amount of data you need to process correlates with CPU horsepower available. Which means that yes, you can run a huge image with 64Gb data in it.. but that means that responsiveness of your service will quite often fall beyond any usability limits. If we look in terms of VM and pick only one thing - garbage collection, you will see that there is certain limits beyond which a performance will drop too much, so you naturally will start thinking about ways to split data to separate chunks and run them on different machines/VMs. It is because GC's mark algorithm is O(n) bound, when n is total number of references between objects, and GC's scavenge algorithm is at best O(n) bound where n is total number of objects in object memory, and at worst is where n is total memory used by objects. No matter how you turn it, i just wanted to indicate that time to run GC is in linear dependency from the amount of data. Yes, we might invest a lot of effort in making GC more clever, more complex and more robust.. but no matter what you do, you cannot change the above facts. It means, that any improvements will be about diminishing returns, but won't change the picture radically. That means that sooner or later you will have to deal with it: a problem of splitting data on multiple independent chunks, and making your service to run on multiple machines , in order to use more CPU power, more memory and be more reliable etc. At this point, your main dilemma is to invent a fast and robust interfaces to communicate between images or between image(s)/ database etc. We should concentrate on things which dealing with inter-image communication and image-database communication, because it is the only way to ensure that we will answer upcoming future problems. Relying on using a single huge image is way to nowhere. -- Best regards, Igor Stasenko. |
Good compromise is a step by step approach, something like:
1.step: image based persistence up to 1GB with hourly snapshot 2.step: parts migrated to Fuel and file based persistence 3.step: Gemstone, with "images" running in parallel (well, any DB with images in parallel, Gemstone is certainly the easiest to scale from image based persistence) 1GB limit is here just for simplicity, you can probably go further with 64bit images. Advantages: - very very simple start - freedom of pure OO modeling, - good enough for probably 90% of all projects - fastest way from your dreams to reality - speed of development - no impedance mismatch, no ORM nightmare - you won't believe how much data you can put in 1GB image - speed because of always in-memory data processing - you can always scale further if you make your design from the start with above steps in mind - reliability good enough, on reliable hardware probably even better than more complex solutions. Main reason: simplicity. Disadvantages: - you easy forgot to include later scalability requirements in upfront design - such scaling is easy only to OO database while migrating to NoSQL (not to mention SQL) database later is very hard if not possible - up to about 1GB only, because of GC problems as Igor described - active users limit (number of requests/s) - single point of failure - corrupted image will loose all data (but good backup approach helps) - undetected image corruption fear (after many otherwise successful snapshots, causing non-startable image) - lengthy snapshots of bigger images (can be improved with two step snapshots, first in memory, then on disk) - loss of data between snapshots in case of power or machine failure (but this is very rare those days) Bet regards Janko S, Igor Stasenko piše: > Incidentally, we had a little chat with Marcus yesterday about that. > > No, i don't think it is feasible to use single image to store everything. > It is convenient, cheap and of course it is way better than dealing > with communicating with external DB/servers whatever. > > But there's one thing you should know already: the days of vertical > growth is over. > > Running a service (under VM or not) on a single machine is asking for troubles: > - limits on load > - susceptible to power outage and other reliability problems > etc > > Also, think that the amount of data you need to process correlates > with CPU horsepower available. > Which means that yes, you can run a huge image with 64Gb data in it.. > but that means that responsiveness > of your service will quite often fall beyond any usability limits. > > If we look in terms of VM and pick only one thing - garbage collection, > you will see that there is certain limits beyond which a performance > will drop too much, so you naturally > will start thinking about ways to split data to separate chunks and > run them on different machines/VMs. > > It is because GC's mark algorithm is O(n) bound, when n is total > number of references between objects, > and GC's scavenge algorithm is at best O(n) bound where n is total > number of objects in object memory, > and at worst is where n is total memory used by objects. > No matter how you turn it, i just wanted to indicate that time to run > GC is in linear dependency from the amount of data. > > Yes, we might invest a lot of effort in making GC more clever, more > complex and more robust.. but no matter what you do, > you cannot change the above facts. It means, that any improvements > will be about diminishing returns, but won't change the picture > radically. > > That means that sooner or later you will have to deal with it: a > problem of splitting data on multiple independent chunks, > and making your service to run on multiple machines , in order to use > more CPU power, more memory and be more reliable etc. > At this point, your main dilemma is to invent a fast and robust > interfaces to communicate between images or between image(s)/ database > etc. > > We should concentrate on things which dealing with inter-image > communication and image-database communication, > because it is the only way to ensure that we will answer upcoming > future problems. Relying on using a single huge image is way to > nowhere. > -- Janko Mivšek Aida/Web Smalltalk Web Application Server http://www.aidaweb.si |
Free forum by Nabble | Edit this page |