Hello Squeak VM Guys, My name is Louis LaBrunda. I use Instantiations VA Smalltalk but dabble with Squeak from time to time. I have an outside-the-box way of implementing an object database for Smalltalk that I would like to see if there is anyone here who is interested in implementing. I understand the theory behind Smalltalk VMs (at least I think I do) but would require a large learning curve to actually modify one. This idea doesn't require the inventing or improving of any technology but it does require changes to the VM. For the purpose of describing this idea, I will deal with only one database and not go into binding to the database and other details like transaction processing and such. These things are of course important but I think they can be handled in very much standard ways that should not be changed by this means of implementing the object database. The idea is that the VM would treat the database file much like a CPU chip would treat RAM and would treat its (the VM) memory like a CPU chip would treat its internal (on-chip) cache. There would be a similar means of linking the data in memory to the data in the database as there is between linking a CPU chip's cache and RAM. A I said, I'm not very knowledgeable of the internal working of Smalltalk VMs, so much of what I am about to say is guess work but I think it is accurate. Objects represented in the memory of a Smalltalk VM probably take up about 12 bytes or so for 32 bit systems, more for 64 bit systems. Much of these bytes are bits that define the class. Some of the bytes might be the value of the object if it is say a small integer or a byte or character. If the data (value) of the object is larger than will fit in a few bytes, there is a pointer to the data. If the object has instance variables that are of course other objects, there are pointers to them. A bit would be needed to indicate a persisted object and probably another bit to indicate the object is dirty (changed and therefore doesn't match the database file copy). Objects with the persisted bit off would otherwise look and be treated the same as they are now. Objects with the persisted bit on would have all their pointers replaced with offsets from the beginning of the database file (a single file containing all the persisted objects. All objects pointed to by a persisted object must also be persisted objects. When the VM comes across a persisted object it would use the pointers (that are now offsets within the database file) as keys into a lookup table (hash table) to find the real pointer to the data in memory. If the item is found in the lookup table the value is used as it would have been if it was in the object and all is the same. If the item is not found in the lookup table the offset into the database file is used to read the object from the database. The lookup table would then be updated to include the new item. As far as I can tell the copies of the object in memory and in the database file can be identical (no object dumper/loader serialization). There may need to be a little bit of a wrapper in the database file but I don't think much. This should make for a very quick loading and saving of objects. Probably some objects, like blocks of code can't or shouldn't be saved to the database (I'm not sure if this is true for Squeak). But I don't think that is any different than systems that use object dumper/loader serialization. I think a low priority fork could run through the lookup table for objects with the dirty bit set and save them to the database file. A #persist (or some other good name) method could be added to #Object to force the saving of an object to the database. This would probably be implemented with a primitive but maybe not. There may be some changes needed for garbage collection to keep the lookup table up to date but I don't think that will be a big deal. Hopefully garbage collection for the database file could be handled mostly by Smalltalk code with the help of a few primitives. Well, that's it for now. I hope this has been an interesting read and not a waste of your time. If you think the idea has merit, let me know and we can discuss it further. Thank you very much for your time. Lou ----------------------------------------------------------- Louis LaBrunda Keystone Software Corp. SkypeMe callto://PhotonDemon mailto:[hidden email] http://www.Keystone-Software.com |
Hello and welcome. I think you'll find the Squeak VM to be quite adaptable to experiments like this. All the code for the object memory is written in Smalltalk (actually a limited subset of Smalltalk), so it is quite accessible and relatively easy to modify. If you have not already done so, try loading the VMMaker package from SqueakSource, and read the class comment of ObjectMemory for a description of the object memory organization and header formats (I'm not sure how familiar you are with Squeak at this point, so ask some questions if this is not clear). Also, read the "Back to the Future" paper for general background: http://ftp.squeak.org/docs/OOPSLA.Squeak.html Dave On Tue, Dec 29, 2009 at 10:10:11AM -0500, Louis LaBrunda wrote: > > Hello Squeak VM Guys, > > My name is Louis LaBrunda. I use Instantiations VA Smalltalk but dabble > with Squeak from time to time. > > I have an outside-the-box way of implementing an object database for > Smalltalk that I would like to see if there is anyone here who is > interested in implementing. I understand the theory behind Smalltalk VMs > (at least I think I do) but would require a large learning curve to > actually modify one. This idea doesn't require the inventing or improving > of any technology but it does require changes to the VM. > > For the purpose of describing this idea, I will deal with only one database > and not go into binding to the database and other details like transaction > processing and such. These things are of course important but I think they > can be handled in very much standard ways that should not be changed by > this means of implementing the object database. > > The idea is that the VM would treat the database file much like a CPU chip > would treat RAM and would treat its (the VM) memory like a CPU chip would > treat its internal (on-chip) cache. There would be a similar means of > linking the data in memory to the data in the database as there is between > linking a CPU chip's cache and RAM. > > A I said, I'm not very knowledgeable of the internal working of Smalltalk > VMs, so much of what I am about to say is guess work but I think it is > accurate. Objects represented in the memory of a Smalltalk VM probably > take up about 12 bytes or so for 32 bit systems, more for 64 bit systems. > Much of these bytes are bits that define the class. Some of the bytes > might be the value of the object if it is say a small integer or a byte or > character. If the data (value) of the object is larger than will fit in a > few bytes, there is a pointer to the data. If the object has instance > variables that are of course other objects, there are pointers to them. > > A bit would be needed to indicate a persisted object and probably another > bit to indicate the object is dirty (changed and therefore doesn't match > the database file copy). Objects with the persisted bit off would > otherwise look and be treated the same as they are now. Objects with the > persisted bit on would have all their pointers replaced with offsets from > the beginning of the database file (a single file containing all the > persisted objects. All objects pointed to by a persisted object must also > be persisted objects. > > When the VM comes across a persisted object it would use the pointers (that > are now offsets within the database file) as keys into a lookup table (hash > table) to find the real pointer to the data in memory. If the item is > found in the lookup table the value is used as it would have been if it was > in the object and all is the same. If the item is not found in the lookup > table the offset into the database file is used to read the object from the > database. The lookup table would then be updated to include the new item. > > As far as I can tell the copies of the object in memory and in the database > file can be identical (no object dumper/loader serialization). There may > need to be a little bit of a wrapper in the database file but I don't think > much. This should make for a very quick loading and saving of objects. > > Probably some objects, like blocks of code can't or shouldn't be saved to > the database (I'm not sure if this is true for Squeak). But I don't think > that is any different than systems that use object dumper/loader > serialization. > > I think a low priority fork could run through the lookup table for objects > with the dirty bit set and save them to the database file. A #persist (or > some other good name) method could be added to #Object to force the saving > of an object to the database. This would probably be implemented with a > primitive but maybe not. > > There may be some changes needed for garbage collection to keep the lookup > table up to date but I don't think that will be a big deal. Hopefully > garbage collection for the database file could be handled mostly by > Smalltalk code with the help of a few primitives. > > Well, that's it for now. I hope this has been an interesting read and not > a waste of your time. If you think the idea has merit, let me know and we > can discuss it further. > > Thank you very much for your time. > > Lou > ----------------------------------------------------------- > Louis LaBrunda > Keystone Software Corp. > SkypeMe callto://PhotonDemon > mailto:[hidden email] http://www.Keystone-Software.com |
In reply to this post by Louis LaBrunda
OOZE and LOOM by Ted Kaehler, et al did this kind of thing. Here's a link to the 1981 article on OOZE: It mentions LOOM, but doesn't go into detail...I think the more detailed LOOM paper(s) are in the ACM digital library.
- Stephen On Tue, Dec 29, 2009 at 10:10 AM, Louis LaBrunda <[hidden email]> wrote:
|
Stephen Pair wrote: > OOZE and LOOM by Ted Kaehler, et al did this kind of thing. Here's a link to the 1981 article on OOZE:http://www-cs-students.stanford.edu/~eswierk/misc/kaehler81/ > It mentions LOOM, but doesn't go into detail...I think the more detailed LOOM paper(s) are in the ACM digital library. There are some papers, but chapter 14 of the "green book" is probably the best place to learn about LOOM. The book is available at http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/ Though LOOM (and OOZE) are actually virtual memory systems rather than databases, if you don't define what you mean by "database" then they are probably good enough for most uses. Going more in the direction of industry standard databases, Gemstone is a great example of what can be done in Smalltalk. This paper (which I can't read right now) probably has some information about it: http://portal.acm.org/citation.cfm?id=125223.125254 -- Jecel |
In reply to this post by Stephen Pair
Hi Stephen, Thanks for the reference. OOZE and maybe LOOM (I couldn't see much about LOOM) seem to be virtual memory for objects. A way to expand the size of memory. I'm talking about an object database built with virtual memory ideas. I know databases are ways to expand the size of memory but I'm looking at their persistence feature and not making memory look bigger. In my scheme, the lookup table is used to find persisted (database only) objects in memory. Non database objects are NOT in the lookup table. Other than the time it takes to test if an object is persisted (a bit that indicates it is in the database) processing of non database objects is normal. Database objects need a little more work. If they are in the lookup table, they are easily found in memory. If not in the lookup table, they can be read from the database and the lookup table updated. Lou >OOZE and LOOM by Ted Kaehler, et al did this kind of thing. Here's a link to the 1981 article on OOZE:http://www-cs-students.stanford.edu/~eswierk/misc/kaehler81/ >It mentions LOOM, but doesn't go into detail...I think the more detailed LOOM paper(s) are in the ACM digital library. > >- Stephen ----------------------------------------------------------- Louis LaBrunda Keystone Software Corp. SkypeMe callto://PhotonDemon mailto:[hidden email] http://www.Keystone-Software.com |
On Tue, Dec 29, 2009 at 11:56 AM, Louis LaBrunda <[hidden email]> wrote: Hi Stephen, Yes, I know, but you will find that you will face many of the same issues that OOZE and LOOM dealt with. I actually implemented a system much like what you are describing in squeak a number of years ago. I used BerkeleyDB as my object storage. It was possible to connect multiple squeak processes to a common database. There was a transactional system that let me track changes to disk based objects and commit them. You could work with disk based objects transparently. Working with the squeak VM was challenging in the sense that it is all very highly tuned for optimal memory use, fast GC, etc. I had to perform a lot of system tracing to transform squeak images to my new object memory layout, etc. To fault in objects quickly, I had to implement a fast become operation. Since squeak has no object table, I implemented a forwarder capability that would transform any object into a forwarder to another object by setting a header bit, then using the class pointer to point to the target object (which then necessitated doing away with the compact class header format). GC would sweep the forwarders away when it ran. IIRC, I managed to this with something like a 10% performance and memory hit.
I got to a point where I realized I needed to also be able to persist classes and move them among different squeak images that might have different versions of like named classes and so forth (so you get into namespace issues). I eventually ran out of steam and abandoned the project. Croquet was also just starting up at the time, so I felt they would eventually solve many of these issues.
With that experience, I now believe you really need a new language (that deals with namespace and security issues ala Newspeak) and COLA (VPRI research) like VM architectures (that are easily customized) to explore things like this...I'm hoping such things are not that far off.
- Stephen |
In reply to this post by Stephen Pair
Hi Jecel, >Stephen Pair wrote: >> OOZE and LOOM by Ted Kaehler, et al did this kind of thing. Here's a link to the 1981 article on OOZE:http://www-cs-students.stanford.edu/~eswierk/misc/kaehler81/ >> It mentions LOOM, but doesn't go into detail...I think the more detailed LOOM paper(s) are in the ACM digital library. > >There are some papers, but chapter 14 of the "green book" is probably >the best place to learn about LOOM. The book is available at > >http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/ > >Though LOOM (and OOZE) are actually virtual memory systems rather than >databases, if you don't define what you mean by "database" then they are >probably good enough for most uses. Going more in the direction of >industry standard databases, Gemstone is a great example of what can be >done in Smalltalk. This paper (which I can't read right now) probably >has some information about it: >http://portal.acm.org/citation.cfm?id=125223.125254 >-- Jecel I don't know a lot about Gemstone or how it is implemented but an object database is what I am trying to achieve. In VA Smalltalk there is Voss from Logicarts http://voss.logicarts.com/ and Tenacity from TotallyObjects http://www.totallyobjects.com/tenacity.htm. I think both are very good especially Voss. Both are written is Smalltalk without modification of the VM. I think both use proxy objects to link the object in memory with the database. I believe they save/read objects to/from the database (made up of very many small files) with an object dumper/loader. My idea (if it can work) uses one file (or at least very few) and doesn't use an object dumper/loader. I think this may make things faster and simpler. By simpler, I mean much less Smalltalk code, no proxy objects, easier backup of the database (since it is just one file). The lack of use of the object dumper/loader may require more work if an object definition changes. Lou ----------------------------------------------------------- Louis LaBrunda Keystone Software Corp. SkypeMe callto://PhotonDemon mailto:[hidden email] http://www.Keystone-Software.com |
On Tue, Dec 29, 2009 at 02:33:36PM -0500, Louis LaBrunda wrote: > > I don't know a lot about Gemstone or how it is implemented but an object > database is what I am trying to achieve. In VA Smalltalk there is Voss > from Logicarts http://voss.logicarts.com/ and Tenacity from TotallyObjects > http://www.totallyobjects.com/tenacity.htm. I think both are very good > especially Voss. Both are written is Smalltalk without modification of the > VM. I think both use proxy objects to link the object in memory with the > database. I believe they save/read objects to/from the database (made up > of very many small files) with an object dumper/loader. You will also want to have a look at Magma, which is written in Squeak: http://wiki.squeak.org/squeak/2665 Dave |
Free forum by Nabble | Edit this page |