Documentation about Gemstone internal info :)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Documentation about Gemstone internal info :)

Mariano Martinez Peck
Hi. I am just curious about some details of the Gemstone implementation. Please forgive me if the questions are not appropriate. I read the Programmer Guide but I didn't find the answers.

So, basically I would like to know some details of the VM and Object Memory, for example:

1) I know that Gemstone uses ObjectTables. With them, I can think different pros:
   - a fast #become since it is just changing one pointer and not a full scan
   - when using proxies/stubs you probably may need a #become just after
   - gemstone uses stubs a lot (I guess)
   - Having an OT may help when trying to implement kind of virtual memory/swapping/pagging system (like Gemstone).
   - split the OT in "regions" and place them there?

Of course not everything is pink, and you can have a big impact in performance since you have for example the cost of accessing once more (the indirection of the OT), and problable much more things.

So, I wonder, which are the reasons of Gemstone behind this decision of using ObjectTable?

2) There is one ObjectTable per Gem or per Stone?

3) Memory addresses.  in LOOM for example, they used different memory address for disk (32) and ram (16). Now, I guess you can do the same with 32/64 or directly use 64bits in both. how do you do?  if it is not the same, how do you map from one to the other?

4) I guess gemstone uses stubs a lot when moving objects from primary memory to secondary memory. Now, you don't create a stub instance (with object header and blah) because of the size, so you can mark them directly in the address/reference. So....you can mark an address as stub?  I guess yes. Then, how do you know where on disk is that stub for example?  you can encode such info also in the same reference ? (if you use 64bits in both I guess that yes)

5) How do you trace object usage? I mean the shared object cache can move objects from disk to primary memory and the other way around. How it selects which objects to move?  how do you know which objects has been used frequently ? 

Thanks in advance,

Mariano

ps: don't worry, I won't create my own gemstone hahahahha I just want to learn :)
ahh if the info is private I can just undertand that or you can send me a private email.
Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

James Foster-8
Hi Mariano,

Last year I did a half-day tutorial in Buenos Aires that covered many of these questions. You can watch the videos by following links at http://programminggems.wordpress.com/2010/02/05/scaling-objects-videos/. In particular, see the second video on "Object format and pointers" that discusses exactly these issues. Other comments follow...

On Jan 6, 2011, at 3:34 AM, Mariano Martinez Peck wrote:

> Hi. I am just curious about some details of the Gemstone implementation. Please forgive me if the questions are not appropriate. I read the Programmer Guide but I didn't find the answers.
>
> So, basically I would like to know some details of the VM and Object Memory, for example:
>
> 1) I know that Gemstone uses ObjectTables. With them, I can think different pros:
>    - a fast #become since it is just changing one pointer and not a full scan
>    - when using proxies/stubs you probably may need a #become just after
>    - gemstone uses stubs a lot (I guess)
>    - Having an OT may help when trying to implement kind of virtual memory/swapping/pagging system (like Gemstone).
>    - split the OT in "regions" and place them there?
>
> Of course not everything is pink, and you can have a big impact in performance since you have for example the cost of accessing once more (the indirection of the OT), and problable much more things.
>
> So, I wonder, which are the reasons of Gemstone behind this decision of using ObjectTable?

I believe that there are two main reasons. The first is to support multiple database views (discussed below in #2). The second is to facilitate repository-wide garbage-collection (GC). With an in-memory GC in a traditional Smalltalk VM (i.e., Squeak), when the VM moves an object it has to scan memory and fix all the pointers. With a disk-based object space of hundreds of GB, this could take hours and would be impractical.

> 2) There is one ObjectTable per Gem or per Stone?

The is actually one (virtual) Object Table (OT) per committed transaction! (A number of optimizations exist to reduce the space taken by each additional OT.) This is necessary because two views of the database might need to see different versions of the same object. Since the object is the identical object in each view, it must have the same identifier, but if it was changed in a transaction, it must have different data. If it has different data, it must have different storage.

An OT needs to be preserved as long as there is a transaction view relying on the information in the OT. An old OT can only be released when there is no longer a session relying on its view (or an older view). This is the basis of the notorious "Commit Record Backlog" (CRB) problem that can plague a GemStone system in which a logged-in session sits on one view for a long time while other sessions are doing many commits.

> 3) Memory addresses.  in LOOM for example, they used different memory address for disk (32) and ram (16). Now, I guess you can do the same with 32/64 or directly use 64bits in both. how do you do?  if it is not the same, how do you map from one to the other?

Each Gem (VM) that references an object copies the object from shared memory (and into shared memory from disk if needed) into the local VM memory. Each Gem maintains a map of an object's ID (entry in the OT) to its local, in-memory address. When an in-memory object references another object, the Gem checks first to see if the object is already loaded and if so, it updates the reference from the ID to the pointer so subsequent lookups do not pay the cost of the indirection. This keeps in-memory references inexpensive. When a modified object is committed, its instance variables are converted from in-memory pointers back to Object Table Identifiers before being written to the shared memory (and then to disk) where other Gems can see the object's instance variables.

An Object Table Identifier (OOP) always has 001 as the low-order three bits, while in-memory pointers always have 000 as the low-order three bits. (Other values for the low-order three bits indicate special or immediate objects such as instances of SmallInteger, Character, Boolean, UndefinedObject, and SmallDouble).

> 4) I guess gemstone uses stubs a lot when moving objects from primary memory to secondary memory. Now, you don't create a stub instance (with object header and blah) because of the size, so you can mark them directly in the address/reference. So....you can mark an address as stub?  I guess yes. Then, how do you know where on disk is that stub for example?  you can encode such info also in the same reference ? (if you use 64bits in both I guess that yes)

I'm not sure what you mean by 'stubs' in this context. Every object takes up space in-memory (if a Gem is actively referencing it) and on disk. If a Gem gets constrained on space, it can remove unmodified persistent objects from RAM by replacing in-memory references with the OOP.

> 5) How do you trace object usage? I mean the shared object cache can move objects from disk to primary memory and the other way around. How it selects which objects to move?  how do you know which objects has been used frequently ?  

Every object is associated with a "page" and the Shared Page Cache keeps a record of the requests that Gems make for a page. Recently-referenced pages stay in the cache while not-recently referenced pages are eligible for being flushed from the cache.

> Thanks in advance,
>
> Mariano
>
> ps: don't worry, I won't create my own gemstone hahahahha I just want to learn :)
> ahh if the info is private I can just undertand that or you can send me a private email.


I hope this helps,

James Foster

Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

Mariano Martinez Peck


On Thu, Jan 6, 2011 at 3:12 PM, James Foster <[hidden email]> wrote:
Hi Mariano,

Last year I did a half-day tutorial in Buenos Aires that covered many of these questions. You can watch the videos by following links at http://programminggems.wordpress.com/2010/02/05/scaling-objects-videos/. In particular, see the second video on "Object format and pointers" that discusses exactly these issues. Other comments follow...


Hi James. Sorry for the late answer. But I had several videos to watch ;)
I wonder which talks were at the same time that I didn't go to your Gemstone lesson. I am regrated hahahaha...fortunatly there are videos :)
The video explains pretty much all my questions.
 
On Jan 6, 2011, at 3:34 AM, Mariano Martinez Peck wrote:

> Hi. I am just curious about some details of the Gemstone implementation. Please forgive me if the questions are not appropriate. I read the Programmer Guide but I didn't find the answers.
>
> So, basically I would like to know some details of the VM and Object Memory, for example:
>
> 1) I know that Gemstone uses ObjectTables. With them, I can think different pros:
>    - a fast #become since it is just changing one pointer and not a full scan
>    - when using proxies/stubs you probably may need a #become just after
>    - gemstone uses stubs a lot (I guess)
>    - Having an OT may help when trying to implement kind of virtual memory/swapping/pagging system (like Gemstone).
>    - split the OT in "regions" and place them there?
>
> Of course not everything is pink, and you can have a big impact in performance since you have for example the cost of accessing once more (the indirection of the OT), and problable much more things.
>
> So, I wonder, which are the reasons of Gemstone behind this decision of using ObjectTable?

I believe that there are two main reasons. The first is to support multiple database views (discussed below in #2). The second is to facilitate repository-wide garbage-collection (GC). With an in-memory GC in a traditional Smalltalk VM (i.e., Squeak), when the VM moves an object it has to scan memory and fix all the pointers. With a disk-based object space of hundreds of GB, this could take hours and would be impractical.

Ok...(offtopic now) after watching the videos, it is not clear for me how is the implementation of the individual online gem GC. Mark and sweep? scavenging ?
 

> 2) There is one ObjectTable per Gem or per Stone?

The is actually one (virtual) Object Table (OT) per committed transaction! (A number of optimizations exist to reduce the space taken by each additional OT.)

WOW. I wonder how you can do those optimizations and not blowup in terabytes of memory used by OTs hahhahahaha
 
This is necessary because two views of the database might need to see different versions of the same object. Since the object is the identical object in each view, it must have the same identifier, but if it was changed in a transaction, it must have different data. If it has different data, it must have different storage.


:)
 
An OT needs to be preserved as long as there is a transaction view relying on the information in the OT. An old OT can only be released when there is no longer a session relying on its view (or an older view). This is the basis of the notorious "Commit Record Backlog" (CRB) problem that can plague a GemStone system in which a logged-in session sits on one view for a long time while other sessions are doing many commits.

ok
 

> 3) Memory addresses.  in LOOM for example, they used different memory address for disk (32) and ram (16). Now, I guess you can do the same with 32/64 or directly use 64bits in both. how do you do?  if it is not the same, how do you map from one to the other?

Each Gem (VM) that references an object copies the object from shared memory (and into shared memory from disk if needed) into the local VM memory. Each Gem maintains a map of an object's ID (entry in the OT) to its local, in-memory address. When an in-memory object references another object, the Gem checks first to see if the object is already loaded and if so, it updates the reference from the ID to the pointer so subsequent lookups do not pay the cost of the indirection. This keeps in-memory references inexpensive. When a modified object is committed, its instance variables are converted from in-memory pointers back to Object Table Identifiers before being written to the shared memory (and then to disk) where other Gems can see the object's instance variables.

An Object Table Identifier (OOP) always has 001 as the low-order three bits, while in-memory pointers always have 000 as the low-order three bits. (Other values for the low-order three bits indicate special or immediate objects such as instances of SmallInteger, Character, Boolean, UndefinedObject, and SmallDouble).


thanks. I got it from the videos.
 
> 4) I guess gemstone uses stubs a lot when moving objects from primary memory to secondary memory. Now, you don't create a stub instance (with object header and blah) because of the size, so you can mark them directly in the address/reference. So....you can mark an address as stub?  I guess yes. Then, how do you know where on disk is that stub for example?  you can encode such info also in the same reference ? (if you use 64bits in both I guess that yes)

I'm not sure what you mean by 'stubs' in this context. Every object takes up space in-memory (if a Gem is actively referencing it) and on disk. If a Gem gets constrained on space, it can remove unmodified persistent objects from RAM by replacing in-memory references with the OOP.

Yes, my question was if Gemstone was able to mark that an object was on disk with the OOP itself or if it needed a proxy object or something like that.
It is clear now.
 

> 5) How do you trace object usage? I mean the shared object cache can move objects from disk to primary memory and the other way around. How it selects which objects to move?  how do you know which objects has been used frequently ?

Every object is associated with a "page" and the Shared Page Cache keeps a record of the requests that Gems make for a page. Recently-referenced pages stay in the cache while not-recently referenced pages are eligible for being flushed from the cache.

Clear. Now, I wonder HOW an object is associated with a page. Because of GC and maybe because of object usage (when an object is used, maybe its instance varaibles are used too), it would make sense to group an object and its instVar in the same page. But at the same time, you have pages that you need to complete. So...is there a strategy in which page to choose for a certain object?

Last question, totally offtopic, (sorry, I don't have a gemstone handly), for hashed collections, the #identityHash   is the Object ID ?
 

> Thanks in advance,
>
> Mariano
>
> ps: don't worry, I won't create my own gemstone hahahahha I just want to learn :)
> ahh if the info is private I can just undertand that or you can send me a private email.


I hope this helps,


yes, a lot!

Thanks James for all your answers.

Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

EstebanLM
In reply to this post by James Foster-8
just seen it. Really cool videos, thanks James for sharing them :)

El 06/01/2011, a las 11:12a.m., James Foster escribió:

> Hi Mariano,
>
> Last year I did a half-day tutorial in Buenos Aires that covered many of these questions. You can watch the videos by following links at http://programminggems.wordpress.com/2010/02/05/scaling-objects-videos/. In particular, see the second video on "Object format and pointers" that discusses exactly these issues. Other comments follow...
>
> On Jan 6, 2011, at 3:34 AM, Mariano Martinez Peck wrote:
>
>> Hi. I am just curious about some details of the Gemstone implementation. Please forgive me if the questions are not appropriate. I read the Programmer Guide but I didn't find the answers.
>>
>> So, basically I would like to know some details of the VM and Object Memory, for example:
>>
>> 1) I know that Gemstone uses ObjectTables. With them, I can think different pros:
>>   - a fast #become since it is just changing one pointer and not a full scan
>>   - when using proxies/stubs you probably may need a #become just after
>>   - gemstone uses stubs a lot (I guess)
>>   - Having an OT may help when trying to implement kind of virtual memory/swapping/pagging system (like Gemstone).
>>   - split the OT in "regions" and place them there?
>>
>> Of course not everything is pink, and you can have a big impact in performance since you have for example the cost of accessing once more (the indirection of the OT), and problable much more things.
>>
>> So, I wonder, which are the reasons of Gemstone behind this decision of using ObjectTable?
>
> I believe that there are two main reasons. The first is to support multiple database views (discussed below in #2). The second is to facilitate repository-wide garbage-collection (GC). With an in-memory GC in a traditional Smalltalk VM (i.e., Squeak), when the VM moves an object it has to scan memory and fix all the pointers. With a disk-based object space of hundreds of GB, this could take hours and would be impractical.
>
>> 2) There is one ObjectTable per Gem or per Stone?
>
> The is actually one (virtual) Object Table (OT) per committed transaction! (A number of optimizations exist to reduce the space taken by each additional OT.) This is necessary because two views of the database might need to see different versions of the same object. Since the object is the identical object in each view, it must have the same identifier, but if it was changed in a transaction, it must have different data. If it has different data, it must have different storage.
>
> An OT needs to be preserved as long as there is a transaction view relying on the information in the OT. An old OT can only be released when there is no longer a session relying on its view (or an older view). This is the basis of the notorious "Commit Record Backlog" (CRB) problem that can plague a GemStone system in which a logged-in session sits on one view for a long time while other sessions are doing many commits.
>
>> 3) Memory addresses.  in LOOM for example, they used different memory address for disk (32) and ram (16). Now, I guess you can do the same with 32/64 or directly use 64bits in both. how do you do?  if it is not the same, how do you map from one to the other?
>
> Each Gem (VM) that references an object copies the object from shared memory (and into shared memory from disk if needed) into the local VM memory. Each Gem maintains a map of an object's ID (entry in the OT) to its local, in-memory address. When an in-memory object references another object, the Gem checks first to see if the object is already loaded and if so, it updates the reference from the ID to the pointer so subsequent lookups do not pay the cost of the indirection. This keeps in-memory references inexpensive. When a modified object is committed, its instance variables are converted from in-memory pointers back to Object Table Identifiers before being written to the shared memory (and then to disk) where other Gems can see the object's instance variables.
>
> An Object Table Identifier (OOP) always has 001 as the low-order three bits, while in-memory pointers always have 000 as the low-order three bits. (Other values for the low-order three bits indicate special or immediate objects such as instances of SmallInteger, Character, Boolean, UndefinedObject, and SmallDouble).
>
>> 4) I guess gemstone uses stubs a lot when moving objects from primary memory to secondary memory. Now, you don't create a stub instance (with object header and blah) because of the size, so you can mark them directly in the address/reference. So....you can mark an address as stub?  I guess yes. Then, how do you know where on disk is that stub for example?  you can encode such info also in the same reference ? (if you use 64bits in both I guess that yes)
>
> I'm not sure what you mean by 'stubs' in this context. Every object takes up space in-memory (if a Gem is actively referencing it) and on disk. If a Gem gets constrained on space, it can remove unmodified persistent objects from RAM by replacing in-memory references with the OOP.
>
>> 5) How do you trace object usage? I mean the shared object cache can move objects from disk to primary memory and the other way around. How it selects which objects to move?  how do you know which objects has been used frequently ?  
>
> Every object is associated with a "page" and the Shared Page Cache keeps a record of the requests that Gems make for a page. Recently-referenced pages stay in the cache while not-recently referenced pages are eligible for being flushed from the cache.
>
>> Thanks in advance,
>>
>> Mariano
>>
>> ps: don't worry, I won't create my own gemstone hahahahha I just want to learn :)
>> ahh if the info is private I can just undertand that or you can send me a private email.
>
>
> I hope this helps,
>
> James Foster
>

Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

James Foster-8
In reply to this post by Mariano Martinez Peck

On Jan 11, 2011, at 2:36 AM, Mariano Martinez Peck wrote:

On Thu, Jan 6, 2011 at 3:12 PM, James Foster <[hidden email]<mailto:[hidden email]>> wrote:
Hi Mariano,

Last year I did a half-day tutorial in Buenos Aires that covered many of these questions. You can watch the videos by following links at http://programminggems.wordpress.com/2010/02/05/scaling-objects-videos/. In particular, see the second video on "Object format and pointers" that discusses exactly these issues. Other comments follow...

Hi James. Sorry for the late answer. But I had several videos to watch ;)
I wonder which talks were at the same time that I didn't go to your Gemstone lesson. I am regrated hahahaha...fortunatly there are videos :)

Actually, the videos were from the 2009 conference. I still haven't posted the videos from the 2010 conference.

The video explains pretty much all my questions.

On Jan 6, 2011, at 3:34 AM, Mariano Martinez Peck wrote:

> Hi. I am just curious about some details of the Gemstone implementation. Please forgive me if the questions are not appropriate. I read the Programmer Guide but I didn't find the answers.
>
> So, basically I would like to know some details of the VM and Object Memory, for example:
>
> 1) I know that Gemstone uses ObjectTables. With them, I can think different pros:
>    - a fast #become since it is just changing one pointer and not a full scan
>    - when using proxies/stubs you probably may need a #become just after
>    - gemstone uses stubs a lot (I guess)
>    - Having an OT may help when trying to implement kind of virtual memory/swapping/pagging system (like Gemstone).
>    - split the OT in "regions" and place them there?
>
> Of course not everything is pink, and you can have a big impact in performance since you have for example the cost of accessing once more (the indirection of the OT), and problable much more things.
>
> So, I wonder, which are the reasons of Gemstone behind this decision of using ObjectTable?

I believe that there are two main reasons. The first is to support multiple database views (discussed below in #2). The second is to facilitate repository-wide garbage-collection (GC). With an in-memory GC in a traditional Smalltalk VM (i.e., Squeak), when the VM moves an object it has to scan memory and fix all the pointers. With a disk-based object space of hundreds of GB, this could take hours and would be impractical.

Ok...(offtopic now) after watching the videos, it is not clear for me how is the implementation of the individual online gem GC. Mark and sweep? scavenging ?

I'm not as familiar with the in-Gem activity, but yes, there are elements of both (and statistics to monitor it as well).

 > 2) There is one ObjectTable per Gem or per Stone?

The is actually one (virtual) Object Table (OT) per committed transaction! (A number of optimizations exist to reduce the space taken by each additional OT.)

WOW. I wonder how you can do those optimizations and not blowup in terabytes of memory used by OTs hahhahahaha

Of the many impressive things about GemStone, this is indeed one of the more striking ones!

This is necessary because two views of the database might need to see different versions of the same object. Since the object is the identical object in each view, it must have the same identifier, but if it was changed in a transaction, it must have different data. If it has different data, it must have different storage.

:)

An OT needs to be preserved as long as there is a transaction view relying on the information in the OT. An old OT can only be released when there is no longer a session relying on its view (or an older view). This is the basis of the notorious "Commit Record Backlog" (CRB) problem that can plague a GemStone system in which a logged-in session sits on one view for a long time while other sessions are doing many commits.

ok


> 3) Memory addresses.  in LOOM for example, they used different memory address for disk (32) and ram (16). Now, I guess you can do the same with 32/64 or directly use 64bits in both. how do you do?  if it is not the same, how do you map from one to the other?

Each Gem (VM) that references an object copies the object from shared memory (and into shared memory from disk if needed) into the local VM memory. Each Gem maintains a map of an object's ID (entry in the OT) to its local, in-memory address. When an in-memory object references another object, the Gem checks first to see if the object is already loaded and if so, it updates the reference from the ID to the pointer so subsequent lookups do not pay the cost of the indirection. This keeps in-memory references inexpensive. When a modified object is committed, its instance variables are converted from in-memory pointers back to Object Table Identifiers before being written to the shared memory (and then to disk) where other Gems can see the object's instance variables.

An Object Table Identifier (OOP) always has 001 as the low-order three bits, while in-memory pointers always have 000 as the low-order three bits. (Other values for the low-order three bits indicate special or immediate objects such as instances of SmallInteger, Character, Boolean, UndefinedObject, and SmallDouble).


thanks. I got it from the videos.

> 4) I guess gemstone uses stubs a lot when moving objects from primary memory to secondary memory. Now, you don't create a stub instance (with object header and blah) because of the size, so you can mark them directly in the address/reference. So....you can mark an address as stub?  I guess yes. Then, how do you know where on disk is that stub for example?  you can encode such info also in the same reference ? (if you use 64bits in both I guess that yes)

I'm not sure what you mean by 'stubs' in this context. Every object takes up space in-memory (if a Gem is actively referencing it) and on disk. If a Gem gets constrained on space, it can remove unmodified persistent objects from RAM by replacing in-memory references with the OOP.

Yes, my question was if Gemstone was able to mark that an object was on disk with the OOP itself or if it needed a proxy object or something like that.
It is clear now.


> 5) How do you trace object usage? I mean the shared object cache can move objects from disk to primary memory and the other way around. How it selects which objects to move?  how do you know which objects has been used frequently ?

Every object is associated with a "page" and the Shared Page Cache keeps a record of the requests that Gems make for a page. Recently-referenced pages stay in the cache while not-recently referenced pages are eligible for being flushed from the cache.

Clear. Now, I wonder HOW an object is associated with a page. Because of GC and maybe because of object usage (when an object is used, maybe its instance varaibles are used too), it would make sense to group an object and its instVar in the same page. But at the same time, you have pages that you need to complete. So...is there a strategy in which page to choose for a certain object?

By default, a modified object is simply placed on the next available (otherwise empty) page. A GC will, likewise, simply move live objects from partially-filled pages to a new page in attempt to increase the packing. To address the issue of having a objects used together close together, GemStone provides "Clustering" in which you can specify a virtual "Cluster Bucket" and associate an object with a bucket. In this case, the object will be written to a page containing only objects also associated with that bucket. The GC process preserves this property by identifying a page with a bucket and when a page is partially filled, live objects will be moved to a new page with other objects from the same bucket. (Details are in the documentation and in the comments to the #'cluster' method and friends.)

Last question, totally offtopic, (sorry, I don't have a gemstone handly), for hashed collections, the #identityHash   is the Object ID ?

I'm not sure how it is implemented at the C level (in Smalltalk it just calls a primitive), but I'll ask around (or maybe someone else can jump in).

> Thanks in advance,
>
> Mariano
>
> ps: don't worry, I won't create my own gemstone hahahahha I just want to learn :)
> ahh if the info is private I can just undertand that or you can send me a private email.

I hope this helps,

yes, a lot!

Thanks James for all your answers.

Glad to help! GemStone is cool technology and I enjoy having it better understood.

James

Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

Mariano Martinez Peck


On Tue, Jan 11, 2011 at 3:30 PM, James Foster <[hidden email]> wrote:

On Jan 11, 2011, at 2:36 AM, Mariano Martinez Peck wrote:

On Thu, Jan 6, 2011 at 3:12 PM, James Foster <[hidden email]<mailto:[hidden email]>> wrote:
Hi Mariano,

Last year I did a half-day tutorial in Buenos Aires that covered many of these questions. You can watch the videos by following links at http://programminggems.wordpress.com/2010/02/05/scaling-objects-videos/. In particular, see the second video on "Object format and pointers" that discusses exactly these issues. Other comments follow...

Hi James. Sorry for the late answer. But I had several videos to watch ;)
I wonder which talks were at the same time that I didn't go to your Gemstone lesson. I am regrated hahahaha...fortunatly there are videos :)

Actually, the videos were from the 2009 conference. I still haven't posted the videos from the 2010 conference.


yes, I noticed. Anyway, my comment is still valid. I went to all Smalltalks (from 2007 to 2010) and I still regret missing this one ;)
 

  > 2) There is one ObjectTable per Gem or per Stone?

The is actually one (virtual) Object Table (OT) per committed transaction! (A number of optimizations exist to reduce the space taken by each additional OT.)

WOW. I wonder how you can do those optimizations and not blowup in terabytes of memory used by OTs hahhahahaha

Of the many impressive things about GemStone, this is indeed one of the more striking ones!


ok....you have a new topic to speak in Smalltalks 2011! :)



> 5) How do you trace object usage? I mean the shared object cache can move objects from disk to primary memory and the other way around. How it selects which objects to move?  how do you know which objects has been used frequently ?

Every object is associated with a "page" and the Shared Page Cache keeps a record of the requests that Gems make for a page. Recently-referenced pages stay in the cache while not-recently referenced pages are eligible for being flushed from the cache.

Clear. Now, I wonder HOW an object is associated with a page. Because of GC and maybe because of object usage (when an object is used, maybe its instance varaibles are used too), it would make sense to group an object and its instVar in the same page. But at the same time, you have pages that you need to complete. So...is there a strategy in which page to choose for a certain object?

By default, a modified object is simply placed on the next available (otherwise empty) page. A GC will, likewise, simply move live objects from partially-filled pages to a new page in attempt to increase the packing. To address the issue of having a objects used together close together, GemStone provides "Clustering" in which you can specify a virtual "Cluster Bucket" and associate an object with a bucket. In this case, the object will be written to a page containing only objects also associated with that bucket. The GC process preserves this property by identifying a page with a bucket and when a page is partially filled, live objects will be moved to a new page with other objects from the same bucket. (Details are in the documentation and in the comments to the #'cluster' method and friends.)

Thanks James, this was interesting. I will also check in the documentation. What it is important here is that I guess that the decision is up to the user, to say that a certain object should be associated to a certain bucket.
 

Last question, totally offtopic, (sorry, I don't have a gemstone handly), for hashed collections, the #identityHash   is the Object ID ?

I'm not sure how it is implemented at the C level (in Smalltalk it just calls a primitive), but I'll ask around (or maybe someone else can jump in).


Don't worry. Just curious ;)

 
> Thanks in advance,
>
> Mariano
>
> ps: don't worry, I won't create my own gemstone hahahahha I just want to learn :)
> ahh if the info is private I can just undertand that or you can send me a private email.

I hope this helps,

yes, a lot!

Thanks James for all your answers.

Glad to help! GemStone is cool technology and I enjoy having it better understood.


Indeed, gemstone is not only an outstanding and impressive solution, but in addition it seems a company wich a really nice culture (I told this several times to Dale) and with a group o cool cool cool people like you, Dale, Martin, etcs....

Mariano
Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

James Foster-8

On Jan 11, 2011, at 12:29 PM, Mariano Martinez Peck wrote:

> Thanks James, this [discussion of clustering] was interesting. I will also check in the documentation. What it is important here is that I guess that the decision is up to the user, to say that a certain object should be associated to a certain bucket.

Yes, it is up to the programmer to determine which objects go together, and it depends completely on the object model. For example, consider a Customer object with instance variables for address fields such as city. If you had a unique String for each Customer, then you would probably want the city (a String) clustered with the Customer. On the other hand, if you had a canonical list of City objects (to avoid having 4 million copies of the string 'Buenos Aires', then you would _not_ want the City object clustered with the Customer. Instead, you would cluster all the City objects together so that if one was present the others would likely be available as well.

James

Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

Mariano Martinez Peck


On Tue, Jan 11, 2011 at 10:44 PM, James Foster <[hidden email]> wrote:

On Jan 11, 2011, at 12:29 PM, Mariano Martinez Peck wrote:

> Thanks James, this [discussion of clustering] was interesting. I will also check in the documentation. What it is important here is that I guess that the decision is up to the user, to say that a certain object should be associated to a certain bucket.

Yes, it is up to the programmer to determine which objects go together, and it depends completely on the object model. For example, consider a Customer object with instance variables for address fields such as city. If you had a unique String for each Customer, then you would probably want the city (a String) clustered with the Customer. On the other hand, if you had a canonical list of City objects (to avoid having 4 million copies of the string 'Buenos Aires', then you would _not_ want the City object clustered with the Customer. Instead, you would cluster all the City objects together so that if one was present the others would likely be available as well.


Thanks James. Very clear

Reply | Threaded
Open this post in threaded view
|

Re: Documentation about Gemstone internal info :)

sadgirl
This post was updated on .
Looking for any great gift to have an upcoming birthday Possibly your husband is gunning for any big promotion and must look his best in the company holiday celebration Maybe your granddaughter is actually graduating from university, and you wish to give her a present that she'll genuinely have forever Regardless of the occasion, a watch from Citizen is a good start to an ideal gift.IWC Replica Watches The best part about giving a wrist watch as a present is that wrist watches are timeless. They are not trendy; they're not the type of thing that will fade out in a couple of years. Instead, watches really are a gift that may withstand the test of your time. They're mementos, ideal for engraving to commemorate a particular occasion – the type of gifts which will be passed down through generation to era.