Hello, i thought about object formats and thought how to make them more flexible and came to following idea. Today's squeak and cog vms using a 4 bits in object header which denoting an object format: So, there is 16 possible object formats supported by VM: formatOf: oop " 0 no fields 1 fixed fields only (all containing pointers) 2 indexable fields only (all containing pointers) 3 both fixed and indexable fields (all containing pointers) 4 both fixed and indexable weak fields (all containing pointers). 5 unused 6 indexable word fields only (no pointers) 7 indexable long (64-bit) fields (only in 64-bit images) 8-11 indexable byte fields only (no pointers) (low 2 bits are low 2 bits of size) 12-15 compiled methods: # of literal oops specified in method header, followed by indexable bytes (same interpretation of low 2 bits as above) " And virtual machine (mostly an ObjectMemory class) knows difference between those numbers and how to deal with objects depending what format they having. The bad side of it, that most semantics around formats is hardcoded and spreaded across various methods, like: lastPointerOf: oop "Return the byte offset of the last pointer field of the given object. Works with CompiledMethods, as well as ordinary objects. Can be used even when the type bits are not correct." | fmt sz methodHeader header contextSize | <inline: true> <asmLabel: false> header := self baseHeader: oop. fmt := self formatOfHeader: header. fmt <= 4 ifTrue: [(fmt = 3 and: [self isContextHeader: header]) ifTrue: ["contexts end at the stack pointer" contextSize := self fetchStackPointerOf: oop. ^ CtxtTempFrameStart + contextSize * BytesPerWord]. sz := self sizeBitsOfSafe: oop. ^ sz - BaseHeaderSize "all pointers"]. fmt < 12 ifTrue: [^ 0]. "no pointers" "CompiledMethod: contains both pointers and bytes:" methodHeader := self longAt: oop + BaseHeaderSize. ^ (methodHeader >> 10 bitAnd: 255) * BytesPerWord + BaseHeaderSize See how smelly this code is? What i was thinking about is , that what if number which denotes an object format instead of just being a number, point to a table of functions, which implement a behavior, specific for that format. Then the above method could be turned into something as short and nice as following: lastPointerOf: oop <inline: true> <asmLabel: false> | table | table := (self formatTableOf: oop). table performSomeOperation: oop param: param which , when translated to C, will look like: sqInt formatNumber; formatNumber = fetchFormatOf(oop); return ObjectFormats[formatNumber].performSomeOperation(oop, param). So, the table is simple list of function pointers (which very closely resembling a method dictionary in smalltalk), to provide an implementation of certain behavior for given object format: struct ObjectFormatFunctions { int (*)(sqInt oop); // firstFixedSlot int (*)(sqInt oop); // numFixedSlots: void* (*)(sqInt oop); // lastPointerOf: .. firstPointerOf: .. numIndexableSlots: .. indexableSlotSize: ... etc ) so, then instead of hardcoding and spreading the smelly code around the VM, we can write something like: (self formatTableOf: oop) performSomeOperation: oop params: params and code no longer needs to know if format = 0 or 4 or whatever, but just dispatching to function using formats table. Then all methods which dealing with object format could be placed in nicely organized class structure and can be translated to C in form of tables. The downside of it, of course, that additional level of indirection will cause a serious slowdown to interpreter. Because C compiler don't sees directly the implementation of operation for certain format and must call a function, pointer to which held in a corresponding format table. Moreover, you cannot inline those functions, since you don't know which one will be used at concrete place because of dynamic dispatch. How we can deal with it? Well, we still can hardcode the stuff (granting that we are not modifying object format table(s) at runtime), so we can write: fmt = 5 ifTrue: [ ^ self inlineFunctionOfFormat: 5 oop: oop params: params ] this requires some trickery in C code generator, but it is still doable. So, in overall i think that drawbacks could be cleverly mitigated if not completely avoided. So, lets talk what opportinities it could give us: VM could support a dynamically changeable object format(s), if we extend the number of possible formats to more than 16. So, for first 16 numbers things will be hardcoded as today, but for higher numbers, VM will always dispatch using format table. The cost of indirection can be amortized quite well especially in presence of JIT (it can simply inline the code for corresponding semantics to access various bits in object when compiling the methods of class with such format). Then we could dynamically change (or create new formats) at run time. But currently we have only 4 bits for object format. Where we can get some more? A solution is extremely simple! We have a 5 bits in object header for compact class index. If we merge them with object format bits 4+5 = 9 -- 512 possible object formats! The trick is that we still can pretty easily identify special objects by reserving a concrete format number for them (hey we have plenty of numbers) So, then format numbers ranged from 0..15 will identify hardcoded formats , known by VM from beginning. And formats from 16..48 is for special objects. and rest 512-48 is dynamically defined. Of course, we could change the object header to look differently for new VM, then there may be more (or less) than 9 bits for object format. Not really matters, because concept remains the same: VM could know that the only field which it can safely access by itself is an object header. For accessing other fields (if any) it should use functions provided in object format table. And the last thing: why i titled topic format as a contract? Because the next logical step of it is to imprint formats in a kind of form of manifest, and store them in image file. Then VM, when booting an image will read manifest and translate it into machine code before first attempt to access any object(s) in object memory. So, what you thinking about it? It could be an overkill , but i like the idea that with such approach we could change the object format(s) dynamically at run time, without the need of changing VM. It also structuring the code in VM quite nicely which will serve clarity. It also opens a wide field for experiments with different object formats. -- Best regards, Igor Stasenko AKA sig. |
hi igor, i have in the past pondered this question many times, and have not yet come to satisfying conclusions. the big question (for me) is where exactly to draw the line. what are the assumptions that are fixed, and what not. i mean, e.g. what would be the interface (contract) that the garbage collector would use? is there a contract to the object memory itself (e.g. allocate new memory, make an object fixed in memory, etc)? can we extend not only object representations, but also compose different object memory strategies (like, object table vs direct pointers)? just my thoughts, thanks wolfgang |
On 29 July 2011 11:41, Wolfgang Eder <[hidden email]> wrote: > > hi igor, > i have in the past pondered this question many times, > and have not yet come to satisfying conclusions. > the big question (for me) is where exactly to draw the > line. what are the assumptions that are fixed, > and what not. i mean, e.g. > > what would be the interface (contract) that the garbage > collector would use? > > is there a contract to the object memory itself > (e.g. allocate new memory, make an object fixed in memory, > etc)? > > can we extend not only object representations, but > also compose different object memory strategies > (like, object table vs direct pointers)? > There's a plenty of various options. What is hard is to take all tradeoffs into account. One of extreme approaches which i thought about is to turn an object header into pointer to a function, of a form (selector, arguments ...), which then looks like a low-level message send. Then anything you can do to object will be available only through call to this function with corresponding arguments, including accessing object's state, sending a message, retrieving its class , GC , etc. Then there will be very few fixed points. The problem of course how to apply optimizations in a presence of such extremely late bound interface. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
On Fri, Jul 29, 2011 at 2:47 AM, Igor Stasenko <[hidden email]> wrote:
That's polymorphism in C :) Igor: exactly that is what OpenDBX does. The thing is like this: OpenDBX has a "core". Such core is abstract, general and does not depend at all in the backend. Then, there is a openDBX-mySQL.c, openDBX-postgresql.c, etc.....each of those files are the glue/mapping between OpenDBX API and database client library. So....each of those files, implements a number of functions (there are arround 20). Imagine: In openDBX-mysql.c: opendbx_init -> opendbx_init_mysql opendbx_connect -> opendbx_connect_mysql .... In openDBX-postgreSQL.c opendbx_init -> opendbx_init_postgresl opendbx_connect -> opendbx_connect_postgresql .... The OpenDBX core needs to invoke some of those functions (such us opendbx_init, opendbx_connect, etc) defined in the API between OpenDBX and database client library. So....but the core works always the same way: there is a table in memory that maps function names to real functions. So OpenDBX just searches the key in the table and invokes the functions that is in there, no matter its name :)
-- Mariano http://marianopeck.wordpress.com |
On 29 July 2011 22:59, Mariano Martinez Peck <[hidden email]> wrote: > > > That's polymorphism in C :) > Yes. The main question if its possible to introduce it without significant speed loss. While in OpenDBX you may do whatever you want (kind of ;) playing with object format will have huge impact on everything. Also, there's a question if its really worth doing like that. An existing 16 object formats is more or less covering most of our needs, so what other (new) object formats may serve for. > Igor: exactly that is what OpenDBX does. [snip] -- Best regards, Igor Stasenko AKA sig. |
> The main question if its possible to introduce it without significant > speed loss. While in OpenDBX you may do whatever you want (kind of ;) > playing with object format will have huge impact on everything. > > Also, there's a question if its really worth doing like that. > An existing 16 object formats is more or less covering most of our > needs, so what other (new) object formats may serve for. And of course, I want to use format 5 for something else. :) -C -- Craig Latta www.netjam.org/resume +31 6 2757 7177 + 1 415 287 3547 |
Free forum by Nabble | Edit this page |