The method trailer format

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

The method trailer format

Igor Stasenko
Here is my proposal for changing the method trailer in order to be
able to encode various stuff to trailer.

Any corrections, suggestions are welcome.

The kind of compiled method trailer is determined by the last byte of
compiled method.

The format is following:
        "2rkkkkkkdd"

where 'k' bits stands for 'kind' , allowing totally 2^6 different
kinds of method trailer
and 'd' bits is data.

The following is description of defined trailer kinds:

k = 000000, dd = 00
method has no trailer, and total trailer size bytes is 1 (just for last byte)

k = 000001,
method has cleared trailer (it was set to something else, but then cleared)
dd+1  determines the number of bytes for size field, and size is a
total length of trailer bytes
So a total length of trailer is: 1 + (dd + 1) + size

k = 000010
the trailer contains a list of method temp names,  compressed using
qCompress: method.
dd+1  determines the number of bytes for size field, and size is a
number of bytes of compressed buffer.
So a total length of trailer is:  1 + (dd + 1) + size

k = 000011
the trailer contains a list of method temp names,  compressed using
GZIP compression method.
dd+1  determines the number of bytes for size field, and size is a
number of bytes of compressed buffer
So a total length of trailer is: 1 + (dd + 1) + size

k = 000100
the trailer indicates , that method source is determined by a class +
selector where it is installed to.
Trailer size = 1.

k = 000101
the trailer indicates , that method source is determined by a class +
some ByteString identifier.
dd+1  determines the number of bytes for size field, denoting the
length of ByteString identifier.
A total length of trailer is:  1 + (dd + 1) + size

k = 000110
the trailer contains an utf-8 encoded method source code, compressed
using qCompress: method.
dd+1  determines the number of bytes for size field, denoting the
length of compressed buffer
A total length of trailer is:  1 + (dd + 1) + size

k = 000111
the trailer contains an utf-8 encoded method source code, comressed using GZIP
dd+1  determines the number of bytes for size field, denoting the
length of compressed buffer
A total length of trailer is:  1 + (dd + 1) + size

k = 111111
the trailer is encoded source pointer. Total trailer size is 4-bytes
(this is backwards compatible with most of currently existing compiled
methods)

k = 111110
the trailer is encoded source pointer. Total trailer size is 5-bytes

k = 111101
the trailer is encoded source pointer. Total trailer size is 6-bytes

k = 111100
the trailer is encoded source pointer. Total trailer size is 7-bytes

k = 111011
the trailer is encoded source pointer. Total trailer size is 8-bytes



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Andreas.Raab
Igor Stasenko wrote:

> k = 111110
> the trailer is encoded source pointer. Total trailer size is 5-bytes
>
> k = 111101
> the trailer is encoded source pointer. Total trailer size is 6-bytes
>
> k = 111100
> the trailer is encoded source pointer. Total trailer size is 7-bytes
>
> k = 111011
> the trailer is encoded source pointer. Total trailer size is 8-bytes

That's a bit wasteful. You can use a variable length encoding at no
const for the usual range of source pointers (i.e., use the high-level
bit to mean that there's another byte to come; this gives you 28 bits
range in 4 bytes). Keep the other patterns reserved; you don't know what
you might want them for at a later point.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Re: The method trailer format

Igor Stasenko
2009/12/12 Andreas Raab <[hidden email]>:

> Igor Stasenko wrote:
>>
>> k = 111110
>> the trailer is encoded source pointer. Total trailer size is 5-bytes
>>
>> k = 111101
>> the trailer is encoded source pointer. Total trailer size is 6-bytes
>>
>> k = 111100
>> the trailer is encoded source pointer. Total trailer size is 7-bytes
>>
>> k = 111011
>> the trailer is encoded source pointer. Total trailer size is 8-bytes
>
> That's a bit wasteful. You can use a variable length encoding at no const
> for the usual range of source pointers (i.e., use the high-level bit to mean
> that there's another byte to come; this gives you 28 bits range in 4 bytes).
> Keep the other patterns reserved; you don't know what you might want them
> for at a later point.
>

Thanks for paying attention. Really, i could encode integer using
variable length bytes as sequence of:
(flag << 7) + remainder
where flag = 1 means that there is another byte following.

So, i will leave
k = 111110
the trailer is variable-length encoded source pointer.
dd bits is unused and must be 00

> Cheers,
>  - Andreas


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Re: The method trailer format

Igor Stasenko
oh.. and as for reserving the patterns.
I thought about providing a single one (where all 64 are almost used up),
to indicate an 'extented' kind in next byte, and so on.

So, i could put it in docs to ensure we won't have any problems in future:

k = 100000 (or some other value, not really matter)
- a byte which follows denotes an extended kind.
Reserved for future use, when, at some moment, there will be no free
patterns left.

I just doubt that we will meet the need in having so much encodings  :)

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Michael van der Gulik-2
In reply to this post by Igor Stasenko
On Sun, Dec 13, 2009 at 4:17 AM, Igor Stasenko <[hidden email]> wrote:

> Here is my proposal for changing the method trailer in order to be
> able to encode various stuff to trailer.
>
> Any corrections, suggestions are welcome.
>
> The kind of compiled method trailer is determined by the last byte of
> compiled method.
>
> The format is following:
>        "2rkkkkkkdd"

Er... yuck.

If I were doing this (which, cooincidently, I am at the moment), I
would completely separate source code management from CompiledMethod.

Trash the CompiledMethod trailer and ignore the temp names. Instead,
add a second dictionary to Class which stores the source code pointers
("sourceDictionary" or something). If you don't want source code for a
class, you can make it nil.

What I'm currently working on is a bit more radical. I'm completely
separating source code from it's compiled form. I have PackageSource,
NamespaceSource, ClassSource and MethodSource classes which store
source code (in the image, not using source files) and contain methods
for compiling code, managing code, etc. Then I have Package,
Namespace, Class and CompiledMethod classes which only contain what is
necessary to run the code and relink themselves into a new image.

Gulik.

--
http://gulik.pbwiki.com/

Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Nicolas Cellier
2009/12/15 Michael van der Gulik <[hidden email]>:

> On Sun, Dec 13, 2009 at 4:17 AM, Igor Stasenko <[hidden email]> wrote:
>> Here is my proposal for changing the method trailer in order to be
>> able to encode various stuff to trailer.
>>
>> Any corrections, suggestions are welcome.
>>
>> The kind of compiled method trailer is determined by the last byte of
>> compiled method.
>>
>> The format is following:
>>        "2rkkkkkkdd"
>
> Er... yuck.
>
> If I were doing this (which, cooincidently, I am at the moment), I
> would completely separate source code management from CompiledMethod.
>
> Trash the CompiledMethod trailer and ignore the temp names. Instead,
> add a second dictionary to Class which stores the source code pointers
> ("sourceDictionary" or something). If you don't want source code for a
> class, you can make it nil.
>
> What I'm currently working on is a bit more radical. I'm completely
> separating source code from it's compiled form. I have PackageSource,
> NamespaceSource, ClassSource and MethodSource classes which store
> source code (in the image, not using source files) and contain methods
> for compiling code, managing code, etc. Then I have Package,
> Namespace, Class and CompiledMethod classes which only contain what is
> necessary to run the code and relink themselves into a new image.
>
> Gulik.
>

It is not uncommon to have some CompiledMethod not installed in any
methodDictionary.
     CompiledMethod allInstances reject: [:e | e isInstalled]
Accessing associated source would become impossible in this scheme

Nicolas

> --
> http://gulik.pbwiki.com/
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Michael van der Gulik-2
On Tue, Dec 15, 2009 at 9:16 PM, Nicolas Cellier
<[hidden email]> wrote:

> 2009/12/15 Michael van der Gulik <[hidden email]>:
>> On Sun, Dec 13, 2009 at 4:17 AM, Igor Stasenko <[hidden email]> wrote:
>>> Here is my proposal for changing the method trailer in order to be
>>> able to encode various stuff to trailer.
>>>
>>> Any corrections, suggestions are welcome.
>>>
>>> The kind of compiled method trailer is determined by the last byte of
>>> compiled method.
>>>
>>> The format is following:
>>>        "2rkkkkkkdd"
>>
>> Er... yuck.
>>
>> If I were doing this (which, cooincidently, I am at the moment), I
>> would completely separate source code management from CompiledMethod.
>>
>> Trash the CompiledMethod trailer and ignore the temp names. Instead,
>> add a second dictionary to Class which stores the source code pointers
>> ("sourceDictionary" or something). If you don't want source code for a
>> class, you can make it nil.
>>
>> What I'm currently working on is a bit more radical. I'm completely
>> separating source code from it's compiled form. I have PackageSource,
>> NamespaceSource, ClassSource and MethodSource classes which store
>> source code (in the image, not using source files) and contain methods
>> for compiling code, managing code, etc. Then I have Package,
>> Namespace, Class and CompiledMethod classes which only contain what is
>> necessary to run the code and relink themselves into a new image.
>>
>> Gulik.
>>
>
> It is not uncommon to have some CompiledMethod not installed in any
> methodDictionary.
>     CompiledMethod allInstances reject: [:e | e isInstalled]
> Accessing associated source would become impossible in this scheme

Where would these exist? How would they be used?

Gulik.



--
http://gulik.pbwiki.com/

Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Eliot Miranda-2


On Tue, Dec 15, 2009 at 12:55 PM, Michael van der Gulik <[hidden email]> wrote:
On Tue, Dec 15, 2009 at 9:16 PM, Nicolas Cellier
<[hidden email]> wrote:
> 2009/12/15 Michael van der Gulik <[hidden email]>:
>> On Sun, Dec 13, 2009 at 4:17 AM, Igor Stasenko <[hidden email]> wrote:
>>> Here is my proposal for changing the method trailer in order to be
>>> able to encode various stuff to trailer.
>>>
>>> Any corrections, suggestions are welcome.
>>>
>>> The kind of compiled method trailer is determined by the last byte of
>>> compiled method.
>>>
>>> The format is following:
>>>        "2rkkkkkkdd"
>>
>> Er... yuck.
>>
>> If I were doing this (which, cooincidently, I am at the moment), I
>> would completely separate source code management from CompiledMethod.
>>
>> Trash the CompiledMethod trailer and ignore the temp names. Instead,
>> add a second dictionary to Class which stores the source code pointers
>> ("sourceDictionary" or something). If you don't want source code for a
>> class, you can make it nil.
>>
>> What I'm currently working on is a bit more radical. I'm completely
>> separating source code from it's compiled form. I have PackageSource,
>> NamespaceSource, ClassSource and MethodSource classes which store
>> source code (in the image, not using source files) and contain methods
>> for compiling code, managing code, etc. Then I have Package,
>> Namespace, Class and CompiledMethod classes which only contain what is
>> necessary to run the code and relink themselves into a new image.
>>
>> Gulik.
>>
>
> It is not uncommon to have some CompiledMethod not installed in any
> methodDictionary.
>     CompiledMethod allInstances reject: [:e | e isInstalled]
> Accessing associated source would become impossible in this scheme

Where would these exist? How would they be used?

1.  When one redefines a method in the browser existing activations of the old method are still potentially visible in the debugger.  If you try e.g.

Object methods for recompilation
haltInRecompile
    (self class whichClassIncludesSelector: thisContext messageSelector) recompile: thisContext messageSelector.
    self halt

the halt will occur in the old version of haltInRecompile.  The Debugger may show the method as e.g. Object>>unboundMethod.

2. if a package system supports overrides it is conceivable that the overridden versions of methods are kept around in the event that unloading a package will remove the overrides and need to reinstall the overridden ones. At least that's what we did in VisualWorks.

3.  if the system supports breakpoints in methods it is conceivable that an unbreak-pointed version of the method is squirrelled away while the break-pointed version (which may contain a hidden send of halt or breakpoint or whatever) is installed in its place.  At least that's what Terry Raymond's Professional Debug package does for VisualWorks.

There may be other examples (John Brandt's MethodWrappers?).

All of the above examples are undermined by the fact that condenseSources/condenseChanges et al works by enumerating over the class hierarchy, looking only at installed methods.  So condense the changes and you'll screw up the source for these hidden versions.  IMO the right approach is to provide some registry for hidden methods, or at least a visitor for methods in various hiding places, and have the source condensing code enumerate over hidden methods in addition to the installed methods.

IMO, it is also important to search for senders in more than the installed set.  It is useful to be able to search for them in hidden methods, but more interestingly to be able to search for senders in class definitions, and in package preamble and postamble scripts (and in VisualWorks, where namespace global variables can have initializers, in global variable initializers).  Abstracting away code searching from the class hierarchy is IMO a good thing.  The class hierarchy is just one place to look, but others (such as the above) will crop up.  Further, abstracting away what list browsers operate on is a good idea.  Instead of just MethodReference one can also have ClassReference (which displays the definition of a class) PreambleReference (which displays a preamble, the "class" being the name of the package.  So now one can search and browse for senders in odd places.  Of course editing the definitions once you get there may not be possible, but a helpful "open Monticello and edit the preamble for foo package there-in" is perfectly acceptable (IMO).





Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Colin Putney

On 2009-12-15, at 1:22 PM, Eliot Miranda wrote:

>
> On Tue, Dec 15, 2009 at 12:55 PM, Michael van der Gulik <[hidden email]> wrote:
>>
>> It is not uncommon to have some CompiledMethod not installed in any
>> methodDictionary.
>>    CompiledMethod allInstances reject: [:e | e isInstalled]
>> Accessing associated source would become impossible in this scheme
>
> Where would these exist? How would they be used?
>
> 1.  When one redefines a method in the browser existing activations of the old method are still potentially visible in the debugger.  If you try e.g.
>
> Object methods for recompilation
> haltInRecompile
>    (self class whichClassIncludesSelector: thisContext messageSelector) recompile: thisContext messageSelector.
>    self halt
>
> the halt will occur in the old version of haltInRecompile.  The Debugger may show the method as e.g. Object>>unboundMethod.
>
> 2. if a package system supports overrides it is conceivable that the overridden versions of methods are kept around in the event that unloading a package will remove the overrides and need to reinstall the overridden ones. At least that's what we did in VisualWorks.
>
> 3.  if the system supports breakpoints in methods it is conceivable that an unbreak-pointed version of the method is squirrelled away while the break-pointed version (which may contain a hidden send of halt or breakpoint or whatever) is installed in its place.  At least that's what Terry Raymond's Professional Debug package does for VisualWorks.
>
> There may be other examples (John Brandt's MethodWrappers?).

SystemChangeNotifier. When a new version of a method is installed, SCN notifies all it's clients, supplying both the old method and the new method. Only the new method is actually installed in the class' method dictionary.

Colin
Reply | Threaded
Open this post in threaded view
|

Re: The method trailer format

Igor Stasenko
In reply to this post by Eliot Miranda-2
2009/12/15 Eliot Miranda <[hidden email]>:

>
>
> On Tue, Dec 15, 2009 at 12:55 PM, Michael van der Gulik <[hidden email]>
> wrote:
>>
>> On Tue, Dec 15, 2009 at 9:16 PM, Nicolas Cellier
>> <[hidden email]> wrote:
>> > 2009/12/15 Michael van der Gulik <[hidden email]>:
>> >> On Sun, Dec 13, 2009 at 4:17 AM, Igor Stasenko <[hidden email]>
>> >> wrote:
>> >>> Here is my proposal for changing the method trailer in order to be
>> >>> able to encode various stuff to trailer.
>> >>>
>> >>> Any corrections, suggestions are welcome.
>> >>>
>> >>> The kind of compiled method trailer is determined by the last byte of
>> >>> compiled method.
>> >>>
>> >>> The format is following:
>> >>>        "2rkkkkkkdd"
>> >>
>> >> Er... yuck.
>> >>
>> >> If I were doing this (which, cooincidently, I am at the moment), I
>> >> would completely separate source code management from CompiledMethod.
>> >>
>> >> Trash the CompiledMethod trailer and ignore the temp names. Instead,
>> >> add a second dictionary to Class which stores the source code pointers
>> >> ("sourceDictionary" or something). If you don't want source code for a
>> >> class, you can make it nil.
>> >>
>> >> What I'm currently working on is a bit more radical. I'm completely
>> >> separating source code from it's compiled form. I have PackageSource,
>> >> NamespaceSource, ClassSource and MethodSource classes which store
>> >> source code (in the image, not using source files) and contain methods
>> >> for compiling code, managing code, etc. Then I have Package,
>> >> Namespace, Class and CompiledMethod classes which only contain what is
>> >> necessary to run the code and relink themselves into a new image.
>> >>
>> >> Gulik.
>> >>
>> >
>> > It is not uncommon to have some CompiledMethod not installed in any
>> > methodDictionary.
>> >     CompiledMethod allInstances reject: [:e | e isInstalled]
>> > Accessing associated source would become impossible in this scheme
>>
>> Where would these exist? How would they be used?
>
> 1.  When one redefines a method in the browser existing activations of the
> old method are still potentially visible in the debugger.  If you try e.g.
> Object methods for recompilation
> haltInRecompile
>     (self class whichClassIncludesSelector: thisContext messageSelector)
> recompile: thisContext messageSelector.
>     self halt
> the halt will occur in the old version of haltInRecompile.  The Debugger may
> show the method as e.g. Object>>unboundMethod.
> 2. if a package system supports overrides it is conceivable that the
> overridden versions of methods are kept around in the event that unloading a
> package will remove the overrides and need to reinstall the overridden ones.
> At least that's what we did in VisualWorks.
> 3.  if the system supports breakpoints in methods it is conceivable that an
> unbreak-pointed version of the method is squirrelled away while the
> break-pointed version (which may contain a hidden send of halt or breakpoint
> or whatever) is installed in its place.  At least that's what Terry
> Raymond's Professional Debug package does for VisualWorks.
> There may be other examples (John Brandt's MethodWrappers?).
> All of the above examples are undermined by the fact that
> condenseSources/condenseChanges et al works by enumerating over the class
> hierarchy, looking only at installed methods.  So condense the changes and
> you'll screw up the source for these hidden versions.  IMO the right
> approach is to provide some registry for hidden methods, or at least a
> visitor for methods in various hiding places, and have the source condensing
> code enumerate over hidden methods in addition to the installed methods.
> IMO, it is also important to search for senders in more than the installed
> set.  It is useful to be able to search for them in hidden methods, but more
> interestingly to be able to search for senders in class definitions, and in
> package preamble and postamble scripts (and in VisualWorks, where namespace
> global variables can have initializers, in global variable initializers).
>  Abstracting away code searching from the class hierarchy is IMO a good
> thing.  The class hierarchy is just one place to look, but others (such as
> the above) will crop up.  Further, abstracting away what list browsers
> operate on is a good idea.  Instead of just MethodReference one can also
> have ClassReference (which displays the definition of a class)
> PreambleReference (which displays a preamble, the "class" being the name of
> the package.  So now one can search and browse for senders in odd places.
>  Of course editing the definitions once you get there may not be possible,
> but a helpful "open Monticello and edit the preamble for foo package
> there-in" is perfectly acceptable (IMO).
>

+ you missed one, huge case - metaprogramming :)

>>
>> Gulik.
>>
>>
>>
>> --
>> http://gulik.pbwiki.com/
>



--
Best regards,
Igor Stasenko AKA sig.