A reengineering the CompiledMethod trailers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

A reengineering the CompiledMethod trailers

Igor Stasenko
Hello guys,

please take a look at changeset i just uploaded:

http://bugs.squeak.org/view.php?id=7428

There is an initial implementation of CompiledMethodTrailer with some
test coverage.

My motivation behind this is simple:

1. no more poking with raw bytes, like this:

holdsTempNames
        "Are tempNames stored in trailer bytes"

        | flagByte |
        flagByte := self last.
        (flagByte = 0 or: [flagByte = 251 "some source-less methods have flag
= 251, rest = 0"
                        and: [(1 to: 3) allSatisfy: [:i | (self at: self size - i) = 0]]])
                ifTrue: [^ false].  "No source pointer & no temp names"
        flagByte < 252 ifTrue: [^ true].  "temp names compressed"
        ^ false "Source pointer"

or this:

endPC
        "Answer the index of the last bytecode."
        | size flagByte |
        "Can't create a zero-sized CompiledMethod so no need to use last for
the errorEmptyCollection check.
         We can reuse size."
        size := self size.
        flagByte := self at: size.
        flagByte = 0 ifTrue:
                ["If last byte = 0, may be either 0, 0, 0, 0 or just 0"
                1 to: 4 do: [:i | (self at: size - i) = 0 ifFalse: [^size - i]]].
        flagByte < 252 ifTrue:
                ["Magic sources (temp names encoded in last few bytes)"
                ^flagByte <= 127
                        ifTrue: [size - flagByte - 1]
                        ifFalse: [size - (flagByte - 128 * 128) - (self at: size - 1) - 2]].
        "Normal 4-byte source pointer"
        ^size - 4

2. method's trailer can be used for storing a variety of stuff, not
just source pointer or temps names. And it will be easy to add new
kinds of trailers.
    A most useful, as to me, and which i've added initially is:
  - being able to embed the source code in trailer, so a compiled
method and its source lives together in image.
     Some of you have expressed this idea before, so here it is.
  - being able to retrieve the method's source using other way, than
through SourceFiles or embedded in trailer.
     I added two kinds of trailers for that:
     - get method's source by class+selector where it installed to.
     - get method's source by class+some string identifier

  - yours. Please tell me, what kind you want to have in addition.

3.  a source pointer could surpass the 32Mb limit. A CompiledMethod
already can encode a source pointer value of any size.
  Its now only a matter of fixing the source pointer encoding logic to
enable having .source and .changes files above 32Mb.

Please, review my code and send me your comments and wishes.
At next stage i will create a changeset which will put this stuff in
use, as well as cleanup lots of places in CompiledMethod and
compiler/decompiler.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Fwd: A reengineering the CompiledMethod trailers

Karl Ramberg


---------- Forwarded message ----------
From: Igor Stasenko <[hidden email]>
Date: Sun, Dec 13, 2009 at 10:14 PM
Subject: [squeak-dev] A reengineering the CompiledMethod trailers


 - yours. Please tell me, what kind you want to have in addition.

just variable names so decompiled code without full source get readable
karl


Reply | Threaded
Open this post in threaded view
|

Re: A reengineering the CompiledMethod trailers

Ken G. Brown
In reply to this post by Igor Stasenko
At 12:14 AM +0200 12/14/09, Igor Stasenko apparently wrote:

>Hello guys,
>
>please take a look at changeset i just uploaded:
>
>http://bugs.squeak.org/view.php?id=7428
>
>There is an initial implementation of CompiledMethodTrailer with some
>test coverage.
>
>My motivation behind this is simple:
>
>1. no more poking with raw bytes, like this:
>
>holdsTempNames
> "Are tempNames stored in trailer bytes"
>
> | flagByte |
> flagByte := self last.
> (flagByte = 0 or: [flagByte = 251 "some source-less methods have flag
>= 251, rest = 0"
> and: [(1 to: 3) allSatisfy: [:i | (self at: self size - i) = 0]]])
> ifTrue: [^ false].  "No source pointer & no temp names"
> flagByte < 252 ifTrue: [^ true].  "temp names compressed"
> ^ false "Source pointer"
>
>or this:
>
>endPC
> "Answer the index of the last bytecode."
> | size flagByte |
> "Can't create a zero-sized CompiledMethod so no need to use last for
>the errorEmptyCollection check.
> We can reuse size."
> size := self size.
> flagByte := self at: size.
> flagByte = 0 ifTrue:
> ["If last byte = 0, may be either 0, 0, 0, 0 or just 0"
> 1 to: 4 do: [:i | (self at: size - i) = 0 ifFalse: [^size - i]]].
> flagByte < 252 ifTrue:
> ["Magic sources (temp names encoded in last few bytes)"
> ^flagByte <= 127
> ifTrue: [size - flagByte - 1]
> ifFalse: [size - (flagByte - 128 * 128) - (self at: size - 1) - 2]].
> "Normal 4-byte source pointer"
> ^size - 4
>
>2. method's trailer can be used for storing a variety of stuff, not
>just source pointer or temps names. And it will be easy to add new
>kinds of trailers.
>    A most useful, as to me, and which i've added initially is:
>  - being able to embed the source code in trailer, so a compiled
>method and its source lives together in image.
>     Some of you have expressed this idea before, so here it is.
>  - being able to retrieve the method's source using other way, than
>through SourceFiles or embedded in trailer.
>     I added two kinds of trailers for that:
>     - get method's source by class+selector where it installed to.
>     - get method's source by class+some string identifier
>
>  - yours. Please tell me, what kind you want to have in addition.
>
>3.  a source pointer could surpass the 32Mb limit. A CompiledMethod
>already can encode a source pointer value of any size.
>  Its now only a matter of fixing the source pointer encoding logic to
>enable having .source and .changes files above 32Mb.
>
>Please, review my code and send me your comments and wishes.
>At next stage i will create a changeset which will put this stuff in
>use, as well as cleanup lots of places in CompiledMethod and
>compiler/decompiler.
>
>--
>Best regards,
>Igor Stasenko AKA sig.

Not sure how this relates exactly but there has been quite a bit of work in the past re: Virtual Image 4.0 format.
See:
New Compiled Method Format
<http://wiki.squeak.org/squeak/750>

and:
VI4 project
<http://wiki.squeak.org/squeak/2119>

Here is an old related message from Tim Rowledge on Oct 20, 2005:
At 1:23 PM -0700 10/20/05, tim Rowledge apparently wrote:

>Delivered-To: [hidden email]
>From: tim Rowledge <[hidden email]>
>Date: Thu, 20 Oct 2005 13:23:20 -0700
>To: The general-purpose Squeak developers list
> <[hidden email]>
>Subject: Re: new image format (was: Actually doing something!)
><>
>
>On 20-Oct-05, at 1:01 PM, Jecel Assumpcao Jr wrote:
>
>>Tim Rowledge wrote on Wed, 19 Oct 2005 15:41:31 -0700
>>
>>>Squeak is about 10 years old. Time to move from toddler to pre-
>>>schooler at least. Yes, it will mean a break in being able to run old
>>>images on new VMs. So what; old VMs will still be there. The
>>>sourcecode is still on SVN.
>>>
>>
>>I guess this is a good time to ask about Anthony Hannan's "image version
>>4 format" work (http://minnow.cc.gatech.edu/squeak/VI4). Bryce Kampjes
>>mentioned incompatibilities as the reason why this is no longer being
>>considered for inclusion in Squeak, but reading all the documentation
>>and scanning quickly through the sources the only thing I noticed was
>>that the needed debugger changes were not finished. Am I missing
>>something?
>
>I don't know. Some of his work got included anyway (some prims to support closures and I think
>continuations) other stuff is superseded and probably a lot is suffering from bitrot since I don't recall hearing from him in ages. Are you there AJH?
>
>I'd like to see the closures being properly supported and made standard, whether through his code or other code. It really is a bit bad to still be without them. The page http://minnow.cc.gatech.edu/squeak/3717 is terribly out of date as well and Dan seems to have drifted off of this path. http://minnow.cc.gatech.edu/squeak/3718 is at least a year behind the times.
>
>
>tim
>--
>tim Rowledge; [hidden email]; http://www.rowledge.org/tim

Ken G. Brown



Reply | Threaded
Open this post in threaded view
|

Re: A reengineering the CompiledMethod trailers

Igor Stasenko
In reply to this post by Karl Ramberg
2009/12/14 karl ramberg <[hidden email]>:

>
>
> ---------- Forwarded message ----------
> From: Igor Stasenko <[hidden email]>
> Date: Sun, Dec 13, 2009 at 10:14 PM
> Subject: [squeak-dev] A reengineering the CompiledMethod trailers
>
>
>  - yours. Please tell me, what kind you want to have in addition.
>
its already there, sorry didn't mentioned that explicitly.
It encodes the temp names string using either qCompress, or Zip
compression by checking which one is best.
Same for source code.

> just variable names so decompiled code without full source get readable
> karl
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: A reengineering the CompiledMethod trailers

Igor Stasenko
In reply to this post by Igor Stasenko
2009/12/14 Ken G. Brown <[hidden email]>:

>
> Not sure how this relates exactly but there has been quite a bit of work in the past re: Virtual Image 4.0 format.
> See:
> New Compiled Method Format
> <http://wiki.squeak.org/squeak/750>
>
> and:
> VI4 project
> <http://wiki.squeak.org/squeak/2119>
>
Oh, the changes in that project is much wider and requiring changing
the image format and VM.
My changes is much less significant and don't require rebuilding a whole image.
It is gradual change which doesn't changes the way how VM deals with
compiled methods,
but rather will change the way how tools (compiler/decompiler etc)
handing the compiled methods metadata encoding in trailing bytes.
And it backwards compatible with 99% of existing compiled methods.
Unless you start using new trailer kinds, it will remain the same.


> Here is an old related message from Tim Rowledge on Oct 20, 2005:
> At 1:23 PM -0700 10/20/05, tim Rowledge apparently wrote:
>>Delivered-To: [hidden email]
>>From: tim Rowledge <[hidden email]>
>>Date: Thu, 20 Oct 2005 13:23:20 -0700
>>To: The general-purpose Squeak developers list
>>       <[hidden email]>
>>Subject: Re: new image format (was: Actually doing something!)
>><>
>>
>>On 20-Oct-05, at 1:01 PM, Jecel Assumpcao Jr wrote:
>>
>>>Tim Rowledge wrote on Wed, 19 Oct 2005 15:41:31 -0700
>>>
>>>>Squeak is about 10 years old. Time to move from toddler to pre-
>>>>schooler at least. Yes, it will mean a break in being able to run old
>>>>images on new VMs. So what; old VMs will still be there. The
>>>>sourcecode is still on SVN.
>>>>
>>>
>>>I guess this is a good time to ask about Anthony Hannan's "image version
>>>4 format" work (http://minnow.cc.gatech.edu/squeak/VI4). Bryce Kampjes
>>>mentioned incompatibilities as the reason why this is no longer being
>>>considered for inclusion in Squeak, but reading all the documentation
>>>and scanning quickly through the sources the only thing I noticed was
>>>that the needed debugger changes were not finished. Am I missing
>>>something?
>>
>>I don't know. Some of his work got included anyway (some prims to support closures and I think
>>continuations) other stuff is superseded and probably a lot is suffering from bitrot since I don't recall hearing from him in ages. Are you there AJH?
>>
>>I'd like to see the closures being properly supported and made standard, whether through his code or other code. It really is a bit bad to still be without them. The page http://minnow.cc.gatech.edu/squeak/3717 is terribly out of date as well and Dan seems to have drifted off of this path. http://minnow.cc.gatech.edu/squeak/3718 is at least a year behind the times.
>>
>>
>>tim
>>--
>>tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>
> Ken G. Brown
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: A reengineering the CompiledMethod trailers

Andreas.Raab
In reply to this post by Ken G. Brown
Ken G. Brown wrote:
> At 12:14 AM +0200 12/14/09, Igor Stasenko apparently wrote:
> Not sure how this relates exactly but there has been quite a bit of work in the past re: Virtual Image 4.0 format.
> See:
> New Compiled Method Format
> <http://wiki.squeak.org/squeak/750>
>
> and:
> VI4 project
> <http://wiki.squeak.org/squeak/2119>

It doesn't relate. There may be some inspiration for how to deal with
source pointers but to be honest, I find Igors approach vastly more
useful since it doesn't immediately add a megabyte or more to the image
size (which would happen if you go to an explicit source pointer
representation) but rather makes the encoding explicitly accessible.

Go Igor!

Cheers,
   - Andreas

PS. The only person who I'd like to explicitly comment (even if only to
say "that's fine") is Eliot, since he might have some additional
thoughts about some of this stuff which relate to Cog.

Reply | Threaded
Open this post in threaded view
|

Re: A reengineering the CompiledMethod trailers

Karl Ramberg
In reply to this post by Igor Stasenko


On Mon, Dec 14, 2009 at 12:36 AM, Igor Stasenko <[hidden email]> wrote:
2009/12/14 karl ramberg <[hidden email]>:
>
>
> ---------- Forwarded message ----------
> From: Igor Stasenko <[hidden email]>
> Date: Sun, Dec 13, 2009 at 10:14 PM
> Subject: [squeak-dev] A reengineering the CompiledMethod trailers
>
>
>  - yours. Please tell me, what kind you want to have in addition.
>
its already there, sorry didn't mentioned that explicitly.
It encodes the temp names string using either qCompress, or Zip
compression by checking which one is best.
Same for source code.
 
Ah, thats great.
No more t1, t2 etc :-)
 
Karl
 
 
 

> just variable names so decompiled code without full source get readable
> karl
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: Re: A reengineering the CompiledMethod trailers

David T. Lewis
In reply to this post by Andreas.Raab
On Sun, Dec 13, 2009 at 05:01:17PM -0800, Andreas Raab wrote:

> Ken G. Brown wrote:
> >At 12:14 AM +0200 12/14/09, Igor Stasenko apparently wrote:
> >Not sure how this relates exactly but there has been quite a bit of work
> >in the past re: Virtual Image 4.0 format.
> >See:
> >New Compiled Method Format
> ><http://wiki.squeak.org/squeak/750>
> >
> >and:
> >VI4 project
> ><http://wiki.squeak.org/squeak/2119>
>
> It doesn't relate. There may be some inspiration for how to deal with
> source pointers but to be honest, I find Igors approach vastly more
> useful since it doesn't immediately add a megabyte or more to the image
> size (which would happen if you go to an explicit source pointer
> representation) but rather makes the encoding explicitly accessible.
>
> Go Igor!

+1

FYI, and just to make sure it does not get overlooked, there is also
this from Markus Denker:
 
  Mantis 4369: [Patch] Both source files now with 512MB capacity
  http://bugs.squeak.org/view.php?id=4369

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Re: A reengineering the CompiledMethod trailers

Igor Stasenko
2009/12/15 David T. Lewis <[hidden email]>:

> On Sun, Dec 13, 2009 at 05:01:17PM -0800, Andreas Raab wrote:
>> Ken G. Brown wrote:
>> >At 12:14 AM +0200 12/14/09, Igor Stasenko apparently wrote:
>> >Not sure how this relates exactly but there has been quite a bit of work
>> >in the past re: Virtual Image 4.0 format.
>> >See:
>> >New Compiled Method Format
>> ><http://wiki.squeak.org/squeak/750>
>> >
>> >and:
>> >VI4 project
>> ><http://wiki.squeak.org/squeak/2119>
>>
>> It doesn't relate. There may be some inspiration for how to deal with
>> source pointers but to be honest, I find Igors approach vastly more
>> useful since it doesn't immediately add a megabyte or more to the image
>> size (which would happen if you go to an explicit source pointer
>> representation) but rather makes the encoding explicitly accessible.
>>
>> Go Igor!
>
> +1
>
> FYI, and just to make sure it does not get overlooked, there is also
> this from Markus Denker:
>
>  Mantis 4369: [Patch] Both source files now with 512MB capacity
>  http://bugs.squeak.org/view.php?id=4369
>

it adds a sourcePointer ivar to MethodProperties.
But i wonder, why its still limits the max pointer value, since once
you got ivar for it, you can store virtually anything there..
My changes is less intrusive - it won't change the space required to
hold the sourcePointer, nor the place where it held (4 bytes in
trailer) unless you allow to have large .source/.changes files and
therefore may need more than 4 bytes to encode the sourcepointer for
some of the methods.

> Dave
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Re: A reengineering the CompiledMethod trailers

Eliot Miranda-2
In reply to this post by Andreas.Raab


On Sun, Dec 13, 2009 at 5:01 PM, Andreas Raab <[hidden email]> wrote:
Ken G. Brown wrote:
At 12:14 AM +0200 12/14/09, Igor Stasenko apparently wrote:
Not sure how this relates exactly but there has been quite a bit of work in the past re: Virtual Image 4.0 format.
See:
New Compiled Method Format
<http://wiki.squeak.org/squeak/750>

and:
VI4 project
<http://wiki.squeak.org/squeak/2119>

It doesn't relate. There may be some inspiration for how to deal with source pointers but to be honest, I find Igors approach vastly more useful since it doesn't immediately add a megabyte or more to the image size (which would happen if you go to an explicit source pointer representation) but rather makes the encoding explicitly accessible.

Go Igor!

Cheers,
 - Andreas

PS. The only person who I'd like to explicitly comment (even if only to say "that's fine") is Eliot, since he might have some additional thoughts about some of this stuff which relate to Cog.

and I've been explicit on my blog that I don't think that the current compiled method format is bad.  I still think its compactness makes sense.  providing some abstraction for accessing trailers is a good thing as long as it doesn't add significant overhead, and Igor's approach respects that constraint nicely.  

<rant>I've been reading Codes at Work by Peter Seibel, and it's a quite brilliant book.  A couple of people in the book crtiticise the OO crowd for unnecessary abstraction.  KISS is really important, as in a networked world is compactness.  I think it is wise to avoid unnecessary decomposition (heavy emphasis on unnecessary here; if it is necessary then by all means decompose).   Arguably unnecessary decomposition are the fragments of SystemDictionary that have gone into SystemNavigation, SmalltalkImage et al (I like SystemNavigation, but SmalltalkImage seems pointless).  Much better to have a well thought-out distinction such as development vs deployment than an abstraction of a function into a class.  Related functions belong together in a single class. One would be insane to break out the arithmetic functions on Point (+,-,<<, >= et al) into a separate ArithmeticPoint from graphical operations such as e.g. isInRectangle:.  So for me breaking out the source pointer, having a separate byte array for bytecodes etc is all quiche.  Keep them compact.  Likewise, SourceFilesArray and SmalltalkImage are incoherent fragments.  Why not have a SystemFileManager that provides an interface to the sources files and the code to deal with renaming images etc?</rant>



Reply | Threaded
Open this post in threaded view
|

Re: A reengineering the CompiledMethod trailers

Andreas.Raab
Eliot Miranda wrote:

> <rant>I've been reading Codes at Work by Peter Seibel, and it's a quite
> brilliant book.  A couple of people in the book crtiticise the OO crowd
> for unnecessary abstraction.  KISS is really important, as in a
> networked world is compactness.  I think it is wise to avoid unnecessary
> decomposition (heavy emphasis on unnecessary here; if it is necessary
> then by all means decompose).   Arguably unnecessary decomposition are
> the fragments of SystemDictionary that have gone into SystemNavigation,
> SmalltalkImage et al (I like SystemNavigation, but SmalltalkImage seems
> pointless).  Much better to have a well thought-out distinction such as
> development vs deployment than an abstraction of a function into a
> class.  Related functions belong together in a single class. One would
> be insane to break out the arithmetic functions on Point (+,-,<<, >= et
> al) into a separate ArithmeticPoint from graphical operations such as
> e.g. isInRectangle:.  So for me breaking out the source pointer, having
> a separate byte array for bytecodes etc is all quiche.  Keep them
> compact.  Likewise, SourceFilesArray and SmalltalkImage are incoherent
> fragments.  Why not have a SystemFileManager that provides an interface
> to the sources files and the code to deal with renaming images etc?</rant>

Amen.

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: Re: A reengineering the CompiledMethod trailers

Igor Stasenko
In reply to this post by Eliot Miranda-2
2009/12/15 Eliot Miranda <[hidden email]>:

>
>
> On Sun, Dec 13, 2009 at 5:01 PM, Andreas Raab <[hidden email]> wrote:
>>
>> Ken G. Brown wrote:
>>>
>>> At 12:14 AM +0200 12/14/09, Igor Stasenko apparently wrote:
>>> Not sure how this relates exactly but there has been quite a bit of work
>>> in the past re: Virtual Image 4.0 format.
>>> See:
>>> New Compiled Method Format
>>> <http://wiki.squeak.org/squeak/750>
>>>
>>> and:
>>> VI4 project
>>> <http://wiki.squeak.org/squeak/2119>
>>
>> It doesn't relate. There may be some inspiration for how to deal with
>> source pointers but to be honest, I find Igors approach vastly more useful
>> since it doesn't immediately add a megabyte or more to the image size (which
>> would happen if you go to an explicit source pointer representation) but
>> rather makes the encoding explicitly accessible.
>>
>> Go Igor!
>>
>> Cheers,
>>  - Andreas
>>
>> PS. The only person who I'd like to explicitly comment (even if only to
>> say "that's fine") is Eliot, since he might have some additional thoughts
>> about some of this stuff which relate to Cog.
>
> and I've been explicit on my blog that I don't think that the current
> compiled method format is bad.  I still think its compactness makes sense.
>  providing some abstraction for accessing trailers is a good thing as long
> as it doesn't add significant overhead, and Igor's approach respects that
> constraint nicely.
> <rant>I've been reading Codes at Work by Peter Seibel, and it's a quite
> brilliant book.  A couple of people in the book crtiticise the OO crowd for
> unnecessary abstraction.  KISS is really important, as in a networked world
> is compactness.  I think it is wise to avoid unnecessary decomposition
> (heavy emphasis on unnecessary here; if it is necessary then by all means
> decompose).   Arguably unnecessary decomposition are the fragments of
> SystemDictionary that have gone into SystemNavigation, SmalltalkImage et al
> (I like SystemNavigation, but SmalltalkImage seems pointless).  Much better
> to have a well thought-out distinction such as development vs deployment
> than an abstraction of a function into a class.  Related functions belong
> together in a single class. One would be insane to break out the arithmetic
> functions on Point (+,-,<<, >= et al) into a separate ArithmeticPoint from
> graphical operations such as e.g. isInRectangle:.  So for me breaking out
> the source pointer, having a separate byte array for bytecodes etc is all
> quiche.  Keep them compact.  Likewise, SourceFilesArray and SmalltalkImage
> are incoherent fragments.  Why not have a SystemFileManager that provides an
> interface to the sources files and the code to deal with renaming images
> etc?</rant>
>

Oh yeah.. SourceFilesArray. A quite representative illustration of
lack of abstraction.

I would call it at SourceCodeProvider and give a protocol:
#sourceCodeAt: sourcePointer
and
#methodStampAt: sourcePointer
and no, no more RemoteString crap.

Also, i tried to offer something to those who want to store sources
differently than system's default.
This is the reason why i added encoding of an arbitrary string identifier.
So, the scheme could be like: First, request comes to a class, where
the method installed, then class may either
answer the source code, or pass request further.
At the end of the day, you could use UUIDs for each method, and store
source code in some database.
Then provide a layer which talks with database for retrieving source
code. Since UUIDs do not need to be changed (in contrast to
sourcePointer, which needs to be changed if you compress the .changes
or .sources files), you can assign it once and keep forever.

The class+selector case is added to support metaprogramming. When you
having an anonymous class (or something which behaves like class), it
is pointless to log the source code in .changes when compiling the
methods, but still you could be able to store it somewhere, so user,
while debugging will see original source, instead of decompiled one.

>
>
>



--
Best regards,
Igor Stasenko AKA sig.