squeak decompiler

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

squeak decompiler

Whiter Walt
 
Hi,

I am interested in the Squeak Decompiler class. Is there any technical
information in the net, which explains the functionality? Or maybe it
follows some common "rules", I can find in some paper or book?

cheers
Walt
Reply | Threaded
Open this post in threaded view
|

Re: squeak decompiler

Clément Béra
 
Hello,

I don't think there's much documentation available.

Basically it builds a method AST from a compiled method. It is composed of the Decompiler that walks over the bytecode and the AST constructor that builds the AST.

Most nodes maps one to one to AST nodes and are easy to decompile. The main complexity comes from the decompilation of inlined control flow structures, which is based on heuristics:
- loops are decompiled to while loops and if a certain pattern is met (there's an iterator incremented at each loop iteration) then the while loop is replaced by #to:do: or #to:by:do:
- conditions are decompiled to ifTrue:ifFalse:, except if the pattern matches a #ifNil:ifNotNil: (there's == nil bytecodes), and except if the conditional jumps jumps to a push true or push false bytecode, in which case it is decompiled to #and: or #or:
- dup mixed with jumps are decompiled to #caseOf: or #caseOf:otherwise:

There's a little complexity from cascade too. They're detected with dup bytecode without jumps on the contrary to #caseOf: .

Note that the Decompiler is unreliable, recompiling the whole image from decompiled sources crashes, and seemingly it has not been possible to do that in the past decade (it was possible in Smalltalk 80). So you can consider that the decompiler can decompile 99% of methods but not all of them.

An interesting work would be to improve the decompiler so that the image does not crash when all the methods are recompiled from decompiled sources.

Regards,

Clement


2014-10-20 10:51 GMT+02:00 Whiter Walt <[hidden email]>:

Hi,

I am interested in the Squeak Decompiler class. Is there any technical information in the net, which explains the functionality? Or maybe it follows some common "rules", I can find in some paper or book?

cheers
Walt

Reply | Threaded
Open this post in threaded view
|

Re: squeak decompiler

Casey Ransberger-2
In reply to this post by Whiter Walt

I don't know anything about how the Squeak decompiler works.

One approach to translating code in one language to another is to build a parser for the source language in order to obtain an abstract syntax tree (AST) and then use a pretty-printer for the target language on the AST. That's the short, short version of an explanation of how we translate the Squeak VM from Slang to C.

In the case of a decompiler for a byte compiled system, the source language (using the above approach) would be the bytecodes themselves, and the target language could be any language with sufficient facilities to support the semantics of the source language (in the case of a decompiler, the target language would be the original language used to compile the e.g. bytecode.)

I hesitated to respond, because I definitely don't know anything about Squeak's decompiler, and also because what I've described is in some ways a massive oversimplification (if the compiler is optimizing things, deoptimization may be necessary to approach the intent and style of the original programmer's code, and that's just one thing I can think of.)

We have available to us a couple of tools which make it (relatively) easy to play around with this idea and get a feel for it (OMeta/Squeak and PetitParser.)

I'd recommend checking out PetitParser if you're messing with stuff to do with compilation, as OMeta's development seems to mostly have shifted to Javascript. PetitParser is actively maintained.

Decompilation hasn't been an area of interest for me, and I don't know much about it, so I may be sending you down the primrose path. In the immortal words of LeVar Burton, "don't take my word for it."

I know that several research papers have been written about OMeta, and I'd bet some have been written about PetitParser as well. The focus of these systems is on parsing expression grammars (PEGs,) but language to language translation is a thing, and it seems to me that a decompiler can be viewed as a special case of this thing. Following the references cited in such papers could prove valuable to whatever your cause is.

For historical perspective on OMeta and PetitParser, it might be helpful looking into the meta language called Meta-2.

Arguably, this question would be as well addressed on squeak-dev, as both the compiler and the decompiler are image-residents, and not built into the VM (though both are tied to the VM's behavior wrt interpreting bytecode.)

HTH, sorry if it doesn't.

Casey

P.S.

People who actually know how Squeak's decompiler works should still probably chime in here :D

> On Oct 20, 2014, at 1:51 AM, Whiter Walt <[hidden email]> wrote:
>
> Hi,
>
> I am interested in the Squeak Decompiler class. Is there any technical information in the net, which explains the functionality? Or maybe it follows some common "rules", I can find in some paper or book?
>
> cheers
> Walt
Reply | Threaded
Open this post in threaded view
|

Re: squeak decompiler

Karl Ramberg
 
Free books here:
 
Early Squeak was pretty similar to the description in
 
 
 
Karl
 
 

On Mon, Oct 20, 2014 at 12:01 PM, Casey Ransberger <[hidden email]> wrote:

I don't know anything about how the Squeak decompiler works.

One approach to translating code in one language to another is to build a parser for the source language in order to obtain an abstract syntax tree (AST) and then use a pretty-printer for the target language on the AST. That's the short, short version of an explanation of how we translate the Squeak VM from Slang to C.

In the case of a decompiler for a byte compiled system, the source language (using the above approach) would be the bytecodes themselves, and the target language could be any language with sufficient facilities to support the semantics of the source language (in the case of a decompiler, the target language would be the original language used to compile the e.g. bytecode.)

I hesitated to respond, because I definitely don't know anything about Squeak's decompiler, and also because what I've described is in some ways a massive oversimplification (if the compiler is optimizing things, deoptimization may be necessary to approach the intent and style of the original programmer's code, and that's just one thing I can think of.)

We have available to us a couple of tools which make it (relatively) easy to play around with this idea and get a feel for it (OMeta/Squeak and PetitParser.)

I'd recommend checking out PetitParser if you're messing with stuff to do with compilation, as OMeta's development seems to mostly have shifted to Javascript. PetitParser is actively maintained.

Decompilation hasn't been an area of interest for me, and I don't know much about it, so I may be sending you down the primrose path. In the immortal words of LeVar Burton, "don't take my word for it."

I know that several research papers have been written about OMeta, and I'd bet some have been written about PetitParser as well. The focus of these systems is on parsing expression grammars (PEGs,) but language to language translation is a thing, and it seems to me that a decompiler can be viewed as a special case of this thing. Following the references cited in such papers could prove valuable to whatever your cause is.

For historical perspective on OMeta and PetitParser, it might be helpful looking into the meta language called Meta-2.

Arguably, this question would be as well addressed on squeak-dev, as both the compiler and the decompiler are image-residents, and not built into the VM (though both are tied to the VM's behavior wrt interpreting bytecode.)

HTH, sorry if it doesn't.

Casey

P.S.

People who actually know how Squeak's decompiler works should still probably chime in here :D

> On Oct 20, 2014, at 1:51 AM, Whiter Walt <[hidden email]> wrote:
>
> Hi,
>
> I am interested in the Squeak Decompiler class. Is there any technical information in the net, which explains the functionality? Or maybe it follows some common "rules", I can find in some paper or book?
>
> cheers
> Walt

Reply | Threaded
Open this post in threaded view
|

Re: squeak decompiler

Whiter Walt
In reply to this post by Clément Béra
 
Hi,

thanks for your reply!
it is a pity, there is no documentation about decompiling. Anyway, thanks for your summary/introduction about decompiling squeak bytecode...

Regards,
Walt

Am 20.10.2014 11:31, schrieb Clément Bera:
 


Hello,

I don't think there's much documentation available.

Basically it builds a method AST from a compiled method. It is composed of the Decompiler that walks over the bytecode and the AST constructor that builds the AST.

Most nodes maps one to one to AST nodes and are easy to decompile. The main complexity comes from the decompilation of inlined control flow structures, which is based on heuristics:
- loops are decompiled to while loops and if a certain pattern is met (there's an iterator incremented at each loop iteration) then the while loop is replaced by #to:do: or #to:by:do:
- conditions are decompiled to ifTrue:ifFalse:, except if the pattern matches a #ifNil:ifNotNil: (there's == nil bytecodes), and except if the conditional jumps jumps to a push true or push false bytecode, in which case it is decompiled to #and: or #or:
- dup mixed with jumps are decompiled to #caseOf: or #caseOf:otherwise:

There's a little complexity from cascade too. They're detected with dup bytecode without jumps on the contrary to #caseOf: .

Note that the Decompiler is unreliable, recompiling the whole image from decompiled sources crashes, and seemingly it has not been possible to do that in the past decade (it was possible in Smalltalk 80). So you can consider that the decompiler can decompile 99% of methods but not all of them.

An interesting work would be to improve the decompiler so that the image does not crash when all the methods are recompiled from decompiled sources.

Regards,

Clement


2014-10-20 10:51 GMT+02:00 Whiter Walt <[hidden email]>:

Hi,

I am interested in the Squeak Decompiler class. Is there any technical information in the net, which explains the functionality? Or maybe it follows some common "rules", I can find in some paper or book?

cheers
Walt


Reply | Threaded
Open this post in threaded view
|

Re: squeak decompiler

Whiter Walt
In reply to this post by Casey Ransberger-2
 
Hi,

thanks for your reply!
I will check out your recommended tools...

Regards,
Walt

Am 20.10.2014 12:01, schrieb Casey Ransberger:

>  
> I don't know anything about how the Squeak decompiler works.
>
> One approach to translating code in one language to another is to build a parser for the source language in order to obtain an abstract syntax tree (AST) and then use a pretty-printer for the target language on the AST. That's the short, short version of an explanation of how we translate the Squeak VM from Slang to C.
>
> In the case of a decompiler for a byte compiled system, the source language (using the above approach) would be the bytecodes themselves, and the target language could be any language with sufficient facilities to support the semantics of the source language (in the case of a decompiler, the target language would be the original language used to compile the e.g. bytecode.)
>
> I hesitated to respond, because I definitely don't know anything about Squeak's decompiler, and also because what I've described is in some ways a massive oversimplification (if the compiler is optimizing things, deoptimization may be necessary to approach the intent and style of the original programmer's code, and that's just one thing I can think of.)
>
> We have available to us a couple of tools which make it (relatively) easy to play around with this idea and get a feel for it (OMeta/Squeak and PetitParser.)
>
> I'd recommend checking out PetitParser if you're messing with stuff to do with compilation, as OMeta's development seems to mostly have shifted to Javascript. PetitParser is actively maintained.
>
> Decompilation hasn't been an area of interest for me, and I don't know much about it, so I may be sending you down the primrose path. In the immortal words of LeVar Burton, "don't take my word for it."
>
> I know that several research papers have been written about OMeta, and I'd bet some have been written about PetitParser as well. The focus of these systems is on parsing expression grammars (PEGs,) but language to language translation is a thing, and it seems to me that a decompiler can be viewed as a special case of this thing. Following the references cited in such papers could prove valuable to whatever your cause is.
>
> For historical perspective on OMeta and PetitParser, it might be helpful looking into the meta language called Meta-2.
>
> Arguably, this question would be as well addressed on squeak-dev, as both the compiler and the decompiler are image-residents, and not built into the VM (though both are tied to the VM's behavior wrt interpreting bytecode.)
>
> HTH, sorry if it doesn't.
>
> Casey
>
> P.S.
>
> People who actually know how Squeak's decompiler works should still probably chime in here :D
>
>> On Oct 20, 2014, at 1:51 AM, Whiter Walt <[hidden email]> wrote:
>>
>> Hi,
>>
>> I am interested in the Squeak Decompiler class. Is there any technical information in the net, which explains the functionality? Or maybe it follows some common "rules", I can find in some paper or book?
>>
>> cheers
>> Walt

Reply | Threaded
Open this post in threaded view
|

Re: squeak decompiler

Whiter Walt
In reply to this post by Karl Ramberg
 
Hi,

thanks for your reply!
do you know for sure, there is information about decompiling in one of these books? Because I could not find anything!

Regards,
Walt

Am 20.10.2014 13:13, schrieb karl ramberg:
 


Free books here:
 
Early Squeak was pretty similar to the description in
 
 
 
Karl
 
 

On Mon, Oct 20, 2014 at 12:01 PM, Casey Ransberger <[hidden email]> wrote:

I don't know anything about how the Squeak decompiler works.

One approach to translating code in one language to another is to build a parser for the source language in order to obtain an abstract syntax tree (AST) and then use a pretty-printer for the target language on the AST. That's the short, short version of an explanation of how we translate the Squeak VM from Slang to C.

In the case of a decompiler for a byte compiled system, the source language (using the above approach) would be the bytecodes themselves, and the target language could be any language with sufficient facilities to support the semantics of the source language (in the case of a decompiler, the target language would be the original language used to compile the e.g. bytecode.)

I hesitated to respond, because I definitely don't know anything about Squeak's decompiler, and also because what I've described is in some ways a massive oversimplification (if the compiler is optimizing things, deoptimization may be necessary to approach the intent and style of the original programmer's code, and that's just one thing I can think of.)

We have available to us a couple of tools which make it (relatively) easy to play around with this idea and get a feel for it (OMeta/Squeak and PetitParser.)

I'd recommend checking out PetitParser if you're messing with stuff to do with compilation, as OMeta's development seems to mostly have shifted to Javascript. PetitParser is actively maintained.

Decompilation hasn't been an area of interest for me, and I don't know much about it, so I may be sending you down the primrose path. In the immortal words of LeVar Burton, "don't take my word for it."

I know that several research papers have been written about OMeta, and I'd bet some have been written about PetitParser as well. The focus of these systems is on parsing expression grammars (PEGs,) but language to language translation is a thing, and it seems to me that a decompiler can be viewed as a special case of this thing. Following the references cited in such papers could prove valuable to whatever your cause is.

For historical perspective on OMeta and PetitParser, it might be helpful looking into the meta language called Meta-2.

Arguably, this question would be as well addressed on squeak-dev, as both the compiler and the decompiler are image-residents, and not built into the VM (though both are tied to the VM's behavior wrt interpreting bytecode.)

HTH, sorry if it doesn't.

Casey

P.S.

People who actually know how Squeak's decompiler works should still probably chime in here :D

> On Oct 20, 2014, at 1:51 AM, Whiter Walt <[hidden email]> wrote:
>
> Hi,
>
> I am interested in the Squeak Decompiler class. Is there any technical information in the net, which explains the functionality? Or maybe it follows some common "rules", I can find in some paper or book?
>
> cheers
> Walt