test crashing the cog vm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Igor Stasenko
On 22 March 2011 19:43, Toon Verwaest <[hidden email]> wrote:

> It's a very simple idea really.
>
> Where your bytecode would previously look like
>
> pushTemp: 1
> pushTemp: 2
> add
>
> now it looks like:
>
> add 1, 2, 3
>
> where 3 is the start of your original stack, for example.
>
> So rather than pushing and popping etc, you just abstractly interpret the
> bytecodes and figure out the stack locations where they would be pushed. And
> then you use this number as arguments to a copy bytecode, rather than
> calling "push". This avoids having to maintain a stack where you up front
> already know what the target index will be. You know this up front since
> your code is static by itself.
>
> The advantage is that you don't need to do stack-tricks to keep data around,
> since all the operation are simple things such as copy, loadFromOuterScope,
> storeInOuterScope, loadFromField, storeInField, ...
>
> So conversely to what you might have thought beforehand, you don't have
> "real registers" around, you just consider a stackframe to be a set of
> registers. And you operate on them with easy bytecodes. The advantage, in
> addition to it being less work and thus slightly faster by itself, is that
> it will later on also map better on the real hardware. The operations
> already look like what normal assembler operations would look like.
>

So, as i understood , a code like:

someMethod

   | temp1 temp2 |

   ^ self send: temp1+temp2 with: temp2.

could be represented by following:
(we need 5 temps on stack: )

1. temp1
2. temp2
3. receiver
4. - first arg for send
5. - second arg

bytecode:

1.storeSelf: 3
2. storeTemp: 1 to: 4
3. storeTemp: 2 to: 5
4. send #+ to: 4
5. storeTemp: 2 to: 5
6. send #send:with: to: 3
7. return: 3

(note here, that given bytecode already contains an optimization - a
context slot #5 are not overridden by #+ send,
the same argument temp2 is simply reused for subsequent send
(#send:with:), which is quite nice).


> cheers,
> Toon
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Igor Stasenko
On 22 March 2011 23:20, Igor Stasenko <[hidden email]> wrote:

> On 22 March 2011 19:43, Toon Verwaest <[hidden email]> wrote:
>> It's a very simple idea really.
>>
>> Where your bytecode would previously look like
>>
>> pushTemp: 1
>> pushTemp: 2
>> add
>>
>> now it looks like:
>>
>> add 1, 2, 3
>>
>> where 3 is the start of your original stack, for example.
>>
>> So rather than pushing and popping etc, you just abstractly interpret the
>> bytecodes and figure out the stack locations where they would be pushed. And
>> then you use this number as arguments to a copy bytecode, rather than
>> calling "push". This avoids having to maintain a stack where you up front
>> already know what the target index will be. You know this up front since
>> your code is static by itself.
>>
>> The advantage is that you don't need to do stack-tricks to keep data around,
>> since all the operation are simple things such as copy, loadFromOuterScope,
>> storeInOuterScope, loadFromField, storeInField, ...
>>
>> So conversely to what you might have thought beforehand, you don't have
>> "real registers" around, you just consider a stackframe to be a set of
>> registers. And you operate on them with easy bytecodes. The advantage, in
>> addition to it being less work and thus slightly faster by itself, is that
>> it will later on also map better on the real hardware. The operations
>> already look like what normal assembler operations would look like.
>>
>
> So, as i understood , a code like:
>
> someMethod
>
>   | temp1 temp2 |
>
>   ^ self send: temp1+temp2 with: temp2.
>
> could be represented by following:
> (we need 5 temps on stack: )
>
> 1. temp1
> 2. temp2
> 3. receiver
> 4. - first arg for send
> 5. - second arg
>
> bytecode:
>
> 1.storeSelf: 3
> 2. storeTemp: 1 to: 4
> 3. storeTemp: 2 to: 5
> 4. send #+ to: 4
> 5. storeTemp: 2 to: 5
> 6. send #send:with: to: 3
> 7. return: 3
>
> (note here, that given bytecode already contains an optimization - a
> context slot #5 are not overridden by #+ send,
> the same argument temp2 is simply reused for subsequent send
> (#send:with:), which is quite nice).
>
oh... a little mistake i meant that instruction #5 could be optimized away,
because slot #5 contents == temp2 are unchanged.

>
>> cheers,
>> Toon
>>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Toon Verwaest-2

>> So, as i understood , a code like:
>>
>> someMethod
>>
>>    | temp1 temp2 |
>>
>>    ^ self send: temp1+temp2 with: temp2.
>>
>> could be represented by following:
>> (we need 5 temps on stack: )
>>
>> 1. temp1
>> 2. temp2
>> 3. receiver
>> 4. - first arg for send
>> 5. - second arg
>>
>> bytecode:
>>
>> 1.storeSelf: 3
>> 2. storeTemp: 1 to: 4
>> 3. storeTemp: 2 to: 5
>> 4. send #+ to: 4
>> 5. storeTemp: 2 to: 5
>> 6. send #send:with: to: 3
>> 7. return: 3
Exactly. You will probably need to say how many arguments you want to
pass to +, for validation purposes. And yes, as you automatically did,
you would get the return value back as the location of the first
argument. This has as an advantage that a "return self" does not not to
do anything at all; it was already the first "argument" of the send anyway.

There are even more optimizations that could be done; such as knowing
that the receiver is actually at location -1 if this is in the context
of the method; thus not even needing to copy any values ... You could
just apply the first send in 2, and the second one in -1 :)
But obviously in that case I expect a lot of knowledge of what you do,
and this might not always be possible.

cheers,
Toon

Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Stefan Marr-4
In reply to this post by Stéphane Ducasse

On 22 Mar 2011, at 19:00, Stéphane Ducasse wrote:

> Can you explain rapidly register-case byte code?

And for those who still wonder why that might be useful, an article titled: 'Virtual Machine Showdown: Stack Versus Registers'


@article{1328197,
  address = {New York, NY, USA},
  author = {Yunhe Shi and Kevin Casey and M. Anton Ertl and David Gregg},
  interhash = {5fbfc20a2129ec1897d295cace8e3595},
  intrahash = {e329d8905cc5c93be9782963125d9644},
  journal = {ACM Trans. Archit. Code Optim.},
  number = 4,
  pages = {1--36},
  publisher = {ACM},
  title = {Virtual Machine Showdown: Stack Versus Registers},
  url = {http://portal.acm.org/citation.cfm?id=1328195.1328197},
  volume = 4,
  year = 2008,
  added-at = {2008-09-03T21:02:27.000+0200},
  issn = {1544-3566},
  description = {Virtual machine showdown},
  biburl = {http://www.bibsonomy.org/bibtex/2e329d8905cc5c93be9782963125d9644/gron},
  doi = {http://doi.acm.org/10.1145/1328195.1328197}
}




--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Eliot Miranda-2
In reply to this post by Stéphane Ducasse


On Tue, Mar 22, 2011 at 5:34 AM, Stéphane Ducasse <[hidden email]> wrote:
But why we could not have a byecode validator at the image level that first make sure that byte code are in sync with the format of the objects.

Because it can be compromised.  An in-image verifier is subject to attack, and could be disabled by an attack that got past the in-image verifier before it got a chance to run.  An in-VM verifier is not possible to side-step because it is the only way to execute code.  So an in-VM verifier can be secure but an in-image one can't and so is pointless.

 
Why this has to be done in the vm.

Stef

Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Igor Stasenko
On 24 March 2011 19:34, Eliot Miranda <[hidden email]> wrote:

>
>
> On Tue, Mar 22, 2011 at 5:34 AM, Stéphane Ducasse
> <[hidden email]> wrote:
>>
>> But why we could not have a byecode validator at the image level that
>> first make sure that byte code are in sync with the format of the objects.
>
> Because it can be compromised.  An in-image verifier is subject to attack,
> and could be disabled by an attack that got past the in-image verifier
> before it got a chance to run.  An in-VM verifier is not possible to
> side-step because it is the only way to execute code.  So an in-VM verifier
> can be secure but an in-image one can't and so is pointless.
>
For real hacker there's nothing impossible :)

Right now its not possible to split image to layered onion (like
operating system does, where you have kernel level,
and user level), but i think (at least in theory) such composition
could be implemented, except that sure thing
we don't have resources to invest in this direction.

It is actually nice field for research (hello guys from academy :)

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Eliot Miranda-2


On Thu, Mar 24, 2011 at 11:42 AM, Igor Stasenko <[hidden email]> wrote:
On 24 March 2011 19:34, Eliot Miranda <[hidden email]> wrote:
>
>
> On Tue, Mar 22, 2011 at 5:34 AM, Stéphane Ducasse
> <[hidden email]> wrote:
>>
>> But why we could not have a byecode validator at the image level that
>> first make sure that byte code are in sync with the format of the objects.
>
> Because it can be compromised.  An in-image verifier is subject to attack,
> and could be disabled by an attack that got past the in-image verifier
> before it got a chance to run.  An in-VM verifier is not possible to
> side-step because it is the only way to execute code.  So an in-VM verifier
> can be secure but an in-image one can't and so is pointless.
>
For real hacker there's nothing impossible :)

True, but it can be much harder.  How would you attempt to hack past the fact that the interpreter and JIT code, including the verifier, is in read-only memory?  If this is the only route to create executable code, and it always verifies the bytecode before it executes then it is secure right?

[there are issues; one doesn't want to verify a method each time it is activated in the interpreter, but e.g. a bit in the header saying "this method has been verified" might be easy to compromise.  I'm not sure it is though, because one big advantage of the current funky method format (half literals, half btes) is that there's only one primitive to construct a method and it has to have a valid header, and there's only one way to modify the header, objectAt:put:.  So as far as I can see the VM can in fact preserve the integrty a header flag bit that says "this method has been verified".  Am I right or wrong?]
 

Right now its not possible to split image to layered onion (like
operating system does, where you have kernel level,
and user level), but i think (at least in theory) such composition
could be implemented, except that sure thing
we don't have resources to invest in this direction.

It is actually nice field for research (hello guys from academy :)

--
Best regards,
Igor Stasenko AKA sig.


Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Stéphane Ducasse
In reply to this post by Eliot Miranda-2
Of course if we talk about attack. I was just talking about verifying to avoid simple crashes.

Stef

On Mar 24, 2011, at 7:34 PM, Eliot Miranda wrote:

>
>
> On Tue, Mar 22, 2011 at 5:34 AM, Stéphane Ducasse <[hidden email]> wrote:
> But why we could not have a byecode validator at the image level that first make sure that byte code are in sync with the format of the objects.
>
> Because it can be compromised.  An in-image verifier is subject to attack, and could be disabled by an attack that got past the in-image verifier before it got a chance to run.  An in-VM verifier is not possible to side-step because it is the only way to execute code.  So an in-VM verifier can be secure but an in-image one can't and so is pointless.
>
>  
> Why this has to be done in the vm.
>
> Stef
>


Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Stéphane Ducasse
In reply to this post by Igor Stasenko
>> <[hidden email]> wrote:
>>>
>>> But why we could not have a byecode validator at the image level that
>>> first make sure that byte code are in sync with the format of the objects.
>>
>> Because it can be compromised.  An in-image verifier is subject to attack,
>> and could be disabled by an attack that got past the in-image verifier
>> before it got a chance to run.  An in-VM verifier is not possible to
>> side-step because it is the only way to execute code.  So an in-VM verifier
>> can be secure but an in-image one can't and so is pointless.
>>
> For real hacker there's nothing impossible :)
>
> Right now its not possible to split image to layered onion (like
> operating system does, where you have kernel level,
> and user level), but i think (at least in theory) such composition
> could be implemented, except that sure thing
> we don't have resources to invest in this direction.
>
> It is actually nice field for research (hello guys from academy :)

we have three phds on the topics and we will see.

>
> --
> Best regards,
> Igor Stasenko AKA sig.
>


Reply | Threaded
Open this post in threaded view
|

Re: test crashing the cog vm

Andres Valloud-4
In reply to this post by Igor Stasenko
Immutability may or may not be paid attention to by oneWayBecome:.

You can verify the class format, just to have someone in the image
change the class of an object.  What should you do when you do a become:
of a compiled method?  What happens when you get the VM to verify and
JIT all the compiled methods for class X, and then you put your badly
shaped class Y as a subclass of X?  If Y does not provide any messages,
then inheritance will find all the already verified methods in class X,
so the verifier doesn't run and there you go again.

Sooner or later, there will be some decision that will throw out the
dynamism of the language, or the extreme safety checks, or performance.

On 3/22/11 6:17 , Igor Stasenko wrote:

> On 22 March 2011 14:02, Toon Verwaest<[hidden email]>  wrote:
>> The problem is exactly what you had at hand. Your bytecode WAS valid, but it
>> was used in combination with an incompatible class layout. So validation
>> here wouldn't solve anything. You always need to validate in a closed world
>> to ensure you don't accidentally break everything. Whenever you change
>> something, you need to ensure that you revalidate the relevant parts. In
>> this case you could for example just have validated that your new class is a
>> valid subclass of its superclass; which it was not.
>>
>> To make the system more secure obviously you would have to check those
>> things, but if you add part of the API that circumvents these checks, like
>> you were doing by calling "Class new" rather than using the ClassBuilder
>> which does do the checks, everything breaks. The only authority that can
>> actually ensure that you don't circumvent these checks is the VM.
>>
>> Obviously you can make sure already in your image that you have enough
>> checks everywhere. You just don't have a crashproof mechanism that will
>> chain your users down avoiding that they shoot themselves in the foot with
>> segfaults. If you put it inside of the VM you -can- provide such a
>> mechanism, because you don't execute anything unless you know it's safe. And
>> as I said, this piece of code could be a piece of prevalidated Smalltalk
>> code that's immutable from the rest of the image.
>>
>
> Yep.. and that is possible only after you introduce irreversible
> immutability mechanism into VM,
> which marking object(s) to be immutable for the rest of their existence.
>
>> cheers,
>> Toon
>>
>> On 03/22/2011 02:37 PM, Alexandre Bergel wrote:
>>>>
>>>> But why we could not have a byecode validator at the image level that
>>>> first make sure that byte code are in sync with the format of the objects.
>>>> Why this has to be done in the vm.
>>>
>>> I agree with Stef. It is not obvious for me why it has to be done at the
>>> VM.
>>>
>>> Alexandre
>>
>>
>>
>
>
>

123