tempVectors use case and current issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

tempVectors use case and current issues

Denis Kudriashov
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis







Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Nicolas Cellier
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater). All can be done at image side then. Or did I miss something?

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis







Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Denis Kudriashov
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side. I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis







Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Levente Uzonyi
In reply to this post by Denis Kudriashov
 
Just because you have got a hammer, it doesn't mean you have to use it to
solve this task. Instead of trying to mangle the way the VM handles this,
you should just change the compiler to emit #at: and #at:put: instead of
the temp vector bytecodes.
Implementing this idea could open up new possiblities. For example, you
could use custom selectors like #remoteTempAt:, which you could be
implemented by your proxies besides Array. Or you could introduce an
Array-like class to act as temp vectors and keep Array clean of these
methods.
By the way, any kind of change to these primitives means that the
atomicity guarantees the VM currently provides are gone if the temp vector
is "remote".

Levente
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Eliot Miranda-2
In reply to this post by Denis Kudriashov
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis









--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Denis Kudriashov
 
Hi Eliot

чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <[hidden email]>:
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

Ok. I expected such answers :) but ask for the chance that some cheap trick is possible. Like my readOnly example. It shows that there is at least writebarrier check during this operation. If it would signal an error it could be used to do the job. 
Method transformation would be quite complex to use because It needs to be applied dynamically to live context, and it requires stack modifications on the fly. Just compiling method in advance is not appropriate for my goal. I don't want to change compiler globally or force user to do it for concrete method/class. It would be not transparent solution. 

Anyway thanks all for answers.


So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis









--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Eliot Miranda-2
 
Hi Denis,

On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <[hidden email]> wrote:

Hi Eliot

чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <[hidden email]>:
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

Ok. I expected such answers :) but ask for the chance that some cheap trick is possible. Like my readOnly example. It shows that there is at least writebarrier check during this operation. If it would signal an error it could be used to do the job. 
Method transformation would be quite complex to use because It needs to be applied dynamically to live context, and it requires stack modifications on the fly. Just compiling method in advance is not appropriate for my goal. I don't want to change compiler globally or force user to do it for concrete method/class. It would be not transparent solution. 

Well maybe.  But transforming a block and its activations is straightforward:

- it is easy to construct a transformation from tempVector bytecode blocks (TVBB) to tempVector message blocks (TVMB) because there are no suspension points in the bytecodes and the stack heights at the start and end of the bytecodes are the same as for the message versions.  So some form of
   store indirect temp bytecode =>
   dup (now value exists twice)
   dup (now value exists thrice)
   push indirect temp
   pop store into value location that was duped
   push index
   pop store into 2nd value
   send at:put:
will reimplement.  And then it’s just a matter of remapping PCs from one to the other and lengthening jumps.  A day’s work or two at most

 If the transformation is done in the marshaller that remotes objects then it will be easy to substitute the transformed method and map any PCs in contexts and closures (the JIT does this kind of mapping routinely).

The only problem transforming in the other direction (if you ever need to) is in advancing computation past the message send sequence for the TVMB access.  That can be coding with Context’s single step facility along with the pc map from which you can detect where the end of the sequence is.

Anyway thanks all for answers.


So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis
_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Clément Béra
 
Hi,

Don't know if it makes sense, but although the VM does not perform read-only check to write into temp vector by default, it is possible to activate such checks through a flag I introduced last year for the incremental compactor. Overhead seemed to be minimal. 

Best,

On Sat, Mar 30, 2019 at 3:50 AM Eliot Miranda <[hidden email]> wrote:
 
Hi Denis,

On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <[hidden email]> wrote:

Hi Eliot

чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <[hidden email]>:
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

Ok. I expected such answers :) but ask for the chance that some cheap trick is possible. Like my readOnly example. It shows that there is at least writebarrier check during this operation. If it would signal an error it could be used to do the job. 
Method transformation would be quite complex to use because It needs to be applied dynamically to live context, and it requires stack modifications on the fly. Just compiling method in advance is not appropriate for my goal. I don't want to change compiler globally or force user to do it for concrete method/class. It would be not transparent solution. 

Well maybe.  But transforming a block and its activations is straightforward:

- it is easy to construct a transformation from tempVector bytecode blocks (TVBB) to tempVector message blocks (TVMB) because there are no suspension points in the bytecodes and the stack heights at the start and end of the bytecodes are the same as for the message versions.  So some form of
   store indirect temp bytecode =>
   dup (now value exists twice)
   dup (now value exists thrice)
   push indirect temp
   pop store into value location that was duped
   push index
   pop store into 2nd value
   send at:put:
will reimplement.  And then it’s just a matter of remapping PCs from one to the other and lengthening jumps.  A day’s work or two at most

 If the transformation is done in the marshaller that remotes objects then it will be easy to substitute the transformed method and map any PCs in contexts and closures (the JIT does this kind of mapping routinely).

The only problem transforming in the other direction (if you ever need to) is in advancing computation past the message send sequence for the TVMB access.  That can be coding with Context’s single step facility along with the pc map from which you can detect where the end of the sequence is.

Anyway thanks all for answers.


So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis
_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_ (phone)


--
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Eliot Miranda-2
 
Hi Clément,

On Apr 5, 2019, at 4:42 AM, Clément Béra <[hidden email]> wrote:

Hi,

Don't know if it makes sense, but although the VM does not perform read-only check to write into temp vector by default, it is possible to activate such checks through a flag I introduced last year for the incremental compactor. Overhead seemed to be minimal. 

Good point.  That would be a good way to solve Denis’ problem.


Best,

On Sat, Mar 30, 2019 at 3:50 AM Eliot Miranda <[hidden email]> wrote:
 
Hi Denis,

On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <[hidden email]> wrote:

Hi Eliot

чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <[hidden email]>:
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

Ok. I expected such answers :) but ask for the chance that some cheap trick is possible. Like my readOnly example. It shows that there is at least writebarrier check during this operation. If it would signal an error it could be used to do the job. 
Method transformation would be quite complex to use because It needs to be applied dynamically to live context, and it requires stack modifications on the fly. Just compiling method in advance is not appropriate for my goal. I don't want to change compiler globally or force user to do it for concrete method/class. It would be not transparent solution. 

Well maybe.  But transforming a block and its activations is straightforward:

- it is easy to construct a transformation from tempVector bytecode blocks (TVBB) to tempVector message blocks (TVMB) because there are no suspension points in the bytecodes and the stack heights at the start and end of the bytecodes are the same as for the message versions.  So some form of
   store indirect temp bytecode =>
   dup (now value exists twice)
   dup (now value exists thrice)
   push indirect temp
   pop store into value location that was duped
   push index
   pop store into 2nd value
   send at:put:
will reimplement.  And then it’s just a matter of remapping PCs from one to the other and lengthening jumps.  A day’s work or two at most

 If the transformation is done in the marshaller that remotes objects then it will be easy to substitute the transformed method and map any PCs in contexts and closures (the JIT does this kind of mapping routinely).

The only problem transforming in the other direction (if you ever need to) is in advancing computation past the message send sequence for the TVMB access.  That can be coding with Context’s single step facility along with the pc map from which you can detect where the end of the sequence is.

Anyway thanks all for answers.


So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis
_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_ (phone)


--
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Denis Kudriashov
In reply to this post by Clément Béra
 
Hi Clement

пт, 5 апр. 2019 г., 12:42 Clément Béra <[hidden email]>:
 
Hi,

Don't know if it makes sense, but although the VM does not perform read-only check to write into temp vector by default, it is possible to activate such checks through a flag I introduced last year for the incremental compactor. Overhead seemed to be minimal. 

Is it a flag to compile VM or image side? 
And is it a requirement for new compactor. So it will be enabled at some point by default?


Best,

On Sat, Mar 30, 2019 at 3:50 AM Eliot Miranda <[hidden email]> wrote:
 
Hi Denis,

On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <[hidden email]> wrote:

Hi Eliot

чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <[hidden email]>:
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

Ok. I expected such answers :) but ask for the chance that some cheap trick is possible. Like my readOnly example. It shows that there is at least writebarrier check during this operation. If it would signal an error it could be used to do the job. 
Method transformation would be quite complex to use because It needs to be applied dynamically to live context, and it requires stack modifications on the fly. Just compiling method in advance is not appropriate for my goal. I don't want to change compiler globally or force user to do it for concrete method/class. It would be not transparent solution. 

Well maybe.  But transforming a block and its activations is straightforward:

- it is easy to construct a transformation from tempVector bytecode blocks (TVBB) to tempVector message blocks (TVMB) because there are no suspension points in the bytecodes and the stack heights at the start and end of the bytecodes are the same as for the message versions.  So some form of
   store indirect temp bytecode =>
   dup (now value exists twice)
   dup (now value exists thrice)
   push indirect temp
   pop store into value location that was duped
   push index
   pop store into 2nd value
   send at:put:
will reimplement.  And then it’s just a matter of remapping PCs from one to the other and lengthening jumps.  A day’s work or two at most

 If the transformation is done in the marshaller that remotes objects then it will be easy to substitute the transformed method and map any PCs in contexts and closures (the JIT does this kind of mapping routinely).

The only problem transforming in the other direction (if you ever need to) is in advancing computation past the message send sequence for the TVMB access.  That can be coding with Context’s single step facility along with the pc map from which you can detect where the end of the sequence is.

Anyway thanks all for answers.


So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis
_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_ (phone)


--
Reply | Threaded
Open this post in threaded view
|

Re: tempVectors use case and current issues

Eliot Miranda-2
 
Hi Denis,

On Apr 8, 2019, at 1:04 AM, Denis Kudriashov <[hidden email]> wrote:

Hi Clement

пт, 5 апр. 2019 г., 12:42 Clément Béra <[hidden email]>:
 
Hi,

Don't know if it makes sense, but although the VM does not perform read-only check to write into temp vector by default, it is possible to activate such checks through a flag I introduced last year for the incremental compactor. Overhead seemed to be minimal. 

Is it a flag to compile VM or image side? 

A flag when compiling the VM.

And is it a requirement for new compactor.

Exactly.  So we can incrementally compact we have to have a read barrier on temp vector access.  Adding a read-only check to the read barrier doesn’t add much overhead.

So it will be enabled at some point by default?

Yes.  Hopefully before the end of this year.

Best,

On Sat, Mar 30, 2019 at 3:50 AM Eliot Miranda <[hidden email]> wrote:
 
Hi Denis,

On Mar 28, 2019, at 5:10 PM, Denis Kudriashov <[hidden email]> wrote:

Hi Eliot

чт, 28 мар. 2019 г. в 23:29, Eliot Miranda <[hidden email]>:
 
Hi Denis,

On Thu, Mar 28, 2019 at 2:36 PM Denis Kudriashov <[hidden email]> wrote:
 
Hi Nicolas.

чт, 28 мар. 2019 г. в 19:44, Nicolas Cellier <[hidden email]>:
 
Hi Denis,
Special bytecodes don't have to be changed: just don't use them and replace by regular sends at bytecode generation (with a special compiler, or some IR translater).

Sure, bytecode transformation will work. But it would be quite tricky to apply in live execution context. It would require fixing context stack to take into account updated method bytecode.
Notice that I don't search for global setting to recompile all methods in image. I want this logic only for concrete method/block activation. In my scenario block is serialized and transferred together with current context. So on remote side I need to do something with materialized objects to maintain normal block semantics.
 
All can be done at image side then. Or did I miss something?

I think my examples shows a security hole in VM execution logic which allows to violate memory bounds from the image side.

It is no different than using an inst var access bytecode on an object which doesn't have enough net vars.  It is not a security hole, as much as it is something the system must use correctly to avoid crashes.  The same can be done by e.g.

    thisContext swapSender: Point basicNew

There are many such "security holes".  And if you want the VM to plug them all then the VM will become very much slower.
 
I did not got segfault but I would not be surprized if it would happens in some complex real live scenarios. Maybe it looks like a specially invented case but I think it is quite easy to get when using or developing low level serialization library - as soon as you by mistake or intentionally serialize context objects with some substitution logic.
And considering that this hole needs to be closed it would be good opportunity to have another hook in execution engine which can be used like in my remote scenario. So back to my proposal in first mail.

If you want to solve this, then build a transformation for the block method when you remote a block.  As others have suggested (Levente) you can transform the bytecodes into normal sends (my blog post on the entire scheme starts with implementing it using at: and at:put: before the special bytecodes are added).  But making a change to all blocks breaks much of the Sista adaptive optimizer.  We have to have the freedom to access indirect temp vectors via special case bytecodes if we are to be able to aggressively optimize code.  If indirect temp vectors are to be treated as general purpose objects, then we are prevented from making many significant optimizations.

Ok. I expected such answers :) but ask for the chance that some cheap trick is possible. Like my readOnly example. It shows that there is at least writebarrier check during this operation. If it would signal an error it could be used to do the job. 
Method transformation would be quite complex to use because It needs to be applied dynamically to live context, and it requires stack modifications on the fly. Just compiling method in advance is not appropriate for my goal. I don't want to change compiler globally or force user to do it for concrete method/class. It would be not transparent solution. 

Well maybe.  But transforming a block and its activations is straightforward:

- it is easy to construct a transformation from tempVector bytecode blocks (TVBB) to tempVector message blocks (TVMB) because there are no suspension points in the bytecodes and the stack heights at the start and end of the bytecodes are the same as for the message versions.  So some form of
   store indirect temp bytecode =>
   dup (now value exists twice)
   dup (now value exists thrice)
   push indirect temp
   pop store into value location that was duped
   push index
   pop store into 2nd value
   send at:put:
will reimplement.  And then it’s just a matter of remapping PCs from one to the other and lengthening jumps.  A day’s work or two at most

 If the transformation is done in the marshaller that remotes objects then it will be easy to substitute the transformed method and map any PCs in contexts and closures (the JIT does this kind of mapping routinely).

The only problem transforming in the other direction (if you ever need to) is in advancing computation past the message send sequence for the TVMB access.  That can be coding with Context’s single step facility along with the pc map from which you can detect where the end of the sequence is.

Anyway thanks all for answers.


So, as the doctor said, "don't do that".
 
 

Le jeu. 28 mars 2019 à 20:05, Denis Kudriashov <[hidden email]> a écrit :
 
Hi.

I found interesting case where tempVectors can be used in remote scenarios. The store into remote temp can be really remote (not just about outer context). 
I played with following example:

| temp | 
temp := 10.
remote evaluate: [temp := temp + 1].
temp.

For the moment forget about remote thing and look into it as a normal local case:
temp var here is managed indirectly through tempVector. You can see it using expression after first assignment:

thisContext at: 1 "=>#(10)"

So the value in fact is stored in the array instance and read from it. 
But because of optimization it happens out of the array control. No #at: and #at:put: messages are sent during this code. VM magically changes the state of this array (there are special bytecodes for this).

Now my remote use case. Imagine that vm actually sends #at: and #at:put: messages to tempVector. Then remoting engine can transfer temp vector (as part of context) as a proxy. So on remote side the block [temp := temp + 1] will actually ask the sender (client) for the value and for the storage. So all block semantics will be supported. Temp in remote outer context will be modified. I think it would be super cool if such transparency would be possible.

I played with this example using Seamless in Pharo. It already works in the way I described but due to VM optimization it does not provide expected behavior. And worse than that it actually corrupts transferred proxy because in place of array the proxy instance is materialized. 

This leads us to the issue with safety of tempVector operations. Following example shows how we can affect the state of tempVector using reflection:

| temp | 
temp := 10.
(thisContext at: 1) at: 1 put: 50.
[temp := temp + 1] value.
temp. "==>51"

It is cool that we can do it. But there is no any safety check in the VM level over tempVector object: 

| temp | 
temp := 10.
thisContext at: 1 put: Object new.
[temp := temp + 1] value.
temp.

It breaks with DNU: #+ is sent to nil. Temp became nil.

| temp | 
temp := 10.
thisContext at: 1 put: #() copy.
[temp := temp + 1] value.
temp.

Sometimes it breaks with same error. Sometimes it returns random number. 
I guess in these cases VM breaks memory boundary of tempVector.

And two exotic cases: 

| temp | 
temp := 10.
(thisContext at: 1) beReadOnlyObject.
[temp := temp + 1] value.
temp.

It silently return 11. It does not break read only protection. But no error is signalled.

| temp | 
temp := 10.
(thisContext at: 1) become: #() copy.
[temp := temp + 1] value.
temp.

It returns #().  (In Pharo  #() + 1 = #()  ).
I use become to check how forwarding is working in that case. (it works fine when array has correct size)

How we can improve this behavior? How it would effect performance?
My proposal is to send real messages to tempVector when it is not an array instance. Then image will decide what to do.

Best regards,
Denis
_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_ (phone)


--