image persistence

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

image persistence

David Paola
Hi lively kernel folks,

I've spent the past month or so digging around in several language VMs -- CPython, Rubinius, Topaz, Pypy, etc in an attempt to add the equivalent of the original Smalltalk "snapshot" VM primitive. Obviously I have been naive.

I've learned a lot, above all else that I'm not giving up. I have a decent, academic understanding of compilers, interpreters, VMs (and a foggy understanding of JITs), and was curious if anyone could clarify how the lively kernel serializes the world into JSON. Was this hairy? What were the hardest parts?

I realize everyone has a full time job and can't hand-hold a newbie, so any direction at all would be appreciated. I tried to pick apart the Squeak source code but without a background in the Squeak architecture, it was fruitless.

Thanks so much for your energy on lively kernel, I'm looking forward to hearing more and possibly contributing in the future.

-dave

More info:

I realize that the "high level" idea of snapshotting a running VM basically involves serializing the object memory, bytecode, and instruction pointer, and then deserializing that on "resume". Most of the issues I'm encountering lead me to believe I have an incomplete understanding.

--
Dave Paola

_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel
Reply | Threaded
Open this post in threaded view
|

Re: image persistence

Robert Krahn-4
Hi, Dave --

Thanks for the question, this is actually a really fascinating topic :)

First, we wrote up some general information about it (that let's you
interactively try out things ;) here:
and here:

I guess the "hairy" part was/is how to deal with "native" objects. JS browser
environments introduce functions and state that are not implemented /
represented in the JS context but hidden. The DOM and DOM nodes are an example
for that -- you cannot get or modify all the state that would be necessary to
capture or restablish a document / world.

The solution that we came up with and that works very well is to implement a
general JS serializer that walks an object graph starting from root objects.
When certain objects are encountered - e.g. DOM nodes - we make an exception
(this is what the serialization plugins that are mentioned in the worlds above
are for) and store not their full object representation but just "what we need
to know".

The creation of objects from a serialization works accordingly
create/instantiate objects + run custom init code for the "exceptions".

The shortcomings of this approach are the following:
- On the application development level you still need to be a bit careful what
  objects you reference. Direct pointers to DOM nodes for example won't break
  the serialization but when you deserialize you need custom init logic to
  make things work as expected again.
- The stored representations become big (x-xxx MBs) really quickly.
  Implementing optimizations using the plugin approach is possible but
  requires additional work.

This deals with the "state" of a JS application / Lively world. Another point
that you mention is to capture running computations. From a certain level of
abstraction this is actually the same thing but since JS has incomplete
metprogramming capabilities (you are not able to reflect on closures, e.g.)
the "hidden state" problem comes up again. For Lively practically this has
little impact since in the "reactive" browser environment Lively don't have to
implement a "main" function. Anyway, we dealt with the problem and came up
with a solution. I will describe that in an upcoming post.

Please let me know if you have questions or want a more technical answer.

Best,
Robert



On Mon, Mar 11, 2013 at 5:51 PM, David Paola <[hidden email]> wrote:
Hi lively kernel folks,

I've spent the past month or so digging around in several language VMs -- CPython, Rubinius, Topaz, Pypy, etc in an attempt to add the equivalent of the original Smalltalk "snapshot" VM primitive. Obviously I have been naive.

I've learned a lot, above all else that I'm not giving up. I have a decent, academic understanding of compilers, interpreters, VMs (and a foggy understanding of JITs), and was curious if anyone could clarify how the lively kernel serializes the world into JSON. Was this hairy? What were the hardest parts?

I realize everyone has a full time job and can't hand-hold a newbie, so any direction at all would be appreciated. I tried to pick apart the Squeak source code but without a background in the Squeak architecture, it was fruitless.

Thanks so much for your energy on lively kernel, I'm looking forward to hearing more and possibly contributing in the future.

-dave

More info:

I realize that the "high level" idea of snapshotting a running VM basically involves serializing the object memory, bytecode, and instruction pointer, and then deserializing that on "resume". Most of the issues I'm encountering lead me to believe I have an incomplete understanding.

--
Dave Paola

_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel


_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel
Reply | Threaded
Open this post in threaded view
|

Re: image persistence

Robert Krahn-4
Btw. the most influencing work for Lively's persistency mechanism comes from Self, see the excellent paper Annotating Objects for Transport to Other Worlds,


On Tue, Mar 12, 2013 at 5:34 PM, Robert Krahn <[hidden email]> wrote:
Hi, Dave --

Thanks for the question, this is actually a really fascinating topic :)

First, we wrote up some general information about it (that let's you
interactively try out things ;) here:
and here:

I guess the "hairy" part was/is how to deal with "native" objects. JS browser
environments introduce functions and state that are not implemented /
represented in the JS context but hidden. The DOM and DOM nodes are an example
for that -- you cannot get or modify all the state that would be necessary to
capture or restablish a document / world.

The solution that we came up with and that works very well is to implement a
general JS serializer that walks an object graph starting from root objects.
When certain objects are encountered - e.g. DOM nodes - we make an exception
(this is what the serialization plugins that are mentioned in the worlds above
are for) and store not their full object representation but just "what we need
to know".

The creation of objects from a serialization works accordingly
create/instantiate objects + run custom init code for the "exceptions".

The shortcomings of this approach are the following:
- On the application development level you still need to be a bit careful what
  objects you reference. Direct pointers to DOM nodes for example won't break
  the serialization but when you deserialize you need custom init logic to
  make things work as expected again.
- The stored representations become big (x-xxx MBs) really quickly.
  Implementing optimizations using the plugin approach is possible but
  requires additional work.

This deals with the "state" of a JS application / Lively world. Another point
that you mention is to capture running computations. From a certain level of
abstraction this is actually the same thing but since JS has incomplete
metprogramming capabilities (you are not able to reflect on closures, e.g.)
the "hidden state" problem comes up again. For Lively practically this has
little impact since in the "reactive" browser environment Lively don't have to
implement a "main" function. Anyway, we dealt with the problem and came up
with a solution. I will describe that in an upcoming post.

Please let me know if you have questions or want a more technical answer.

Best,
Robert



On Mon, Mar 11, 2013 at 5:51 PM, David Paola <[hidden email]> wrote:
Hi lively kernel folks,

I've spent the past month or so digging around in several language VMs -- CPython, Rubinius, Topaz, Pypy, etc in an attempt to add the equivalent of the original Smalltalk "snapshot" VM primitive. Obviously I have been naive.

I've learned a lot, above all else that I'm not giving up. I have a decent, academic understanding of compilers, interpreters, VMs (and a foggy understanding of JITs), and was curious if anyone could clarify how the lively kernel serializes the world into JSON. Was this hairy? What were the hardest parts?

I realize everyone has a full time job and can't hand-hold a newbie, so any direction at all would be appreciated. I tried to pick apart the Squeak source code but without a background in the Squeak architecture, it was fruitless.

Thanks so much for your energy on lively kernel, I'm looking forward to hearing more and possibly contributing in the future.

-dave

More info:

I realize that the "high level" idea of snapshotting a running VM basically involves serializing the object memory, bytecode, and instruction pointer, and then deserializing that on "resume". Most of the issues I'm encountering lead me to believe I have an incomplete understanding.

--
Dave Paola

_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel



_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel
Reply | Threaded
Open this post in threaded view
|

Re: image persistence

David Paola
Thanks both of you for your answers :-) that self paper is *exactly* what I am looking for.

Happy hacking!

--
Dave Paola

On Mar 12, 2013, at 6:47 PM, Robert Krahn <[hidden email]> wrote:

Btw. the most influencing work for Lively's persistency mechanism comes from Self, see the excellent paper Annotating Objects for Transport to Other Worlds,


On Tue, Mar 12, 2013 at 5:34 PM, Robert Krahn <[hidden email]> wrote:
Hi, Dave --

Thanks for the question, this is actually a really fascinating topic :)

First, we wrote up some general information about it (that let's you
interactively try out things ;) here:
and here:

I guess the "hairy" part was/is how to deal with "native" objects. JS browser
environments introduce functions and state that are not implemented /
represented in the JS context but hidden. The DOM and DOM nodes are an example
for that -- you cannot get or modify all the state that would be necessary to
capture or restablish a document / world.

The solution that we came up with and that works very well is to implement a
general JS serializer that walks an object graph starting from root objects.
When certain objects are encountered - e.g. DOM nodes - we make an exception
(this is what the serialization plugins that are mentioned in the worlds above
are for) and store not their full object representation but just "what we need
to know".

The creation of objects from a serialization works accordingly
create/instantiate objects + run custom init code for the "exceptions".

The shortcomings of this approach are the following:
- On the application development level you still need to be a bit careful what
  objects you reference. Direct pointers to DOM nodes for example won't break
  the serialization but when you deserialize you need custom init logic to
  make things work as expected again.
- The stored representations become big (x-xxx MBs) really quickly.
  Implementing optimizations using the plugin approach is possible but
  requires additional work.

This deals with the "state" of a JS application / Lively world. Another point
that you mention is to capture running computations. From a certain level of
abstraction this is actually the same thing but since JS has incomplete
metprogramming capabilities (you are not able to reflect on closures, e.g.)
the "hidden state" problem comes up again. For Lively practically this has
little impact since in the "reactive" browser environment Lively don't have to
implement a "main" function. Anyway, we dealt with the problem and came up
with a solution. I will describe that in an upcoming post.

Please let me know if you have questions or want a more technical answer.

Best,
Robert



On Mon, Mar 11, 2013 at 5:51 PM, David Paola <[hidden email]> wrote:
Hi lively kernel folks,

I've spent the past month or so digging around in several language VMs -- CPython, Rubinius, Topaz, Pypy, etc in an attempt to add the equivalent of the original Smalltalk "snapshot" VM primitive. Obviously I have been naive.

I've learned a lot, above all else that I'm not giving up. I have a decent, academic understanding of compilers, interpreters, VMs (and a foggy understanding of JITs), and was curious if anyone could clarify how the lively kernel serializes the world into JSON. Was this hairy? What were the hardest parts?

I realize everyone has a full time job and can't hand-hold a newbie, so any direction at all would be appreciated. I tried to pick apart the Squeak source code but without a background in the Squeak architecture, it was fruitless.

Thanks so much for your energy on lively kernel, I'm looking forward to hearing more and possibly contributing in the future.

-dave

More info:

I realize that the "high level" idea of snapshotting a running VM basically involves serializing the object memory, bytecode, and instruction pointer, and then deserializing that on "resume". Most of the issues I'm encountering lead me to believe I have an incomplete understanding.

--
Dave Paola

_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel




_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel
Reply | Threaded
Open this post in threaded view
|

Re: image persistence

Casey Ransberger-2
Actually that paper answered a lot of questions I had about stuff that I want to be able to do in Squeak eventually. I thought I'd read all of the Self papers, but it looks like I missed this one. Thanks for the question, Dave, and thanks for the answer, Robert!

On Wed, Mar 13, 2013 at 3:45 PM, David Paola <[hidden email]> wrote:
Thanks both of you for your answers :-) that self paper is *exactly* what I am looking for.

Happy hacking!

--
Dave Paola

On Mar 12, 2013, at 6:47 PM, Robert Krahn <[hidden email]> wrote:

Btw. the most influencing work for Lively's persistency mechanism comes from Self, see the excellent paper Annotating Objects for Transport to Other Worlds,


On Tue, Mar 12, 2013 at 5:34 PM, Robert Krahn <[hidden email]> wrote:
Hi, Dave --

Thanks for the question, this is actually a really fascinating topic :)

First, we wrote up some general information about it (that let's you
interactively try out things ;) here:
and here:

I guess the "hairy" part was/is how to deal with "native" objects. JS browser
environments introduce functions and state that are not implemented /
represented in the JS context but hidden. The DOM and DOM nodes are an example
for that -- you cannot get or modify all the state that would be necessary to
capture or restablish a document / world.

The solution that we came up with and that works very well is to implement a
general JS serializer that walks an object graph starting from root objects.
When certain objects are encountered - e.g. DOM nodes - we make an exception
(this is what the serialization plugins that are mentioned in the worlds above
are for) and store not their full object representation but just "what we need
to know".

The creation of objects from a serialization works accordingly
create/instantiate objects + run custom init code for the "exceptions".

The shortcomings of this approach are the following:
- On the application development level you still need to be a bit careful what
  objects you reference. Direct pointers to DOM nodes for example won't break
  the serialization but when you deserialize you need custom init logic to
  make things work as expected again.
- The stored representations become big (x-xxx MBs) really quickly.
  Implementing optimizations using the plugin approach is possible but
  requires additional work.

This deals with the "state" of a JS application / Lively world. Another point
that you mention is to capture running computations. From a certain level of
abstraction this is actually the same thing but since JS has incomplete
metprogramming capabilities (you are not able to reflect on closures, e.g.)
the "hidden state" problem comes up again. For Lively practically this has
little impact since in the "reactive" browser environment Lively don't have to
implement a "main" function. Anyway, we dealt with the problem and came up
with a solution. I will describe that in an upcoming post.

Please let me know if you have questions or want a more technical answer.

Best,
Robert



On Mon, Mar 11, 2013 at 5:51 PM, David Paola <[hidden email]> wrote:
Hi lively kernel folks,

I've spent the past month or so digging around in several language VMs -- CPython, Rubinius, Topaz, Pypy, etc in an attempt to add the equivalent of the original Smalltalk "snapshot" VM primitive. Obviously I have been naive.

I've learned a lot, above all else that I'm not giving up. I have a decent, academic understanding of compilers, interpreters, VMs (and a foggy understanding of JITs), and was curious if anyone could clarify how the lively kernel serializes the world into JSON. Was this hairy? What were the hardest parts?

I realize everyone has a full time job and can't hand-hold a newbie, so any direction at all would be appreciated. I tried to pick apart the Squeak source code but without a background in the Squeak architecture, it was fruitless.

Thanks so much for your energy on lively kernel, I'm looking forward to hearing more and possibly contributing in the future.

-dave

More info:

I realize that the "high level" idea of snapshotting a running VM basically involves serializing the object memory, bytecode, and instruction pointer, and then deserializing that on "resume". Most of the issues I'm encountering lead me to believe I have an incomplete understanding.

--
Dave Paola

_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel




_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel



_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel
Reply | Threaded
Open this post in threaded view
|

Re: image persistence

David Paola
Indeed -- what other self papers would you recommend?

--
Dave Paola

On Mar 13, 2013, at 6:52 PM, Casey Ransberger <[hidden email]> wrote:

Actually that paper answered a lot of questions I had about stuff that I want to be able to do in Squeak eventually. I thought I'd read all of the Self papers, but it looks like I missed this one. Thanks for the question, Dave, and thanks for the answer, Robert!

On Wed, Mar 13, 2013 at 3:45 PM, David Paola <[hidden email]> wrote:
Thanks both of you for your answers :-) that self paper is *exactly* what I am looking for.

Happy hacking!

--
Dave Paola

On Mar 12, 2013, at 6:47 PM, Robert Krahn <[hidden email]> wrote:

Btw. the most influencing work for Lively's persistency mechanism comes from Self, see the excellent paper Annotating Objects for Transport to Other Worlds,


On Tue, Mar 12, 2013 at 5:34 PM, Robert Krahn <[hidden email]> wrote:
Hi, Dave --

Thanks for the question, this is actually a really fascinating topic :)

First, we wrote up some general information about it (that let's you
interactively try out things ;) here:
and here:

I guess the "hairy" part was/is how to deal with "native" objects. JS browser
environments introduce functions and state that are not implemented /
represented in the JS context but hidden. The DOM and DOM nodes are an example
for that -- you cannot get or modify all the state that would be necessary to
capture or restablish a document / world.

The solution that we came up with and that works very well is to implement a
general JS serializer that walks an object graph starting from root objects.
When certain objects are encountered - e.g. DOM nodes - we make an exception
(this is what the serialization plugins that are mentioned in the worlds above
are for) and store not their full object representation but just "what we need
to know".

The creation of objects from a serialization works accordingly
create/instantiate objects + run custom init code for the "exceptions".

The shortcomings of this approach are the following:
- On the application development level you still need to be a bit careful what
  objects you reference. Direct pointers to DOM nodes for example won't break
  the serialization but when you deserialize you need custom init logic to
  make things work as expected again.
- The stored representations become big (x-xxx MBs) really quickly.
  Implementing optimizations using the plugin approach is possible but
  requires additional work.

This deals with the "state" of a JS application / Lively world. Another point
that you mention is to capture running computations. From a certain level of
abstraction this is actually the same thing but since JS has incomplete
metprogramming capabilities (you are not able to reflect on closures, e.g.)
the "hidden state" problem comes up again. For Lively practically this has
little impact since in the "reactive" browser environment Lively don't have to
implement a "main" function. Anyway, we dealt with the problem and came up
with a solution. I will describe that in an upcoming post.

Please let me know if you have questions or want a more technical answer.

Best,
Robert



On Mon, Mar 11, 2013 at 5:51 PM, David Paola <[hidden email]> wrote:
Hi lively kernel folks,

I've spent the past month or so digging around in several language VMs -- CPython, Rubinius, Topaz, Pypy, etc in an attempt to add the equivalent of the original Smalltalk "snapshot" VM primitive. Obviously I have been naive.

I've learned a lot, above all else that I'm not giving up. I have a decent, academic understanding of compilers, interpreters, VMs (and a foggy understanding of JITs), and was curious if anyone could clarify how the lively kernel serializes the world into JSON. Was this hairy? What were the hardest parts?

I realize everyone has a full time job and can't hand-hold a newbie, so any direction at all would be appreciated. I tried to pick apart the Squeak source code but without a background in the Squeak architecture, it was fruitless.

Thanks so much for your energy on lively kernel, I'm looking forward to hearing more and possibly contributing in the future.

-dave

More info:

I realize that the "high level" idea of snapshotting a running VM basically involves serializing the object memory, bytecode, and instruction pointer, and then deserializing that on "resume". Most of the issues I'm encountering lead me to believe I have an incomplete understanding.

--
Dave Paola

_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel




_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel


_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel


_______________________________________________
lively-kernel mailing list
[hidden email]
http://lists.hpi.uni-potsdam.de/listinfo/lively-kernel