Hi folks. I am really happy to announce that ESUG is sponsoring me for Fuel development through the ESUG SummerTalk. I am Martin Dias, a student at Buenos Aires, Argentina. The idea behind this SummerTalk is to implement Fuel, a binary, fast and general-purpose object graph serializer in Pharo. It is based on VisualWorks' Parcels ideas.
Actually, the project has already started since several months. Tristan Bourgois and I started with the project while doing an internship with RMoD, INRIA. Since a couple of months, Mariano Martinez Peck joined the team, and now he is the official mentor in the SummerTalk. ESUG website for SummertTalk: http://www.esug.org/wiki/pier/Promotion/SummerTalk/SummerTalk2011 The website with all the necessary information is here: http://rmod.lille.inria.fr/web/pier/software/Fuel It even includes slides explaining the algorithm. In addition, a paper is in progress. For the moment, Fuel already provides the following features: - Fast pickle format. It is much faster to materialize than to serialize. - Correctly support class reshape (when the class of serialized objects has changed). - Serialize ANY kind of object. For the moment there is no object to our knowledge that we cannot serialize and materialize. - Be able to completely serialize classes and traits (not just a global name). - Support cycles and avoid duplicates in the graph. - Integration to Moose with an extension to export and import their models. - Detection of globals: for example if you serialize Transcript, it is not duplicated and instead managed as a global reference. - Solve common problems like Set rehash. - Buffered writing: we use a buffered write stream for the serialization part (thanks Sven!). - No need of special support from the VM. - Try to have a good object oriented design. - Well tested (about 120 tests, for the moment). - Large set of benchmarks (even benchmarks for Moose extension). And of course, there are a lot features for the future. You can see some of them in the website and some in the issue tracker: http://code.google.com/p/fuel/issues/list We really appreciate all kind of feedback and comments. If you want to try it, check in the website how to do it. It is extremely easy. Once again, I want to thank a lot to ESUG for sponsoring the project. I plan to create a "news" section in the website with some RSS. I will keep you informed. Best regards, Martin _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Interesting project. There are so many uses for object serializers.
Personally I am looking at storing objects in NoSQL databases, and serializing them is an option that will work in a lot of cases. Using a serializer for copying objects is also useful. Does Fuel support “cutting off” instance variables, like SIXX’s Object>>sixxIgnorableInstVarNames ? I always thought SIXX (http://www.mars.dti.ne.jp/~umejava/smalltalk/sixx/index.html ) was state-of-the-art in this field. It seems to have many features, and it is supported on many dialects. Are your ideas so great that it really makes sense to create yet another serializer? Wouldn’t improving SIXX be better? Runar _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, May 25, 2011 at 9:55 AM, Runar Jordahl <[hidden email]> wrote: -- Interesting project. There are so many uses for object serializers. Yes! We are trying to enumerate them for a paper, so if you can help ;) Personally I am looking at storing objects in NoSQL databases, and What would you need as an output of the serializer? a ByteArray ? Using Yep, I think it was Colin who told us he was using his serializer as a deepCopy :)
Not so easily for the moment, but definitely something we will do in the future. In fact, we have already an issue for that: http://code.google.com/p/fuel/issues/detail?id=6 We call them "transient" instance variables. And we want to to extend them to classes also (so their instances should be ignored as well)
mmmmmmmm It seems to have many features, and it It is not mostly our ideas, we based our work on VisualWorks Parcels. Of course, we changed several things but the idea of the pickling format is from there. Wouldn’t improving SIXX Because we DON'T want to use text based serializers. It is impossible to get real speed with a text based serializer. We want to be really fast, that's our main goal. Try to use SIXX to export a Moose model for example. I recommend you to take a chair. So....purposes and goals are different. Our mail goal is speed, and mostly, materializing speed. That's why we are using a particular pickle format. Cheers Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
I see three main uses for serializers:
- Storing objects in databases and files. - Deep copy without manually having to implement copy methods. - Comparing objects. The last use is complex: Given two (possibly complex) object graphs, you could serialize both to a (separate) byte array, and then compare the two byte arrays to implement #=. In theory this could be a lot easier than manually implementing the comparison. However, depending on the serializer and the contents being serialized, the byte arrays could represent two “equal” objects, but still have different byte contents: - “Equal” objects, represented by unequal objects: One graph has Integer 0, the other one has Float 0.0 - Objects stored in different order: A Smalltalk Set being equal in the two graphs can have the element stored using different ordering. - Object IDs which should not be part of the comparison, would make the two byte arrays different. I am sure there are cases where a serializer could be used to implement #=, but I have so far not seen it used. I am looking at using Riak ( http://wiki.basho.com/ ) with Pharo. When storing a “business object” you have two choices: - Store a binary BLOB representing your object graph. - Store the object graph as JSON data. The last option essentially means you must do something similar to OR mapping, so I would prefer the first option. With Riak, you will soon be able to store additional indexed properties, which you later will be able to query. So you store your whole business object as a BLOB, together with those properties you need to query. I never tested SIXX with anything large. Saving space and time is of course a valid reason to make a new serializer. If you use SIXX you might need to compress the XML before storing it, adding even more overhead. I really look forward to improvements in Fuel. Good luck! Runar _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, May 25, 2011 at 10:51 AM, Runar Jordahl <[hidden email]> wrote: I see three main uses for serializers: We took note, thanks :) The last use is complex: Given two (possibly complex) object graphs, yep
Ok, I understand. So, we can provide something like this API: byteArray := FLSerializer serializeInMemory: myObject And myRecoveredObject := FLMaterializer materializeFromByteArray: aByteArray would something like that help ?
Sorry, I didn't want to look rude with my answer. But the two times we sent something about Fuel to the mailing list, the answers we received were "stop reiveinting the wheele". And this is completly the opposite. Why I am doing Fuel ? for fun? no. For fun I have other projects. I do it because I NEED it for my PhD. I would like not to spend time of Fuel and directly concentrate on my topic. But I need a fast serializer that could serialize ANY kind of object in the image: MethodContext, Continuations, BlockClosure, CompiledMethos (with all kind of trailers and types), MethodDictionaries, full Classes and Traits, Set and Dictionaries without problems. Don't have problems with global references, support class re-shape, etc. In addition, I need to be able to understand the serializer, extend it and make sure it works. Fuel is really OOP, I can understand it, extend it for my own use, and very well tested/benchmarked. If you use SIXX you THanks!
-- Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by tinchodias
Sounds awesome!
The streaming feature is appealing. I might take a look at it. Won't hurt as a redundant repo backup tactic On May 24, 2011, at 5:39 PM, Martin Dias wrote: Hi folks. I am really happy to announce that ESUG is sponsoring me for Fuel development through the ESUG SummerTalk. I am Martin Dias, a student at Buenos Aires, Argentina. The idea behind this SummerTalk is to implement Fuel, a binary, fast and general-purpose object graph serializer in Pharo. It is based on VisualWorks' Parcels ideas. _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, May 25, 2011 at 2:18 PM, Sebastian Sastre <[hidden email]> wrote:
Yes. At the beginning we were using MultiByteFileStream directly. Then we use a ByteArray new writeStream as a full buffer of the full graphs. We used a lot of memory but speed was very good. But it was also not good because the array was growing all the time. So, Sven suggested a ZnBufferedWriteStream he was using for Zinc. That way, we have a buffered write stream were we can set a buffer, say 5000 elements, and the speed is almost the same as using a full baffer, because ok, we go a little more to disk but on the other hand the collection buffer doesn't need to grow. So....in conclusion, when using a buffered stream writer instead of MultiByteFileStream we increased speed about 12x and with a little buffer of 5000 elements.
:)
-- Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by tinchodias
On 24/05/11 4:39 PM, Martin Dias wrote:
> > We really appreciate all kind of feedback and comments. If you want to > try it, check in the website how to do it. It is extremely easy. I had a brief look and will look some more. I may try to use it to serialize a Pier kernel. In another use case, I'd like to serialize from one image, and deserialize in another image - under end user control. The issue here is that "nasty" code could be introduced: e.g. capture the Fuel output, deserialize, add nasty code, re-serialize, then send onward for import to image. Would it be possible to have some sort of "virus" filter? Maybe something like the Star Trek transporter that can filter out nasty stuff before re-materializing. :) For a start, maybe an inclusion list and/or an exclusion list of classes and globals would be useful. _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, May 25, 2011 at 5:14 PM, Yanni Chiu <[hidden email]> wrote:
heheheheh. We would LOVE that. In fact, I told martin few months ago to do EXACTLY that. If you could give it a try or need help, please let us know. In another use case, I'd like to serialize from one image, and deserialize in another image - under end user control. The issue here is that "nasty" code could be introduced: e.g. capture the Fuel output, deserialize, add nasty code, re-serialize, then send onward for import to image. Would it be possible to have some sort of "virus" filter? Maybe something like the Star Trek transporter that can filter out nasty stuff before re-materializing. :) For a start, maybe an inclusion list and/or an exclusion list of classes and globals would be useful. I guess this should be easy to do. For the moment: - globals objects are hardcoded in #globalNames - globals behaviors (classes and traits) are managed (by default) in kind of "light" serialization. Where we only serialize the global name which means that the class has to be present in Smalltalk globals in the image that you want to materialize. You can change the default behavior and be able to completely serialize a class/trait. But this is much more complicated and it is still work on process (ClassBuilder is not your best friend). Cheers -- Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On Wed, May 25, 2011 at 5:31 PM, Yanni Chiu <[hidden email]> wrote:
Sorry Yanni, I didn't follow. Could you please explain a bit more? what do you want to serialize? do you want to be able to choose some classes as light and some as non-light? where do you want to materialize ? in the same image or in another one ? When you said discard....what would you do with the instances of those non-light classes for example? you don't materialize them? and what happens to the objects that were pointing to them ? why would be the scenario useful for ? security ? Thanks -- Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
On 25/05/11 11:53 AM, Mariano Martinez Peck wrote:
> > Sorry Yanni, I didn't follow. Could you please explain a bit more? what > do you want to serialize? do you want to be able to choose some classes > as light and some as non-light? where do you want to materialize ? in > the same image or in another one ? When you said discard....what would > you do with the instances of those non-light classes for example? you > don't materialize them? and what happens to the objects that were > pointing to them ? why would be the scenario useful for ? security ? ==== Yes, security. Here's my first post again, with different formatting: In another use case, I'd like to serialize from one image, and deserialize in another image - *under end user control*. [e.g. web app] The issue here is that "nasty" code could be introduced: - capture the Fuel output - deserialize, add nasty code, re-serialize - then send onward for import to image. Would it be possible to have some sort of "virus" filter? ==== So a simple "safe-mode" option on de-serialization would probably be sufficient. _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Hi Yanni,
On Wed, May 25, 2011 at 1:10 PM, Yanni Chiu <[hidden email]> wrote:
It is a good point. For the moment, when you deserialize a full class into the image, their methods are created, and bytecodes are copied from the stream without any validation check. Anyway, you could deserialize the class, run your own validations, and then install the class (I mean, add the class to Smalltalk globals, do the class initialization, run announcements).
_______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Mariano Martinez Peck
New Fuel version 1.4 supports this. Read the blog post: http://rmod.lille.inria.fr/web/pier/software/Fuel/FuelNews2/2011-06-01 Cheers -- Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Excellent!
Runar >> myRecoveredObject := FLMaterializer materializeFromByteArray: aByteArray >> >> would something like that help ? >> > > New Fuel version 1.4 supports this. _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
In reply to this post by Yanni Chiu
hehehehe I culdn't resist it. I was reading: http://www.parashift.com/c++-faq-lite/serialization.html and I read: "Like the Transporter on Star Trek, it's all about taking something complicated and turning it into a flat sequence of 1s and 0s, then taking that sequence of 1s and 0s (possibly at another place, possibly at another time) and reconstructing the original complicated "something." " |
grr sorry. That was a response to a quote
"In another use case, I'd like to serialize from one image, and deserialize in another image - under end user control. The issue here is that "nasty" code could be introduced: e.g. capture the Fuel output, deserialize, add nasty code, re-serialize, then send onward for import to image. Would it be possible to have some sort of "virus" filter? Maybe something like the Star Trek transporter that can filter out nasty stuff before re-materializing. :) For a start, maybe an inclusion list and/or an exclusion list of classes and globals would be useful. " On Tue, Feb 28, 2012 at 10:45 PM, Mariano Martinez Peck <[hidden email]> wrote:
-- Mariano http://marianopeck.wordpress.com _______________________________________________ seaside mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside |
Free forum by Nabble | Edit this page |