[ANN] StOMP - Yet another multi-dialect object serializer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] StOMP - Yet another multi-dialect object serializer

Masashi UMEZAWA-2
Hello all,

I have recently developed a new serialization library called
StOMP(Smalltalk Objects on MessagePack).
http://stomp.smalltalk-users.jp/

StOMP is a binary serializer for major Smalltalk dialects. For those
who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
represents object data as XML, StOMP uses MessagePack. By combining
the flexibility of SIXX with the compactness of MessagePack, StOMP
aims to be a unique, next-generation portable serializer for
Smalltalk.

Features:
- Implementation is compact and portable
- Shared/circular references support
- "Class shape changes" support
- Data is interchangable between Smalltalk dialects
- Good performance for small sized object graph

StOMP is now available for Squeak, Pharo, and VisualWorks.

There is ConfigurationOfStOMP, so the installation is easy.

Installer squeaksource
    project: 'MetacelloRepository';
    install: 'ConfigurationOfStOMP'.
(Smalltalk at: #ConfigurationOfStOMP) perform: #load.

Enjoy!
--
[:masashi | ^umezawa]

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] StOMP - Yet another multi-dialect object serializer

Janko Mivšek
Hi Masashi,

Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
that it is portable and already ported to VW. Which are other
advantages? Disadvantages?

Also question for Fuel developers, do you plan to port it to other
Smalltalks too? Portability is namelly something which is very high on
checking list for a serializer to use in portable projects, like most of
web ones are.

Best regards
Janko

S, Masashi UMEZAWA piše:

> Hello all,
>
> I have recently developed a new serialization library called
> StOMP(Smalltalk Objects on MessagePack).
> http://stomp.smalltalk-users.jp/
>
> StOMP is a binary serializer for major Smalltalk dialects. For those
> who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> represents object data as XML, StOMP uses MessagePack. By combining
> the flexibility of SIXX with the compactness of MessagePack, StOMP
> aims to be a unique, next-generation portable serializer for
> Smalltalk.
>
> Features:
> - Implementation is compact and portable
> - Shared/circular references support
> - "Class shape changes" support
> - Data is interchangable between Smalltalk dialects
> - Good performance for small sized object graph
>
> StOMP is now available for Squeak, Pharo, and VisualWorks.
>
> There is ConfigurationOfStOMP, so the installation is easy.
>
> Gofer new
>   squeaksource: 'MetacelloRepository';
>   package: 'ConfigurationOfStOMP';
>   load.
> (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
>
> Enjoy!

--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] StOMP - Yet another multi-dialect object serializer

Mariano Martinez Peck


2011/6/20 Janko Mivšek <[hidden email]>
Hi Masashi,

Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
that it is portable and already ported to VW. Which are other
advantages? Disadvantages?

Also question for Fuel developers, do you plan to port it to other
Smalltalks too? Portability is namelly something which is very high on
checking list for a serializer to use in portable projects, like most of
web ones are.

Hi Janko. I think "portability" is to wide to just talk without details. For me, portability in this case means two things: a) In a dialect XXX be able to materialize a  stream which was serialized in a dialect YYY: b)  that the code of the serializer can also work in another dialect (not necessary including a) ).

Fuel will not support a) for sure. At least, we will not do extra effort to support that. Regarding b), it is not Fuel first feature to be portable to other dialects. But let me explain it:
- We want to be able to serialize ANY kind of object, that includes BlockClosure, CompiledMethod, MethodContext, Class, Trait, etc.... Finding a abstract and portable representation for those objects across dialects is complicated.
- We want to be as fast as possible. That means that if we find a way to be faster which only works in Pharo, we don't care. We will go ahead with that.

That being said, I have to say that Fuel OO design, from my point of view, is quite nice, easy to understand, and not difficult to port. As an example, Eliot Miranda easily not even port Fuel to another dialect but to Newspeak. And even more, he needed special management for Newspeak data, and he was able to easily adapt Fuel for his needs. So....from in this case Fuel was portable (in the sense of b) and flexible.


Another difference is that we try to be a little faster in materialization than in serialization (which is not the case of StOMP). So in summary, the differences I can see are:

1) StOMP is focus in portability across dialects and also be able to materialize the same stream in different dialects. Fuel is not focus on portability even if it could be portable in the sense of the code.
2) StOMP is faster in serializing small/medium graphs. Fuel is faster in large graphs.
3) StOMP is faster in serializing while Fuel in materializing.
4) StOMP can serialize some objects (cannot right now BlockClosures or things like that), Fuel can (or should) be able to serialize all.

That's all I can see for the moment. But don't worry, there is no fight. We have been sending each other several mails this and the previous week and tried to shared knowledge between :)

Cheers


 

Best regards
Janko

S, Masashi UMEZAWA piše:
> Hello all,
>
> I have recently developed a new serialization library called
> StOMP(Smalltalk Objects on MessagePack).
> http://stomp.smalltalk-users.jp/
>
> StOMP is a binary serializer for major Smalltalk dialects. For those
> who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> represents object data as XML, StOMP uses MessagePack. By combining
> the flexibility of SIXX with the compactness of MessagePack, StOMP
> aims to be a unique, next-generation portable serializer for
> Smalltalk.
>
> Features:
> - Implementation is compact and portable
> - Shared/circular references support
> - "Class shape changes" support
> - Data is interchangable between Smalltalk dialects
> - Good performance for small sized object graph
>
> StOMP is now available for Squeak, Pharo, and VisualWorks.
>
> There is ConfigurationOfStOMP, so the installation is easy.
>
> Gofer new
>   squeaksource: 'MetacelloRepository';
>   package: 'ConfigurationOfStOMP';
>   load.
> (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
>
> Enjoy!

--
Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si




--
Mariano
http://marianopeck.wordpress.com



Reply | Threaded
Open this post in threaded view
|

Re: [ANN] StOMP - Yet another multi-dialect object serializer

Tony Garnock-Jones-3
In reply to this post by Masashi UMEZAWA-2
On 2011-06-19 10:16 PM, Masashi UMEZAWA wrote:
> I have recently developed a new serialization library called
> StOMP(Smalltalk Objects on MessagePack).

Hmm - could get confusing, given:

  - http://stomp.codehaus.org/
  - http://www.squeaksource.com/StompProtocol.html

So we might find ourselves using StOMP (encoding) over STOMP (transport)!

Regards,
   Tony

Reply | Threaded
Open this post in threaded view
|

RE: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

Paul Baumann
In reply to this post by Mariano Martinez Peck

If you are going to compare object serializing tools then State Replication Protocol (SRP) should be added to that list. SRP has not been promoted much but it is after many years still a good cross dialect and platform binary serialization tool. It was originally ported to about seven smalltalk dialects.  Every aspect of SRP is context-configurable. SRP encoding is unique, simple, fast, and unlimited. The user base for SRP is not well known, but I hear from several people that use it for production applications and I have personal experience with one deployment.

 

The default configuration for SRP is to use a portable mapping layer and to encode metastate into the data stream. Even with these costs, SRP is comparable in performance to serialization tools that do not do this. The (optional) portable mapping layer is used to represent common smalltalk objects in way that can be loaded into any smalltalk dialect. Metastates describe the structure of the object state so that data load is data driven rather than code dependent. SRP can actually load state for which a class is not defined or has significantly changed. Metastates can be stored in metastate tables that can be reused and referenced to reduce data size and improve performance. When you use metastate tables, SRP stores more compactly than any other binary serialization tool is capable of. Whoever compares performance of SRP with other binary serialization tools should keep in mind that they will have to disable SRP features like these to have a fair comparison.

 

SRP is maintained with a single code base that is designed to work for all smalltalk dialects. SRP does this by directing less-portable behavior through a "portal" that is configured to accommodate the dialect the code is being used with.

 

I find it funny when I see some binary encodings that are still code-bound. If the data does not somehow indicate the data encoding and layout in some standard way then you can render encode streams unreadable from something as simple as a class schema change. They do that to save the cost of a data type code. SRP would never make a mistake like that, and the cost that SRP experiences for this data type code is typically only one byte.

 

SRP encoding is fundamentally a sequence of unsigned integers of infinite size. This is the most compact representation possible. An object type header is commonly only one byte and yet is still flexible enough to be unlimited and extended any way imaginable. SRP encoding supported four byte character strings before they were invented and stores them as compactly as possible. SRP allows direct and data width encodings for things like floats and embedded data. Even direct encoding of some doesn't break the readability of the object graph. SRP also allows has features for object annotation like if you want to remember the oop of an object or dependents. The encoding is what is most special and portable about SRP. Financial markets now exchange data using encoding standards (Fast FIX) for some data types that had been pioneered by SRP, but none that I'm aware of are as consistent and pure as SRP.

 

SRP is a solid base of code that is intended to be tailored and configured to your needs. It is fast, but the main goal of SRP was portability. SRP is provides a good configuration out of the box that you can easily tune and configure to meet your needs. The most recent tuning SRP has received was for the GS/S dialect to use GS/S specific optimizations. That GS/S specific code can be found here:

 

http://techsupport.gemstone.com/entries/181657-srp-3-1-010-0

 

SRP can serialize objects like a ComplexBlock, but does not attempt to do so in a dialect-portable way. It is simply that I had not defined a portable representation of a complex block in the portability layer. A common way to do that would be to determine the source of the block (for all dialects) and compile that code on load. It gets tricky if you attempt to support more than simple blocks or if you want to translate bytecodes (which I'd also prototyped). If you really think you need to serialize blocks then SRP is flexible enough to let you define how you want it done.

 

Some Smalltalk dialects (like VA in particular) do not have an efficient two-way become. You'll find that most serialization tools expect there to be an efficient two-way become to substitute one object for another on load. SRP however has a unique way to fix-up references that is efficient for all dialects. SRP has a wide variety of object substitution hooks for both saving and loading that preserve graph relationship integrity without screwing up original objects. SRP also has support for proxy objects that can be managed by application code.

 

The main thing wrong with SRP is that it is not the framework that "you" created. SRP was the first binary serialization tool to focus on Smalltalk dialect portability. I'd argue that it is still the only one that truly accomplished that in a meaningful way. I created SRP by combining proven techniques from the best tools of the time and adding features for portability. SRP was superior to even the dialect-specific frameworks at the time. SRP is not something that I intend to maintain and promote. I released it open source some ten years ago in the hope that others would do that. A lot of effort and sacrifice was put into SRP "for the benefit of others". SRP taught me a painful lesson about human nature and the perception of value. Programmers (myself included) love to solve problems more than learn about existing solutions. Everyone wants to solve problems like this their own way and thinks they have a good reason that they must do it their way. "Yet another" was an excellent subject line.

 

Paul Baumann

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Mariano Martinez Peck
Sent: Monday, June 20, 2011 08:54
To: The general-purpose Squeak developers list
Cc: VWNC; [hidden email]
Subject: Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

 

 

2011/6/20 Janko Mivšek <[hidden email]>

Hi Masashi,

Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
that it is portable and already ported to VW. Which are other
advantages? Disadvantages?

Also question for Fuel developers, do you plan to port it to other
Smalltalks too? Portability is namelly something which is very high on
checking list for a serializer to use in portable projects, like most of
web ones are.


Hi Janko. I think "portability" is to wide to just talk without details. For me, portability in this case means two things: a) In a dialect XXX be able to materialize a  stream which was serialized in a dialect YYY: b)  that the code of the serializer can also work in another dialect (not necessary including a) ).

Fuel will not support a) for sure. At least, we will not do extra effort to support that. Regarding b), it is not Fuel first feature to be portable to other dialects. But let me explain it:
- We want to be able to serialize ANY kind of object, that includes BlockClosure, CompiledMethod, MethodContext, Class, Trait, etc.... Finding a abstract and portable representation for those objects across dialects is complicated.
- We want to be as fast as possible. That means that if we find a way to be faster which only works in Pharo, we don't care. We will go ahead with that.

That being said, I have to say that Fuel OO design, from my point of view, is quite nice, easy to understand, and not difficult to port. As an example, Eliot Miranda easily not even port Fuel to another dialect but to Newspeak. And even more, he needed special management for Newspeak data, and he was able to easily adapt Fuel for his needs. So....from in this case Fuel was portable (in the sense of b) and flexible.


Another difference is that we try to be a little faster in materialization than in serialization (which is not the case of StOMP). So in summary, the differences I can see are:

1) StOMP is focus in portability across dialects and also be able to materialize the same stream in different dialects. Fuel is not focus on portability even if it could be portable in the sense of the code.
2) StOMP is faster in serializing small/medium graphs. Fuel is faster in large graphs.
3) StOMP is faster in serializing while Fuel in materializing.
4) StOMP can serialize some objects (cannot right now BlockClosures or things like that), Fuel can (or should) be able to serialize all.

That's all I can see for the moment. But don't worry, there is no fight. We have been sending each other several mails this and the previous week and tried to shared knowledge between :)

Cheers


 


Best regards
Janko

S, Masashi UMEZAWA piše:

> Hello all,
>
> I have recently developed a new serialization library called
> StOMP(Smalltalk Objects on MessagePack).
> http://stomp.smalltalk-users.jp/
>
> StOMP is a binary serializer for major Smalltalk dialects. For those
> who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> represents object data as XML, StOMP uses MessagePack. By combining
> the flexibility of SIXX with the compactness of MessagePack, StOMP
> aims to be a unique, next-generation portable serializer for
> Smalltalk.
>
> Features:
> - Implementation is compact and portable
> - Shared/circular references support
> - "Class shape changes" support
> - Data is interchangable between Smalltalk dialects
> - Good performance for small sized object graph
>
> StOMP is now available for Squeak, Pharo, and VisualWorks.
>
> There is ConfigurationOfStOMP, so the installation is easy.
>
> Gofer new
>   squeaksource: 'MetacelloRepository';
>   package: 'ConfigurationOfStOMP';
>   load.
> (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
>
> Enjoy!

--

Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si




--
Mariano
http://marianopeck.wordpress.com



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

stephane ducasse-2
Paul

I asked and proposed martin to start a pickle format serializer for fast binary serialization. We want images without compiler.
Now I was not aware that SRP was still available/maintained. I looked at it probably 8 years ago.
So please do not bash mariano and martin for my mistakes. Now may be you should have advertize it a bit.
And having documents to describe solutions and pros and cons is also important and we often need more than a couple of wiki pages.
So where is the web site of SRP that we can find some docs, code and others.

Stef

On Jun 20, 2011, at 5:48 PM, Paul Baumann wrote:

> If you are going to compare object serializing tools then State Replication Protocol (SRP) should be added to that list. SRP has not been promoted much but it is after many years still a good cross dialect and platform binary serialization tool. It was originally ported to about seven smalltalk dialects.  Every aspect of SRP is context-configurable. SRP encoding is unique, simple, fast, and unlimited. The user base for SRP is not well known, but I hear from several people that use it for production applications and I have personal experience with one deployment.
>  
> The default configuration for SRP is to use a portable mapping layer and to encode metastate into the data stream. Even with these costs, SRP is comparable in performance to serialization tools that do not do this. The (optional) portable mapping layer is used to represent common smalltalk objects in way that can be loaded into any smalltalk dialect. Metastates describe the structure of the object state so that data load is data driven rather than code dependent. SRP can actually load state for which a class is not defined or has significantly changed. Metastates can be stored in metastate tables that can be reused and referenced to reduce data size and improve performance. When you use metastate tables, SRP stores more compactly than any other binary serialization tool is capable of. Whoever compares performance of SRP with other binary serialization tools should keep in mind that they will have to disable SRP features like these to have a fair comparison.
>  
> SRP is maintained with a single code base that is designed to work for all smalltalk dialects. SRP does this by directing less-portable behavior through a "portal" that is configured to accommodate the dialect the code is being used with.
>  
> I find it funny when I see some binary encodings that are still code-bound. If the data does not somehow indicate the data encoding and layout in some standard way then you can render encode streams unreadable from something as simple as a class schema change. They do that to save the cost of a data type code. SRP would never make a mistake like that, and the cost that SRP experiences for this data type code is typically only one byte.
>  
> SRP encoding is fundamentally a sequence of unsigned integers of infinite size. This is the most compact representation possible. An object type header is commonly only one byte and yet is still flexible enough to be unlimited and extended any way imaginable. SRP encoding supported four byte character strings before they were invented and stores them as compactly as possible. SRP allows direct and data width encodings for things like floats and embedded data. Even direct encoding of some doesn't break the readability of the object graph. SRP also allows has features for object annotation like if you want to remember the oop of an object or dependents. The encoding is what is most special and portable about SRP. Financial markets now exchange data using encoding standards (Fast FIX) for some data types that had been pioneered by SRP, but none that I'm aware of are as consistent and pure as SRP.
>  
> SRP is a solid base of code that is intended to be tailored and configured to your needs. It is fast, but the main goal of SRP was portability. SRP is provides a good configuration out of the box that you can easily tune and configure to meet your needs. The most recent tuning SRP has received was for the GS/S dialect to use GS/S specific optimizations. That GS/S specific code can be found here:
>  
> http://techsupport.gemstone.com/entries/181657-srp-3-1-010-0
>  
> SRP can serialize objects like a ComplexBlock, but does not attempt to do so in a dialect-portable way. It is simply that I had not defined a portable representation of a complex block in the portability layer. A common way to do that would be to determine the source of the block (for all dialects) and compile that code on load. It gets tricky if you attempt to support more than simple blocks or if you want to translate bytecodes (which I'd also prototyped). If you really think you need to serialize blocks then SRP is flexible enough to let you define how you want it done.
>  
> Some Smalltalk dialects (like VA in particular) do not have an efficient two-way become. You'll find that most serialization tools expect there to be an efficient two-way become to substitute one object for another on load. SRP however has a unique way to fix-up references that is efficient for all dialects. SRP has a wide variety of object substitution hooks for both saving and loading that preserve graph relationship integrity without screwing up original objects. SRP also has support for proxy objects that can be managed by application code.
>  
> The main thing wrong with SRP is that it is not the framework that "you" created. SRP was the first binary serialization tool to focus on Smalltalk dialect portability. I'd argue that it is still the only one that truly accomplished that in a meaningful way. I created SRP by combining proven techniques from the best tools of the time and adding features for portability. SRP was superior to even the dialect-specific frameworks at the time. SRP is not something that I intend to maintain and promote. I released it open source some ten years ago in the hope that others would do that. A lot of effort and sacrifice was put into SRP "for the benefit of others". SRP taught me a painful lesson about human nature and the perception of value. Programmers (myself included) love to solve problems more than learn about existing solutions. Everyone wants to solve problems like this their own way and thinks they have a good reason that they must do it their way. "Yet another" was an excellent subject line.
>  
> Paul Baumann
>  
>  
>  
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Mariano Martinez Peck
> Sent: Monday, June 20, 2011 08:54
> To: The general-purpose Squeak developers list
> Cc: VWNC; [hidden email]
> Subject: Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer
>  
>  
>
> 2011/6/20 Janko Mivšek <[hidden email]>
> Hi Masashi,
>
> Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
> that it is portable and already ported to VW. Which are other
> advantages? Disadvantages?
>
> Also question for Fuel developers, do you plan to port it to other
> Smalltalks too? Portability is namelly something which is very high on
> checking list for a serializer to use in portable projects, like most of
> web ones are.
>
> Hi Janko. I think "portability" is to wide to just talk without details. For me, portability in this case means two things: a) In a dialect XXX be able to materialize a  stream which was serialized in a dialect YYY: b)  that the code of the serializer can also work in another dialect (not necessary including a) ).
>
> Fuel will not support a) for sure. At least, we will not do extra effort to support that. Regarding b), it is not Fuel first feature to be portable to other dialects. But let me explain it:
> - We want to be able to serialize ANY kind of object, that includes BlockClosure, CompiledMethod, MethodContext, Class, Trait, etc.... Finding a abstract and portable representation for those objects across dialects is complicated.
> - We want to be as fast as possible. That means that if we find a way to be faster which only works in Pharo, we don't care. We will go ahead with that.
>
> That being said, I have to say that Fuel OO design, from my point of view, is quite nice, easy to understand, and not difficult to port. As an example, Eliot Miranda easily not even port Fuel to another dialect but to Newspeak. And even more, he needed special management for Newspeak data, and he was able to easily adapt Fuel for his needs. So....from in this case Fuel was portable (in the sense of b) and flexible.
>
>
> Another difference is that we try to be a little faster in materialization than in serialization (which is not the case of StOMP). So in summary, the differences I can see are:
>
> 1) StOMP is focus in portability across dialects and also be able to materialize the same stream in different dialects. Fuel is not focus on portability even if it could be portable in the sense of the code.
> 2) StOMP is faster in serializing small/medium graphs. Fuel is faster in large graphs.
> 3) StOMP is faster in serializing while Fuel in materializing.
> 4) StOMP can serialize some objects (cannot right now BlockClosures or things like that), Fuel can (or should) be able to serialize all.
>
> That's all I can see for the moment. But don't worry, there is no fight. We have been sending each other several mails this and the previous week and tried to shared knowledge between :)
>
> Cheers
>
>
>  
>
> Best regards
> Janko
>
> S, Masashi UMEZAWA piše:
> > Hello all,
> >
> > I have recently developed a new serialization library called
> > StOMP(Smalltalk Objects on MessagePack).
> > http://stomp.smalltalk-users.jp/
> >
> > StOMP is a binary serializer for major Smalltalk dialects. For those
> > who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> > represents object data as XML, StOMP uses MessagePack. By combining
> > the flexibility of SIXX with the compactness of MessagePack, StOMP
> > aims to be a unique, next-generation portable serializer for
> > Smalltalk.
> >
> > Features:
> > - Implementation is compact and portable
> > - Shared/circular references support
> > - "Class shape changes" support
> > - Data is interchangable between Smalltalk dialects
> > - Good performance for small sized object graph
> >
> > StOMP is now available for Squeak, Pharo, and VisualWorks.
> >
> > There is ConfigurationOfStOMP, so the installation is easy.
> >
> > Gofer new
> >   squeaksource: 'MetacelloRepository';
> >   package: 'ConfigurationOfStOMP';
> >   load.
> > (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
> >
> > Enjoy!
>
> --
> Janko Mivšek
> Aida/Web
> Smalltalk Web Application Server
> http://www.aidaweb.si
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
> This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

Mariano Martinez Peck
In reply to this post by Paul Baumann


On Mon, Jun 20, 2011 at 5:48 PM, Paul Baumann <[hidden email]> wrote:

If you are going to compare object serializing tools then State Replication Protocol (SRP) should be added to that list.


Well, this thread was about StOMP, but I will answer anyway about Fuel. We did take a look to SRP. In fact, I've sent you an email asking a lot of questions and you kindly and detailed answered me all the questions.
 

SRP has not been promoted much but it is after many years still a good cross dialect and platform binary serialization tool. It was originally ported to about seven smalltalk dialects.  Every aspect of SRP is context-configurable.


That's one of the reasons which can me it a little bit slower than others.
 

SRP encoding is unique, simple, fast, and unlimited. The user base for SRP is not well known, but I hear from several people that use it for production applications and I have personal experience with one deployment.

 

The default configuration for SRP is to use a portable mapping layer and to encode metastate into the data stream. Even with these costs, SRP is comparable in performance to serialization tools that do not do this. The (optional) portable mapping layer is used to represent common smalltalk objects in way that can be loaded into any smalltalk dialect. Metastates describe the structure of the object state so that data load is data driven rather than code dependent. SRP can actually load state for which a class is not defined or has significantly changed. Metastates can be stored in metastate tables that can be reused and referenced to reduce data size and improve performance. When you use metastate tables, SRP stores more compactly than any other binary serialization tool is capable of. Whoever compares performance of SRP with other binary serialization tools should keep in mind that they will have to disable SRP features like these to have a fair comparison.


How can I disable such portable mapping layer (exaxctly, in code)?  Can I disable that but at the same time support class shape changes?


 

SRP is maintained with a single code base that is designed to work for all smalltalk dialects. SRP does this by directing less-portable behavior through a "portal" that is configured to accommodate the dialect the code is being used with.

 

I find it funny when I see some binary encodings that are still code-bound. If the data does not somehow indicate the data encoding and layout in some standard way then you can render encode streams unreadable from something as simple as a class schema change. They do that to save the cost of a data type code. SRP would never make a mistake like that, and the cost that SRP experiences for this data type code is typically only one byte.


We do store the type as well in one byte. But in our case, objects are grouped together in clusters. So it is even one byte per cluster only.
 

 

SRP encoding is fundamentally a sequence of unsigned integers of infinite size. This is the most compact representation possible. An object type header is commonly only one byte and yet is still flexible enough to be unlimited and extended any way imaginable. SRP encoding supported four byte character strings before they were invented and stores them as compactly as possible. SRP allows direct and data width encodings for things like floats and embedded data. Even direct encoding of some doesn't break the readability of the object graph. SRP also allows has features for object annotation like if you want to remember the oop of an object or dependents. The encoding is what is most special and portable about SRP. Financial markets now exchange data using encoding standards (Fast FIX) for some data types that had been pioneered by SRP, but none that I'm aware of are as consistent and pure as SRP.

 

SRP is a solid base of code that is intended to be tailored and configured to your needs. It is fast, but the main goal of SRP was portability. SRP is provides a good configuration out of the box that you can easily tune and configure to meet your needs. The most recent tuning SRP has received was for the GS/S dialect to use GS/S specific optimizations. That GS/S specific code can be found here:

 

http://techsupport.gemstone.com/entries/181657-srp-3-1-010-0

 

SRP can serialize objects like a ComplexBlock, but does not attempt to do so in a dialect-portable way. It is simply that I had not defined a portable representation of a complex block in the portability layer. A common way to do that would be to determine the source of the block (for all dialects) and compile that code on load.


Yes, but that may not work. Because closures point to another context, which can be a CompiledMethod for example. And a closure can have references to variables defined outside the closure....
 

It gets tricky if you attempt to support more than simple blocks or if you want to translate bytecodes (which I'd also prototyped). If you really think you need to serialize blocks then SRP is flexible enough to let you define how you want it done.

 


excellent.
 

Some Smalltalk dialects (like VA in particular) do not have an efficient two-way become. You'll find that most serialization tools expect there to be an efficient two-way become to substitute one object for another on load. SRP however has a unique way to fix-up references that is efficient for all dialects. SRP has a wide variety of object substitution hooks for both saving and loading that preserve graph relationship integrity without screwing up original objects. SRP also has support for proxy objects that can be managed by application code.


Where (classes/methods/tests) can I take a look how do you manage those proxies? it sounds interestng. The same for the object sustitutio hook.
 

 

The main thing wrong with SRP is that it is not the framework that "you" created. SRP was the first binary serialization tool to focus on Smalltalk dialect portability. I'd argue that it is still the only one that truly accomplished that in a meaningful way. I created SRP by combining proven techniques from the best tools of the time and adding features for portability. SRP was superior to even the dialect-specific frameworks at the time. SRP is not something that I intend to maintain and promote. I released it open source some ten years ago in the hope that others would do that. A lot of effort and sacrifice was put into SRP "for the benefit of others". SRP taught me a painful lesson about human nature and the perception of value. Programmers (myself included) love to solve problems more than learn about existing solutions. Everyone wants to solve problems like this their own way and thinks they have a good reason that they must do it their way. "Yet another" was an excellent subject line.


I will speak just for Fuel. I don't think this is really a problem. This that you mention is so known that it has even a name: trade-off. If you find a way to be really fast in serializtion, materialization and be portable at the same time, then I am all ears. For me it is perfect to have different kind of serializers. Do you want something portable and be able to even edit it with a text editor?  then use SIXX. Do you want a portable solution with a more or less good performance? then use StOMP, SRP, etc. Do you want something really fast (mostly at materializtion time) which is not focused in portability? then use Fuel. Is that bad ??    Now in Pharo people are doing Opal compiler, which is 3 times slower than the old one. Why we are not agains that?  again, trade-off. Old Compiler is really difficult to understand and maintain. We want something more OO, easy to maintain, to understand and to experiment. 

Now, I don't know the reasons but Colin ported SRP to Squeak and the he finally implemented his own S&M serializer. Masashi now implemented StOMP but he also took a look tp SRP. In fact, check the commits in http://www.squeaksource.com/SRP,  He fixed it, and I asked him a couple of questions to make it work. Since this week (a couple of days ago), SRP tests are green in Pharo. So...these guys took a look to SRP, as well as us.

In our case, we even created benchmarks (check package FuelBenchmarksSRP in Fuel repo) to compare Fuel against the rest. I can share the results with you if you want, but tell me first how to disable the mapping layer that makes it slower.

Cheers

 

 

Paul Baumann

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Mariano Martinez Peck
Sent: Monday, June 20, 2011 08:54


To: The general-purpose Squeak developers list
Cc: VWNC; [hidden email]
Subject: Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

 

 

2011/6/20 Janko Mivšek <[hidden email]>

Hi Masashi,

Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
that it is portable and already ported to VW. Which are other
advantages? Disadvantages?

Also question for Fuel developers, do you plan to port it to other
Smalltalks too? Portability is namelly something which is very high on
checking list for a serializer to use in portable projects, like most of
web ones are.


Hi Janko. I think "portability" is to wide to just talk without details. For me, portability in this case means two things: a) In a dialect XXX be able to materialize a  stream which was serialized in a dialect YYY: b)  that the code of the serializer can also work in another dialect (not necessary including a) ).

Fuel will not support a) for sure. At least, we will not do extra effort to support that. Regarding b), it is not Fuel first feature to be portable to other dialects. But let me explain it:
- We want to be able to serialize ANY kind of object, that includes BlockClosure, CompiledMethod, MethodContext, Class, Trait, etc.... Finding a abstract and portable representation for those objects across dialects is complicated.
- We want to be as fast as possible. That means that if we find a way to be faster which only works in Pharo, we don't care. We will go ahead with that.

That being said, I have to say that Fuel OO design, from my point of view, is quite nice, easy to understand, and not difficult to port. As an example, Eliot Miranda easily not even port Fuel to another dialect but to Newspeak. And even more, he needed special management for Newspeak data, and he was able to easily adapt Fuel for his needs. So....from in this case Fuel was portable (in the sense of b) and flexible.


Another difference is that we try to be a little faster in materialization than in serialization (which is not the case of StOMP). So in summary, the differences I can see are:

1) StOMP is focus in portability across dialects and also be able to materialize the same stream in different dialects. Fuel is not focus on portability even if it could be portable in the sense of the code.
2) StOMP is faster in serializing small/medium graphs. Fuel is faster in large graphs.
3) StOMP is faster in serializing while Fuel in materializing.
4) StOMP can serialize some objects (cannot right now BlockClosures or things like that), Fuel can (or should) be able to serialize all.

That's all I can see for the moment. But don't worry, there is no fight. We have been sending each other several mails this and the previous week and tried to shared knowledge between :)

Cheers


 


Best regards
Janko

S, Masashi UMEZAWA piše:

> Hello all,
>
> I have recently developed a new serialization library called
> StOMP(Smalltalk Objects on MessagePack).
> http://stomp.smalltalk-users.jp/
>
> StOMP is a binary serializer for major Smalltalk dialects. For those
> who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> represents object data as XML, StOMP uses MessagePack. By combining
> the flexibility of SIXX with the compactness of MessagePack, StOMP
> aims to be a unique, next-generation portable serializer for
> Smalltalk.
>
> Features:
> - Implementation is compact and portable
> - Shared/circular references support
> - "Class shape changes" support
> - Data is interchangable between Smalltalk dialects
> - Good performance for small sized object graph
>
> StOMP is now available for Squeak, Pharo, and VisualWorks.
>
> There is ConfigurationOfStOMP, so the installation is easy.
>
> Gofer new
>   squeaksource: 'MetacelloRepository';
>   package: 'ConfigurationOfStOMP';
>   load.
> (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
>
> Enjoy!

--

Janko Mivšek
Aida/Web
Smalltalk Web Application Server
http://www.aidaweb.si




--
Mariano
http://marianopeck.wordpress.com



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.



--
Mariano
http://marianopeck.wordpress.com



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

Stéphane Ducasse
Thanks mariano.
I like your smart attitude and answers :)

Stef

On Jun 20, 2011, at 8:50 PM, Mariano Martinez Peck wrote:

>
>
> On Mon, Jun 20, 2011 at 5:48 PM, Paul Baumann <[hidden email]> wrote:
> If you are going to compare object serializing tools then State Replication Protocol (SRP) should be added to that list.
>
>
> Well, this thread was about StOMP, but I will answer anyway about Fuel. We did take a look to SRP. In fact, I've sent you an email asking a lot of questions and you kindly and detailed answered me all the questions.
>  
> SRP has not been promoted much but it is after many years still a good cross dialect and platform binary serialization tool. It was originally ported to about seven smalltalk dialects.  Every aspect of SRP is context-configurable.
>
>
> That's one of the reasons which can me it a little bit slower than others.
>  
> SRP encoding is unique, simple, fast, and unlimited. The user base for SRP is not well known, but I hear from several people that use it for production applications and I have personal experience with one deployment.
>
>  
>
> The default configuration for SRP is to use a portable mapping layer and to encode metastate into the data stream. Even with these costs, SRP is comparable in performance to serialization tools that do not do this. The (optional) portable mapping layer is used to represent common smalltalk objects in way that can be loaded into any smalltalk dialect. Metastates describe the structure of the object state so that data load is data driven rather than code dependent. SRP can actually load state for which a class is not defined or has significantly changed. Metastates can be stored in metastate tables that can be reused and referenced to reduce data size and improve performance. When you use metastate tables, SRP stores more compactly than any other binary serialization tool is capable of. Whoever compares performance of SRP with other binary serialization tools should keep in mind that they will have to disable SRP features like these to have a fair comparison.
>
>
> How can I disable such portable mapping layer (exaxctly, in code)?  Can I disable that but at the same time support class shape changes?
>
>
>
>  
>
> SRP is maintained with a single code base that is designed to work for all smalltalk dialects. SRP does this by directing less-portable behavior through a "portal" that is configured to accommodate the dialect the code is being used with.
>
>  
>
> I find it funny when I see some binary encodings that are still code-bound. If the data does not somehow indicate the data encoding and layout in some standard way then you can render encode streams unreadable from something as simple as a class schema change. They do that to save the cost of a data type code. SRP would never make a mistake like that, and the cost that SRP experiences for this data type code is typically only one byte.
>
>
> We do store the type as well in one byte. But in our case, objects are grouped together in clusters. So it is even one byte per cluster only.
>  
>
>  
>
> SRP encoding is fundamentally a sequence of unsigned integers of infinite size. This is the most compact representation possible. An object type header is commonly only one byte and yet is still flexible enough to be unlimited and extended any way imaginable. SRP encoding supported four byte character strings before they were invented and stores them as compactly as possible. SRP allows direct and data width encodings for things like floats and embedded data. Even direct encoding of some doesn't break the readability of the object graph. SRP also allows has features for object annotation like if you want to remember the oop of an object or dependents. The encoding is what is most special and portable about SRP. Financial markets now exchange data using encoding standards (Fast FIX) for some data types that had been pioneered by SRP, but none that I'm aware of are as consistent and pure as SRP.
>
>  
>
> SRP is a solid base of code that is intended to be tailored and configured to your needs. It is fast, but the main goal of SRP was portability. SRP is provides a good configuration out of the box that you can easily tune and configure to meet your needs. The most recent tuning SRP has received was for the GS/S dialect to use GS/S specific optimizations. That GS/S specific code can be found here:
>
>  
>
> http://techsupport.gemstone.com/entries/181657-srp-3-1-010-0
>
>  
>
> SRP can serialize objects like a ComplexBlock, but does not attempt to do so in a dialect-portable way. It is simply that I had not defined a portable representation of a complex block in the portability layer. A common way to do that would be to determine the source of the block (for all dialects) and compile that code on load.
>
>
> Yes, but that may not work. Because closures point to another context, which can be a CompiledMethod for example. And a closure can have references to variables defined outside the closure....
>  
> It gets tricky if you attempt to support more than simple blocks or if you want to translate bytecodes (which I'd also prototyped). If you really think you need to serialize blocks then SRP is flexible enough to let you define how you want it done.
>
>  
>
>
> excellent.
>  
>
> Some Smalltalk dialects (like VA in particular) do not have an efficient two-way become. You'll find that most serialization tools expect there to be an efficient two-way become to substitute one object for another on load. SRP however has a unique way to fix-up references that is efficient for all dialects. SRP has a wide variety of object substitution hooks for both saving and loading that preserve graph relationship integrity without screwing up original objects. SRP also has support for proxy objects that can be managed by application code.
>
>
> Where (classes/methods/tests) can I take a look how do you manage those proxies? it sounds interestng. The same for the object sustitutio hook.
>  
>
>  
>
> The main thing wrong with SRP is that it is not the framework that "you" created. SRP was the first binary serialization tool to focus on Smalltalk dialect portability. I'd argue that it is still the only one that truly accomplished that in a meaningful way. I created SRP by combining proven techniques from the best tools of the time and adding features for portability. SRP was superior to even the dialect-specific frameworks at the time. SRP is not something that I intend to maintain and promote. I released it open source some ten years ago in the hope that others would do that. A lot of effort and sacrifice was put into SRP "for the benefit of others". SRP taught me a painful lesson about human nature and the perception of value. Programmers (myself included) love to solve problems more than learn about existing solutions. Everyone wants to solve problems like this their own way and thinks they have a good reason that they must do it their way. "Yet another" was an excellent subject line.
>
>
> I will speak just for Fuel. I don't think this is really a problem. This that you mention is so known that it has even a name: trade-off. If you find a way to be really fast in serializtion, materialization and be portable at the same time, then I am all ears. For me it is perfect to have different kind of serializers. Do you want something portable and be able to even edit it with a text editor?  then use SIXX. Do you want a portable solution with a more or less good performance? then use StOMP, SRP, etc. Do you want something really fast (mostly at materializtion time) which is not focused in portability? then use Fuel. Is that bad ??    Now in Pharo people are doing Opal compiler, which is 3 times slower than the old one. Why we are not agains that?  again, trade-off. Old Compiler is really difficult to understand and maintain. We want something more OO, easy to maintain, to understand and to experiment.  
>
> Now, I don't know the reasons but Colin ported SRP to Squeak and the he finally implemented his own S&M serializer. Masashi now implemented StOMP but he also took a look tp SRP. In fact, check the commits in http://www.squeaksource.com/SRP,  He fixed it, and I asked him a couple of questions to make it work. Since this week (a couple of days ago), SRP tests are green in Pharo. So...these guys took a look to SRP, as well as us.
>
> In our case, we even created benchmarks (check package FuelBenchmarksSRP in Fuel repo) to compare Fuel against the rest. I can share the results with you if you want, but tell me first how to disable the mapping layer that makes it slower.
>
> Cheers
>
>  
>
>  
>
> Paul Baumann
>
>  
>
>  
>
>  
>
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Mariano Martinez Peck
> Sent: Monday, June 20, 2011 08:54
>
>
> To: The general-purpose Squeak developers list
> Cc: VWNC; [hidden email]
> Subject: Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer
>
>  
>
>  
>
> 2011/6/20 Janko Mivšek <[hidden email]>
>
> Hi Masashi,
>
> Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
> that it is portable and already ported to VW. Which are other
> advantages? Disadvantages?
>
> Also question for Fuel developers, do you plan to port it to other
> Smalltalks too? Portability is namelly something which is very high on
> checking list for a serializer to use in portable projects, like most of
> web ones are.
>
>
> Hi Janko. I think "portability" is to wide to just talk without details. For me, portability in this case means two things: a) In a dialect XXX be able to materialize a  stream which was serialized in a dialect YYY: b)  that the code of the serializer can also work in another dialect (not necessary including a) ).
>
> Fuel will not support a) for sure. At least, we will not do extra effort to support that. Regarding b), it is not Fuel first feature to be portable to other dialects. But let me explain it:
> - We want to be able to serialize ANY kind of object, that includes BlockClosure, CompiledMethod, MethodContext, Class, Trait, etc.... Finding a abstract and portable representation for those objects across dialects is complicated.
> - We want to be as fast as possible. That means that if we find a way to be faster which only works in Pharo, we don't care. We will go ahead with that.
>
> That being said, I have to say that Fuel OO design, from my point of view, is quite nice, easy to understand, and not difficult to port. As an example, Eliot Miranda easily not even port Fuel to another dialect but to Newspeak. And even more, he needed special management for Newspeak data, and he was able to easily adapt Fuel for his needs. So....from in this case Fuel was portable (in the sense of b) and flexible.
>
>
> Another difference is that we try to be a little faster in materialization than in serialization (which is not the case of StOMP). So in summary, the differences I can see are:
>
> 1) StOMP is focus in portability across dialects and also be able to materialize the same stream in different dialects. Fuel is not focus on portability even if it could be portable in the sense of the code.
> 2) StOMP is faster in serializing small/medium graphs. Fuel is faster in large graphs.
> 3) StOMP is faster in serializing while Fuel in materializing.
> 4) StOMP can serialize some objects (cannot right now BlockClosures or things like that), Fuel can (or should) be able to serialize all.
>
> That's all I can see for the moment. But don't worry, there is no fight. We have been sending each other several mails this and the previous week and tried to shared knowledge between :)
>
> Cheers
>
>
>  
>
>
> Best regards
> Janko
>
> S, Masashi UMEZAWA piše:
>
> > Hello all,
> >
> > I have recently developed a new serialization library called
> > StOMP(Smalltalk Objects on MessagePack).
> > http://stomp.smalltalk-users.jp/
> >
> > StOMP is a binary serializer for major Smalltalk dialects. For those
> > who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> > represents object data as XML, StOMP uses MessagePack. By combining
> > the flexibility of SIXX with the compactness of MessagePack, StOMP
> > aims to be a unique, next-generation portable serializer for
> > Smalltalk.
> >
> > Features:
> > - Implementation is compact and portable
> > - Shared/circular references support
> > - "Class shape changes" support
> > - Data is interchangable between Smalltalk dialects
> > - Good performance for small sized object graph
> >
> > StOMP is now available for Squeak, Pharo, and VisualWorks.
> >
> > There is ConfigurationOfStOMP, so the installation is easy.
> >
> > Gofer new
> >   squeaksource: 'MetacelloRepository';
> >   package: 'ConfigurationOfStOMP';
> >   load.
> > (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
> >
> > Enjoy!
>
> --
>
> Janko Mivšek
> Aida/Web
> Smalltalk Web Application Server
> http://www.aidaweb.si
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
> This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>


Reply | Threaded
Open this post in threaded view
|

RE: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

Paul Baumann
In reply to this post by stephane ducasse-2
Stef,

There is a WIKI on SRP somewhere? Cool. The SourceForge.org site that hosted SRP never got much traction. You are right, it is my fault that few have heard of SRP. I stopped promoting it many years ago. The core hasn't needed much maintenance, but updated distribution forms (like newer VW parcel formats) are overdue and likely outdated. Overview documentation (outside of code and class comment methods) is also poor. Sorry if it sounded like I was bashing others, that was not my intent. Any tone detected is a reflection of my own frustration getting support for what others have since attempted to repeat. I do not intend to discourage others from trying a goal that I still believe in. I'd encourage them to look at SRP though because it uses techniques found nowhere else and that are ideal for portability.

SRP was intended to be one part of a solution that could translate compiled code to native bytecodes. Those features are not built into SRP though as they are harder to port than the rest of SRP. SRP was released without that functionality. I could probably dig up the code for a portable code scanner/interpreter that parses to nodes that SRP is able to serialize more space efficiently than the original source code. The parse nodes for a method were able to be mixed into a method dictionary with a performance hit due to message performs. The nodes can generate nicely formatted smalltalk and java-like source code. The nodes save (using an SRP metastate table) as space efficiently as compiled code. Since the nodes can take the place of a compiled method though, the images can be used cross-dialect without a compiler. The parser recognized dialect specific syntax and was also able to generate code using syntax expected by the loading dialect (for syntax differences between dialects on things like namespaces). I never refined an SRP-based load translator to native bytecodes, but that is possible. One challenge you'll encounter is that some vendors don't want to tell you what their bytecodes are and you'd violate their license by figuring them out yourself. At one time though, some vendors wouldn't allow you to include their compiler in a runtime image either though. Not sure if any still have that restriction though.

Paul


-----Original Message-----
From: stephane ducasse [mailto:[hidden email]]
Sent: Monday, June 20, 2011 12:01
To: Paul Baumann
Cc: Mariano Martinez Peck; The general-purpose Squeak developers list; VWNC; [hidden email]
Subject: Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer
Importance: High

Paul

I asked and proposed martin to start a pickle format serializer for fast binary serialization. We want images without compiler.
Now I was not aware that SRP was still available/maintained. I looked at it probably 8 years ago.
So please do not bash mariano and martin for my mistakes. Now may be you should have advertize it a bit.
And having documents to describe solutions and pros and cons is also important and we often need more than a couple of wiki pages.
So where is the web site of SRP that we can find some docs, code and others.

Stef

On Jun 20, 2011, at 5:48 PM, Paul Baumann wrote:

> If you are going to compare object serializing tools then State Replication Protocol (SRP) should be added to that list. SRP has not been promoted much but it is after many years still a good cross dialect and platform binary serialization tool. It was originally ported to about seven smalltalk dialects.  Every aspect of SRP is context-configurable. SRP encoding is unique, simple, fast, and unlimited. The user base for SRP is not well known, but I hear from several people that use it for production applications and I have personal experience with one deployment.
>
> The default configuration for SRP is to use a portable mapping layer and to encode metastate into the data stream. Even with these costs, SRP is comparable in performance to serialization tools that do not do this. The (optional) portable mapping layer is used to represent common smalltalk objects in way that can be loaded into any smalltalk dialect. Metastates describe the structure of the object state so that data load is data driven rather than code dependent. SRP can actually load state for which a class is not defined or has significantly changed. Metastates can be stored in metastate tables that can be reused and referenced to reduce data size and improve performance. When you use metastate tables, SRP stores more compactly than any other binary serialization tool is capable of. Whoever compares performance of SRP with other binary serialization tools should keep in mind that they will have to disable SRP features like these to have a fair comparison.
>
> SRP is maintained with a single code base that is designed to work for all smalltalk dialects. SRP does this by directing less-portable behavior through a "portal" that is configured to accommodate the dialect the code is being used with.
>
> I find it funny when I see some binary encodings that are still code-bound. If the data does not somehow indicate the data encoding and layout in some standard way then you can render encode streams unreadable from something as simple as a class schema change. They do that to save the cost of a data type code. SRP would never make a mistake like that, and the cost that SRP experiences for this data type code is typically only one byte.
>
> SRP encoding is fundamentally a sequence of unsigned integers of infinite size. This is the most compact representation possible. An object type header is commonly only one byte and yet is still flexible enough to be unlimited and extended any way imaginable. SRP encoding supported four byte character strings before they were invented and stores them as compactly as possible. SRP allows direct and data width encodings for things like floats and embedded data. Even direct encoding of some doesn't break the readability of the object graph. SRP also allows has features for object annotation like if you want to remember the oop of an object or dependents. The encoding is what is most special and portable about SRP. Financial markets now exchange data using encoding standards (Fast FIX) for some data types that had been pioneered by SRP, but none that I'm aware of are as consistent and pure as SRP.
>
> SRP is a solid base of code that is intended to be tailored and configured to your needs. It is fast, but the main goal of SRP was portability. SRP is provides a good configuration out of the box that you can easily tune and configure to meet your needs. The most recent tuning SRP has received was for the GS/S dialect to use GS/S specific optimizations. That GS/S specific code can be found here:
>
> http://techsupport.gemstone.com/entries/181657-srp-3-1-010-0
>
> SRP can serialize objects like a ComplexBlock, but does not attempt to do so in a dialect-portable way. It is simply that I had not defined a portable representation of a complex block in the portability layer. A common way to do that would be to determine the source of the block (for all dialects) and compile that code on load. It gets tricky if you attempt to support more than simple blocks or if you want to translate bytecodes (which I'd also prototyped). If you really think you need to serialize blocks then SRP is flexible enough to let you define how you want it done.
>
> Some Smalltalk dialects (like VA in particular) do not have an efficient two-way become. You'll find that most serialization tools expect there to be an efficient two-way become to substitute one object for another on load. SRP however has a unique way to fix-up references that is efficient for all dialects. SRP has a wide variety of object substitution hooks for both saving and loading that preserve graph relationship integrity without screwing up original objects. SRP also has support for proxy objects that can be managed by application code.
>
> The main thing wrong with SRP is that it is not the framework that "you" created. SRP was the first binary serialization tool to focus on Smalltalk dialect portability. I'd argue that it is still the only one that truly accomplished that in a meaningful way. I created SRP by combining proven techniques from the best tools of the time and adding features for portability. SRP was superior to even the dialect-specific frameworks at the time. SRP is not something that I intend to maintain and promote. I released it open source some ten years ago in the hope that others would do that. A lot of effort and sacrifice was put into SRP "for the benefit of others". SRP taught me a painful lesson about human nature and the perception of value. Programmers (myself included) love to solve problems more than learn about existing solutions. Everyone wants to solve problems like this their own way and thinks they have a good reason that they must do it their way. "Yet another" was an excellent subject line.
>
> Paul Baumann
>
>
>
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Mariano Martinez Peck
> Sent: Monday, June 20, 2011 08:54
> To: The general-purpose Squeak developers list
> Cc: VWNC; [hidden email]
> Subject: Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer
>
>
>
> 2011/6/20 Janko Mivšek <[hidden email]>
> Hi Masashi,
>
> Now we have a competition, Fuel vs. StOMP :) Big advantage of StOMP is
> that it is portable and already ported to VW. Which are other
> advantages? Disadvantages?
>
> Also question for Fuel developers, do you plan to port it to other
> Smalltalks too? Portability is namelly something which is very high on
> checking list for a serializer to use in portable projects, like most of
> web ones are.
>
> Hi Janko. I think "portability" is to wide to just talk without details. For me, portability in this case means two things: a) In a dialect XXX be able to materialize a  stream which was serialized in a dialect YYY: b)  that the code of the serializer can also work in another dialect (not necessary including a) ).
>
> Fuel will not support a) for sure. At least, we will not do extra effort to support that. Regarding b), it is not Fuel first feature to be portable to other dialects. But let me explain it:
> - We want to be able to serialize ANY kind of object, that includes BlockClosure, CompiledMethod, MethodContext, Class, Trait, etc.... Finding a abstract and portable representation for those objects across dialects is complicated.
> - We want to be as fast as possible. That means that if we find a way to be faster which only works in Pharo, we don't care. We will go ahead with that.
>
> That being said, I have to say that Fuel OO design, from my point of view, is quite nice, easy to understand, and not difficult to port. As an example, Eliot Miranda easily not even port Fuel to another dialect but to Newspeak. And even more, he needed special management for Newspeak data, and he was able to easily adapt Fuel for his needs. So....from in this case Fuel was portable (in the sense of b) and flexible.
>
>
> Another difference is that we try to be a little faster in materialization than in serialization (which is not the case of StOMP). So in summary, the differences I can see are:
>
> 1) StOMP is focus in portability across dialects and also be able to materialize the same stream in different dialects. Fuel is not focus on portability even if it could be portable in the sense of the code.
> 2) StOMP is faster in serializing small/medium graphs. Fuel is faster in large graphs.
> 3) StOMP is faster in serializing while Fuel in materializing.
> 4) StOMP can serialize some objects (cannot right now BlockClosures or things like that), Fuel can (or should) be able to serialize all.
>
> That's all I can see for the moment. But don't worry, there is no fight. We have been sending each other several mails this and the previous week and tried to shared knowledge between :)
>
> Cheers
>
>
>
>
> Best regards
> Janko
>
> S, Masashi UMEZAWA piše:
> > Hello all,
> >
> > I have recently developed a new serialization library called
> > StOMP(Smalltalk Objects on MessagePack).
> > http://stomp.smalltalk-users.jp/
> >
> > StOMP is a binary serializer for major Smalltalk dialects. For those
> > who know SIXX, StOMP can be seen as a binary SIXX. While SIXX
> > represents object data as XML, StOMP uses MessagePack. By combining
> > the flexibility of SIXX with the compactness of MessagePack, StOMP
> > aims to be a unique, next-generation portable serializer for
> > Smalltalk.
> >
> > Features:
> > - Implementation is compact and portable
> > - Shared/circular references support
> > - "Class shape changes" support
> > - Data is interchangable between Smalltalk dialects
> > - Good performance for small sized object graph
> >
> > StOMP is now available for Squeak, Pharo, and VisualWorks.
> >
> > There is ConfigurationOfStOMP, so the installation is easy.
> >
> > Gofer new
> >   squeaksource: 'MetacelloRepository';
> >   package: 'ConfigurationOfStOMP';
> >   load.
> > (Smalltalk at: #ConfigurationOfStOMP) perform: #load.
> >
> > Enjoy!
>
> --
> Janko Mivšek
> Aida/Web
> Smalltalk Web Application Server
> http://www.aidaweb.si
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>
> This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


Reply | Threaded
Open this post in threaded view
|

RE: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

Paul Baumann
In reply to this post by Mariano Martinez Peck

Mariano,

 

My responses below are tagged by <plb> and </plb>...

 

Paul Baumann

 

 

From: Mariano Martinez Peck [mailto:[hidden email]]

On Mon, Jun 20, 2011 at 5:48 PM, Paul Baumann <[hidden email]> wrote:

If you are going to compare object serializing tools then State Replication Protocol (SRP) should be added to that list.


Well, this thread was about StOMP, but I will answer anyway about Fuel. We did take a look to SRP. In fact, I've sent you an email asking a lot of questions and you kindly and detailed answered me all the questions.
 

SRP has not been promoted much but it is after many years still a good cross dialect and platform binary serialization tool. It was originally ported to about seven smalltalk dialects.  Every aspect of SRP is context-configurable.


That's one of the reasons which can me it a little bit slower than others.

<plb> Yeah, but that can be customized by configuration too. A little bit slower out of the box for the sake of portability is usually a good trade off so long as you can tune it to get the performance you need. If someone skips the tuning part then they'll leave with a bad impression. If SRP is compared with an immature solution one may find that a performance advantage disappears once their framework evolves years later to do what SRP is already doing .  </plb>

 

SRP encoding is unique, simple, fast, and unlimited. The user base for SRP is not well known, but I hear from several people that use it for production applications and I have personal experience with one deployment.

 

The default configuration for SRP is to use a portable mapping layer and to encode metastate into the data stream. Even with these costs, SRP is comparable in performance to serialization tools that do not do this. The (optional) portable mapping layer is used to represent common smalltalk objects in way that can be loaded into any smalltalk dialect. Metastates describe the structure of the object state so that data load is data driven rather than code dependent. SRP can actually load state for which a class is not defined or has significantly changed. Metastates can be stored in metastate tables that can be reused and referenced to reduce data size and improve performance. When you use metastate tables, SRP stores more compactly than any other binary serialization tool is capable of. Whoever compares performance of SRP with other binary serialization tools should keep in mind that they will have to disable SRP features like these to have a fair comparison.


How can I disable such portable mapping layer (exaxctly, in code)?  Can I disable that but at the same time support class shape changes?

<plb>

SrpConfiguration is the source of all the customizations and interactions.  Use SrpNonMappingConfiguration to avoid the portable mapping layer. The bigger cost in both performance and space efficiency is that SRP adds metastate information to the data if the metastates are not in a table. You need to use a metastate table. You need to define how loading code will resolve a metastate table that is being referenced. This is similar in purpose to an XML DTD file. You'd likely subclass SrpConfiguration and then refine methods like #saveMetastateTableNamed:containing:, #resolveMetastateTableNamed:, or #resolveMetastateTableReference:.

Metastates describe the data encoding. You must be able to resolve the metastate the object to be read. The metastate is data in a predictable (yet extendable) format that describes how data in a less predictable format is encoded. SRP uses metastates so that class shape changes or behavior will not affect the ability to read data. If the class doesn't exist at all for loaded data then SRP is able to load in instance of SrpState that represents the structure and accessor behavior of the original object.

A portable mapping is different entirely, it says for example that "a dictionary is a collection of association instances" rather than whatever the native dictionary implementation is. When saving a Dictionary instance, it instead saves a PmrDictionary (of association instances) that is then able to load-map back to the native Dictionary implementation. Other serialization frameworks tend to put these portability rules in code. You could put them in code with SRP too (and even directly write objects to the stream yourself within SRP data), but SRP defaults to using class-based portability mapping. Your configuration declares which mappings you want to use. Encode with no portability mappings at all and you'll be able to read the data in the format it originated from. Don't like the class-based mapping rules? Then tell your configuration to #beforeSavingAnyNamed:doWithContext: or #afterLoadingAnyNamed:doWithContext:. Still don't think that is fast enough for you then override #writerClass and #readerClass to use your own marshaler subclass to for example implement #saveDictionary:  to write your collection of associations.  SRP gives more options for portable mapping. You can still map by marshaler if you prefer that approach.

</plb>

 

SRP is maintained with a single code base that is designed to work for all smalltalk dialects. SRP does this by directing less-portable behavior through a "portal" that is configured to accommodate the dialect the code is being used with.

 

I find it funny when I see some binary encodings that are still code-bound. If the data does not somehow indicate the data encoding and layout in some standard way then you can render encode streams unreadable from something as simple as a class schema change. They do that to save the cost of a data type code. SRP would never make a mistake like that, and the cost that SRP experiences for this data type code is typically only one byte.


We do store the type as well in one byte. But in our case, objects are grouped together in clusters. So it is even one byte per cluster only.

<plb>

SRP can store in clusters too. It is a common layout for serialization tools (depth first storage in silo collections followed by a relationship graph of pointers). I'd experimented with an SRP configuration that used that layout. It didn't provide any advantage at all. The cost of the pointer values outweighs the savings on class identifier. If you prefer that layout then SRP can accommodate it though.

</plb>

 

 

SRP encoding is fundamentally a sequence of unsigned integers of infinite size. This is the most compact representation possible. An object type header is commonly only one byte and yet is still flexible enough to be unlimited and extended any way imaginable. SRP encoding supported four byte character strings before they were invented and stores them as compactly as possible. SRP allows direct and data width encodings for things like floats and embedded data. Even direct encoding of some doesn't break the readability of the object graph. SRP also allows has features for object annotation like if you want to remember the oop of an object or dependents. The encoding is what is most special and portable about SRP. Financial markets now exchange data using encoding standards (Fast FIX) for some data types that had been pioneered by SRP, but none that I'm aware of are as consistent and pure as SRP.

 

SRP is a solid base of code that is intended to be tailored and configured to your needs. It is fast, but the main goal of SRP was portability. SRP is provides a good configuration out of the box that you can easily tune and configure to meet your needs. The most recent tuning SRP has received was for the GS/S dialect to use GS/S specific optimizations. That GS/S specific code can be found here:

 

http://techsupport.gemstone.com/entries/181657-srp-3-1-010-0

 

SRP can serialize objects like a ComplexBlock, but does not attempt to do so in a dialect-portable way. It is simply that I had not defined a portable representation of a complex block in the portability layer. A common way to do that would be to determine the source of the block (for all dialects) and compile that code on load.


Yes, but that may not work. Because closures point to another context, which can be a CompiledMethod for example. And a closure can have references to variables defined outside the closure....

<plb>

SRP placeholders, actionItems, proxies, and substitutions can all be part of a solution that would make that work. I don't see a need to do it anyway. To me it is an example of something that many people think they need to serialize but that rarely does it need to be. An exception being the sort block of a sorted collection. What most people end up doing in that case is to have standardized substitutes. If serializing the block for [:a :b | a < b ] then instead serialize an object that will load as the native compiled form of that block.

 

You can support features like porting complex blocks with external references if you want to, but SRP doesn't do that in a cross-dialect form. At some point you need to limit the depth of your traversal by use of proxies or else you'll end up saving far more junk than you anticipated and is reasonable.

</plb>

 

It gets tricky if you attempt to support more than simple blocks or if you want to translate bytecodes (which I'd also prototyped). If you really think you need to serialize blocks then SRP is flexible enough to let you define how you want it done.

 


excellent.
 

Some Smalltalk dialects (like VA in particular) do not have an efficient two-way become. You'll find that most serialization tools expect there to be an efficient two-way become to substitute one object for another on load. SRP however has a unique way to fix-up references that is efficient for all dialects. SRP has a wide variety of object substitution hooks for both saving and loading that preserve graph relationship integrity without screwing up original objects. SRP also has support for proxy objects that can be managed by application code.


Where (classes/methods/tests) can I take a look how do you manage those proxies? it sounds interestng. The same for the object sustitutio hook.
 

 

<plb>

One of several ways is to tell the class to save *referenced* instances as a proxy so that it goes through #saveProxy: and #loadProxy methods with ways to customize. Direct saves of those kinds of objects are not proxies. You define the proxy representation. The context of both saving and loading is provided to you by SRP for the proxy.

 

SRP placeholders are temporary objects that are part of a graph being loaded. The placeholders are removed incrementally as the load of each object and any exchanges are completed. Placeholders are normally entirely gone by the time a load completes, but there are times when a few may be kept longer for post-load actions. You can for example declare a post-load action for an object which then gets wrapped with an action item that you control within the context of the full graph load.

</plb>

 

 

The main thing wrong with SRP is that it is not the framework that "you" created. SRP was the first binary serialization tool to focus on Smalltalk dialect portability. I'd argue that it is still the only one that truly accomplished that in a meaningful way. I created SRP by combining proven techniques from the best tools of the time and adding features for portability. SRP was superior to even the dialect-specific frameworks at the time. SRP is not something that I intend to maintain and promote. I released it open source some ten years ago in the hope that others would do that. A lot of effort and sacrifice was put into SRP "for the benefit of others". SRP taught me a painful lesson about human nature and the perception of value. Programmers (myself included) love to solve problems more than learn about existing solutions. Everyone wants to solve problems like this their own way and thinks they have a good reason that they must do it their way. "Yet another" was an excellent subject line.


I will speak just for Fuel. I don't think this is really a problem. This that you mention is so known that it has even a name: trade-off. If you find a way to be really fast in serializtion, materialization and be portable at the same time, then I am all ears. For me it is perfect to have different kind of serializers. Do you want something portable and be able to even edit it with a text editor?  then use SIXX. Do you want a portable solution with a more or less good performance? then use StOMP, SRP, etc. Do you want something really fast (mostly at materializtion time) which is not focused in portability? then use Fuel. Is that bad ??    Now in Pharo people are doing Opal compiler, which is 3 times slower than the old one. Why we are not agains that?  again, trade-off. Old Compiler is really difficult to understand and maintain. We want something more OO, easy to maintain, to understand and to experiment. 

 

<plb>

SRP uses the same data/nesting sequence as XML. It wouldn't be difficult to create an SRP load marshaler that efficiently loads/translates XML from SRP encoded data. That way the data is both compact and human readable/editable.

</plb>

 


Now, I don't know the reasons but Colin ported SRP to Squeak and the he finally implemented his own S&M serializer. Masashi now implemented StOMP but he also took a look tp SRP. In fact, check the commits in http://www.squeaksource.com/SRP,  He fixed it, and I asked him a couple of questions to make it work. Since this week (a couple of days ago), SRP tests are green in Pharo. So...these guys took a look to SRP, as well as us.

In our case, we even created benchmarks (check package FuelBenchmarksSRP in Fuel repo) to compare Fuel against the rest. I can share the results with you if you want, but tell me first how to disable the mapping layer that makes it slower.

 

<plb>

Keep in mind that SRP was written a long time ago. It won't be everything for everybody, but is a good general base for customization to meet a set of needs. Nobody can look at code they wrote ten years ago and not see a way it could be improved. I'd have certainly done float marshaling differently (as you could customize yourself). I can no longer say it is the fastest option out there because I haven't compared SRP with the performance of anything that came after it. It would be interesting to see how it compares now, but I'd take any measurements with a grain of salt because frameworks do have different features and goals.  That said, SRP isn't bad considering the age and neglect. It is certainly a good starting point for anyone else that wants to do better.

 

The place that nearly all serialization tools have trouble with performance wise is due to the hash size limitations of most IdentityDictionary implementations. Serialization of a graph can easily touch thousands and thousands of objects. When VW IdentityDictionary performance degrades after 16K objects (32K for VA) then that causes a problem for any serialization tool that relies on the IdentityDictionary to save objects. This is the first thing I focus on when I tune SRP (#newHitList).  I've implemented faster identity dictionaries that grow better, but they are not built into SRP. There are also some dialect-specific tuning that can be done in this area.  The tuned version of SRP for GS/S for example makes use of a special hidden map in GS/S for all objects and that GS/S itself uses for their serialization.

 

SRP makes heavy use of #saveUnsigned: and #loadUnsigned. In some dialects using math functions on small integers is faster (and more portable) than bit manipulation. SRP would benefit from primitives to do the work of #saveUnsigned: and #loadUnsigned.

</plb>



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [squeak-dev] Re: [ANN] StOMP - Yet another multi-dialect object serializer

Colin Putney-3
In reply to this post by Mariano Martinez Peck
On Mon, Jun 20, 2011 at 11:50 AM, Mariano Martinez Peck
<[hidden email]> wrote:

> Now, I don't know the reasons but Colin ported SRP to Squeak and the he
> finally implemented his own S&M serializer. Masashi now implemented StOMP
> but he also took a look tp SRP. In fact, check the commits in
> http://www.squeaksource.com/SRP,  He fixed it, and I asked him a couple of
> questions to make it work. Since this week (a couple of days ago), SRP tests
> are green in Pharo. So...these guys took a look to SRP, as well as us.

Well, to be fair, I didn't port SRP, I just used the existing Squeak
port. (Might have made a few tweaks to get it working on recent Squeak
releases.) I used SRP for early versions of Monticello2, but then
switched over to a custom serializer. There were a few reasons for
this:

- I wanted to optimize for encoding efficiency - ie. shorter byte
sequences for a given object graph
- I needed to ensure that a given object graph would always product
exactly the same byte sequence
- I wanted to be able to turn a graph of objects into a sequence of
objects as a separate operation from encoding them as bytes

I might have been able to bend SRP to my will, but I figured it would
be easier to start from scratch. Eventually, I started using this
serialization code in other projects, and separated it out into a
separate package, called Serialization&Materialization. It's very
focussed on my somewhat odd needs, though, and probably not useful for
the general case.

Colin