Magma object serializer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Magma object serializer

tinchodias
Hi,

I want to run some benchmarks with the Magma object serializer... Am I using it in the right way?

Thanks!!
Martin


    | serializer graphBuffer anObject classDefinitionsByteArray graphBufferByteArray loadedObject |

    anObject := Array with: 1 with: 'string'.

    serializer := MaObjectSerializer new.
    graphBuffer := serializer serializeGraph: anObject.
   
    classDefinitionsByteArray := serializer classDefinitionsByteArray.
    graphBufferByteArray := graphBuffer byteArray.

    "put these two bytearrays into a stream, and reload them..."

    loadedObject := MaObjectSerializer new
        classDefinitionsByteArray: classDefinitionsByteArray;
        materializeGraph: graphBufferByteArray

_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Chris Muller-3
Hi Martin and Mariano, thanks for asking.  The answer depends on what
you want to measure.  Benchmarking software as complex as a serializer
is tricky because, as you know, there are multiple functions to
measure which are used independently in various real-world use-cases.

To gain a meaningful understanding of the performance, you need to
bench the most-atomic level of operations a user of a serializer would
use individually.

  - Instantiatation / initialization of a MaObjectSerializer.
  - Serialization of object graphs of various sizes.
  - Materialization of said graphs.
  - Also, if doing comparisons to other serializers (which, I'm
guessing you are) it is crucial to ensure each serializer is
configured to serialize the same number of objects (i.e., the same
depth, etc.) as the other serializers being compared to.
  - It's also important to discover whether any special
configuration-options / preferences which affect the performance can
be used.

So, given the one example object you've provided, here are some
starter scripts which could be used for benching useful serialization
operations with MaObjectSerializer.

"Initialization"
[ MaObjectSerializer new ] bench.

"Serialization"
| obj ser |
obj := Array with: 1 with: 'string'.
ser := MaObjectSerializer new.
[ ser serializeGraph: obj ] bench

"Materialization"
| obj ser ba |
obj := Array with: 1 with: 'string'.
ser := MaObjectSerializer new.
ba := (ser serializeGraph: obj) byteArray.
[ ser materializeGraph: ba ] bench

I have not researched any speed optimizations of MaObjectSerializer in
many years, so I'm sure it will not be as fast as Fuel if it was
designed for speed from the ground up.  I need to profile and revisit
performance aspects of MaObjectSerializer.

As I said, benching is tricky, and publishing comparisons even
trickier, and so I do appreciate your asking for my input and hope
whatever you publish to the world will be based on fair, responsible
measuring.  Toward that end, I support you and thank you for your work
on Fuel.

Regards,
  Chris



On Thu, Jun 2, 2011 at 12:14 AM, Martin Dias <[hidden email]> wrote:

> Hi,
>
> I want to run some benchmarks with the Magma object serializer... Am I using
> it in the right way?
>
> Thanks!!
> Martin
>
>
>     | serializer graphBuffer anObject classDefinitionsByteArray
> graphBufferByteArray loadedObject |
>
>     anObject := Array with: 1 with: 'string'.
>
>     serializer := MaObjectSerializer new.
>     graphBuffer := serializer serializeGraph: anObject.
>
>     classDefinitionsByteArray := serializer classDefinitionsByteArray.
>     graphBufferByteArray := graphBuffer byteArray.
>
>     "put these two bytearrays into a stream, and reload them..."
>
>     loadedObject := MaObjectSerializer new
>         classDefinitionsByteArray: classDefinitionsByteArray;
>         materializeGraph: graphBufferByteArray
>
> _______________________________________________
> Magma mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
>
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Mariano Martinez Peck


On Fri, Jun 3, 2011 at 3:23 AM, Chris Muller <[hidden email]> wrote:
Hi Martin and Mariano, thanks for asking.  The answer depends on what
you want to measure.  Benchmarking software as complex as a serializer
is tricky because, as you know, there are multiple functions to
measure which are used independently in various real-world use-cases.


Yes, exactly.
 
To gain a meaningful understanding of the performance, you need to
bench the most-atomic level of operations a user of a serializer would
use individually.

 - Instantiatation / initialization of a MaObjectSerializer.
 - Serialization of object graphs of various sizes.

If you have by chance some code snippets to generate graphs for testing (maybe you have that in Magma) let us know :)
We can a pice of code that generates binary trees...
 
 - Materialization of said graphs.
 - Also, if doing comparisons to other serializers (which, I'm
guessing you are) it is crucial to ensure each serializer is
configured to serialize the same number of objects (i.e., the same
depth, etc.) as the other serializers being compared to.
 - It's also important to discover whether any special
configuration-options / preferences which affect the performance can
be used.

yes, exactly. This is the most complicated when you do not know that much the serializer. That's why we were asking :)
 

So, given the one example object you've provided, here are some
starter scripts which could be used for benching useful serialization
operations with MaObjectSerializer.

"Initialization"
[ MaObjectSerializer new ] bench.

"Serialization"
| obj ser |
obj := Array with: 1 with: 'string'.
ser := MaObjectSerializer new.
[ ser serializeGraph: obj ] bench


With the rest of the serializers that we do is to serialize the graph into a file. How could we do this with Magma Serializer?
because serializeGraph: answers a MaSerializedGraphBuffer. So, I guess I can ask the byteArray to it and do a nextPutAll: or something like that to our stream?
We also have in Fuel what we call "in memory serialization" that basically returns the byteArray and then we can materialize from that. So this case would be similar to this usage of Magma, wouldn't it ?


 
"Materialization"
| obj ser ba |
obj := Array with: 1 with: 'string'.
ser := MaObjectSerializer new.
ba := (ser serializeGraph: obj) byteArray.
[ ser materializeGraph: ba ] bench


The same question of serializtion.

 

I have not researched any speed optimizations of MaObjectSerializer in
many years, so I'm sure it will not be as fast as Fuel if it was
designed for speed from the ground up.  I need to profile and revisit
performance aspects of MaObjectSerializer.

No problem. Speed is only one more measure and only needed in certain scenarios. And usually speed comes together with trade-offs. In addition, you cannot compare a serializer of a database to a general-purpose serializer. Even if magma serializr could be used outside magma. There are a lot of things that Magma Serializer has to do that maybe other do not need, or things you cannot do because of magma. So, each serializer has its own goals.
 

As I said, benching is tricky, and publishing comparisons even
trickier

yes, exactly. The problem is in addition that there are supported features of a serializer that inpacts on the results of a benchmark. For example, supported class reshapes, initialization after materialization, support transient instVars...etc... to support all those things you usually spend more time. So maybe you support all that and in the results you are slower in comparisson with someone that do not support that. So yes, measuring speed only is not good. But taking into account the rest of the properties is better.

 
, and so I do appreciate your asking for my input and hope
whatever you publish to the world will be based on fair, responsible
measuring.  Toward that end, I support you and thank you for your work
on Fuel.


Thanks Chris.

 
Regards,
 Chris



On Thu, Jun 2, 2011 at 12:14 AM, Martin Dias <[hidden email]> wrote:
> Hi,
>
> I want to run some benchmarks with the Magma object serializer... Am I using
> it in the right way?
>
> Thanks!!
> Martin
>
>
>     | serializer graphBuffer anObject classDefinitionsByteArray
> graphBufferByteArray loadedObject |
>
>     anObject := Array with: 1 with: 'string'.
>
>     serializer := MaObjectSerializer new.
>     graphBuffer := serializer serializeGraph: anObject.
>
>     classDefinitionsByteArray := serializer classDefinitionsByteArray.
>     graphBufferByteArray := graphBuffer byteArray.
>
>     "put these two bytearrays into a stream, and reload them..."
>
>     loadedObject := MaObjectSerializer new
>         classDefinitionsByteArray: classDefinitionsByteArray;
>         materializeGraph: graphBufferByteArray
>
> _______________________________________________
> Magma mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
>



--
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Chris Muller-4
>>  - Instantiatation / initialization of a MaObjectSerializer.
>>  - Serialization of object graphs of various sizes.
>
> If you have by chance some code snippets to generate graphs for testing
> (maybe you have that in Magma) let us know :)
> We can a pice of code that generates binary trees...

Yes, see MaFixtureFactory.

  MaFixtureFactory current samples

or

  MaFixtureFactory current knot

There was a discussion recently about comparing object graphs -- you
may be interested in MaObjectSerializerTester, and how it verifies
serialized-->remateralized object graphs of any shape against the
original fixture graph.  Very useful for ensuring your serializer is
_working_.  :)  See MaObjectSerializerTestCase>>#testSamples which
sends #maEquivalentForSerializationTest: to determine that..

>> "Serialization"
>> | obj ser |
>> obj := Array with: 1 with: 'string'.
>> ser := MaObjectSerializer new.
>> [ ser serializeGraph: obj ] bench
>>
>
> With the rest of the serializers that we do is to serialize the graph into a
> file. How could we do this with Magma Serializer?
> because serializeGraph: answers a MaSerializedGraphBuffer. So, I guess I can
> ask the byteArray to it and do a nextPutAll: or something like that to our
> stream?

Yes.  I, of course, always appreciated the elegance of the notion that
MaObjectSerializer could operate directly on Streams, but the problem
is that I also want a secure client-server protocol which wraps the
serialized requests and responses.  So to, for example, calculate a
MAC, the full byteArray of the request is required in advance.  It's
ok though, serialized / materialized objects have to fit into memory
anyway, so a streaming API doesn't really offer any practical
advantage - just elegance.

> We also have in Fuel what we call "in memory serialization" that basically
> returns the byteArray and then we can materialize from that. So this case
> would be similar to this usage of Magma, wouldn't it ?

Yeah - similar to use of "Ma object serializer".  (IOW, you don't have
to load Magma to use MaObjectSerializer), you can just load MaBase.

>> As I said, benching is tricky, and publishing comparisons even
>> trickier
>
> yes, exactly. The problem is in addition that there are supported features
> of a serializer that inpacts on the results of a benchmark. For example,
> supported class reshapes, initialization after materialization, support
> transient instVars...etc... to support all those things you usually spend
> more time. So maybe you support all that and in the results you are slower
> in comparisson with someone that do not support that. So yes, measuring
> speed only is not good. But taking into account the rest of the properties
> is better.

Yip!  Glad you said that.

 - Chris
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

NorbertHartl

Am 03.06.2011 um 18:30 schrieb Chris Muller:

> There was a discussion recently about comparing object graphs
Do you have a link to the thread? I somehow missed that and cannot find it.

Norbert
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Chris Muller-3
I shouldn't have said, "a discussion about".  It was only mentioned.
I was referring to "ESUG SummerTalk - Fuel, binary object serializer"
in the PHaro list.

On Fri, Jun 3, 2011 at 6:34 PM, Norbert Hartl <[hidden email]> wrote:

>
> Am 03.06.2011 um 18:30 schrieb Chris Muller:
>
>> There was a discussion recently about comparing object graphs
> Do you have a link to the thread? I somehow missed that and cannot find it.
>
> Norbert
> _______________________________________________
> Magma mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

tinchodias
In reply to this post by Chris Muller-4
Hi Chris,

Sorry for my delay, thank you very much for your answer and for the tips about benchmarking. I also agree in that both serializers have different purposes and so different features and limitations and so looking who is the fastest is not the best comparison.

Another question: have you benchmarked the memory use while serializing or materializing? I have never. I know that there is something called SpaceTally but I don't know about it.

Cheers,
Martin
 

On Fri, Jun 3, 2011 at 1:30 PM, Chris Muller <[hidden email]> wrote:
>>  - Instantiatation / initialization of a MaObjectSerializer.
>>  - Serialization of object graphs of various sizes.
>
> If you have by chance some code snippets to generate graphs for testing
> (maybe you have that in Magma) let us know :)
> We can a pice of code that generates binary trees...

Yes, see MaFixtureFactory.

 MaFixtureFactory current samples

or

 MaFixtureFactory current knot

There was a discussion recently about comparing object graphs -- you
may be interested in MaObjectSerializerTester, and how it verifies
serialized-->remateralized object graphs of any shape against the
original fixture graph.  Very useful for ensuring your serializer is
_working_.  :)  See MaObjectSerializerTestCase>>#testSamples which
sends #maEquivalentForSerializationTest: to determine that..

>> "Serialization"
>> | obj ser |
>> obj := Array with: 1 with: 'string'.
>> ser := MaObjectSerializer new.
>> [ ser serializeGraph: obj ] bench
>>
>
> With the rest of the serializers that we do is to serialize the graph into a
> file. How could we do this with Magma Serializer?
> because serializeGraph: answers a MaSerializedGraphBuffer. So, I guess I can
> ask the byteArray to it and do a nextPutAll: or something like that to our
> stream?

Yes.  I, of course, always appreciated the elegance of the notion that
MaObjectSerializer could operate directly on Streams, but the problem
is that I also want a secure client-server protocol which wraps the
serialized requests and responses.  So to, for example, calculate a
MAC, the full byteArray of the request is required in advance.  It's
ok though, serialized / materialized objects have to fit into memory
anyway, so a streaming API doesn't really offer any practical
advantage - just elegance.

> We also have in Fuel what we call "in memory serialization" that basically
> returns the byteArray and then we can materialize from that. So this case
> would be similar to this usage of Magma, wouldn't it ?

Yeah - similar to use of "Ma object serializer".  (IOW, you don't have
to load Magma to use MaObjectSerializer), you can just load MaBase.

>> As I said, benching is tricky, and publishing comparisons even
>> trickier
>
> yes, exactly. The problem is in addition that there are supported features
> of a serializer that inpacts on the results of a benchmark. For example,
> supported class reshapes, initialization after materialization, support
> transient instVars...etc... to support all those things you usually spend
> more time. So maybe you support all that and in the results you are slower
> in comparisson with someone that do not support that. So yes, measuring
> speed only is not good. But taking into account the rest of the properties
> is better.

Yip!  Glad you said that.

 - Chris


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Mariano Martinez Peck
In reply to this post by Chris Muller-4


On Fri, Jun 3, 2011 at 6:30 PM, Chris Muller <[hidden email]> wrote:
>>  - Instantiatation / initialization of a MaObjectSerializer.
>>  - Serialization of object graphs of various sizes.
>
> If you have by chance some code snippets to generate graphs for testing
> (maybe you have that in Magma) let us know :)
> We can a pice of code that generates binary trees...

Yes, see MaFixtureFactory.

 MaFixtureFactory current samples

or

 MaFixtureFactory current knot

There was a discussion recently about comparing object graphs -- you
may be interested in MaObjectSerializerTester, and how it verifies
serialized-->remateralized object graphs of any shape against the
original fixture graph.  Very useful for ensuring your serializer is
_working_.  :)  See MaObjectSerializerTestCase>>#testSamples which
sends #maEquivalentForSerializationTest: to determine that..


Thanks Chris. We have looked at it (and still are).

 
>> "Serialization"
>> | obj ser |
>> obj := Array with: 1 with: 'string'.
>> ser := MaObjectSerializer new.
>> [ ser serializeGraph: obj ] bench
>>
>
> With the rest of the serializers that we do is to serialize the graph into a
> file. How could we do this with Magma Serializer?
> because serializeGraph: answers a MaSerializedGraphBuffer. So, I guess I can
> ask the byteArray to it and do a nextPutAll: or something like that to our
> stream?

Yes.

Ok. So if aStream is a fileStream for example, then the following two methods are correct:

>> serialize: anObject on: aStream
   
    | serializer graphBuffer classDefinitionsByteArray graphBufferByteArray |
    aStream binary.
    serializer := MaObjectSerializer new.
    graphBuffer := serializer serializeGraph: anObject.
   
    classDefinitionsByteArray := serializer classDefinitionsByteArray.
    graphBufferByteArray := graphBuffer byteArray.

    self nextByteArrayPut: classDefinitionsByteArray on: aStream.
    self nextByteArrayPut: graphBufferByteArray on: aStream.


and

>> materializeFrom: aStream

    | size classDefinitionsByteArray graphBufferByteArray |
    aStream binary.
    classDefinitionsByteArray := self nextByteArrayFrom: aStream.
    graphBufferByteArray := self nextByteArrayFrom: aStream.

    ^ MaObjectSerializer new
        classDefinitionsByteArray: classDefinitionsByteArray;
        materializeGraph: graphBufferByteArray


>> nextByteArrayPut: aByteArray on: aWriteStream

    aWriteStream
        nextNumber: 4 put: aByteArray size;
        nextPutAll: aByteArray
   

>> nextByteArrayFrom: aReadStream

    ^ aReadStream next: (aReadStream nextNumber: 4)


is correct?


 
 I, of course, always appreciated the elegance of the notion that
MaObjectSerializer could operate directly on Streams,

what do you mean to "operate directly on Streams" ?
 
but the problem
is that I also want a secure client-server protocol which wraps the
serialized requests and responses.  So to, for example, calculate a
MAC, the full byteArray of the request is required in advance.  It's
ok though, serialized / materialized objects have to fit into memory
anyway, so a streaming API doesn't really offer any practical
advantage - just elegance.

ok I understand.
 

> We also have in Fuel what we call "in memory serialization" that basically
> returns the byteArray and then we can materialize from that. So this case
> would be similar to this usage of Magma, wouldn't it ?

Yeah - similar to use of "Ma object serializer".  (IOW, you don't have
to load Magma to use MaObjectSerializer), you can just load MaBase.

Good!! we didn't know. Martin fixed that now :)
 


--
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Chris Muller-3
> Ok. So if aStream is a fileStream for example, then the following two
> methods are correct:
>
>>> serialize: anObject on: aStream
> ....
> and
>
>>> materializeFrom: aStream
>
>     | size classDefinitionsByteArray graphBufferByteArray |
> ...
>
> is correct?

Hmm, well, that might work, but you should just use the helper methods
that are already provided for this, and which essentially do exactly
the same thing.

To serialize an object to a file, you may use
MaObjectSerializer>>#fileOut:toFileNamed:in:, which calls
MaObjectSerializer>>#object:toStream: (operates on any binary
WriteStream).

For materialization, use MaObjectSerializer class>>#fileIn:, which
calls MaObjectSerializer class>>#objectFromStream: (operates on any
binary ReadStream).  BTW, I just noticed these two methods are
incorrectly categorized under 'debugging', they should be under their
own category called 'file' or something..

These are just convenience methods for saving / loading users work to
a single file.  If you would need to load multiple files where
performance is concerned, you would want to try to instantiate only
one serializer and use it for all of them.

 - Chris
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Mariano Martinez Peck


On Thu, Jun 9, 2011 at 10:43 PM, Chris Muller <[hidden email]> wrote:
> Ok. So if aStream is a fileStream for example, then the following two
> methods are correct:
>
>>> serialize: anObject on: aStream
> ....
> and
>
>>> materializeFrom: aStream
>
>     | size classDefinitionsByteArray graphBufferByteArray |
> ...
>
> is correct?

Hmm, well, that might work, but you should just use the helper methods
that are already provided for this, and which essentially do exactly
the same thing.

To serialize an object to a file, you may use
MaObjectSerializer>>#fileOut:toFileNamed:in:, which calls
MaObjectSerializer>>#object:toStream: (operates on any binary
WriteStream).

For materialization, use MaObjectSerializer class>>#fileIn:, which
calls MaObjectSerializer class>>#objectFromStream: (operates on any
binary ReadStream).  BTW, I just noticed these two methods are
incorrectly categorized under 'debugging', they should be under their
own category called 'file' or something..

Thanks Chris. In fact, those methods was the kind of thing I was looking for :)
 

These are just convenience methods for saving / loading users work to
a single file.  If you would need to load multiple files where
performance is concerned, you would want to try to instantiate only
one serializer and use it for all of them.

I am not sure if I understood. In our benchmarks, we have a list of samples and each sample is at the same time an array of objects that we serialize/materialize. For each sample we create instantiate a serializer and a materializer.

I understood now that we should reuse the same serializer/materialize instance for all samples?  if true, why I don't use a Singleton ? I mean...it is not clear for me when to instantiate a serializer.
 
Thanks Chris

--
Mariano
http://marianopeck.wordpress.com


_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Magma object serializer

Chris Muller-3
> I am not sure if I understood. In our benchmarks, we have a list of samples
> and each sample is at the same time an array of objects that we
> serialize/materialize. For each sample we create instantiate a serializer
> and a materializer.
>
> I understood now that we should reuse the same serializer/materialize
> instance for all samples?  if true, why I don't use a Singleton ? I
> mean...it is not clear for me when to instantiate a serializer.

Users of MaObjectSerialization only instantiate one serializer to
handle a related "groups" of objects - usually related by being of the
same set of classes.  In this case, if the samples of your benchmark
are small (e.g., < 100 obejcts), you should reuse the same serializer
for each sample being serialized/materialized.

A generic pattern for improving real-world performance is to off-load
work to an initialization step.  For example, many applications
pre-cache certain objects from a database at system startup (a.k.a.,
the "initialization step").  Since startup of the system is only done
once, it is ok if it takes, say, an additional 10 or 30 seconds to
pre-cache if it means that users will have sub-second response times
after the system comes up rather than response times of 5 seconds..

MaObjectSerialization uses this pattern - it is expensive to
instantiate a MaObjectSerializer (about 500 milliseconds) but, in
exchange, the performance of the serializer is improved.

So if the benchmark is going to include initialization of a new
MaObjectSerializer for each of _many tiny_ "samples", then that is not
a good measurement of how it would be used in actual practice.  The
repeated initializations time will dominate 99% of the time consumed,
and the "benchmark" would favor the serializers which have fast
initialization times but, in fact, may be slower for serialization
and/or materialization.

This is why I stressed it is important to measure each operation -
initialization, serialization, and materialization - individually, so
that interpretation of the results can be made with respect to how it
would be used.

It's up to you, of course.  But to bring the benchmark into a
real-world usage pattern for MaObjectSerializer, I hope you will
consider:  1) measure and report initialization, serialization and
materialization separately, 2) reuse the same serializer for all of
the tiny samples or 3) use very large samples for the benchmark, so
that the initialization cost is not such a large factor.

Regards,
  Chris
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma