Fuel - a fast object deployment tool

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
73 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Eliot Miranda-2
Hi Martin & Mariano,

    regarding filtering.  Yesterday my colleague Yaron and I successfully finished our port of Fuel to Newspeak and are successfully using it to save and restore our data sets; thank you, its a cool framework.  We had to implement two extensions, the first of which the ability to save and restore Newspeak classes, which is complex because these are instantiated classes inside instantiated Newspeak modules, not static Smalltalk classes in the Smalltalk dictionary.  The second extension is the ability to map specific objects to nil, to prune objects on the way out.  I want to discuss this latter extension.

In our data set we have a set of references to objects that are logically not persistent and hence not to be saved.  I'm sure that this will be a common case.  The requirement is for the pickling system to prune certain objects, typically by arranging that when an object graph is pickled, references to the pruned objects are replaced by references to nil.  One way of doing this is as described below, by specifiying per-class lists of instance variables whose referents shoudl not be saved.  But this can be clumsy; there may be references to objects one wants to prune from e.g. more than one class, in which case one may have to provide multiple lists of the relevant inst vars; there may be references to objects one wants to prune from e.g. collections (e.g. sets and dictionaries) in which case the instance variable list approach just doesn't work.

Here are two more general schemes.  VFirst, most directly, Fuel could provide two filters, implemented in the default mapper, or the core analyser.  One is a set of classes whose instances are not to be saved.  Any reference to an instance of a class in the toBePrunedClasses set is saved as nil.  The other is a set of instances that are not to be saved, and also any reference to an instance in the toBePruned set is saved as nil.  Why have both?  It can be convenient and efficient to filter by class (in our case we had many instances of a specific class, all of which should be filtered, and finding them could be time consuming), but filtering by class can be too inflexible, there may indeed be specific instances to exclude (thing for example of part of the object graph that functions as a cache; pruning the specific objects in the cache is the right thing to do; pruning all instances of classes whose instances exist in the cache may prune too much).

As an example here's how we implemented pruning.  Our system is called Glue, and we start with a mapper for Glue objects, FLGlueMapper:

FLMapper subclass: #FLGlueMapper
instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster modelClasses'
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Mappers'

It accepts newspeak objects and filters instances in the prunedObjectsClasses set, and as a side-effect collects certain classes that we need in a manifest:

FLGlueMapper>>accepts: anObject
"Tells if the received object is handled by this analyzer.  We want to hand-off
instantiated Newspeak classes to the newspeakClassesCluster, and we want
to record other model classes.  We want to filter-out instances of any class
in prunedObjectClasses."
^anObject isBehavior
ifTrue:
[(self isInstantiatedNewspeakClass: anObject) 
ifTrue: [true]
ifFalse:
[(anObject inheritsFrom: GlueDataObject) ifTrue:
[modelClasses add: anObject].
false]]
ifFalse:
[prunedObjectClasses includes: anObject class]

It prunes by mapping instances of the prunedObjectClasses to a special cluster.  It can do this in visitObject: since any newspeak objects it is accepting will be visited in its visitClassOrTrait: method (i.e. it's implicit that all arguments to visitObjects: are instances of the prunedObjectsClasses set).

FLGlueMapper>>visitObject: anObject

analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

FLPrunedObjectsCluster is a specialization of the nil,true,false cluster that maps its objects to nil:

FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Clusters'

FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream

super serialize: nil on: aWriteStream


So this would generalize by the analyser having an e.g. FLPruningMapper as the first mapper, and this having a prunedObjects and a priunedObjectClasses set and going something like this:

FLPruningMapper>>accepts: anObject
^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: anObject class]

FLPruningMapper >>visitObject: anObject
analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

and then one would provide accessors in FLSerialzer and/or FLAnalyser to add objects and classes to the prunedObjects and prunedObjectClasses set.

For efficiency one could arrange that the FLPruningMapper was not added to the sequence of mappers unless and until objects or classes were added to the prunedObjects and prunedObjectClasses set.

I think both Yaron and I feel the Fuel framework is comprehensible and flexible.  We enjoyed using it and while we took two passes at coming up with the pruning scheme we liked (our first was based on not serializing specific ins vars and was much more complex than our second, based on pruning instances of specific classes) we got there quickly and will very little frustration along the way.  Thank you very much.

Finally, a couple of things.  First, it may be more flexible to implement fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to override certain parts of the mapping framework an implementation can access the analyser to find existing clusters, e.g.

MyClass>>fuelClusterIn: anFLAnalyser
^self shouldBeInASpecialCluster
ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
ifFalse: [super fuelClusterIn: anFLAnalyser]

This makes it easier to find a specific unique cluster to handle a group of objects specially.

Lastly, the class-side cluster ids are a bit of a pain.  It would be nice to know a) are these byte values or general integer values, i.e. can there be more than 256 types of cluster?, and b) is there any meaning to the ids?  For example, are clusters ordered by id, or is this just an integer tag?  Also, some class-side code to assign an unused id would be nice.

You might think of virtualizing the id scheme.  For example, if FLCluster maintained a weak array of all its subclasses then the id of a cluster could be the index in the array, and the array could be cleaned up occasionally.  Then each fuel serialization could start with the list of cluster class names and ids, so that specific values of ids are specific to a particular serialization.

again thanks for a great framework.

best,
Eliot

On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]> wrote:
Hi Martin and Mariano,

    a couple of questions.  What's the right way to exclude certain objects from the serialization?  Is there a way of excluding certain inst vars from certain objects?



Eliot and the rest....Martin implemented this feature in Fuel-MartinDias.258. For the moment, we decided to put #fuelIgnoredInstanceVariableNames at class side.

Behavior >> fuelIgnoredInstanceVariableNames
    "Indicates which variables have to be ignored during serialization."

    ^#()


MyClass class >> fuelIgnoredInstanceVariableNames
  ^ #('instVar1')


The impact in speed is nothing, so this is good. Now....we were thinking if it is common to need that 2 different instances of the same class need different instVars to ignore. Is this common ? do you usually need this ?  We checked in SIXX and it is at instance side. Java uses the prefix 'transient' so it is at class side...

thanks



Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse

On Jun 15, 2011, at 8:29 PM, Eliot Miranda wrote:

> Hi Martin & Mariano,
>
>     regarding filtering.  Yesterday my colleague Yaron and I successfully finished our port of Fuel to Newspeak and are successfully using it to save and restore our data sets; thank you, its a cool framework.

I'm happy to see that we picked up good projects and guys: my consulting money is well spent I'm happy.

>  We had to implement two extensions, the first of which the ability to save and restore Newspeak classes, which is complex because these are instantiated classes inside instantiated Newspeak modules, not static Smalltalk classes in the Smalltalk dictionary.  The second extension is the ability to map specific objects to nil, to prune objects on the way out.  I want to discuss this latter extension.
>
> In our data set we have a set of references to objects that are logically not persistent and hence not to be saved.  I'm sure that this will be a common case.  The requirement is for the pickling system to prune certain objects, typically by arranging that when an object graph is pickled, references to the pruned objects are replaced by references to nil.  One way of doing this is as described below, by specifiying per-class lists of instance variables whose referents shoudl not be saved.  But this can be clumsy; there may be references to objects one wants to prune from e.g. more than one class, in which case one may have to provide multiple lists of the relevant inst vars;

yes I imagine that you can have different cases for the same class.

> there may be references to objects one wants to prune from e.g. collections (e.g. sets and dictionaries) in which case the instance variable list approach just doesn't work.
>
> Here are two more general schemes.  VFirst, most directly, Fuel could provide two filters, implemented in the default mapper, or the core analyser.  One is a set of classes whose instances are not to be saved.

yes like TranscriptStream

>  Any reference to an instance of a class in the toBePrunedClasses set is saved as nil.  The other is a set of instances that are not to be saved, and also any reference to an instance in the toBePruned set is saved as nil.  Why have both?  It can be convenient and efficient to filter by class (in our case we had many instances of a specific class, all of which should be filtered, and finding them could be time consuming), but filtering by class can be too inflexible, there may indeed be specific instances to exclude (thing for example of part of the object graph that functions as a cache; pruning the specific objects in the cache is the right thing to do; pruning all instances of classes whose instances exist in the cache may prune too much).

Yes I have the impression that we need both too.

>
> As an example here's how we implemented pruning.  Our system is called Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>
> FLMapper subclass: #FLGlueMapper
> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster modelClasses'
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Fuel-Core-Mappers'
>
> It accepts newspeak objects and filters instances in the prunedObjectsClasses set, and as a side-effect collects certain classes that we need in a manifest:
>
> FLGlueMapper>>accepts: anObject
> "Tells if the received object is handled by this analyzer.  We want to hand-off
> instantiated Newspeak classes to the newspeakClassesCluster, and we want
> to record other model classes.  We want to filter-out instances of any class
> in prunedObjectClasses."
> ^anObject isBehavior
> ifTrue:
> [(self isInstantiatedNewspeakClass: anObject)
> ifTrue: [true]
> ifFalse:
> [(anObject inheritsFrom: GlueDataObject) ifTrue:
> [modelClasses add: anObject].
> false]]
> ifFalse:
> [prunedObjectClasses includes: anObject class]
>
> It prunes by mapping instances of the prunedObjectClasses to a special cluster.  It can do this in visitObject: since any newspeak objects it is accepting will be visited in its visitClassOrTrait: method (i.e. it's implicit that all arguments to visitObjects: are instances of the prunedObjectsClasses set).
>
> FLGlueMapper>>visitObject: anObject
>
> analyzer
> mapAndTrace: anObject  
> to: FLPrunedObjectsCluster instance
> into: analyzer clustersWithBaselevelObjects
>
> FLPrunedObjectsCluster is a specialization of the nil,true,false cluster that maps its objects to nil:
>
> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Fuel-Core-Clusters'
>
> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>
> super serialize: nil on: aWriteStream
>
>
> So this would generalize by the analyser having an e.g. FLPruningMapper as the first mapper, and this having a prunedObjects and a priunedObjectClasses set and going something like this:
>
> FLPruningMapper>>accepts: anObject
> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: anObject class]
>
> FLPruningMapper >>visitObject: anObject
> analyzer
> mapAndTrace: anObject  
> to: FLPrunedObjectsCluster instance
> into: analyzer clustersWithBaselevelObjects
>
> and then one would provide accessors in FLSerialzer and/or FLAnalyser to add objects and classes to the prunedObjects and prunedObjectClasses set.
>
> For efficiency one could arrange that the FLPruningMapper was not added to the sequence of mappers unless and until objects or classes were added to the prunedObjects and prunedObjectClasses set.
>
> I think both Yaron and I feel the Fuel framework is comprehensible and flexible.  We enjoyed using it and while we took two passes at coming up with the pruning scheme we liked (our first was based on not serializing specific ins vars and was much more complex than our second, based on pruning instances of specific classes) we got there quickly and will very little frustration along the way.  Thank you very much.

No thank you for the feedback. We are writing two papers and it will help the master of Martin and probably helping PhD funding if we can say that people really use his work.

> Finally, a couple of things.  First, it may be more flexible to implement fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to override certain parts of the mapping framework an implementation can access the analyser to find existing clusters, e.g.
>
> MyClass>>fuelClusterIn: anFLAnalyser
> ^self shouldBeInASpecialCluster
> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
> ifFalse: [super fuelClusterIn: anFLAnalyser]
>
> This makes it easier to find a specific unique cluster to handle a group of objects specially.
>
> Lastly, the class-side cluster ids are a bit of a pain.  It would be nice to know a) are these byte values or general integer values, i.e. can there be more than 256 types of cluster?, and b) is there any meaning to the ids?  For example, are clusters ordered by id, or is this just an integer tag?  Also, some class-side code to assign an unused id would be nice.
>
> You might think of virtualizing the id scheme.  For example, if FLCluster maintained a weak array of all its subclasses then the id of a cluster could be the index in the array, and the array could be cleaned up occasionally.  Then each fuel serialization could start with the list of cluster class names and ids, so that specific values of ids are specific to a particular serialization.

We will have to think about that
What is important is that Fuel should support change shape and evolution.

>
> again thanks for a great framework.
>
> best,
> Eliot
>
> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck <[hidden email]> wrote:
>
>
> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]> wrote:
> Hi Martin and Mariano,
>
>     a couple of questions.  What's the right way to exclude certain objects from the serialization?  Is there a way of excluding certain inst vars from certain objects?
>
>
>
> Eliot and the rest....Martin implemented this feature in Fuel-MartinDias.258. For the moment, we decided to put #fuelIgnoredInstanceVariableNames at class side.
>
> Behavior >> fuelIgnoredInstanceVariableNames
>     "Indicates which variables have to be ignored during serialization."
>
>     ^#()
>
>
> MyClass class >> fuelIgnoredInstanceVariableNames
>   ^ #('instVar1')
>
>
> The impact in speed is nothing, so this is good. Now....we were thinking if it is common to need that 2 different instances of the same class need different instVars to ignore. Is this common ? do you usually need this ?  We checked in SIXX and it is at instance side. Java uses the prefix 'transient' so it is at class side...
>
> thanks
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Mariano Martinez Peck
In reply to this post by Eliot Miranda-2


On Wed, Jun 15, 2011 at 8:29 PM, Eliot Miranda <[hidden email]> wrote:
Hi Martin & Mariano,

    regarding filtering.  Yesterday my colleague Yaron and I successfully finished our port of Fuel to Newspeak and are successfully using it to save and restore our data sets; thank you, its a cool framework.  

Thanks Eliot. These are nice words. We are used to receive critics like "yet another serializer?". So it is good for us to know you like it and that could port it even to Newspeak. That means, at some point, that Fuel design is good. And this is good because we spend a lot of time in design, in good class names, good method names, class comments, tests, benchmarks, etc.
 
We had to implement two extensions, the first of which the ability to save and restore Newspeak classes, which is complex because these are instantiated classes inside instantiated Newspeak modules, not static Smalltalk classes in the Smalltalk dictionary.  The second extension is the ability to map specific objects to nil, to prune objects on the way out.  I want to discuss this latter extension.

In our data set we have a set of references to objects that are logically not persistent and hence not to be saved.  I'm sure that this will be a common case.  The requirement is for the pickling system to prune certain objects, typically by arranging that when an object graph is pickled, references to the pruned objects are replaced by references to nil.  One way of doing this is as described below, by specifiying per-class lists of instance variables whose referents shoudl not be saved.  

Exactly. At least this was implemented this week in  Fuel-MartinDias.259. Basically, you can define a class method #fuelIgnoredInstanceVariableNames and all instances of such class will ignore those instVarNames. Example:

MyClass class >> fuelIgnoredInstanceVariableNames
^ #(instVar1)


But this can be clumsy; there may be references to objects one wants to prune from e.g. more than one class, in which case one may have to provide multiple lists of the relevant inst vars; there may be references to objects one wants to prune from e.g. collections (e.g. sets and dictionaries) in which case the instance variable list approach just doesn't work.

+1  Do you have an example?  For example, you may don't want to serialize classes of #specialObjectsArray? 
 

Here are two more general schemes.  VFirst, most directly, Fuel could provide two filters, implemented in the default mapper, or the core analyser.  One is a set of classes whose instances are not to be saved.  Any reference to an instance of a class in the toBePrunedClasses set is saved as nil.  

Yes, I really want that. I discussed with Dale because this is what they have in GemStone. There, they have DbTransient and they can do  "aClass makeInstancesDbTransient". After that, all instnaces of aClass will be ignored.
In addition, they have TransientValue, which is a class that it is always ignored.   So you can use an instance of TransientValue to wrap the actual value want to be transient ... kind of value holder

What do you think about it ?  it sounds really similar to what you suggest :)
 
The other is a set of instances that are not to be saved, and also any reference to an instance in the toBePruned set is saved as nil.  Why have both?  It can be convenient and efficient to filter by class (in our case we had many instances of a specific class, all of which should be filtered, and finding them could be time consuming), but filtering by class can be too inflexible, there may indeed be specific instances to exclude (thing for example of part of the object graph that functions as a cache; pruning the specific objects in the cache is the right thing to do; pruning all instances of classes whose instances exist in the cache may prune too much).


+999. GemStone has exactly that so I am not surprise that you come to the same conclusion :)

So...to sum up, we want:

- classes whose instances are always ignored. Say #toBePrunedClasses  or #makeInstancesDbTransient
- particular references that we want to ignore. Say TransientValue or something like that.
 
As an example here's how we implemented pruning.  Our system is called Glue, and we start with a mapper for Glue objects, FLGlueMapper:

FLMapper subclass: #FLGlueMapper
instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster modelClasses'
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Mappers'

It accepts newspeak objects and filters instances in the prunedObjectsClasses set, and as a side-effect collects certain classes that we need in a manifest:

FLGlueMapper>>accepts: anObject
"Tells if the received object is handled by this analyzer.  We want to hand-off
instantiated Newspeak classes to the newspeakClassesCluster, and we want
to record other model classes.  We want to filter-out instances of any class
in prunedObjectClasses."
^anObject isBehavior
ifTrue:
[(self isInstantiatedNewspeakClass: anObject) 
ifTrue: [true]
ifFalse:
[(anObject inheritsFrom: GlueDataObject) ifTrue:
[modelClasses add: anObject].
false]]
ifFalse:
[prunedObjectClasses includes: anObject class]

It prunes by mapping instances of the prunedObjectClasses to a special cluster.  It can do this in visitObject: since any newspeak objects it is accepting will be visited in its visitClassOrTrait: method (i.e. it's implicit that all arguments to visitObjects: are instances of the prunedObjectsClasses set).

FLGlueMapper>>visitObject: anObject

analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

FLPrunedObjectsCluster is a specialization of the nil,true,false cluster that maps its objects to nil:

FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Clusters'

FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream

super serialize: nil on: aWriteStream


I understand. But I have a question. This way, imagine an object whose class is in #prunedObjectClasses. So, you will serialize it as nil. BUT, what happens with the references from that object ?  are those references still analyzed ?  Because probably you don't want that.  But you should take care of #referencesOf: anObject do: aBlock
In this case you are lucky because FLNilTrueFalseCluster  uses the empty implementation of FLCluster. So...no problem I think. Martin ?

Check what we do in:

FLObjectCluster  >> referencesOf: anObject do: aBlock

    | ignoredInstanceVariableNames |
    ignoredInstanceVariableNames := theClass fuelIgnoredInstanceVariableNames.
   
    theClass instVarNamesAndOffsetsDo: [:name :index |
        (ignoredInstanceVariableNames includes: name)
            ifFalse: [ aBlock value: (anObject instVarAt: index) ]]
 
so of course, we don't follow those instance variables



So this would generalize by the analyser having an e.g. FLPruningMapper as the first mapper, and this having a prunedObjects and a priunedObjectClasses set and going something like this:

FLPruningMapper>>accepts: anObject
^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: anObject class]

FLPruningMapper >>visitObject: anObject
analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

and then one would provide accessors in FLSerialzer and/or FLAnalyser to add objects and classes to the prunedObjects and prunedObjectClasses set.


Yes, in fact, when you talk about #toBePrunedClasses  you mean something like this:

Behavior >> toBePrunedClasses
FLGlueMapper addPrunnedClass: self

or something like that. I am right?   what I mean is that #toBePrunedClasses  put the class in the array 'prunedObjectClasses'  of FLGlueMapper.
  
For efficiency one could arrange that the FLPruningMapper was not added to the sequence of mappers unless and until objects or classes were added to the prunedObjects and prunedObjectClasses set.


That's a good idea, because those mappers are evaluated for EVERY single object. If 80% of the times it is FLDefaultMapper, then we have to pay the cost of the #accepts: of all the rest of the mappers.
 
I think both Yaron and I feel the Fuel framework is comprehensible and flexible.  

Again. Thanks. This is what people don't see when using other serializers.
 
We enjoyed using it and while we took two passes at coming up with the pruning scheme we liked (our first was based on not serializing specific ins vars and was much more complex than our second, based on pruning instances of specific classes) we got there quickly and will very little frustration along the way.  Thank you very much.

Thanks to you for the wonderful feedback.
 

Finally, a couple of things.  First, it may be more flexible to implement fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to override certain parts of the mapping framework an implementation can access the analyser to find existing clusters, e.g.

MyClass>>fuelClusterIn: anFLAnalyser
^self shouldBeInASpecialCluster
ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
ifFalse: [super fuelClusterIn: anFLAnalyser]

This makes it easier to find a specific unique cluster to handle a group of objects specially.

I understand. Do you have an example?
 

Lastly, the class-side cluster ids are a bit of a pain.

Yes, we know :(
 
 It would be nice to know a) are these byte values or general integer values, i.e. can there be more than 256 types of cluster?, and b) is there any meaning to the ids?  For example, are clusters ordered by id, or is this just an integer tag?  

Good questions. I will let Martin to answer ;)
 
Also, some class-side code to assign an unused id would be nice.


+999
 
You might think of virtualizing the id scheme.  For example, if FLCluster maintained a weak array of all its subclasses then the id of a cluster could be the index in the array, and the array could be cleaned up occasionally.  Then each fuel serialization could start with the list of cluster class names and ids, so that specific values of ids are specific to a particular serialization.


Thanks, good idea.
 
again thanks for a great framework.


Thanks for the feedback. Now, a couple of things I would like to comment:

1) we are preparing a paper right now so we are a concentrated there instead of the code. In addition, we have 5 failing tests that we should fix ;)  So...as far as we finish with that we will continue with all these things you are talking about.

2) I will open issues in our bug tracker for everything we discuss in this thread.

3) Do you have something in mind so that we can ease your port? I mean...if we continue the development...do you plan to get new versions in the future? how are you going to do that?

Thanks a lot,

Mariano

 
best,
Eliot

On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]> wrote:
Hi Martin and Mariano,

    a couple of questions.  What's the right way to exclude certain objects from the serialization?  Is there a way of excluding certain inst vars from certain objects?



Eliot and the rest....Martin implemented this feature in Fuel-MartinDias.258. For the moment, we decided to put #fuelIgnoredInstanceVariableNames at class side.

Behavior >> fuelIgnoredInstanceVariableNames
    "Indicates which variables have to be ignored during serialization."

    ^#()


MyClass class >> fuelIgnoredInstanceVariableNames
  ^ #('instVar1')


The impact in speed is nothing, so this is good. Now....we were thinking if it is common to need that 2 different instances of the same class need different instVars to ignore. Is this common ? do you usually need this ?  We checked in SIXX and it is at instance side. Java uses the prefix 'transient' so it is at class side...

thanks






--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

tinchodias
In reply to this post by Eliot Miranda-2
Hi Eliot,
I am very happy to read your mail.

On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda <[hidden email]> wrote:
Hi Martin & Mariano,

    regarding filtering.  Yesterday my colleague Yaron and I successfully finished our port of Fuel to Newspeak and are successfully using it to save and restore our data sets; thank you, its a cool framework.  We had to implement two extensions, the first of which the ability to save and restore Newspeak classes, which is complex because these are instantiated classes inside instantiated Newspeak modules, not static Smalltalk classes in the Smalltalk dictionary.  The second extension is the ability to map specific objects to nil, to prune objects on the way out.  I want to discuss this latter extension.

In our data set we have a set of references to objects that are logically not persistent and hence not to be saved.  I'm sure that this will be a common case.  The requirement is for the pickling system to prune certain objects, typically by arranging that when an object graph is pickled, references to the pruned objects are replaced by references to nil.  One way of doing this is as described below, by specifiying per-class lists of instance variables whose referents shoudl not be saved.  But this can be clumsy; there may be references to objects one wants to prune from e.g. more than one class, in which case one may have to provide multiple lists of the relevant inst vars; there may be references to objects one wants to prune from e.g. collections (e.g. sets and dictionaries) in which case the instance variable list approach just doesn't work.

Here are two more general schemes.  VFirst, most directly, Fuel could provide two filters, implemented in the default mapper, or the core analyser.  One is a set of classes whose instances are not to be saved.  Any reference to an instance of a class in the toBePrunedClasses set is saved as nil.  The other is a set of instances that are not to be saved, and also any reference to an instance in the toBePruned set is saved as nil.  Why have both?  It can be convenient and efficient to filter by class (in our case we had many instances of a specific class, all of which should be filtered, and finding them could be time consuming), but filtering by class can be too inflexible, there may indeed be specific instances to exclude (thing for example of part of the object graph that functions as a cache; pruning the specific objects in the cache is the right thing to do; pruning all instances of classes whose instances exist in the cache may prune too much).

As an example here's how we implemented pruning.  Our system is called Glue, and we start with a mapper for Glue objects, FLGlueMapper:

FLMapper subclass: #FLGlueMapper
instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster modelClasses'
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Mappers'

It accepts newspeak objects and filters instances in the prunedObjectsClasses set, and as a side-effect collects certain classes that we need in a manifest:

FLGlueMapper>>accepts: anObject
"Tells if the received object is handled by this analyzer.  We want to hand-off
instantiated Newspeak classes to the newspeakClassesCluster, and we want
to record other model classes.  We want to filter-out instances of any class
in prunedObjectClasses."
^anObject isBehavior
ifTrue:
[(self isInstantiatedNewspeakClass: anObject) 
ifTrue: [true]
ifFalse:
[(anObject inheritsFrom: GlueDataObject) ifTrue:
[modelClasses add: anObject].
false]]
ifFalse:
[prunedObjectClasses includes: anObject class]

It prunes by mapping instances of the prunedObjectClasses to a special cluster.  It can do this in visitObject: since any newspeak objects it is accepting will be visited in its visitClassOrTrait: method (i.e. it's implicit that all arguments to visitObjects: are instances of the prunedObjectsClasses set).

FLGlueMapper>>visitObject: anObject

analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

FLPrunedObjectsCluster is a specialization of the nil,true,false cluster that maps its objects to nil:

FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Clusters'

FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream

super serialize: nil on: aWriteStream


So this would generalize by the analyser having an e.g. FLPruningMapper as the first mapper, and this having a prunedObjects and a priunedObjectClasses set and going something like this:

FLPruningMapper>>accepts: anObject
^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: anObject class]

FLPruningMapper >>visitObject: anObject
analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

and then one would provide accessors in FLSerialzer and/or FLAnalyser to add objects and classes to the prunedObjects and prunedObjectClasses set.

For efficiency one could arrange that the FLPruningMapper was not added to the sequence of mappers unless and until objects or classes were added to the prunedObjects and prunedObjectClasses set.


Excellent. I love the botanical metaphor of pruning! Of course we can include FLPruningMapper and FLPrunedObjectsCluster in Fuel.

We are also interested in pruning objects but not necessarily replacing them by nil, but for another user defined objects. For example proxies. We can extend the pruning stuff for doing that.

 
I think both Yaron and I feel the Fuel framework is comprehensible and flexible.  We enjoyed using it and while we took two passes at coming up with the pruning scheme we liked (our first was based on not serializing specific ins vars and was much more complex than our second, based on pruning instances of specific classes) we got there quickly and will very little frustration along the way.  Thank you very much.

:-) thank you!


Finally, a couple of things.  First, it may be more flexible to implement fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to override certain parts of the mapping framework an implementation can access the analyser to find existing clusters, e.g.

MyClass>>fuelClusterIn: anFLAnalyser
^self shouldBeInASpecialCluster
ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
ifFalse: [super fuelClusterIn: anFLAnalyser]

This makes it easier to find a specific unique cluster to handle a group of objects specially.

I can't imagine a concrete example but I see that it is more flexible... the cluster obtained via double dispatch can be anything polymorphic with MySpecialCluster... that's the point?



Lastly, the class-side cluster ids are a bit of a pain.  It would be nice to know a) are these byte values or general integer values, i.e. can there be more than 256 types of cluster?, and b) is there any meaning to the ids?  For example, are clusters ordered by id, or is this just an integer tag?  Also, some class-side code to assign an unused id would be nice.

You might think of virtualizing the id scheme.  For example, if FLCluster maintained a weak array of all its subclasses then the id of a cluster could be the index in the array, and the array could be cleaned up occasionally.  Then each fuel serialization could start with the list of cluster class names and ids, so that specific values of ids are specific to a particular serialization.

I do agree, these ids are an heritage from the first prototypes of fuel, they should be revised. a) yes, it is encoded in only one byte; b) just an integer tag, the only purpose of the id was for decoding fast: read a byte and then look in a dictionary for the corresponding cluster instance. We could even store the cluster class name but that's inefficient.

Virtualizing the id scheme is a good idea. Much more elegant and extensible. The current mechanism not only limits the number of possible clusters, but also "user defined" extensions can collide, for example if your Glue cluster id is the same of the Moose cluster id.

I added an issue in our tracker.

If it makes sense, maybe the weak array you suggest can be also used to avoid instantiating lots of FLObjectCluster like we are doing in Object:

fuelCluster
    ^ self class isVariable
        ifTrue: [ FLVariableObjectCluster for: self class ]
        ifFalse: [ FLFixedObjectCluster for: self class ]
 
the second time you send fuelCluster to an object, it can reuse the cluster instance.



again thanks for a great framework.

Thanks for your words and the feedback. Is Glue published somewhere?

regards
Martin
 


best,
Eliot


 
On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]> wrote:
Hi Martin and Mariano,

    a couple of questions.  What's the right way to exclude certain objects from the serialization?  Is there a way of excluding certain inst vars from certain objects?



Eliot and the rest....Martin implemented this feature in Fuel-MartinDias.258. For the moment, we decided to put #fuelIgnoredInstanceVariableNames at class side.

Behavior >> fuelIgnoredInstanceVariableNames
    "Indicates which variables have to be ignored during serialization."

    ^#()


MyClass class >> fuelIgnoredInstanceVariableNames
  ^ #('instVar1')


The impact in speed is nothing, so this is good. Now....we were thinking if it is common to need that 2 different instances of the same class need different instVars to ignore. Is this common ? do you usually need this ?  We checked in SIXX and it is at instance side. Java uses the prefix 'transient' so it is at class side...

thanks




Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Eliot Miranda-2


On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[hidden email]> wrote:
Hi Eliot,
I am very happy to read your mail.

On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda <[hidden email]> wrote:
Hi Martin & Mariano,

    regarding filtering.  Yesterday my colleague Yaron and I successfully finished our port of Fuel to Newspeak and are successfully using it to save and restore our data sets; thank you, its a cool framework.  We had to implement two extensions, the first of which the ability to save and restore Newspeak classes, which is complex because these are instantiated classes inside instantiated Newspeak modules, not static Smalltalk classes in the Smalltalk dictionary.  The second extension is the ability to map specific objects to nil, to prune objects on the way out.  I want to discuss this latter extension.

In our data set we have a set of references to objects that are logically not persistent and hence not to be saved.  I'm sure that this will be a common case.  The requirement is for the pickling system to prune certain objects, typically by arranging that when an object graph is pickled, references to the pruned objects are replaced by references to nil.  One way of doing this is as described below, by specifiying per-class lists of instance variables whose referents shoudl not be saved.  But this can be clumsy; there may be references to objects one wants to prune from e.g. more than one class, in which case one may have to provide multiple lists of the relevant inst vars; there may be references to objects one wants to prune from e.g. collections (e.g. sets and dictionaries) in which case the instance variable list approach just doesn't work.

Here are two more general schemes.  VFirst, most directly, Fuel could provide two filters, implemented in the default mapper, or the core analyser.  One is a set of classes whose instances are not to be saved.  Any reference to an instance of a class in the toBePrunedClasses set is saved as nil.  The other is a set of instances that are not to be saved, and also any reference to an instance in the toBePruned set is saved as nil.  Why have both?  It can be convenient and efficient to filter by class (in our case we had many instances of a specific class, all of which should be filtered, and finding them could be time consuming), but filtering by class can be too inflexible, there may indeed be specific instances to exclude (thing for example of part of the object graph that functions as a cache; pruning the specific objects in the cache is the right thing to do; pruning all instances of classes whose instances exist in the cache may prune too much).

As an example here's how we implemented pruning.  Our system is called Glue, and we start with a mapper for Glue objects, FLGlueMapper:

FLMapper subclass: #FLGlueMapper
instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster modelClasses'
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Mappers'

It accepts newspeak objects and filters instances in the prunedObjectsClasses set, and as a side-effect collects certain classes that we need in a manifest:

FLGlueMapper>>accepts: anObject
"Tells if the received object is handled by this analyzer.  We want to hand-off
instantiated Newspeak classes to the newspeakClassesCluster, and we want
to record other model classes.  We want to filter-out instances of any class
in prunedObjectClasses."
^anObject isBehavior
ifTrue:
[(self isInstantiatedNewspeakClass: anObject) 
ifTrue: [true]
ifFalse:
[(anObject inheritsFrom: GlueDataObject) ifTrue:
[modelClasses add: anObject].
false]]
ifFalse:
[prunedObjectClasses includes: anObject class]

It prunes by mapping instances of the prunedObjectClasses to a special cluster.  It can do this in visitObject: since any newspeak objects it is accepting will be visited in its visitClassOrTrait: method (i.e. it's implicit that all arguments to visitObjects: are instances of the prunedObjectsClasses set).

FLGlueMapper>>visitObject: anObject

analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

FLPrunedObjectsCluster is a specialization of the nil,true,false cluster that maps its objects to nil:

FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Clusters'

FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream

super serialize: nil on: aWriteStream


So this would generalize by the analyser having an e.g. FLPruningMapper as the first mapper, and this having a prunedObjects and a priunedObjectClasses set and going something like this:

FLPruningMapper>>accepts: anObject
^(prunedObjects includes: anObject) or: [prunedObjectClasses includes: anObject class]

FLPruningMapper >>visitObject: anObject
analyzer 
mapAndTrace: anObject  
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

and then one would provide accessors in FLSerialzer and/or FLAnalyser to add objects and classes to the prunedObjects and prunedObjectClasses set.

For efficiency one could arrange that the FLPruningMapper was not added to the sequence of mappers unless and until objects or classes were added to the prunedObjects and prunedObjectClasses set.


Excellent. I love the botanical metaphor of pruning! Of course we can include FLPruningMapper and FLPrunedObjectsCluster in Fuel.

We are also interested in pruning objects but not necessarily replacing them by nil, but for another user defined objects. For example proxies. We can extend the pruning stuff for doing that.

That was an idea Yaron came up with.  That instead of using fuelIgnoredInstanceVariableNames one uses e.g.

Object>>objectToSerialize
    ^self

and then if one wants to prune specific inst vars in MyClass one implements

MyClass>>objectToSerialize
    ^self shallowCopy prepareForSerialization

MyClass>>prepareForSerialization
    instVarIDontWantToSerialize := nil.
    ^self

and for objects one doesn't want to serlalize one implements

MyNotToBeSerializedClass>>objectToSerialize
    ^nil

So its more general.  But I would pass the analyser in as an argument, which would allow things like

MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
    ^(anFLAnalyser shouldPrune: self)
        ifFalse: [self]
        ifTrue: [nil]

which would of course be the default in Object:

Object>>objectToSerializeIn: anFLAnalyser
    ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]

 

 
I think both Yaron and I feel the Fuel framework is comprehensible and flexible.  We enjoyed using it and while we took two passes at coming up with the pruning scheme we liked (our first was based on not serializing specific ins vars and was much more complex than our second, based on pruning instances of specific classes) we got there quickly and will very little frustration along the way.  Thank you very much.

:-) thank you!


Finally, a couple of things.  First, it may be more flexible to implement fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to override certain parts of the mapping framework an implementation can access the analyser to find existing clusters, e.g.

MyClass>>fuelClusterIn: anFLAnalyser
^self shouldBeInASpecialCluster
ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
ifFalse: [super fuelClusterIn: anFLAnalyser]

This makes it easier to find a specific unique cluster to handle a group of objects specially.

I can't imagine a concrete example but I see that it is more flexible... the cluster obtained via double dispatch can be anything polymorphic with MySpecialCluster... that's the point?

To be honest I'm not sure.  But passing in the analyser in things like fuelCluster or objectToSerialize is I think a good idea as it provides a convenient communication path which in turn provides considerable flexibility.




Lastly, the class-side cluster ids are a bit of a pain.  It would be nice to know a) are these byte values or general integer values, i.e. can there be more than 256 types of cluster?, and b) is there any meaning to the ids?  For example, are clusters ordered by id, or is this just an integer tag?  Also, some class-side code to assign an unused id would be nice.

You might think of virtualizing the id scheme.  For example, if FLCluster maintained a weak array of all its subclasses then the id of a cluster could be the index in the array, and the array could be cleaned up occasionally.  Then each fuel serialization could start with the list of cluster class names and ids, so that specific values of ids are specific to a particular serialization.

I do agree, these ids are an heritage from the first prototypes of fuel, they should be revised. a) yes, it is encoded in only one byte; b) just an integer tag, the only purpose of the id was for decoding fast: read a byte and then look in a dictionary for the corresponding cluster instance. We could even store the cluster class name but that's inefficient.

Yes, but how inefficient?  What's the size of all the cluster names?

    FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1] 670
 
So you'd add less than a kilobyte to the size of each serialization and get complete freedom from ids.  Something to think about.


Virtualizing the id scheme is a good idea. Much more elegant and extensible. The current mechanism not only limits the number of possible clusters, but also "user defined" extensions can collide, for example if your Glue cluster id is the same of the Moose cluster id.

I added an issue in our tracker.

If it makes sense, maybe the weak array you suggest can be also used to avoid instantiating lots of FLObjectCluster like we are doing in Object:

fuelCluster
    ^ self class isVariable
        ifTrue: [ FLVariableObjectCluster for: self class ]
        ifFalse: [ FLFixedObjectCluster for: self class ]
 
the second time you send fuelCluster to an object, it can reuse the cluster instance.

Right.  I think that's important, and is one reason why I think passing in the analyser is important, because it allows certain objects to discover existing clusters in the analyzer and join them if they want to, instead of having to invent and maintain their own cluster uniquing solution

.
again thanks for a great framework.

Thanks for your words and the feedback. Is Glue published somewhere?

No, and its extremely proprietary :)  Newspeak however is available and we may end up maintaining a port of Fuel for Newspeak.

best regards,
Eliot
 

regards
Martin
 


best,
Eliot


 
On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck <[hidden email]> wrote:


On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]> wrote:
Hi Martin and Mariano,

    a couple of questions.  What's the right way to exclude certain objects from the serialization?  Is there a way of excluding certain inst vars from certain objects?



Eliot and the rest....Martin implemented this feature in Fuel-MartinDias.258. For the moment, we decided to put #fuelIgnoredInstanceVariableNames at class side.

Behavior >> fuelIgnoredInstanceVariableNames
    "Indicates which variables have to be ignored during serialization."

    ^#()


MyClass class >> fuelIgnoredInstanceVariableNames
  ^ #('instVar1')


The impact in speed is nothing, so this is good. Now....we were thinking if it is common to need that 2 different instances of the same class need different instVars to ignore. Is this common ? do you usually need this ?  We checked in SIXX and it is at instance side. Java uses the prefix 'transient' so it is at class side...

thanks





Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Nicolas Cellier
2011/6/17 Eliot Miranda <[hidden email]>:

>
>
> On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[hidden email]> wrote:
>>
>> Hi Eliot,
>> I am very happy to read your mail.
>>
>> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda <[hidden email]>
>> wrote:
>>>
>>> Hi Martin & Mariano,
>>>     regarding filtering.  Yesterday my colleague Yaron and I successfully
>>> finished our port of Fuel to Newspeak and are successfully using it to save
>>> and restore our data sets; thank you, its a cool framework.  We had to
>>> implement two extensions, the first of which the ability to save and restore
>>> Newspeak classes, which is complex because these are instantiated classes
>>> inside instantiated Newspeak modules, not static Smalltalk classes in the
>>> Smalltalk dictionary.  The second extension is the ability to map specific
>>> objects to nil, to prune objects on the way out.  I want to discuss this
>>> latter extension.
>>> In our data set we have a set of references to objects that are logically
>>> not persistent and hence not to be saved.  I'm sure that this will be a
>>> common case.  The requirement is for the pickling system to prune certain
>>> objects, typically by arranging that when an object graph is pickled,
>>> references to the pruned objects are replaced by references to nil.  One way
>>> of doing this is as described below, by specifiying per-class lists of
>>> instance variables whose referents shoudl not be saved.  But this can be
>>> clumsy; there may be references to objects one wants to prune from e.g. more
>>> than one class, in which case one may have to provide multiple lists of the
>>> relevant inst vars; there may be references to objects one wants to prune
>>> from e.g. collections (e.g. sets and dictionaries) in which case the
>>> instance variable list approach just doesn't work.
>>> Here are two more general schemes.  VFirst, most directly, Fuel could
>>> provide two filters, implemented in the default mapper, or the core
>>> analyser.  One is a set of classes whose instances are not to be saved.  Any
>>> reference to an instance of a class in the toBePrunedClasses set is saved as
>>> nil.  The other is a set of instances that are not to be saved, and also any
>>> reference to an instance in the toBePruned set is saved as nil.  Why have
>>> both?  It can be convenient and efficient to filter by class (in our case we
>>> had many instances of a specific class, all of which should be filtered, and
>>> finding them could be time consuming), but filtering by class can be too
>>> inflexible, there may indeed be specific instances to exclude (thing for
>>> example of part of the object graph that functions as a cache; pruning the
>>> specific objects in the cache is the right thing to do; pruning all
>>> instances of classes whose instances exist in the cache may prune too much).
>>> As an example here's how we implemented pruning.  Our system is called
>>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>>> FLMapper subclass: #FLGlueMapper
>>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>>> modelClasses'
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'Fuel-Core-Mappers'
>>> It accepts newspeak objects and filters instances in the
>>> prunedObjectsClasses set, and as a side-effect collects certain classes that
>>> we need in a manifest:
>>> FLGlueMapper>>accepts: anObject
>>> "Tells if the received object is handled by this analyzer.  We want to
>>> hand-off
>>> instantiated Newspeak classes to the newspeakClassesCluster, and we want
>>> to record other model classes.  We want to filter-out instances of any
>>> class
>>> in prunedObjectClasses."
>>> ^anObject isBehavior
>>> ifTrue:
>>> [(self isInstantiatedNewspeakClass: anObject)
>>> ifTrue: [true]
>>> ifFalse:
>>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>>> [modelClasses add: anObject].
>>> false]]
>>> ifFalse:
>>> [prunedObjectClasses includes: anObject class]
>>> It prunes by mapping instances of the prunedObjectClasses to a special
>>> cluster.  It can do this in visitObject: since any newspeak objects it is
>>> accepting will be visited in its visitClassOrTrait: method (i.e. it's
>>> implicit that all arguments to visitObjects: are instances of the
>>> prunedObjectsClasses set).
>>> FLGlueMapper>>visitObject: anObject
>>> analyzer
>>> mapAndTrace: anObject
>>> to: FLPrunedObjectsCluster instance
>>> into: analyzer clustersWithBaselevelObjects
>>> FLPrunedObjectsCluster is a specialization of the nil,true,false cluster
>>> that maps its objects to nil:
>>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>>> instanceVariableNames: ''
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'Fuel-Core-Clusters'
>>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>>> super serialize: nil on: aWriteStream
>>>
>>> So this would generalize by the analyser having an e.g. FLPruningMapper
>>> as the first mapper, and this having a prunedObjects and a
>>> priunedObjectClasses set and going something like this:
>>> FLPruningMapper>>accepts: anObject
>>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
>>> anObject class]
>>> FLPruningMapper >>visitObject: anObject
>>> analyzer
>>> mapAndTrace: anObject
>>> to: FLPrunedObjectsCluster instance
>>> into: analyzer clustersWithBaselevelObjects
>>> and then one would provide accessors in FLSerialzer and/or FLAnalyser to
>>> add objects and classes to the prunedObjects and prunedObjectClasses set.
>>> For efficiency one could arrange that the FLPruningMapper was not added
>>> to the sequence of mappers unless and until objects or classes were added
>>> to the prunedObjects and prunedObjectClasses set.
>>
>> Excellent. I love the botanical metaphor of pruning! Of course we can
>> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>>
>> We are also interested in pruning objects but not necessarily replacing
>> them by nil, but for another user defined objects. For example proxies. We
>> can extend the pruning stuff for doing that.
>
> That was an idea Yaron came up with.  That instead of
> using fuelIgnoredInstanceVariableNames one uses e.g.
> Object>>objectToSerialize
>     ^self
> and then if one wants to prune specific inst vars in MyClass one implements
> MyClass>>objectToSerialize
>     ^self shallowCopy prepareForSerialization

Hi Eliot,

I'm not convinced by the shallowCopy solution, except for the simple structures.
If object graph is complex (have share nodes, loops, ...) then you
gonna end up in a replication problem equivalent to the one Fuel is
trying to solve.

Nicolas

> MyClass>>prepareForSerialization
>     instVarIDontWantToSerialize := nil.
>     ^self
> and for objects one doesn't want to serlalize one implements
> MyNotToBeSerializedClass>>objectToSerialize
>     ^nil
> So its more general.  But I would pass the analyser in as an argument, which
> would allow things like
> MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>     ^(anFLAnalyser shouldPrune: self)
>         ifFalse: [self]
>         ifTrue: [nil]
> which would of course be the default in Object:
> Object>>objectToSerializeIn: anFLAnalyser
>     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>
>>
>>
>>>
>>> I think both Yaron and I feel the Fuel framework is comprehensible and
>>> flexible.  We enjoyed using it and while we took two passes at coming up
>>> with the pruning scheme we liked (our first was based on not serializing
>>> specific ins vars and was much more complex than our second, based on
>>> pruning instances of specific classes) we got there quickly and will very
>>> little frustration along the way.  Thank you very much.
>>
>> :-) thank you!
>>
>>>
>>> Finally, a couple of things.  First, it may be more flexible to implement
>>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
>>> override certain parts of the mapping framework an implementation can access
>>> the analyser to find existing clusters, e.g.
>>> MyClass>>fuelClusterIn: anFLAnalyser
>>> ^self shouldBeInASpecialCluster
>>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>>> This makes it easier to find a specific unique cluster to handle a group
>>> of objects specially.
>>
>> I can't imagine a concrete example but I see that it is more flexible...
>> the cluster obtained via double dispatch can be anything polymorphic with
>> MySpecialCluster... that's the point?
>
> To be honest I'm not sure.  But passing in the analyser in things like
> fuelCluster or objectToSerialize is I think a good idea as it provides a
> convenient communication path which in turn provides considerable
> flexibility.
>>
>>
>>>
>>> Lastly, the class-side cluster ids are a bit of a pain.  It would be nice
>>> to know a) are these byte values or general integer values, i.e. can there
>>> be more than 256 types of cluster?, and b) is there any meaning to the ids?
>>>  For example, are clusters ordered by id, or is this just an integer tag?
>>>  Also, some class-side code to assign an unused id would be nice.
>>> You might think of virtualizing the id scheme.  For example, if FLCluster
>>> maintained a weak array of all its subclasses then the id of a cluster could
>>> be the index in the array, and the array could be cleaned up occasionally.
>>>  Then each fuel serialization could start with the list of cluster class
>>> names and ids, so that specific values of ids are specific to a particular
>>> serialization.
>>
>> I do agree, these ids are an heritage from the first prototypes of fuel,
>> they should be revised. a) yes, it is encoded in only one byte; b) just an
>> integer tag, the only purpose of the id was for decoding fast: read a byte
>> and then look in a dictionary for the corresponding cluster instance. We
>> could even store the cluster class name but that's inefficient.
>
> Yes, but how inefficient?  What's the size of all the cluster names?
>     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1] 670
>
> So you'd add less than a kilobyte to the size of each serialization and get
> complete freedom from ids.  Something to think about.
>>
>> Virtualizing the id scheme is a good idea. Much more elegant and
>> extensible. The current mechanism not only limits the number of possible
>> clusters, but also "user defined" extensions can collide, for example if
>> your Glue cluster id is the same of the Moose cluster id.
>>
>> I added an issue in our tracker.
>>
>> If it makes sense, maybe the weak array you suggest can be also used to
>> avoid instantiating lots of FLObjectCluster like we are doing in Object:
>>
>> fuelCluster
>>     ^ self class isVariable
>>         ifTrue: [ FLVariableObjectCluster for: self class ]
>>         ifFalse: [ FLFixedObjectCluster for: self class ]
>>
>> the second time you send fuelCluster to an object, it can reuse the
>> cluster instance.
>
> Right.  I think that's important, and is one reason why I think passing in
> the analyser is important, because it allows certain objects to discover
> existing clusters in the analyzer and join them if they want to, instead of
> having to invent and maintain their own cluster uniquing solution
> .
>>>
>>> again thanks for a great framework.
>>
>> Thanks for your words and the feedback. Is Glue published somewhere?
>
> No, and its extremely proprietary :)  Newspeak however is available and we
> may end up maintaining a port of Fuel for Newspeak.
> best regards,
> Eliot
>
>>
>> regards
>> Martin
>>
>>
>>>
>>> best,
>>> Eliot
>>
>>
>>>
>>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>>> <[hidden email]> wrote:
>>>>
>>>>
>>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi Martin and Mariano,
>>>>>     a couple of questions.  What's the right way to exclude certain
>>>>> objects from the serialization?  Is there a way of excluding certain inst
>>>>> vars from certain objects?
>>>>
>>>>
>>>> Eliot and the rest....Martin implemented this feature in
>>>> Fuel-MartinDias.258. For the moment, we decided to put
>>>> #fuelIgnoredInstanceVariableNames at class side.
>>>>
>>>> Behavior >> fuelIgnoredInstanceVariableNames
>>>>     "Indicates which variables have to be ignored during serialization."
>>>>
>>>>     ^#()
>>>>
>>>>
>>>> MyClass class >> fuelIgnoredInstanceVariableNames
>>>>   ^ #('instVar1')
>>>>
>>>>
>>>> The impact in speed is nothing, so this is good. Now....we were thinking
>>>> if it is common to need that 2 different instances of the same class need
>>>> different instVars to ignore. Is this common ? do you usually need this ?
>>>> We checked in SIXX and it is at instance side. Java uses the prefix
>>>> 'transient' so it is at class side...
>>>>
>>>> thanks
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>>
>>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Eliot Miranda-2


On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier <[hidden email]> wrote:
2011/6/17 Eliot Miranda <[hidden email]>:
>
>
> On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[hidden email]> wrote:
>>
>> Hi Eliot,
>> I am very happy to read your mail.
>>
>> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda <[hidden email]>
>> wrote:
>>>
>>> Hi Martin & Mariano,
>>>     regarding filtering.  Yesterday my colleague Yaron and I successfully
>>> finished our port of Fuel to Newspeak and are successfully using it to save
>>> and restore our data sets; thank you, its a cool framework.  We had to
>>> implement two extensions, the first of which the ability to save and restore
>>> Newspeak classes, which is complex because these are instantiated classes
>>> inside instantiated Newspeak modules, not static Smalltalk classes in the
>>> Smalltalk dictionary.  The second extension is the ability to map specific
>>> objects to nil, to prune objects on the way out.  I want to discuss this
>>> latter extension.
>>> In our data set we have a set of references to objects that are logically
>>> not persistent and hence not to be saved.  I'm sure that this will be a
>>> common case.  The requirement is for the pickling system to prune certain
>>> objects, typically by arranging that when an object graph is pickled,
>>> references to the pruned objects are replaced by references to nil.  One way
>>> of doing this is as described below, by specifiying per-class lists of
>>> instance variables whose referents shoudl not be saved.  But this can be
>>> clumsy; there may be references to objects one wants to prune from e.g. more
>>> than one class, in which case one may have to provide multiple lists of the
>>> relevant inst vars; there may be references to objects one wants to prune
>>> from e.g. collections (e.g. sets and dictionaries) in which case the
>>> instance variable list approach just doesn't work.
>>> Here are two more general schemes.  VFirst, most directly, Fuel could
>>> provide two filters, implemented in the default mapper, or the core
>>> analyser.  One is a set of classes whose instances are not to be saved.  Any
>>> reference to an instance of a class in the toBePrunedClasses set is saved as
>>> nil.  The other is a set of instances that are not to be saved, and also any
>>> reference to an instance in the toBePruned set is saved as nil.  Why have
>>> both?  It can be convenient and efficient to filter by class (in our case we
>>> had many instances of a specific class, all of which should be filtered, and
>>> finding them could be time consuming), but filtering by class can be too
>>> inflexible, there may indeed be specific instances to exclude (thing for
>>> example of part of the object graph that functions as a cache; pruning the
>>> specific objects in the cache is the right thing to do; pruning all
>>> instances of classes whose instances exist in the cache may prune too much).
>>> As an example here's how we implemented pruning.  Our system is called
>>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>>> FLMapper subclass: #FLGlueMapper
>>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>>> modelClasses'
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'Fuel-Core-Mappers'
>>> It accepts newspeak objects and filters instances in the
>>> prunedObjectsClasses set, and as a side-effect collects certain classes that
>>> we need in a manifest:
>>> FLGlueMapper>>accepts: anObject
>>> "Tells if the received object is handled by this analyzer.  We want to
>>> hand-off
>>> instantiated Newspeak classes to the newspeakClassesCluster, and we want
>>> to record other model classes.  We want to filter-out instances of any
>>> class
>>> in prunedObjectClasses."
>>> ^anObject isBehavior
>>> ifTrue:
>>> [(self isInstantiatedNewspeakClass: anObject)
>>> ifTrue: [true]
>>> ifFalse:
>>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>>> [modelClasses add: anObject].
>>> false]]
>>> ifFalse:
>>> [prunedObjectClasses includes: anObject class]
>>> It prunes by mapping instances of the prunedObjectClasses to a special
>>> cluster.  It can do this in visitObject: since any newspeak objects it is
>>> accepting will be visited in its visitClassOrTrait: method (i.e. it's
>>> implicit that all arguments to visitObjects: are instances of the
>>> prunedObjectsClasses set).
>>> FLGlueMapper>>visitObject: anObject
>>> analyzer
>>> mapAndTrace: anObject
>>> to: FLPrunedObjectsCluster instance
>>> into: analyzer clustersWithBaselevelObjects
>>> FLPrunedObjectsCluster is a specialization of the nil,true,false cluster
>>> that maps its objects to nil:
>>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>>> instanceVariableNames: ''
>>> classVariableNames: ''
>>> poolDictionaries: ''
>>> category: 'Fuel-Core-Clusters'
>>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>>> super serialize: nil on: aWriteStream
>>>
>>> So this would generalize by the analyser having an e.g. FLPruningMapper
>>> as the first mapper, and this having a prunedObjects and a
>>> priunedObjectClasses set and going something like this:
>>> FLPruningMapper>>accepts: anObject
>>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
>>> anObject class]
>>> FLPruningMapper >>visitObject: anObject
>>> analyzer
>>> mapAndTrace: anObject
>>> to: FLPrunedObjectsCluster instance
>>> into: analyzer clustersWithBaselevelObjects
>>> and then one would provide accessors in FLSerialzer and/or FLAnalyser to
>>> add objects and classes to the prunedObjects and prunedObjectClasses set.
>>> For efficiency one could arrange that the FLPruningMapper was not added
>>> to the sequence of mappers unless and until objects or classes were added
>>> to the prunedObjects and prunedObjectClasses set.
>>
>> Excellent. I love the botanical metaphor of pruning! Of course we can
>> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>>
>> We are also interested in pruning objects but not necessarily replacing
>> them by nil, but for another user defined objects. For example proxies. We
>> can extend the pruning stuff for doing that.
>
> That was an idea Yaron came up with.  That instead of
> using fuelIgnoredInstanceVariableNames one uses e.g.
> Object>>objectToSerialize
>     ^self
> and then if one wants to prune specific inst vars in MyClass one implements
> MyClass>>objectToSerialize
>     ^self shallowCopy prepareForSerialization

Hi Eliot,

I'm not convinced by the shallowCopy solution, except for the simple structures.
If object graph is complex (have share nodes, loops, ...) then you
gonna end up in a replication problem equivalent to the one Fuel is
trying to solve.

The assumption is that the analyser would create a maximum of one proxy per object in the graph (default, no proxy) and that it would map objects with proxies to their proxies.  So if proxies only nilled out inst vars I don't see a problem.  What's attractive about this is that it provides a general solution to a couple of problems, a) how to replace a class of objects by some substitute (e.g. nil), b) how to prune state that needn't be saved.  It is also conceptually simple; one just creates a proxy instance; no defining metadata, such as inst var names, and hence the code is always up-to-date (e.g. a class redefine won't automatically uncover renamed inst vars in serialization metadata).


Nicolas

> MyClass>>prepareForSerialization
>     instVarIDontWantToSerialize := nil.
>     ^self
> and for objects one doesn't want to serlalize one implements
> MyNotToBeSerializedClass>>objectToSerialize
>     ^nil
> So its more general.  But I would pass the analyser in as an argument, which
> would allow things like
> MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>     ^(anFLAnalyser shouldPrune: self)
>         ifFalse: [self]
>         ifTrue: [nil]
> which would of course be the default in Object:
> Object>>objectToSerializeIn: anFLAnalyser
>     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>
>>
>>
>>>
>>> I think both Yaron and I feel the Fuel framework is comprehensible and
>>> flexible.  We enjoyed using it and while we took two passes at coming up
>>> with the pruning scheme we liked (our first was based on not serializing
>>> specific ins vars and was much more complex than our second, based on
>>> pruning instances of specific classes) we got there quickly and will very
>>> little frustration along the way.  Thank you very much.
>>
>> :-) thank you!
>>
>>>
>>> Finally, a couple of things.  First, it may be more flexible to implement
>>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
>>> override certain parts of the mapping framework an implementation can access
>>> the analyser to find existing clusters, e.g.
>>> MyClass>>fuelClusterIn: anFLAnalyser
>>> ^self shouldBeInASpecialCluster
>>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>>> This makes it easier to find a specific unique cluster to handle a group
>>> of objects specially.
>>
>> I can't imagine a concrete example but I see that it is more flexible...
>> the cluster obtained via double dispatch can be anything polymorphic with
>> MySpecialCluster... that's the point?
>
> To be honest I'm not sure.  But passing in the analyser in things like
> fuelCluster or objectToSerialize is I think a good idea as it provides a
> convenient communication path which in turn provides considerable
> flexibility.
>>
>>
>>>
>>> Lastly, the class-side cluster ids are a bit of a pain.  It would be nice
>>> to know a) are these byte values or general integer values, i.e. can there
>>> be more than 256 types of cluster?, and b) is there any meaning to the ids?
>>>  For example, are clusters ordered by id, or is this just an integer tag?
>>>  Also, some class-side code to assign an unused id would be nice.
>>> You might think of virtualizing the id scheme.  For example, if FLCluster
>>> maintained a weak array of all its subclasses then the id of a cluster could
>>> be the index in the array, and the array could be cleaned up occasionally.
>>>  Then each fuel serialization could start with the list of cluster class
>>> names and ids, so that specific values of ids are specific to a particular
>>> serialization.
>>
>> I do agree, these ids are an heritage from the first prototypes of fuel,
>> they should be revised. a) yes, it is encoded in only one byte; b) just an
>> integer tag, the only purpose of the id was for decoding fast: read a byte
>> and then look in a dictionary for the corresponding cluster instance. We
>> could even store the cluster class name but that's inefficient.
>
> Yes, but how inefficient?  What's the size of all the cluster names?
>     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1] 670
>
> So you'd add less than a kilobyte to the size of each serialization and get
> complete freedom from ids.  Something to think about.
>>
>> Virtualizing the id scheme is a good idea. Much more elegant and
>> extensible. The current mechanism not only limits the number of possible
>> clusters, but also "user defined" extensions can collide, for example if
>> your Glue cluster id is the same of the Moose cluster id.
>>
>> I added an issue in our tracker.
>>
>> If it makes sense, maybe the weak array you suggest can be also used to
>> avoid instantiating lots of FLObjectCluster like we are doing in Object:
>>
>> fuelCluster
>>     ^ self class isVariable
>>         ifTrue: [ FLVariableObjectCluster for: self class ]
>>         ifFalse: [ FLFixedObjectCluster for: self class ]
>>
>> the second time you send fuelCluster to an object, it can reuse the
>> cluster instance.
>
> Right.  I think that's important, and is one reason why I think passing in
> the analyser is important, because it allows certain objects to discover
> existing clusters in the analyzer and join them if they want to, instead of
> having to invent and maintain their own cluster uniquing solution
> .
>>>
>>> again thanks for a great framework.
>>
>> Thanks for your words and the feedback. Is Glue published somewhere?
>
> No, and its extremely proprietary :)  Newspeak however is available and we
> may end up maintaining a port of Fuel for Newspeak.
> best regards,
> Eliot
>
>>
>> regards
>> Martin
>>
>>
>>>
>>> best,
>>> Eliot
>>
>>
>>>
>>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>>> <[hidden email]> wrote:
>>>>
>>>>
>>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi Martin and Mariano,
>>>>>     a couple of questions.  What's the right way to exclude certain
>>>>> objects from the serialization?  Is there a way of excluding certain inst
>>>>> vars from certain objects?
>>>>
>>>>
>>>> Eliot and the rest....Martin implemented this feature in
>>>> Fuel-MartinDias.258. For the moment, we decided to put
>>>> #fuelIgnoredInstanceVariableNames at class side.
>>>>
>>>> Behavior >> fuelIgnoredInstanceVariableNames
>>>>     "Indicates which variables have to be ignored during serialization."
>>>>
>>>>     ^#()
>>>>
>>>>
>>>> MyClass class >> fuelIgnoredInstanceVariableNames
>>>>   ^ #('instVar1')
>>>>
>>>>
>>>> The impact in speed is nothing, so this is good. Now....we were thinking
>>>> if it is common to need that 2 different instances of the same class need
>>>> different instVars to ignore. Is this common ? do you usually need this ?
>>>> We checked in SIXX and it is at instance side. Java uses the prefix
>>>> 'transient' so it is at class side...
>>>>
>>>> thanks
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> http://marianopeck.wordpress.com
>>>>
>>>
>>
>
>
 
--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Nicolas Cellier
2011/6/17 Eliot Miranda <[hidden email]>:

>
>
> On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier
> <[hidden email]> wrote:
>>
>> 2011/6/17 Eliot Miranda <[hidden email]>:
>> >
>> >
>> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Eliot,
>> >> I am very happy to read your mail.
>> >>
>> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda
>> >> <[hidden email]>
>> >> wrote:
>> >>>
>> >>> Hi Martin & Mariano,
>> >>>     regarding filtering.  Yesterday my colleague Yaron and I
>> >>> successfully
>> >>> finished our port of Fuel to Newspeak and are successfully using it to
>> >>> save
>> >>> and restore our data sets; thank you, its a cool framework.  We had to
>> >>> implement two extensions, the first of which the ability to save and
>> >>> restore
>> >>> Newspeak classes, which is complex because these are instantiated
>> >>> classes
>> >>> inside instantiated Newspeak modules, not static Smalltalk classes in
>> >>> the
>> >>> Smalltalk dictionary.  The second extension is the ability to map
>> >>> specific
>> >>> objects to nil, to prune objects on the way out.  I want to discuss
>> >>> this
>> >>> latter extension.
>> >>> In our data set we have a set of references to objects that are
>> >>> logically
>> >>> not persistent and hence not to be saved.  I'm sure that this will be
>> >>> a
>> >>> common case.  The requirement is for the pickling system to prune
>> >>> certain
>> >>> objects, typically by arranging that when an object graph is pickled,
>> >>> references to the pruned objects are replaced by references to nil.
>> >>>  One way
>> >>> of doing this is as described below, by specifiying per-class lists of
>> >>> instance variables whose referents shoudl not be saved.  But this can
>> >>> be
>> >>> clumsy; there may be references to objects one wants to prune from
>> >>> e.g. more
>> >>> than one class, in which case one may have to provide multiple lists
>> >>> of the
>> >>> relevant inst vars; there may be references to objects one wants to
>> >>> prune
>> >>> from e.g. collections (e.g. sets and dictionaries) in which case the
>> >>> instance variable list approach just doesn't work.
>> >>> Here are two more general schemes.  VFirst, most directly, Fuel could
>> >>> provide two filters, implemented in the default mapper, or the core
>> >>> analyser.  One is a set of classes whose instances are not to be
>> >>> saved.  Any
>> >>> reference to an instance of a class in the toBePrunedClasses set is
>> >>> saved as
>> >>> nil.  The other is a set of instances that are not to be saved, and
>> >>> also any
>> >>> reference to an instance in the toBePruned set is saved as nil.  Why
>> >>> have
>> >>> both?  It can be convenient and efficient to filter by class (in our
>> >>> case we
>> >>> had many instances of a specific class, all of which should be
>> >>> filtered, and
>> >>> finding them could be time consuming), but filtering by class can be
>> >>> too
>> >>> inflexible, there may indeed be specific instances to exclude (thing
>> >>> for
>> >>> example of part of the object graph that functions as a cache; pruning
>> >>> the
>> >>> specific objects in the cache is the right thing to do; pruning all
>> >>> instances of classes whose instances exist in the cache may prune too
>> >>> much).
>> >>> As an example here's how we implemented pruning.  Our system is called
>> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>> >>> FLMapper subclass: #FLGlueMapper
>> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>> >>> modelClasses'
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Mappers'
>> >>> It accepts newspeak objects and filters instances in the
>> >>> prunedObjectsClasses set, and as a side-effect collects certain
>> >>> classes that
>> >>> we need in a manifest:
>> >>> FLGlueMapper>>accepts: anObject
>> >>> "Tells if the received object is handled by this analyzer.  We want to
>> >>> hand-off
>> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we
>> >>> want
>> >>> to record other model classes.  We want to filter-out instances of any
>> >>> class
>> >>> in prunedObjectClasses."
>> >>> ^anObject isBehavior
>> >>> ifTrue:
>> >>> [(self isInstantiatedNewspeakClass: anObject)
>> >>> ifTrue: [true]
>> >>> ifFalse:
>> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>> >>> [modelClasses add: anObject].
>> >>> false]]
>> >>> ifFalse:
>> >>> [prunedObjectClasses includes: anObject class]
>> >>> It prunes by mapping instances of the prunedObjectClasses to a special
>> >>> cluster.  It can do this in visitObject: since any newspeak objects it
>> >>> is
>> >>> accepting will be visited in its visitClassOrTrait: method (i.e. it's
>> >>> implicit that all arguments to visitObjects: are instances of the
>> >>> prunedObjectsClasses set).
>> >>> FLGlueMapper>>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false
>> >>> cluster
>> >>> that maps its objects to nil:
>> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>> >>> instanceVariableNames: ''
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Clusters'
>> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>> >>> super serialize: nil on: aWriteStream
>> >>>
>> >>> So this would generalize by the analyser having an e.g.
>> >>> FLPruningMapper
>> >>> as the first mapper, and this having a prunedObjects and a
>> >>> priunedObjectClasses set and going something like this:
>> >>> FLPruningMapper>>accepts: anObject
>> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
>> >>> anObject class]
>> >>> FLPruningMapper >>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> and then one would provide accessors in FLSerialzer and/or FLAnalyser
>> >>> to
>> >>> add objects and classes to the prunedObjects and prunedObjectClasses
>> >>> set.
>> >>> For efficiency one could arrange that the FLPruningMapper was not
>> >>> added
>> >>> to the sequence of mappers unless and until objects or classes were
>> >>> added
>> >>> to the prunedObjects and prunedObjectClasses set.
>> >>
>> >> Excellent. I love the botanical metaphor of pruning! Of course we can
>> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>> >>
>> >> We are also interested in pruning objects but not necessarily replacing
>> >> them by nil, but for another user defined objects. For example proxies.
>> >> We
>> >> can extend the pruning stuff for doing that.
>> >
>> > That was an idea Yaron came up with.  That instead of
>> > using fuelIgnoredInstanceVariableNames one uses e.g.
>> > Object>>objectToSerialize
>> >     ^self
>> > and then if one wants to prune specific inst vars in MyClass one
>> > implements
>> > MyClass>>objectToSerialize
>> >     ^self shallowCopy prepareForSerialization
>>
>> Hi Eliot,
>>
>> I'm not convinced by the shallowCopy solution, except for the simple
>> structures.
>> If object graph is complex (have share nodes, loops, ...) then you
>> gonna end up in a replication problem equivalent to the one Fuel is
>> trying to solve.
>
> The assumption is that the analyser would create a maximum of one proxy per
> object in the graph (default, no proxy) and that it would map objects with
> proxies to their proxies.  So if proxies only nilled out inst vars I don't
> see a problem.  What's attractive about this is that it provides a general
> solution to a couple of problems, a) how to replace a class of objects by
> some substitute (e.g. nil), b) how to prune state that needn't be saved.  It
> is also conceptually simple; one just creates a proxy instance; no defining
> metadata, such as inst var names, and hence the code is always up-to-date
> (e.g. a class redefine won't automatically uncover renamed inst vars in
> serialization metadata).

Ah, OK, it occurs after the graph analysis, which I did not catch at first read.
Now I understand better.

Nicolas

>>
>> Nicolas
>>
>> > MyClass>>prepareForSerialization
>> >     instVarIDontWantToSerialize := nil.
>> >     ^self
>> > and for objects one doesn't want to serlalize one implements
>> > MyNotToBeSerializedClass>>objectToSerialize
>> >     ^nil
>> > So its more general.  But I would pass the analyser in as an argument,
>> > which
>> > would allow things like
>> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self)
>> >         ifFalse: [self]
>> >         ifTrue: [nil]
>> > which would of course be the default in Object:
>> > Object>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>> >
>> >>
>> >>
>> >>>
>> >>> I think both Yaron and I feel the Fuel framework is comprehensible and
>> >>> flexible.  We enjoyed using it and while we took two passes at coming
>> >>> up
>> >>> with the pruning scheme we liked (our first was based on not
>> >>> serializing
>> >>> specific ins vars and was much more complex than our second, based on
>> >>> pruning instances of specific classes) we got there quickly and will
>> >>> very
>> >>> little frustration along the way.  Thank you very much.
>> >>
>> >> :-) thank you!
>> >>
>> >>>
>> >>> Finally, a couple of things.  First, it may be more flexible to
>> >>> implement
>> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
>> >>> override certain parts of the mapping framework an implementation can
>> >>> access
>> >>> the analyser to find existing clusters, e.g.
>> >>> MyClass>>fuelClusterIn: anFLAnalyser
>> >>> ^self shouldBeInASpecialCluster
>> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>> >>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>> >>> This makes it easier to find a specific unique cluster to handle a
>> >>> group
>> >>> of objects specially.
>> >>
>> >> I can't imagine a concrete example but I see that it is more
>> >> flexible...
>> >> the cluster obtained via double dispatch can be anything polymorphic
>> >> with
>> >> MySpecialCluster... that's the point?
>> >
>> > To be honest I'm not sure.  But passing in the analyser in things like
>> > fuelCluster or objectToSerialize is I think a good idea as it provides a
>> > convenient communication path which in turn provides considerable
>> > flexibility.
>> >>
>> >>
>> >>>
>> >>> Lastly, the class-side cluster ids are a bit of a pain.  It would be
>> >>> nice
>> >>> to know a) are these byte values or general integer values, i.e. can
>> >>> there
>> >>> be more than 256 types of cluster?, and b) is there any meaning to the
>> >>> ids?
>> >>>  For example, are clusters ordered by id, or is this just an integer
>> >>> tag?
>> >>>  Also, some class-side code to assign an unused id would be nice.
>> >>> You might think of virtualizing the id scheme.  For example, if
>> >>> FLCluster
>> >>> maintained a weak array of all its subclasses then the id of a cluster
>> >>> could
>> >>> be the index in the array, and the array could be cleaned up
>> >>> occasionally.
>> >>>  Then each fuel serialization could start with the list of cluster
>> >>> class
>> >>> names and ids, so that specific values of ids are specific to a
>> >>> particular
>> >>> serialization.
>> >>
>> >> I do agree, these ids are an heritage from the first prototypes of
>> >> fuel,
>> >> they should be revised. a) yes, it is encoded in only one byte; b) just
>> >> an
>> >> integer tag, the only purpose of the id was for decoding fast: read a
>> >> byte
>> >> and then look in a dictionary for the corresponding cluster instance.
>> >> We
>> >> could even store the cluster class name but that's inefficient.
>> >
>> > Yes, but how inefficient?  What's the size of all the cluster names?
>> >     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1]
>> > 670
>> >
>> > So you'd add less than a kilobyte to the size of each serialization and
>> > get
>> > complete freedom from ids.  Something to think about.
>> >>
>> >> Virtualizing the id scheme is a good idea. Much more elegant and
>> >> extensible. The current mechanism not only limits the number of
>> >> possible
>> >> clusters, but also "user defined" extensions can collide, for example
>> >> if
>> >> your Glue cluster id is the same of the Moose cluster id.
>> >>
>> >> I added an issue in our tracker.
>> >>
>> >> If it makes sense, maybe the weak array you suggest can be also used to
>> >> avoid instantiating lots of FLObjectCluster like we are doing in
>> >> Object:
>> >>
>> >> fuelCluster
>> >>     ^ self class isVariable
>> >>         ifTrue: [ FLVariableObjectCluster for: self class ]
>> >>         ifFalse: [ FLFixedObjectCluster for: self class ]
>> >>
>> >> the second time you send fuelCluster to an object, it can reuse the
>> >> cluster instance.
>> >
>> > Right.  I think that's important, and is one reason why I think passing
>> > in
>> > the analyser is important, because it allows certain objects to discover
>> > existing clusters in the analyzer and join them if they want to, instead
>> > of
>> > having to invent and maintain their own cluster uniquing solution
>> > .
>> >>>
>> >>> again thanks for a great framework.
>> >>
>> >> Thanks for your words and the feedback. Is Glue published somewhere?
>> >
>> > No, and its extremely proprietary :)  Newspeak however is available and
>> > we
>> > may end up maintaining a port of Fuel for Newspeak.
>> > best regards,
>> > Eliot
>> >
>> >>
>> >> regards
>> >> Martin
>> >>
>> >>
>> >>>
>> >>> best,
>> >>> Eliot
>> >>
>> >>
>> >>>
>> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>> >>> <[hidden email]> wrote:
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda
>> >>>> <[hidden email]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi Martin and Mariano,
>> >>>>>     a couple of questions.  What's the right way to exclude certain
>> >>>>> objects from the serialization?  Is there a way of excluding certain
>> >>>>> inst
>> >>>>> vars from certain objects?
>> >>>>
>> >>>>
>> >>>> Eliot and the rest....Martin implemented this feature in
>> >>>> Fuel-MartinDias.258. For the moment, we decided to put
>> >>>> #fuelIgnoredInstanceVariableNames at class side.
>> >>>>
>> >>>> Behavior >> fuelIgnoredInstanceVariableNames
>> >>>>     "Indicates which variables have to be ignored during
>> >>>> serialization."
>> >>>>
>> >>>>     ^#()
>> >>>>
>> >>>>
>> >>>> MyClass class >> fuelIgnoredInstanceVariableNames
>> >>>>   ^ #('instVar1')
>> >>>>
>> >>>>
>> >>>> The impact in speed is nothing, so this is good. Now....we were
>> >>>> thinking
>> >>>> if it is common to need that 2 different instances of the same class
>> >>>> need
>> >>>> different instVars to ignore. Is this common ? do you usually need
>> >>>> this ?
>> >>>> We checked in SIXX and it is at instance side. Java uses the prefix
>> >>>> 'transient' so it is at class side...
>> >>>>
>> >>>> thanks
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Mariano
>> >>>> http://marianopeck.wordpress.com
>> >>>>
>> >>>
>> >>
>> >
>> >
>
>
> --
> best,
> Eliot
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

tinchodias
I think the substitution should be done during the graph trace. Following with the example, if a proxy replaces an object, the proxy represents a subgraph that is appended and so it should be traced.

For that we should keep track of the substitutions. I'm not sure how complex is that but is think it's not so difficult.

Seems to be a great idea, we have to try it. I like that avoids writing inst var names as strings. I have no idea if with *slots* implemented then we will be able to return inst vars as first-class objects... but anyway this looks like the a nice solution.

So, we have this as a pending issue as well as the id virtualization. Thanks for the ideas and the discussion!

Martin

On Fri, Jun 17, 2011 at 7:09 PM, Nicolas Cellier <[hidden email]> wrote:
2011/6/17 Eliot Miranda <[hidden email]>:
>
>
> On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier
> <[hidden email]> wrote:
>>
>> 2011/6/17 Eliot Miranda <[hidden email]>:
>> >
>> >
>> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Eliot,
>> >> I am very happy to read your mail.
>> >>
>> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda
>> >> <[hidden email]>
>> >> wrote:
>> >>>
>> >>> Hi Martin & Mariano,
>> >>>     regarding filtering.  Yesterday my colleague Yaron and I
>> >>> successfully
>> >>> finished our port of Fuel to Newspeak and are successfully using it to
>> >>> save
>> >>> and restore our data sets; thank you, its a cool framework.  We had to
>> >>> implement two extensions, the first of which the ability to save and
>> >>> restore
>> >>> Newspeak classes, which is complex because these are instantiated
>> >>> classes
>> >>> inside instantiated Newspeak modules, not static Smalltalk classes in
>> >>> the
>> >>> Smalltalk dictionary.  The second extension is the ability to map
>> >>> specific
>> >>> objects to nil, to prune objects on the way out.  I want to discuss
>> >>> this
>> >>> latter extension.
>> >>> In our data set we have a set of references to objects that are
>> >>> logically
>> >>> not persistent and hence not to be saved.  I'm sure that this will be
>> >>> a
>> >>> common case.  The requirement is for the pickling system to prune
>> >>> certain
>> >>> objects, typically by arranging that when an object graph is pickled,
>> >>> references to the pruned objects are replaced by references to nil.
>> >>>  One way
>> >>> of doing this is as described below, by specifiying per-class lists of
>> >>> instance variables whose referents shoudl not be saved.  But this can
>> >>> be
>> >>> clumsy; there may be references to objects one wants to prune from
>> >>> e.g. more
>> >>> than one class, in which case one may have to provide multiple lists
>> >>> of the
>> >>> relevant inst vars; there may be references to objects one wants to
>> >>> prune
>> >>> from e.g. collections (e.g. sets and dictionaries) in which case the
>> >>> instance variable list approach just doesn't work.
>> >>> Here are two more general schemes.  VFirst, most directly, Fuel could
>> >>> provide two filters, implemented in the default mapper, or the core
>> >>> analyser.  One is a set of classes whose instances are not to be
>> >>> saved.  Any
>> >>> reference to an instance of a class in the toBePrunedClasses set is
>> >>> saved as
>> >>> nil.  The other is a set of instances that are not to be saved, and
>> >>> also any
>> >>> reference to an instance in the toBePruned set is saved as nil.  Why
>> >>> have
>> >>> both?  It can be convenient and efficient to filter by class (in our
>> >>> case we
>> >>> had many instances of a specific class, all of which should be
>> >>> filtered, and
>> >>> finding them could be time consuming), but filtering by class can be
>> >>> too
>> >>> inflexible, there may indeed be specific instances to exclude (thing
>> >>> for
>> >>> example of part of the object graph that functions as a cache; pruning
>> >>> the
>> >>> specific objects in the cache is the right thing to do; pruning all
>> >>> instances of classes whose instances exist in the cache may prune too
>> >>> much).
>> >>> As an example here's how we implemented pruning.  Our system is called
>> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>> >>> FLMapper subclass: #FLGlueMapper
>> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>> >>> modelClasses'
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Mappers'
>> >>> It accepts newspeak objects and filters instances in the
>> >>> prunedObjectsClasses set, and as a side-effect collects certain
>> >>> classes that
>> >>> we need in a manifest:
>> >>> FLGlueMapper>>accepts: anObject
>> >>> "Tells if the received object is handled by this analyzer.  We want to
>> >>> hand-off
>> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we
>> >>> want
>> >>> to record other model classes.  We want to filter-out instances of any
>> >>> class
>> >>> in prunedObjectClasses."
>> >>> ^anObject isBehavior
>> >>> ifTrue:
>> >>> [(self isInstantiatedNewspeakClass: anObject)
>> >>> ifTrue: [true]
>> >>> ifFalse:
>> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>> >>> [modelClasses add: anObject].
>> >>> false]]
>> >>> ifFalse:
>> >>> [prunedObjectClasses includes: anObject class]
>> >>> It prunes by mapping instances of the prunedObjectClasses to a special
>> >>> cluster.  It can do this in visitObject: since any newspeak objects it
>> >>> is
>> >>> accepting will be visited in its visitClassOrTrait: method (i.e. it's
>> >>> implicit that all arguments to visitObjects: are instances of the
>> >>> prunedObjectsClasses set).
>> >>> FLGlueMapper>>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false
>> >>> cluster
>> >>> that maps its objects to nil:
>> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>> >>> instanceVariableNames: ''
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Clusters'
>> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>> >>> super serialize: nil on: aWriteStream
>> >>>
>> >>> So this would generalize by the analyser having an e.g.
>> >>> FLPruningMapper
>> >>> as the first mapper, and this having a prunedObjects and a
>> >>> priunedObjectClasses set and going something like this:
>> >>> FLPruningMapper>>accepts: anObject
>> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
>> >>> anObject class]
>> >>> FLPruningMapper >>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> and then one would provide accessors in FLSerialzer and/or FLAnalyser
>> >>> to
>> >>> add objects and classes to the prunedObjects and prunedObjectClasses
>> >>> set.
>> >>> For efficiency one could arrange that the FLPruningMapper was not
>> >>> added
>> >>> to the sequence of mappers unless and until objects or classes were
>> >>> added
>> >>> to the prunedObjects and prunedObjectClasses set.
>> >>
>> >> Excellent. I love the botanical metaphor of pruning! Of course we can
>> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>> >>
>> >> We are also interested in pruning objects but not necessarily replacing
>> >> them by nil, but for another user defined objects. For example proxies.
>> >> We
>> >> can extend the pruning stuff for doing that.
>> >
>> > That was an idea Yaron came up with.  That instead of
>> > using fuelIgnoredInstanceVariableNames one uses e.g.
>> > Object>>objectToSerialize
>> >     ^self
>> > and then if one wants to prune specific inst vars in MyClass one
>> > implements
>> > MyClass>>objectToSerialize
>> >     ^self shallowCopy prepareForSerialization
>>
>> Hi Eliot,
>>
>> I'm not convinced by the shallowCopy solution, except for the simple
>> structures.
>> If object graph is complex (have share nodes, loops, ...) then you
>> gonna end up in a replication problem equivalent to the one Fuel is
>> trying to solve.
>
> The assumption is that the analyser would create a maximum of one proxy per
> object in the graph (default, no proxy) and that it would map objects with
> proxies to their proxies.  So if proxies only nilled out inst vars I don't
> see a problem.  What's attractive about this is that it provides a general
> solution to a couple of problems, a) how to replace a class of objects by
> some substitute (e.g. nil), b) how to prune state that needn't be saved.  It
> is also conceptually simple; one just creates a proxy instance; no defining
> metadata, such as inst var names, and hence the code is always up-to-date
> (e.g. a class redefine won't automatically uncover renamed inst vars in
> serialization metadata).

Ah, OK, it occurs after the graph analysis, which I did not catch at first read.
Now I understand better.

Nicolas

>>
>> Nicolas
>>
>> > MyClass>>prepareForSerialization
>> >     instVarIDontWantToSerialize := nil.
>> >     ^self
>> > and for objects one doesn't want to serlalize one implements
>> > MyNotToBeSerializedClass>>objectToSerialize
>> >     ^nil
>> > So its more general.  But I would pass the analyser in as an argument,
>> > which
>> > would allow things like
>> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self)
>> >         ifFalse: [self]
>> >         ifTrue: [nil]
>> > which would of course be the default in Object:
>> > Object>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>> >
>> >>
>> >>
>> >>>
>> >>> I think both Yaron and I feel the Fuel framework is comprehensible and
>> >>> flexible.  We enjoyed using it and while we took two passes at coming
>> >>> up
>> >>> with the pruning scheme we liked (our first was based on not
>> >>> serializing
>> >>> specific ins vars and was much more complex than our second, based on
>> >>> pruning instances of specific classes) we got there quickly and will
>> >>> very
>> >>> little frustration along the way.  Thank you very much.
>> >>
>> >> :-) thank you!
>> >>
>> >>>
>> >>> Finally, a couple of things.  First, it may be more flexible to
>> >>> implement
>> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
>> >>> override certain parts of the mapping framework an implementation can
>> >>> access
>> >>> the analyser to find existing clusters, e.g.
>> >>> MyClass>>fuelClusterIn: anFLAnalyser
>> >>> ^self shouldBeInASpecialCluster
>> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>> >>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>> >>> This makes it easier to find a specific unique cluster to handle a
>> >>> group
>> >>> of objects specially.
>> >>
>> >> I can't imagine a concrete example but I see that it is more
>> >> flexible...
>> >> the cluster obtained via double dispatch can be anything polymorphic
>> >> with
>> >> MySpecialCluster... that's the point?
>> >
>> > To be honest I'm not sure.  But passing in the analyser in things like
>> > fuelCluster or objectToSerialize is I think a good idea as it provides a
>> > convenient communication path which in turn provides considerable
>> > flexibility.
>> >>
>> >>
>> >>>
>> >>> Lastly, the class-side cluster ids are a bit of a pain.  It would be
>> >>> nice
>> >>> to know a) are these byte values or general integer values, i.e. can
>> >>> there
>> >>> be more than 256 types of cluster?, and b) is there any meaning to the
>> >>> ids?
>> >>>  For example, are clusters ordered by id, or is this just an integer
>> >>> tag?
>> >>>  Also, some class-side code to assign an unused id would be nice.
>> >>> You might think of virtualizing the id scheme.  For example, if
>> >>> FLCluster
>> >>> maintained a weak array of all its subclasses then the id of a cluster
>> >>> could
>> >>> be the index in the array, and the array could be cleaned up
>> >>> occasionally.
>> >>>  Then each fuel serialization could start with the list of cluster
>> >>> class
>> >>> names and ids, so that specific values of ids are specific to a
>> >>> particular
>> >>> serialization.
>> >>
>> >> I do agree, these ids are an heritage from the first prototypes of
>> >> fuel,
>> >> they should be revised. a) yes, it is encoded in only one byte; b) just
>> >> an
>> >> integer tag, the only purpose of the id was for decoding fast: read a
>> >> byte
>> >> and then look in a dictionary for the corresponding cluster instance.
>> >> We
>> >> could even store the cluster class name but that's inefficient.
>> >
>> > Yes, but how inefficient?  What's the size of all the cluster names?
>> >     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1]
>> > 670
>> >
>> > So you'd add less than a kilobyte to the size of each serialization and
>> > get
>> > complete freedom from ids.  Something to think about.
>> >>
>> >> Virtualizing the id scheme is a good idea. Much more elegant and
>> >> extensible. The current mechanism not only limits the number of
>> >> possible
>> >> clusters, but also "user defined" extensions can collide, for example
>> >> if
>> >> your Glue cluster id is the same of the Moose cluster id.
>> >>
>> >> I added an issue in our tracker.
>> >>
>> >> If it makes sense, maybe the weak array you suggest can be also used to
>> >> avoid instantiating lots of FLObjectCluster like we are doing in
>> >> Object:
>> >>
>> >> fuelCluster
>> >>     ^ self class isVariable
>> >>         ifTrue: [ FLVariableObjectCluster for: self class ]
>> >>         ifFalse: [ FLFixedObjectCluster for: self class ]
>> >>
>> >> the second time you send fuelCluster to an object, it can reuse the
>> >> cluster instance.
>> >
>> > Right.  I think that's important, and is one reason why I think passing
>> > in
>> > the analyser is important, because it allows certain objects to discover
>> > existing clusters in the analyzer and join them if they want to, instead
>> > of
>> > having to invent and maintain their own cluster uniquing solution
>> > .
>> >>>
>> >>> again thanks for a great framework.
>> >>
>> >> Thanks for your words and the feedback. Is Glue published somewhere?
>> >
>> > No, and its extremely proprietary :)  Newspeak however is available and
>> > we
>> > may end up maintaining a port of Fuel for Newspeak.
>> > best regards,
>> > Eliot
>> >
>> >>
>> >> regards
>> >> Martin
>> >>
>> >>
>> >>>
>> >>> best,
>> >>> Eliot
>> >>
>> >>
>> >>>
>> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>> >>> <[hidden email]> wrote:
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda
>> >>>> <[hidden email]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi Martin and Mariano,
>> >>>>>     a couple of questions.  What's the right way to exclude certain
>> >>>>> objects from the serialization?  Is there a way of excluding certain
>> >>>>> inst
>> >>>>> vars from certain objects?
>> >>>>
>> >>>>
>> >>>> Eliot and the rest....Martin implemented this feature in
>> >>>> Fuel-MartinDias.258. For the moment, we decided to put
>> >>>> #fuelIgnoredInstanceVariableNames at class side.
>> >>>>
>> >>>> Behavior >> fuelIgnoredInstanceVariableNames
>> >>>>     "Indicates which variables have to be ignored during
>> >>>> serialization."
>> >>>>
>> >>>>     ^#()
>> >>>>
>> >>>>
>> >>>> MyClass class >> fuelIgnoredInstanceVariableNames
>> >>>>   ^ #('instVar1')
>> >>>>
>> >>>>
>> >>>> The impact in speed is nothing, so this is good. Now....we were
>> >>>> thinking
>> >>>> if it is common to need that 2 different instances of the same class
>> >>>> need
>> >>>> different instVars to ignore. Is this common ? do you usually need
>> >>>> this ?
>> >>>> We checked in SIXX and it is at instance side. Java uses the prefix
>> >>>> 'transient' so it is at class side...
>> >>>>
>> >>>> thanks
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Mariano
>> >>>> http://marianopeck.wordpress.com
>> >>>>
>> >>>
>> >>
>> >
>> >
>
>
> --
> best,
> Eliot
>


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

tinchodias
I had some problems when I tried to write a package export tool with fuel. I wanted to store the classes without method extensions of other packages. Maybe with the objectToSerializeIn: idea I can write:

Class>>objectToSerializeIn: anFLAnalyser
    ^(anFLAnalyser shouldAvoidForeignProtocol: self)
        ifFalse: [self]
        ifTrue: [self copyWithoutForeignProtocol]

Cheers,
Martin

On Mon, Jun 20, 2011 at 1:48 AM, Martin Dias <[hidden email]> wrote:
I think the substitution should be done during the graph trace. Following with the example, if a proxy replaces an object, the proxy represents a subgraph that is appended and so it should be traced.

For that we should keep track of the substitutions. I'm not sure how complex is that but is think it's not so difficult.

Seems to be a great idea, we have to try it. I like that avoids writing inst var names as strings. I have no idea if with *slots* implemented then we will be able to return inst vars as first-class objects... but anyway this looks like the a nice solution.

So, we have this as a pending issue as well as the id virtualization. Thanks for the ideas and the discussion!

Martin


On Fri, Jun 17, 2011 at 7:09 PM, Nicolas Cellier <[hidden email]> wrote:
2011/6/17 Eliot Miranda <[hidden email]>:
>
>
> On Fri, Jun 17, 2011 at 2:39 PM, Nicolas Cellier
> <[hidden email]> wrote:
>>
>> 2011/6/17 Eliot Miranda <[hidden email]>:
>> >
>> >
>> > On Fri, Jun 17, 2011 at 1:26 AM, Martin Dias <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Eliot,
>> >> I am very happy to read your mail.
>> >>
>> >> On Wed, Jun 15, 2011 at 3:29 PM, Eliot Miranda
>> >> <[hidden email]>
>> >> wrote:
>> >>>
>> >>> Hi Martin & Mariano,
>> >>>     regarding filtering.  Yesterday my colleague Yaron and I
>> >>> successfully
>> >>> finished our port of Fuel to Newspeak and are successfully using it to
>> >>> save
>> >>> and restore our data sets; thank you, its a cool framework.  We had to
>> >>> implement two extensions, the first of which the ability to save and
>> >>> restore
>> >>> Newspeak classes, which is complex because these are instantiated
>> >>> classes
>> >>> inside instantiated Newspeak modules, not static Smalltalk classes in
>> >>> the
>> >>> Smalltalk dictionary.  The second extension is the ability to map
>> >>> specific
>> >>> objects to nil, to prune objects on the way out.  I want to discuss
>> >>> this
>> >>> latter extension.
>> >>> In our data set we have a set of references to objects that are
>> >>> logically
>> >>> not persistent and hence not to be saved.  I'm sure that this will be
>> >>> a
>> >>> common case.  The requirement is for the pickling system to prune
>> >>> certain
>> >>> objects, typically by arranging that when an object graph is pickled,
>> >>> references to the pruned objects are replaced by references to nil.
>> >>>  One way
>> >>> of doing this is as described below, by specifiying per-class lists of
>> >>> instance variables whose referents shoudl not be saved.  But this can
>> >>> be
>> >>> clumsy; there may be references to objects one wants to prune from
>> >>> e.g. more
>> >>> than one class, in which case one may have to provide multiple lists
>> >>> of the
>> >>> relevant inst vars; there may be references to objects one wants to
>> >>> prune
>> >>> from e.g. collections (e.g. sets and dictionaries) in which case the
>> >>> instance variable list approach just doesn't work.
>> >>> Here are two more general schemes.  VFirst, most directly, Fuel could
>> >>> provide two filters, implemented in the default mapper, or the core
>> >>> analyser.  One is a set of classes whose instances are not to be
>> >>> saved.  Any
>> >>> reference to an instance of a class in the toBePrunedClasses set is
>> >>> saved as
>> >>> nil.  The other is a set of instances that are not to be saved, and
>> >>> also any
>> >>> reference to an instance in the toBePruned set is saved as nil.  Why
>> >>> have
>> >>> both?  It can be convenient and efficient to filter by class (in our
>> >>> case we
>> >>> had many instances of a specific class, all of which should be
>> >>> filtered, and
>> >>> finding them could be time consuming), but filtering by class can be
>> >>> too
>> >>> inflexible, there may indeed be specific instances to exclude (thing
>> >>> for
>> >>> example of part of the object graph that functions as a cache; pruning
>> >>> the
>> >>> specific objects in the cache is the right thing to do; pruning all
>> >>> instances of classes whose instances exist in the cache may prune too
>> >>> much).
>> >>> As an example here's how we implemented pruning.  Our system is called
>> >>> Glue, and we start with a mapper for Glue objects, FLGlueMapper:
>> >>> FLMapper subclass: #FLGlueMapper
>> >>> instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
>> >>> modelClasses'
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Mappers'
>> >>> It accepts newspeak objects and filters instances in the
>> >>> prunedObjectsClasses set, and as a side-effect collects certain
>> >>> classes that
>> >>> we need in a manifest:
>> >>> FLGlueMapper>>accepts: anObject
>> >>> "Tells if the received object is handled by this analyzer.  We want to
>> >>> hand-off
>> >>> instantiated Newspeak classes to the newspeakClassesCluster, and we
>> >>> want
>> >>> to record other model classes.  We want to filter-out instances of any
>> >>> class
>> >>> in prunedObjectClasses."
>> >>> ^anObject isBehavior
>> >>> ifTrue:
>> >>> [(self isInstantiatedNewspeakClass: anObject)
>> >>> ifTrue: [true]
>> >>> ifFalse:
>> >>> [(anObject inheritsFrom: GlueDataObject) ifTrue:
>> >>> [modelClasses add: anObject].
>> >>> false]]
>> >>> ifFalse:
>> >>> [prunedObjectClasses includes: anObject class]
>> >>> It prunes by mapping instances of the prunedObjectClasses to a special
>> >>> cluster.  It can do this in visitObject: since any newspeak objects it
>> >>> is
>> >>> accepting will be visited in its visitClassOrTrait: method (i.e. it's
>> >>> implicit that all arguments to visitObjects: are instances of the
>> >>> prunedObjectsClasses set).
>> >>> FLGlueMapper>>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> FLPrunedObjectsCluster is a specialization of the nil,true,false
>> >>> cluster
>> >>> that maps its objects to nil:
>> >>> FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
>> >>> instanceVariableNames: ''
>> >>> classVariableNames: ''
>> >>> poolDictionaries: ''
>> >>> category: 'Fuel-Core-Clusters'
>> >>> FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream
>> >>> super serialize: nil on: aWriteStream
>> >>>
>> >>> So this would generalize by the analyser having an e.g.
>> >>> FLPruningMapper
>> >>> as the first mapper, and this having a prunedObjects and a
>> >>> priunedObjectClasses set and going something like this:
>> >>> FLPruningMapper>>accepts: anObject
>> >>> ^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
>> >>> anObject class]
>> >>> FLPruningMapper >>visitObject: anObject
>> >>> analyzer
>> >>> mapAndTrace: anObject
>> >>> to: FLPrunedObjectsCluster instance
>> >>> into: analyzer clustersWithBaselevelObjects
>> >>> and then one would provide accessors in FLSerialzer and/or FLAnalyser
>> >>> to
>> >>> add objects and classes to the prunedObjects and prunedObjectClasses
>> >>> set.
>> >>> For efficiency one could arrange that the FLPruningMapper was not
>> >>> added
>> >>> to the sequence of mappers unless and until objects or classes were
>> >>> added
>> >>> to the prunedObjects and prunedObjectClasses set.
>> >>
>> >> Excellent. I love the botanical metaphor of pruning! Of course we can
>> >> include FLPruningMapper and FLPrunedObjectsCluster in Fuel.
>> >>
>> >> We are also interested in pruning objects but not necessarily replacing
>> >> them by nil, but for another user defined objects. For example proxies.
>> >> We
>> >> can extend the pruning stuff for doing that.
>> >
>> > That was an idea Yaron came up with.  That instead of
>> > using fuelIgnoredInstanceVariableNames one uses e.g.
>> > Object>>objectToSerialize
>> >     ^self
>> > and then if one wants to prune specific inst vars in MyClass one
>> > implements
>> > MyClass>>objectToSerialize
>> >     ^self shallowCopy prepareForSerialization
>>
>> Hi Eliot,
>>
>> I'm not convinced by the shallowCopy solution, except for the simple
>> structures.
>> If object graph is complex (have share nodes, loops, ...) then you
>> gonna end up in a replication problem equivalent to the one Fuel is
>> trying to solve.
>
> The assumption is that the analyser would create a maximum of one proxy per
> object in the graph (default, no proxy) and that it would map objects with
> proxies to their proxies.  So if proxies only nilled out inst vars I don't
> see a problem.  What's attractive about this is that it provides a general
> solution to a couple of problems, a) how to replace a class of objects by
> some substitute (e.g. nil), b) how to prune state that needn't be saved.  It
> is also conceptually simple; one just creates a proxy instance; no defining
> metadata, such as inst var names, and hence the code is always up-to-date
> (e.g. a class redefine won't automatically uncover renamed inst vars in
> serialization metadata).

Ah, OK, it occurs after the graph analysis, which I did not catch at first read.
Now I understand better.

Nicolas

>>
>> Nicolas
>>
>> > MyClass>>prepareForSerialization
>> >     instVarIDontWantToSerialize := nil.
>> >     ^self
>> > and for objects one doesn't want to serlalize one implements
>> > MyNotToBeSerializedClass>>objectToSerialize
>> >     ^nil
>> > So its more general.  But I would pass the analyser in as an argument,
>> > which
>> > would allow things like
>> > MyPerhapsNotToBeSerializedClass>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self)
>> >         ifFalse: [self]
>> >         ifTrue: [nil]
>> > which would of course be the default in Object:
>> > Object>>objectToSerializeIn: anFLAnalyser
>> >     ^(anFLAnalyser shouldPrune: self) ifFalse:: [self]
>> >
>> >>
>> >>
>> >>>
>> >>> I think both Yaron and I feel the Fuel framework is comprehensible and
>> >>> flexible.  We enjoyed using it and while we took two passes at coming
>> >>> up
>> >>> with the pruning scheme we liked (our first was based on not
>> >>> serializing
>> >>> specific ins vars and was much more complex than our second, based on
>> >>> pruning instances of specific classes) we got there quickly and will
>> >>> very
>> >>> little frustration along the way.  Thank you very much.
>> >>
>> >> :-) thank you!
>> >>
>> >>>
>> >>> Finally, a couple of things.  First, it may be more flexible to
>> >>> implement
>> >>> fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
>> >>> override certain parts of the mapping framework an implementation can
>> >>> access
>> >>> the analyser to find existing clusters, e.g.
>> >>> MyClass>>fuelClusterIn: anFLAnalyser
>> >>> ^self shouldBeInASpecialCluster
>> >>> ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
>> >>> ifFalse: [super fuelClusterIn: anFLAnalyser]
>> >>> This makes it easier to find a specific unique cluster to handle a
>> >>> group
>> >>> of objects specially.
>> >>
>> >> I can't imagine a concrete example but I see that it is more
>> >> flexible...
>> >> the cluster obtained via double dispatch can be anything polymorphic
>> >> with
>> >> MySpecialCluster... that's the point?
>> >
>> > To be honest I'm not sure.  But passing in the analyser in things like
>> > fuelCluster or objectToSerialize is I think a good idea as it provides a
>> > convenient communication path which in turn provides considerable
>> > flexibility.
>> >>
>> >>
>> >>>
>> >>> Lastly, the class-side cluster ids are a bit of a pain.  It would be
>> >>> nice
>> >>> to know a) are these byte values or general integer values, i.e. can
>> >>> there
>> >>> be more than 256 types of cluster?, and b) is there any meaning to the
>> >>> ids?
>> >>>  For example, are clusters ordered by id, or is this just an integer
>> >>> tag?
>> >>>  Also, some class-side code to assign an unused id would be nice.
>> >>> You might think of virtualizing the id scheme.  For example, if
>> >>> FLCluster
>> >>> maintained a weak array of all its subclasses then the id of a cluster
>> >>> could
>> >>> be the index in the array, and the array could be cleaned up
>> >>> occasionally.
>> >>>  Then each fuel serialization could start with the list of cluster
>> >>> class
>> >>> names and ids, so that specific values of ids are specific to a
>> >>> particular
>> >>> serialization.
>> >>
>> >> I do agree, these ids are an heritage from the first prototypes of
>> >> fuel,
>> >> they should be revised. a) yes, it is encoded in only one byte; b) just
>> >> an
>> >> integer tag, the only purpose of the id was for decoding fast: read a
>> >> byte
>> >> and then look in a dictionary for the corresponding cluster instance.
>> >> We
>> >> could even store the cluster class name but that's inefficient.
>> >
>> > Yes, but how inefficient?  What's the size of all the cluster names?
>> >     FLCluster allSubclasses inject: 0 into: [:t :c| t + c name size + 1]
>> > 670
>> >
>> > So you'd add less than a kilobyte to the size of each serialization and
>> > get
>> > complete freedom from ids.  Something to think about.
>> >>
>> >> Virtualizing the id scheme is a good idea. Much more elegant and
>> >> extensible. The current mechanism not only limits the number of
>> >> possible
>> >> clusters, but also "user defined" extensions can collide, for example
>> >> if
>> >> your Glue cluster id is the same of the Moose cluster id.
>> >>
>> >> I added an issue in our tracker.
>> >>
>> >> If it makes sense, maybe the weak array you suggest can be also used to
>> >> avoid instantiating lots of FLObjectCluster like we are doing in
>> >> Object:
>> >>
>> >> fuelCluster
>> >>     ^ self class isVariable
>> >>         ifTrue: [ FLVariableObjectCluster for: self class ]
>> >>         ifFalse: [ FLFixedObjectCluster for: self class ]
>> >>
>> >> the second time you send fuelCluster to an object, it can reuse the
>> >> cluster instance.
>> >
>> > Right.  I think that's important, and is one reason why I think passing
>> > in
>> > the analyser is important, because it allows certain objects to discover
>> > existing clusters in the analyzer and join them if they want to, instead
>> > of
>> > having to invent and maintain their own cluster uniquing solution
>> > .
>> >>>
>> >>> again thanks for a great framework.
>> >>
>> >> Thanks for your words and the feedback. Is Glue published somewhere?
>> >
>> > No, and its extremely proprietary :)  Newspeak however is available and
>> > we
>> > may end up maintaining a port of Fuel for Newspeak.
>> > best regards,
>> > Eliot
>> >
>> >>
>> >> regards
>> >> Martin
>> >>
>> >>
>> >>>
>> >>> best,
>> >>> Eliot
>> >>
>> >>
>> >>>
>> >>> On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck
>> >>> <[hidden email]> wrote:
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda
>> >>>> <[hidden email]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi Martin and Mariano,
>> >>>>>     a couple of questions.  What's the right way to exclude certain
>> >>>>> objects from the serialization?  Is there a way of excluding certain
>> >>>>> inst
>> >>>>> vars from certain objects?
>> >>>>
>> >>>>
>> >>>> Eliot and the rest....Martin implemented this feature in
>> >>>> Fuel-MartinDias.258. For the moment, we decided to put
>> >>>> #fuelIgnoredInstanceVariableNames at class side.
>> >>>>
>> >>>> Behavior >> fuelIgnoredInstanceVariableNames
>> >>>>     "Indicates which variables have to be ignored during
>> >>>> serialization."
>> >>>>
>> >>>>     ^#()
>> >>>>
>> >>>>
>> >>>> MyClass class >> fuelIgnoredInstanceVariableNames
>> >>>>   ^ #('instVar1')
>> >>>>
>> >>>>
>> >>>> The impact in speed is nothing, so this is good. Now....we were
>> >>>> thinking
>> >>>> if it is common to need that 2 different instances of the same class
>> >>>> need
>> >>>> different instVars to ignore. Is this common ? do you usually need
>> >>>> this ?
>> >>>> We checked in SIXX and it is at instance side. Java uses the prefix
>> >>>> 'transient' so it is at class side...
>> >>>>
>> >>>> thanks
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Mariano
>> >>>> http://marianopeck.wordpress.com
>> >>>>
>> >>>
>> >>
>> >
>> >
>
>
> --
> best,
> Eliot
>



Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Henrik Sperre Johansen
In reply to this post by tinchodias
On 08.12.2010 17:50, Martin Dias wrote:

> Hi all
>
> Last months I and Tristan have been working on Fuel project, an object
> binary serialization tool. The idea is that objects are much more
> times loaded than stored, therefore it is worth to spend time while
> storing in order to have faster loading and user experience. We
> present an implementation of a pickle format that is based on
> clustering similar objects.
>
> There is a summary of the project below, but more complete information
> is available here: http://rmod.lille.inria.fr/web/pier/software/Fuel
>
> The implementation still needs a lot of work to be really useful,
> optimizations should be done, but we'll be glad to get feedback of the
> community.
>
>
> = Pickle format =
>
> The pickle format and the serialization algorithm main idea, is
> explained in this slides:
>
> http://www.slideshare.net/tinchodias/fuel-serialization-in-an-example
>
>
> = Current features =
>
> - Class shape changing (when a variable has been added, or removed, or
> its index changed)
> - Serialize most of the basic objects.
> - Serialize (almost) any CompiledMethod
> - Detection of global or class variables
> - Support for cyclic object graphs
> - Tests
>
>
> = Next steps =
>
> - Improve version checking.
> - Optimize performance.
> - Serialize more kinds of objects:
> -- Class with its complete description.
> -- Method contexts
> -- Active block closures
> -- Continuation
> - Some improvements for the user:
> -- pre and post actions to be executed.
> -- easily say 'this object is singleton'.
> - Partial loading of a stored graph.
> - Fast statistics/brief info extraction of a stored graph.
> - ConfigurationOfFuel.
> - Be able to deploy materialization behavior only (independent from
> the serialization behavior)
>
>
> = Download =
>
> In a Pharo 1.1 or 1.1.1 evaluate:
>
> Gofer new
> squeaksource: 'Fuel';
> version: 'Fuel-MartinDias.74';
> version: 'FuelBenchmarks-MartinDias.4';
> load.
>
>
> = Benchmarks =
>
> You can run benchmarks executing this line (results in Transcript):
>
> FLBenchmarks newBasic run.
>
>
> Thank you!
> Martin Dias

One thing I do not see mentioned, and feel could use some attention, is
thread safety. (aka other threads altering the graph you are serializing)
The classic answer would of course be "always run FUEL at highest
priority", but if we ever want to move to true multi-core, that's not
enough.

What would be neat is protecting all mutation of objects in the graph
with a Mutex/Monitor whose critical section covers the analysis and
serialization, i.e. blocking all other processes that wants to mutate
objects in the graph untill serialization is complete.

Aside from the behaviour when a marked object is encountered, the
process is the same as for immutability, as discussed here:
http://forum.world.st/immutability-and-become-Was-Re-squeak-dev-immutability-td1597511.html

You could do this image-side as part of the analyze phase, as Eliot's
post suggests, but it's not entirely safe when:
A child -> B
B parent -> A

1 . Process1 protects A with Mutex
2. Process2 calls method on B which does:
    a) B parent: C
    b) (B's tmpRef to A) child: somethingElse *Wait for Mutex*
3. Process 1 makes B, C immutable, serializes B with C as parent.
4. Process2 changes A child.

It could of course be argued that is a programming/logic error to update
B parent ref before child ref in A, (you'd actually have to do extra
work to get that order), still it's a hard one to debug if it does happen.
(Note to the thread: Following the same logic, I would also say it
should be considered an error to keep a mutable cache as part of an
immutable object :) )

In fact, you *could* implement it with immutability, injecting handler
contexts with behaviour as described above into existing/new processes
for any resulting immutability errors.

Cheers,
Henry





Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Mariano Martinez Peck
One thing I do not see mentioned, and feel could use some attention, is thread safety. (aka other threads altering the graph you are serializing)
The classic answer would of course be "always run FUEL at highest priority", but if we ever want to move to true multi-core, that's not enough.


Wow. Were you reading our minds?? hehehe. Yes, reading some of the related work papers while writing our own paper, we read about the "atomicy". How to be atomic while analyzing and serializing?
Right now, if the graph changes between the first phase (analysis) and the second phase (serializating) you are screw :)
 
I wonder if Eliot fixed that in Parcels?

What would be neat is protecting all mutation of objects in the graph with a Mutex/Monitor whose critical section covers the analysis and serialization, i.e. blocking all other processes that wants to mutate objects in the graph untill serialization is complete.


yes,
 
Aside from the behaviour when a marked object is encountered, the process is the same as for immutability, as discussed here:
http://forum.world.st/immutability-and-become-Was-Re-squeak-dev-immutability-td1597511.html

You could do this image-side as part of the analyze phase, as Eliot's post suggests,

Ok, but in order to do I need the immutable bit in the ObjectHeader, right?
 
but it's not entirely safe when:
A child -> B
B parent -> A

1 . Process1 protects A with Mutex
2. Process2 calls method on B which does:
  a) B parent: C
  b) (B's tmpRef to A) child: somethingElse *Wait for Mutex*
3. Process 1 makes B, C immutable, serializes B with C as parent.
4. Process2 changes A child.

It could of course be argued that is a programming/logic error to update B parent ref before child ref in A, (you'd actually have to do extra work to get that order), still it's a hard one to debug if it does happen.
(Note to the thread: Following the same logic, I would also say it should be considered an error to keep a mutable cache as part of an immutable object :) )

In fact, you *could* implement it with immutability, injecting handler contexts with behaviour as described above into existing/new processes for any resulting immutability errors.



wow. Seems a really complicated stuff to deal with :(
Immutability is not yet present in default VMs



--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Henrik Sperre Johansen


On 20. juni 2011, at 15:59, Mariano Martinez Peck <[hidden email]> wrote:

One thing I do not see mentioned, and feel could use some attention, is thread safety. (aka other threads altering the graph you are serializing)
The classic answer would of course be "always run FUEL at highest priority", but if we ever want to move to true multi-core, that's not enough.


Wow. Were you reading our minds?? hehehe. Yes, reading some of the related work papers while writing our own paper, we read about the "atomicy". How to be atomic while analyzing and serializing?
Right now, if the graph changes between the first phase (analysis) and the second phase (serializating) you are screw :) 
 
I wonder if Eliot fixed that in Parcels?

The classical way? ;)
What I describe below is one way of achieving atomicity (in the strict sense if using uninterruptible vm functionality) 

What would be neat is protecting all mutation of objects in the graph with a Mutex/Monitor whose critical section covers the analysis and serialization, i.e. blocking all other processes that wants to mutate objects in the graph untill serialization is complete.


yes,
 
Aside from the behaviour when a marked object is encountered, the process is the same as for immutability, as discussed here:
http://forum.world.st/immutability-and-become-Was-Re-squeak-dev-immutability-td1597511.html

You could do this image-side as part of the analyze phase, as Eliot's post suggests,

Ok, but in order to do I need the immutable bit in the ObjectHeader, right?
Immutability as discussed there? Yes.

Providing serialization thread-saferness? No.
You could also use method wrappers (of all methods mutating state), like other potential users of immutability do currently.
 I'd imagine immutability would be a much lower overhead in the common case of no other threads doing access.

 
but it's not entirely safe when:
A child -> B
B parent -> A

1 . Process1 protects A with Mutex
2. Process2 calls method on B which does:
  a) B parent: C
  b) (B's tmpRef to A) child: somethingElse *Wait for Mutex*
3. Process 1 makes B, C immutable, serializes B with C as parent.
4. Process2 changes A child.

It could of course be argued that is a programming/logic error to update B parent ref before child ref in A, (you'd actually have to do extra work to get that order), still it's a hard one to debug if it does happen.
(Note to the thread: Following the same logic, I would also say it should be considered an error to keep a mutable cache as part of an immutable object :) )

In fact, you *could* implement it with immutability, injecting handler contexts with behaviour as described above into existing/new processes for any resulting immutability errors.



wow. Seems a really complicated stuff to deal with :(
Not really, 2 threads performing 2 sequences of operation in a specific order is really the simplest kind of treading problem one can imagine.
I guess it comes down to the
presenter :/
Immutability is not yet present in default VMs
True. It has been done though, and iirc came down to newspeak license (resolved?) and noone to review and actually integrate.
So for a comparative test of approaches it might not be too much work.

Cheers,
Henry
1234