Fuel - a fast object deployment tool

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
73 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

tinchodias
Hi Tudor,

Thank you.
Yes, just to test the project in a real case, I created a package that
uses Fame metadescription like MSE to export and import moose models.
In the main page I talk a bit about the extension, but I didn't
mentioned it in the summary mail. It needs to be better tested... and
a progress bar like MSE!

In Moose 4.1:

Gofer new
        squeaksource: 'Fuel';
        version: 'Fuel-MartinDias.74';
        version: 'FuelFameExtension-MartinDias.10';
        load.

And opening a new MoosePanel it will have the export to FL and import
from FL options.

I would like to have your feedback!

Martin


On Thu, Dec 9, 2010 at 10:02 AM, Tudor Girba <[hidden email]> wrote:

> Hi Martin,
>
> Nice project.
>
> I noticed that you have a package FuelFameExtension. Is this done for the Fame meta engine? If yes, I would be interested in testing it, especially that in the context of Moose we do load the objects significantly more often than we store them :).
>
> Cheers,
> Doru
>
>
> On 8 Dec 2010, at 17:50, Martin Dias wrote:
>
>> Hi all
>>
>> Last months I and Tristan have been working on Fuel project, an object
>> binary serialization tool. The idea is that objects are much more
>> times loaded than stored, therefore it is worth to spend time while
>> storing in order to have faster loading and user experience. We
>> present an implementation of a pickle format that is based on
>> clustering similar objects.
>>
>> There is a summary of the project below, but more complete information
>> is available here: http://rmod.lille.inria.fr/web/pier/software/Fuel
>>
>> The implementation still needs a lot of work to be really useful,
>> optimizations should be done, but we'll be glad to get feedback of the
>> community.
>>
>>
>> = Pickle format =
>>
>> The pickle format and the serialization algorithm main idea, is
>> explained in this slides:
>>
>> http://www.slideshare.net/tinchodias/fuel-serialization-in-an-example
>>
>>
>> = Current features =
>>
>> - Class shape changing (when a variable has been added, or removed, or
>> its index changed)
>> - Serialize most of the basic objects.
>> - Serialize (almost) any CompiledMethod
>> - Detection of global or class variables
>> - Support for cyclic object graphs
>> - Tests
>>
>>
>> = Next steps =
>>
>> - Improve version checking.
>> - Optimize performance.
>> - Serialize more kinds of objects:
>> -- Class with its complete description.
>> -- Method contexts
>> -- Active block closures
>> -- Continuation
>> - Some improvements for the user:
>> -- pre and post actions to be executed.
>> -- easily say 'this object is singleton'.
>> - Partial loading of a stored graph.
>> - Fast statistics/brief info extraction of a stored graph.
>> - ConfigurationOfFuel.
>> - Be able to deploy materialization behavior only (independent from
>> the serialization behavior)
>>
>>
>> = Download =
>>
>> In a Pharo 1.1 or 1.1.1 evaluate:
>>
>> Gofer new
>>       squeaksource: 'Fuel';
>>       version: 'Fuel-MartinDias.74';
>>       version: 'FuelBenchmarks-MartinDias.4';
>>       load.
>>
>>
>> = Benchmarks =
>>
>> You can run benchmarks executing this line (results in Transcript):
>>
>> FLBenchmarks newBasic run.
>>
>>
>> Thank you!
>> Martin Dias
>>
>
> --
> www.tudorgirba.com
>
> "Sometimes the best solution is not the best solution."
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

tinchodias
In reply to this post by Igor Stasenko
Hi all,

Thank you for the discussion, is very interesting for me.

Thanks Adrian, I don't have benchmarks with ImageSegment, so I like to
see the numbers, how could I reproduce that benchmarks? it would be
useful for me.

I think that ImageSegment is the best if fits the needs. Maybe if in a
future we implement using primitives some things it could be more
comparable. As someones said, Fuel is more similar to Parcels, in fact
we started working based on this paper:
http://scg.unibe.ch/archive/papers/Mira05aParcels.pdf

About if "objects are much more times loaded than stored", yes... Fuel
is not the universal serialization solution, but I believe it could be
useful in some cases.

Beside of the speed, I think that is also important the "user
experience" part of the purpose ("is worth to spend time while storing
in order to have faster loading and user experience"), in the sense of
giving more features that makes easier the life of the developer to
share their object structures.

Maybe Fuel could give more flexibility than ImageSegment to select
which objects of the graph store and which not. It could let the user
define custom rules.

For example if FuelFameExtension is installed, when Fuel stores an
object first looks at the Fame metadescription of that object and then
only stores the "not derived attributes" of that object. That was
necessary to export and import properly Moose models.

Martin



On Thu, Dec 9, 2010 at 3:58 PM, Igor Stasenko <[hidden email]> wrote:

> 2010/12/9 Levente Uzonyi <[hidden email]>:
>> On Thu, 9 Dec 2010, Stéphane Ducasse wrote:
>>
>>> BTW
>>> when giving feedback consider that the guy doing that is spending a lot of
>>> time and this will be his master
>>> and that the code was not optimize and that there is no dedicated
>>> primitive in play.
>>>
>>> So we will see at the end and I was thinking that our little community
>>> would be much more positive but
>>> we will continue because we believe that there is some value in that.
>>
>> Don't get me wrong, I'm not saying that Fuel is not useful. I'm saying that
>> improving code loading performance is not that important.
>>
>
> Sort of.
> But what is most important, i think that you can exchange objects
> between images.
> MC really allows you to exchange only with source code,
> while with Fuel, i think you could put any object/data into binary
> package, and don't bother with
> inventing the pervasive ways how to recreate complex (or big) data
> structures from array literals :)
>
> Another interesting aspect of binary format is that you can give
> binary to people without
> disclosing source code.. (waving to corporate world ;)
>
>>
>> Levente
>>
>>>
>>> Stef
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse
In reply to this post by Adrian Lienhard
sure I understood that like that.
I think that the use case for fuel is fast loading like VW parcels.

> Hi Martin,
>>>
>>> I took some application for which we use image segments to test Fuel
>>>
>>> - With Fuel serializing and writing to disk took 330s. File size is 16.1MB
>>> - With image segments saving takes 4s and the file size is 2.4MB
>>>
>>
>> but how are you using ImageSegment?  just the primitive?  because in order
>> to compare it to Fuel, you should write all objects, including
>> "outPointers". So you should use #writeForExportOn:  or similar...
>
> yes, this includes outPointers serialized with reference stream and writing to disk.
>
> Adrian
>
> BTW, I was just providing the numbers that I gathered when looking at Fuel (to see whether that could be interesting for our use case to replace image segments). This was not to say that Fuel is not on the right way or anything, but I though the numbers would be interesting for Martin because they show a real-world use case with a large number of objects. I know that Fuel is in an early stage of development and it doesn't (yet?) have a primitive/plugin to speed things up.
>


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse
In reply to this post by Tudor Girba
> Hi Martin,
>
> Nice project.
>
> I noticed that you have a package FuelFameExtension. Is this done for the Fame meta engine? If yes, I would be interested in testing it, especially that in the context of Moose we do load the objects significantly more often than we store them :).

:)
We tried to save and load some large moose models to see if it could be used for that :)
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse
In reply to this post by Levente Uzonyi-2
>> It does!
>> It seems that you did not work in VW2.5 and 3.0 and when parcels arrived loading was realllllly a big difference
>> I do not see why this would not the same with Fuel.
>
> No I didn't, but the version number of VW is around 7.x now, so I guess the CPUs and VMs are now several times faster. Does it really matter if it takes 200ms or 20ms to load a package?

When I load a package with MC so far I still notice it and I would not even think about it.
It should load in a unnoticeable amount of time.

Stef


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse
In reply to this post by Levente Uzonyi-2
Ok
even if I disagree.
We could imagine a system where we do not need the compiler to be present to load code.

>> BTW
>> when giving feedback consider that the guy doing that is spending a lot of time and this will be his master
>> and that the code was not optimize and that there is no dedicated primitive in play.
>>
>> So we will see at the end and I was thinking that our little community would be much more positive but
>> we will continue because we believe that there is some value in that.
>
> Don't get me wrong, I'm not saying that Fuel is not useful. I'm saying that improving code loading performance is not that important.
>
>
> Levente
>
>>
>> Stef
>>


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Levente Uzonyi-2
In reply to this post by Stéphane Ducasse
On Thu, 9 Dec 2010, Stéphane Ducasse wrote:

>>> It does!
>>> It seems that you did not work in VW2.5 and 3.0 and when parcels arrived loading was realllllly a big difference
>>> I do not see why this would not the same with Fuel.
>>
>> No I didn't, but the version number of VW is around 7.x now, so I guess the CPUs and VMs are now several times faster. Does it really matter if it takes 200ms or 20ms to load a package?
>
> When I load a package with MC so far I still notice it and I would not even think about it.
> It should load in a unnoticeable amount of time.

Do you think MC is "slow" because the Compiler is "slow"?


Levente

>
> Stef
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse
>>>> It does!
>>>> It seems that you did not work in VW2.5 and 3.0 and when parcels arrived loading was realllllly a big difference
>>>> I do not see why this would not the same with Fuel.
>>>
>>> No I didn't, but the version number of VW is around 7.x now, so I guess the CPUs and VMs are now several times faster. Does it really matter if it takes 200ms or 20ms to load a package?
>>
>> When I load a package with MC so far I still notice it and I would not even think about it.
>> It should load in a unnoticeable amount of time.
>
> Do you think MC is "slow" because the Compiler is "slow"?

not totally but I do not understand why this is obligatory to compile everything all the time.
Then Opal will be slower than the old comcrapiler.
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Levente Uzonyi-2
On Thu, 9 Dec 2010, Stéphane Ducasse wrote:

>>>>> It does!
>>>>> It seems that you did not work in VW2.5 and 3.0 and when parcels arrived loading was realllllly a big difference
>>>>> I do not see why this would not the same with Fuel.
>>>>
>>>> No I didn't, but the version number of VW is around 7.x now, so I guess the CPUs and VMs are now several times faster. Does it really matter if it takes 200ms or 20ms to load a package?
>>>
>>> When I load a package with MC so far I still notice it and I would not even think about it.
>>> It should load in a unnoticeable amount of time.
>>
>> Do you think MC is "slow" because the Compiler is "slow"?
>
> not totally but I do not understand why this is obligatory to compile everything all the time.
> Then Opal will be slower than the old comcrapiler.
>
I did a quick test where it took 4294ms to load RoelTyper + OCompletion
from disk (no socket creation, no network latency, etc). These packages
contain 790 methods. 5.1% of the total time was spent for compilation,
that's 219ms. The rest is used by other stuff like:
- loading the files from disk
- writing the source code to disk (sources/changes files)
- evaluating class side #initialize methods
- processing system change notifications

Another 51ms was spent in creating 63 classes and 107ms to install the
CompiledMethods to the classes. So that's 219+51+107 = 377ms for creating
all classes and their methods. The rest is administration which can not be
avoided by binary loading.

Even if binary loading is 10x faster than compiling the code (which I
doubt), then you save 339ms. So it would take only 3917ms to load these
packages. That would save you 7.9% of the total time.


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Stéphane Ducasse
thanks for the analysis so we will have to speed it up too.

>>>>>> It does!
>>>>>> It seems that you did not work in VW2.5 and 3.0 and when parcels arrived loading was realllllly a big difference
>>>>>> I do not see why this would not the same with Fuel.
>>>>>
>>>>> No I didn't, but the version number of VW is around 7.x now, so I guess the CPUs and VMs are now several times faster. Does it really matter if it takes 200ms or 20ms to load a package?
>>>>
>>>> When I load a package with MC so far I still notice it and I would not even think about it.
>>>> It should load in a unnoticeable amount of time.
>>>
>>> Do you think MC is "slow" because the Compiler is "slow"?
>>
>> not totally but I do not understand why this is obligatory to compile everything all the time.
>> Then Opal will be slower than the old comcrapiler.
>>
>
> I did a quick test where it took 4294ms to load RoelTyper + OCompletion from disk (no socket creation, no network latency, etc). These packages contain 790 methods. 5.1% of the total time was spent for compilation, that's 219ms. The rest is used by other stuff like:
> - loading the files from disk
> - writing the source code to disk (sources/changes files)
> - evaluating class side #initialize methods
> - processing system change notifications
>
> Another 51ms was spent in creating 63 classes and 107ms to install the CompiledMethods to the classes. So that's 219+51+107 = 377ms for creating all classes and their methods. The rest is administration which can not be avoided by binary loading.
>
> Even if binary loading is 10x faster than compiling the code (which I doubt), then you save 339ms. So it would take only 3917ms to load these packages. That would save you 7.9% of the total time.
>
>
> Levente


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Adrian Lienhard
In reply to this post by tinchodias

On Dec 9, 2010, at 16:25 , Martin Dias wrote:

> Hi all,
>
> Thank you for the discussion, is very interesting for me.
>
> Thanks Adrian, I don't have benchmarks with ImageSegment, so I like to
> see the numbers, how could I reproduce that benchmarks? it would be
> useful for me.

This is the essential part of our code (in a subclass of ImageSegment) that writes out our model starting from the root anObject

basicSave: anObject
        | stream temp symbolHolder |
        symbolHolder := Symbol allSymbols.
        self
                copyFromRoots: (Array with: anObject)
                sizeHint: self fileSize // 2
                areUnique: true.
        state = #activeCopy
                ifFalse: [ ^ self logger error: 'wrong serializer state' ].
        temp := endMarker.
        endMarker := nil.
        stream := FileStream forceNewFileNamed: fileName.
        [ stream fileOutClass: nil andObject: self ]
                ensure: [ stream close ].
        endMarker := temp.

Loading is done as follows:

| stream streamContents mode object |
stream := aDirectory oldFileOrNoneNamed: filename.
streamContents := stream fileInObjectAndCode.
^ streamContents install

I just timed how long it takes to run these method.

HTH,
Adrian


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Adrian Lienhard
In reply to this post by Levente Uzonyi-2
>>> Even if Fuel can be 10x faster, it doesn't really make a difference IMHO.
>>
>> It would be interesting to thoroughly profile MC to figure where it spends all its time (with large projects it gets very very slow, like several minutes to just show the merge diffs between two branches).
>
> I guess those days are over when MC spends minutes doing this, it's at most a few seconds for large packages. The 1.5MB Morphic package of Squeak can be compared to another really old version (changes) in 3 seconds. According to MessageTally 50% of the time is spend in getting the timeStamp for the methods.

Sure, that's for one package. But we don't just put everything in one package; we have like 20 packages for a large project. 20 x 3s is already a minute. Large ancestry and HTTP repository also has an influence. I've done several small optimizations (like adding caching) to speed things up. But still, MC speed in our case is rather measured in minutes than seconds :(

Adrian

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Tristan Bourgois
In reply to this post by tinchodias


Hi Martin,

Looks very interesting. Reading the material you posted, the question
that jumps out at me is this: from the benchmarks, it's clear that
materializing is faster than serializing, but it's not clear why. How
does separating the nodes from the edges of the object graph make
materializing faster?

Thanks,

Colin

Hi!

Separate the nodes from the edges of the object permit during the serialization
to pass only one time on the byte code.
So first we pass on the node so we recreate the objects/nodes and second when
we pass on the edges we recreate the links between the objects.

I hope I answered your question :)

Tristan

Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
Je crée ma boîte mail www.laposte.net


Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

tinchodias
In reply to this post by tinchodias
Hi Colin

Another thing to consider are the clusters. I think they are good
because they avoid a lot of overhead. Following the example used in
the slides, for serializing N rectangles, it writes:

- in vertexes section:
'Rectangle'
'origin'
'corner'
N

- in edges section: 2*N indexes (references to the origin and corner points)


Without clustering, I think each rectangle stored should have some
header to say that its an instance of Rectangle.

Martin


On Fri, Dec 10, 2010 at 11:03 AM, Tristan Bourgois
<[hidden email]> wrote:

>
>> Hi Martin,
>>
>> Looks very interesting. Reading the material you posted, the question
>> that jumps out at me is this: from the benchmarks, it's clear that
>> materializing is faster than serializing, but it's not clear why. How
>> does separating the nodes from the edges of the object graph make
>> materializing faster?
>>
>> Thanks,
>>
>> Colin
>
> Hi!
>
> Separate the nodes from the edges of the object permit during the
> serialization
> to pass only one time on the byte code.
> So first we pass on the node so we recreate the objects/nodes and second
> when
> we pass on the edges we recreate the links between the objects.
>
> I hope I answered your question :)
>
> Tristan
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Eliot Miranda-2
In reply to this post by Levente Uzonyi-2
Hi Levente,

2010/12/9 Levente Uzonyi <[hidden email]>
On Thu, 9 Dec 2010, Stéphane Ducasse wrote:

It does!
It seems that you did not work in VW2.5 and 3.0 and when parcels arrived loading was realllllly a big difference
I do not see why this would not the same with Fuel.

No I didn't, but the version number of VW is around 7.x now, so I guess the CPUs and VMs are now several times faster. Does it really matter if it takes 200ms or 20ms to load a package?

When I load a package with MC so far I still notice it and I would not even think about it.
It should load in a unnoticeable amount of time.

Do you think MC is "slow" because the Compiler is "slow"?

not totally but I do not understand why this is obligatory to compile everything all the time.
Then Opal will be slower than the old comcrapiler.


I did a quick test where it took 4294ms to load RoelTyper + OCompletion from disk (no socket creation, no network latency, etc). These packages contain 790 methods. 5.1% of the total time was spent for compilation, that's 219ms. The rest is used by other stuff like:
- loading the files from disk
- writing the source code to disk (sources/changes files)
- evaluating class side #initialize methods
- processing system change notifications

Another 51ms was spent in creating 63 classes and 107ms to install the CompiledMethods to the classes. So that's 219+51+107 = 377ms for creating all classes and their methods. The rest is administration which can not be avoided by binary loading.

Even if binary loading is 10x faster than compiling the code (which I doubt), then you save 339ms. So it would take only 3917ms to load these packages. That would save you 7.9% of the total time.

One of the important features of the VW parcels work is that one does not write the source code to the changes file.  Instead teh system has a SourceFileManager that works like a dictionary mapping file indices to source files, so instead of the two element SourceFileArray one has an arbitrarily large collection of files.  When a parcel file is loaded its source file is added to the SFM which returns an index and then all the file pointers in the methods in the parcel are swizzled to refer to their sources' position in their files' index.  We changed the format of file pointers so something resembling four floating-point formats so we can have lots of small files (more space for file indices) and a few large files (more space for file offsets) before one overflows into large integer file pointers.

Now Igor's trailer work makes this approach feasible but one wouldn't need the funky floating-point format stuff because one could easily allocate, say, 5 bytes to the file pointer, two for the file index for a maximum of 64k source files, and 3 for the file offset for a maximum of 16m of source per parcel source file (I think splitting the 40 bits as 14:26 might be better but that's details).  One needs to predetermine the size of the file pointer so that the trailer can be modified in place since changing the length of a trailer means allocating a new comp[iled method and that will be slow.

If this approach is taken how much does that change your analysis?

Note that not writing source to the changes file has ancilliary benefits; change recovery is now not polluted with package loads and the changes file does not grow as packages are added, only as one's changes are made.  Unloading a package doesn't leave garbage in the changes files.

There are downsides.  Deploying a development image means deploying all the associated parcel source files as well, and for this a platform-independent Filename abstraction really helps.

best
Eliot



Levente

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Levente Uzonyi-2
In reply to this post by Adrian Lienhard
On Fri, 10 Dec 2010, Adrian Lienhard wrote:

>>>> Even if Fuel can be 10x faster, it doesn't really make a difference IMHO.
>>>
>>> It would be interesting to thoroughly profile MC to figure where it spends all its time (with large projects it gets very very slow, like several minutes to just show the merge diffs between two branches).
>>
>> I guess those days are over when MC spends minutes doing this, it's at most a few seconds for large packages. The 1.5MB Morphic package of Squeak can be compared to another really old version (changes) in 3 seconds. According to MessageTally 50% of the time is spend in getting the timeStamp for the methods.
>
> Sure, that's for one package. But we don't just put everything in one package; we have like 20 packages for a large project. 20 x 3s is already a minute. Large ancestry and HTTP repository also has an influence. I've done several small optimizations (like adding caching) to speed things up. But still, MC speed in our case is rather measured in minutes than seconds :(

Note that this is not package loading, but viewing the changes. And the 3
seconds is for a 1.5MB package. I doubt you have 20 x 1.5MB packages.

Btw Keith's MC1.5/1.6 is really impressive. With Squeak 3.10.2 (no
closures, no cog, no buffered files, etc) it takes 6.8 seconds to load
RoelTyper + OCompletion. In a current Squeak image that takes 4.3 seconds
with Cog.


Levente

>
> Adrian
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Levente Uzonyi-2
In reply to this post by Eliot Miranda-2
On Fri, 10 Dec 2010, Eliot Miranda wrote:

> Hi Levente,
>
> 2010/12/9 Levente Uzonyi <[hidden email]>
>
>> On Thu, 9 Dec 2010, Stéphane Ducasse wrote:
>>
>>  It does!
>>>>>>> It seems that you did not work in VW2.5 and 3.0 and when parcels
>>>>>>> arrived loading was realllllly a big difference
>>>>>>> I do not see why this would not the same with Fuel.
>>>>>>>
>>>>>>
>>>>>> No I didn't, but the version number of VW is around 7.x now, so I guess
>>>>>> the CPUs and VMs are now several times faster. Does it really matter if it
>>>>>> takes 200ms or 20ms to load a package?
>>>>>>
>>>>>
>>>>> When I load a package with MC so far I still notice it and I would not
>>>>> even think about it.
>>>>> It should load in a unnoticeable amount of time.
>>>>>
>>>>
>>>> Do you think MC is "slow" because the Compiler is "slow"?
>>>>
>>>
>>> not totally but I do not understand why this is obligatory to compile
>>> everything all the time.
>>> Then Opal will be slower than the old comcrapiler.
>>>
>>>
>> I did a quick test where it took 4294ms to load RoelTyper + OCompletion
>> from disk (no socket creation, no network latency, etc). These packages
>> contain 790 methods. 5.1% of the total time was spent for compilation,
>> that's 219ms. The rest is used by other stuff like:
>> - loading the files from disk
>> - writing the source code to disk (sources/changes files)
>> - evaluating class side #initialize methods
>> - processing system change notifications
>>
>> Another 51ms was spent in creating 63 classes and 107ms to install the
>> CompiledMethods to the classes. So that's 219+51+107 = 377ms for creating
>> all classes and their methods. The rest is administration which can not be
>> avoided by binary loading.
>>
>> Even if binary loading is 10x faster than compiling the code (which I
>> doubt), then you save 339ms. So it would take only 3917ms to load these
>> packages. That would save you 7.9% of the total time.
>>
>
> One of the important features of the VW parcels work is that one does not
> write the source code to the changes file.  Instead teh system has a
> SourceFileManager that works like a dictionary mapping file indices to
> source files, so instead of the two element SourceFileArray one has an
> arbitrarily large collection of files.  When a parcel file is loaded its
> source file is added to the SFM which returns an index and then all the file
> pointers in the methods in the parcel are swizzled to refer to their
> sources' position in their files' index.  We changed the format of file
> pointers so something resembling four floating-point formats so we can have
> lots of small files (more space for file indices) and a few large files
> (more space for file offsets) before one overflows into large integer file
> pointers.
>
> Now Igor's trailer work makes this approach feasible but one wouldn't need
> the funky floating-point format stuff because one could easily allocate,
> say, 5 bytes to the file pointer, two for the file index for a maximum of
> 64k source files, and 3 for the file offset for a maximum of 16m of source
> per parcel source file (I think splitting the 40 bits as 14:26 might be
> better but that's details).  One needs to predetermine the size of the file
> pointer so that the trailer can be modified in place since changing the
> length of a trailer means allocating a new comp[iled method and that will be
> slow.
>
> If this approach is taken how much does that change your analysis?
Writing the sources takes 1.6% of total time according to MessageTally.
Significant amount of time is spent in finalization (12.8%) and GCs (15%).
But note that MessageTally is not reliable nowadays.

>
> Note that not writing source to the changes file has ancilliary benefits;
> change recovery is now not polluted with package loads and the changes file
> does not grow as packages are added, only as one's changes are made.
> Unloading a package doesn't leave garbage in the changes files.
>
> There are downsides.  Deploying a development image means deploying all the
> associated parcel source files as well, and for this a platform-independent
> Filename abstraction really helps.

Thanks for the explanation. I wonder how the previous versions of a method
can be found using parcels.


Levente

>
> best
> Eliot
>
>
>>
>> Levente
>
Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Mariano Martinez Peck

If this approach is taken how much does that change your analysis?
 
But note that MessageTally is not reliable nowadays.



Why ?  since when ?



 

Note that not writing source to the changes file has ancilliary benefits;
change recovery is now not polluted with package loads and the changes file
does not grow as packages are added, only as one's changes are made.
Unloading a package doesn't leave garbage in the changes files.

There are downsides.  Deploying a development image means deploying all the
associated parcel source files as well, and for this a platform-independent
Filename abstraction really helps.

Thanks for the explanation. I wonder how the previous versions of a method can be found using parcels.


Levente


best
Eliot



Levente

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Levente Uzonyi-2
On Fri, 10 Dec 2010, Mariano Martinez Peck wrote:

>> If this approach is taken how much does that change your analysis?
>>>
>>
>
>> But note that MessageTally is not reliable nowadays.
>>
>>
>>
> Why ?  since when ?

It's like this since a long time, but it seems to be worse with Cog. See
http://bugs.squeak.org/view.php?id=7515 for details.


Levente

>
>
>
>
>
>>
>>> Note that not writing source to the changes file has ancilliary benefits;
>>> change recovery is now not polluted with package loads and the changes
>>> file
>>> does not grow as packages are added, only as one's changes are made.
>>> Unloading a package doesn't leave garbage in the changes files.
>>>
>>> There are downsides.  Deploying a development image means deploying all
>>> the
>>> associated parcel source files as well, and for this a
>>> platform-independent
>>> Filename abstraction really helps.
>>>
>>
>> Thanks for the explanation. I wonder how the previous versions of a method
>> can be found using parcels.
>>
>>
>> Levente
>>
>>
>>> best
>>> Eliot
>>>
>>>
>>>
>>>> Levente
>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Fuel - a fast object deployment tool

Eliot Miranda-2
In reply to this post by Levente Uzonyi-2


2010/12/10 Levente Uzonyi <[hidden email]>
On Fri, 10 Dec 2010, Eliot Miranda wrote:

Hi Levente,

2010/12/9 Levente Uzonyi <[hidden email]>

On Thu, 9 Dec 2010, Stéphane Ducasse wrote:

 It does!
It seems that you did not work in VW2.5 and 3.0 and when parcels
arrived loading was realllllly a big difference
I do not see why this would not the same with Fuel.


No I didn't, but the version number of VW is around 7.x now, so I guess
the CPUs and VMs are now several times faster. Does it really matter if it
takes 200ms or 20ms to load a package?


When I load a package with MC so far I still notice it and I would not
even think about it.
It should load in a unnoticeable amount of time.


Do you think MC is "slow" because the Compiler is "slow"?


not totally but I do not understand why this is obligatory to compile
everything all the time.
Then Opal will be slower than the old comcrapiler.


I did a quick test where it took 4294ms to load RoelTyper + OCompletion
from disk (no socket creation, no network latency, etc). These packages
contain 790 methods. 5.1% of the total time was spent for compilation,
that's 219ms. The rest is used by other stuff like:
- loading the files from disk
- writing the source code to disk (sources/changes files)
- evaluating class side #initialize methods
- processing system change notifications

Another 51ms was spent in creating 63 classes and 107ms to install the
CompiledMethods to the classes. So that's 219+51+107 = 377ms for creating
all classes and their methods. The rest is administration which can not be
avoided by binary loading.

Even if binary loading is 10x faster than compiling the code (which I
doubt), then you save 339ms. So it would take only 3917ms to load these
packages. That would save you 7.9% of the total time.


One of the important features of the VW parcels work is that one does not
write the source code to the changes file.  Instead teh system has a
SourceFileManager that works like a dictionary mapping file indices to
source files, so instead of the two element SourceFileArray one has an
arbitrarily large collection of files.  When a parcel file is loaded its
source file is added to the SFM which returns an index and then all the file
pointers in the methods in the parcel are swizzled to refer to their
sources' position in their files' index.  We changed the format of file
pointers so something resembling four floating-point formats so we can have
lots of small files (more space for file indices) and a few large files
(more space for file offsets) before one overflows into large integer file
pointers.

Now Igor's trailer work makes this approach feasible but one wouldn't need
the funky floating-point format stuff because one could easily allocate,
say, 5 bytes to the file pointer, two for the file index for a maximum of
64k source files, and 3 for the file offset for a maximum of 16m of source
per parcel source file (I think splitting the 40 bits as 14:26 might be
better but that's details).  One needs to predetermine the size of the file
pointer so that the trailer can be modified in place since changing the
length of a trailer means allocating a new comp[iled method and that will be
slow.

If this approach is taken how much does that change your analysis?

Writing the sources takes 1.6% of total time according to MessageTally. Significant amount of time is spent in finalization (12.8%) and GCs (15%). But note that MessageTally is not reliable nowadays.



Note that not writing source to the changes file has ancilliary benefits;
change recovery is now not polluted with package loads and the changes file
does not grow as packages are added, only as one's changes are made.
Unloading a package doesn't leave garbage in the changes files.

There are downsides.  Deploying a development image means deploying all the
associated parcel source files as well, and for this a platform-independent
Filename abstraction really helps.

Thanks for the explanation. I wonder how the previous versions of a method can be found using parcels.

I hacked a dreadful implementation of overrides in vw3.0 and I don't think things are much better now.  But in http://www.mail-archive.com/pharo-project@.../msg17714.html I sketched how I think it should be done:

Maintain a global package load order (a stack of loaded packages, removing interior elements on unload).
Maintain a dictionary from method reference to set of package/method pairs for each method that is overridden.
When a package is removed search overrides and compute the overridden methods to be reinstalled, computing the uppermost method depending on the new package order.

So to answer your question, one finds the previous versions directly in the overrides dictionary, and sorts the results according to the current package load order.

best
Eliot


Levente


best
Eliot



Levente

1234