Hi all, I'm just wondering. |
Thierry Goubier wrote:
> Hi all, > > I'm just wondering. > > Would it work to have a package format based on Fuel? > > Would that make loading faster? > > Does it already exist? > > Thanks, > > Thierry What is your use case for needing it faster? * Just every day developer use? * CI automation? * Something in production? * Other? cheers -ben |
In reply to this post by Thierry Goubier
Hi Thierry,
On Thu, Dec 4, 2014 at 2:31 PM, Thierry Goubier <[hidden email]> wrote:
It doesn't have to be based on Fuel. Monticello .mcz's are just zips that contain any number of files: Archive: trunkpackages/Kernel.spur-eem.866.mcz Length Date Time Name -------- ---- ---- ---- 15 08-07-14 09:12 package 294780 08-07-14 09:12 version 1370995 08-07-14 09:12 snapshot/source.st 1442870 08-07-14 09:12 snapshot.bin -------- ------- 3108660 4 files Right now package and version are juts textual, and you've been complaining about the parsing speed. It would be easy to extend the package to include e.g. version.fuel. It is also a huge advantage to keep both the .bin (a not-as-fast binary loading format similar in use to Fuel but more primitive) along side the source code, because the source can be extracted without a running system, and can be compiled if, for some reason, binary loading fails. So instead of changing the format you could simply augment it with a snapshot.fuel, modifying the writer to include it, and modify the reader to use the .fuel if it exists.
Yes. But not hugely because the .bin already loads faster than the .st. You might do some profiling?
Much of what you need is there. Use the Pharo, Luke.
best,
Eliot |
2014-12-05 0:08 GMT+01:00 Eliot Miranda <[hidden email]>:
Yes, or use an entirely different format. Whatever works. FileTree demonstrate that this can be done, and GitFileTree demonstrated that it could totally reproduce the mcz view on another tech repo. I'm more annoyed by the consequences of loading the full version file in the image.
Good suggestion. This is what I wanted to know.
This is a part of Pharo where I know my way :) Thanks, Thierry
|
In reply to this post by Ben Coman
2014-12-04 23:43 GMT+01:00 Ben Coman <[hidden email]>: The three first ones. Anything which improves loading speed / reduces memory footprint, really.
Maybe the point to target is the version metadata. Thierry |
In reply to this post by Thierry Goubier
> Am 04.12.2014 um 23:31 schrieb Thierry Goubier <[hidden email]>: > > Hi all, > > I'm just wondering. > > Would it work to have a package format based on Fuel? > I doubt it would work cross platform. I don't know how fuel serializes WideString, LargePositiveInteger, BoxedFloat64. These differ between smalltalk platforms. The source as string solves that. Strings are written as unicode string and numbers as certain number format etc. Norbert > Would that make loading faster? > > Does it already exist? > > Thanks, > > Thierry |
> On 05 Dec 2014, at 08:02, Norbert Hartl <[hidden email]> wrote: > > > > >> Am 04.12.2014 um 23:31 schrieb Thierry Goubier <[hidden email]>: >> >> Hi all, >> >> I'm just wondering. >> >> Would it work to have a package format based on Fuel? >> > I doubt it would work cross platform. I don't know how fuel serializes WideString, LargePositiveInteger, BoxedFloat64. These differ between smalltalk platforms. The source as string solves that. Strings are written as unicode string and numbers as certain number format etc. True. Cross dialect loading isn’t something we encourage to do with Fuel. @Thierry You said something about partial loading: that will never be possible with Fuel because of its pickle format. To select any partial graph you have to first read the entire file and rebuild the graph first (ok, you don’t strictly have to build the graph but you still need to read the entire file). > > Norbert > >> Would that make loading faster? >> >> Does it already exist? >> >> Thanks, >> >> Thierry > |
2014-12-05 8:35 GMT+01:00 Max Leske <[hidden email]>:
Hum. I was thinking of partial loading at the application level, not at the marshalling level. The process of cutting out Monticello packages versions ancestry in Pharo releases come to mind as a ... how should I say that ... We can/should do better (additionally, because if the mcz isn't available somewhere, having its ref its useless). Thierry
|
I will summon Martin and Mariano here :). Introducing fuel in the loading infrastructure had so far AFAIK two different experimental setups: - tanker: a package completely written in fuel - mixing tanker/fuel with monticello. However, the results were not so promising I remember. Apparently when loading a package, most of the time is spent not in the deserialization/recompilation but in the update of the system (system dictionary, categories, rpackages, update the corresponding subclass relationships). Just try the following: load a monticello package with (a) no Nautilus opened, and with (a) 10 Nautilus opened. I let Martin and Mariano give the details :). On Fri, Dec 5, 2014 at 9:28 AM, Thierry Goubier <[hidden email]> wrote:
|
2014-12-05 10:55 GMT+01:00 Guillermo Polito <[hidden email]>:
Oh, I missed that one, yes. I remember the name.
Sort of what I'm thinking about.
I profiled AltBrowser for that use case... It's a bit more vulnerable to that than Nautilus because you may see more of the structure at a given time than Nautilus (such as having a single browser watching the methods of two classes). Still, once a bit optimised, MC loading time dominates. But I haven't looked too closely on exactly what; I was focused on getting the browser bit as small as possible. If anybody has things about the various costs, I'd be interested. Thierry |
In reply to this post by Guillermo Polito
On Fri, Dec 5, 2014 at 6:55 AM, Guillermo Polito <[hidden email]> wrote:
This one is the one I explain below.
I think here you mean an experiment where we serialized the little thing MC serializes currently in the data file... We found that that serialization/materialization is INSIGNIFICANT in the export/import process. Ok... so.. Indeed: Tanker. Fuel is able to serialize methods, classes etc... (but not by default). So what Tanker does it kind of serialize a whole package (with internal classes, extension methods, etc) as a fuel file. And then, at the import, it materializes and then runs a lot of stuff besides the materialization itself. The idea was NOT to use the Compiler at all at import time, because that is/was one of the bottleneck when importing packages with Metacello. The compiler was not needed most of the times, so performance was much better. However, there were cases (like instVars changed in superclasses where we are importing) etc.. that causes that we needed to RE-Compile or stored methods because otherwise the instVar offsets were wrong. This, COULD be solved this the intermediate representation of Opal and to avoid a normal compilation. However, I don't remember us doing that. I think we ended up using the compiler for that scenario. But I am not sure.
Exactly. The ones we found that took most of the time was: 1) Compiling (could be solved with Fuel and even more if we can fix some scenarios with Opal) 2) Notification of created methods/classes. One way to solve this is by having bulk notifications where we notify a list rather than a single object. But we should also adapt the observers. 3) #become: when updating/migrating existing instances. But...just to see that there is a light in the tunnel, I showed at... mmm ESUG or PharoConf or.. how to load WHOLE seaside in 10 seconds...which in my machine takes like 20 minutes. Tanker was originally called "FuelPackageLoader" and was then renamed. Some useful links with details:
Exactly!!! Sometimes the slowdown of the notifications is not in the observable but on the observers ;)
|
On Fri, Dec 5, 2014 at 10:07 AM, Mariano Martinez Peck <[hidden email]> wrote:
I forgot to said...this one could be improved once the new VM Eliot is doing has the lazy become.. So.... adapting Tanker to use Opal to avoid compilation in special cases, having a bulk notification and update a few observers (like test runner or nautilus), and a lazy become... that would bring a incredible change.
|
> On 05 Dec 2014, at 14:09, Mariano Martinez Peck <[hidden email]> wrote: > > Mariano > http://marianopeck.wordpress.com Hey Mariano, Welcome back ! (Not that you left, but I read your latest blog entry) We miss(ed) your expertise and your work. Regards, Sven |
In reply to this post by Mariano Martinez Peck
2014-12-05 14:07 GMT+01:00 Mariano Martinez Peck <[hidden email]>:
Ok.
What is the state of that for, say, Pharo 4?
They can be adapted. You can also pile up notifications (well, you can't at the moment, but it would be nice) and have a way to merge them. Such optimisations are also usefull for heavy code generators such as SmaCC.
Which is what I wanted to know :)
Thanks for the links!
When, in the profile, the observers appear and you can optimize that. One of the way is to have a single central point (i.e. whatever the number of Nautilus instances opened, you send to a singleton), then memoize all queries you do on RPackage and friends before firing again notifications for the views (relevant package, class, etc..).
In short, you have the thing. Thierry |
In reply to this post by Max Leske
On Thu, Dec 4, 2014 at 11:35 PM, Max Leske <[hidden email]> wrote:
Hi Max, you misunderstand what partial loading means in this context. I designed and implemented partial loading for the Parcel system in VW, and Fuel is essentially a clean reimplementation of parcels. The idea is that one does indeed read the entire object graph, but then only the parts of the graph that mate with the current class hierarchy are installed, and the bits that don't fit are stored for later. Why? So that a component can define extensions on classes in components that may not be present. Why? So that one can maintain a single logical component in a single package instead of decomposing it into independently loadable fragments. Lets take an example like Fuel itself. This may have specialised marshalling and unmarshalling extensions defined on may classes, some of which may be to do with the GUI. If we have partial loading we can load Fuel into a headless image. The extensions on the GUI classes will simply not be installed, *until* we load the GUI. Hence Fuel does not have to be decomposed every time we factor the system into subcomponents. Without partial loading Fuel must be decomposed into a series of fragments so that it can load that part that fits into the headless base, and we have to manage teh dependency to ensure the fragment that loads against the GUI is loaded when the GUI is loaded. Worse still, if we cut the GUI into two (e.g. development vs deployment tools) we have to visit the Fuel GUI component (and potentially many other component) and decompose it into two pieces. In practice this is extremely costly to maintain, verging on chaos. With partial loading things are simple. Perhaps I should have called it partial installation, but you get the idea.
best,
Eliot |
On Sat, Dec 6, 2014 at 12:37 AM, Eliot Miranda <[hidden email]> wrote:
Eliot, I do agree with all you said. But let me say a few bits please to reduce misunderstandings. We consider Fuel itself completely isolated from what is exporting/importing a package. In fact, as you know, by default Fuel serializes classes and methods "globally". So...all this discussion should actually be at another level... say Tanker level (or whatever tool we build on top of Fuel to manage packages). This is a difference to Parcels, which even if it was able to also serialize regular objects graphs, it was more tight to packages. In addition, as you said, Max is right in the sense that at FUEL level, the whole graph would have to be materialized. IN THE CASE OF TANKER, or whatever tool for packages, we may NOT install all parts, as you said. But they will indeed be materialized. So... to sum out, I think the clearer way of saying it is: - Fuel must have to materialize the whole graph anyway (at least as it is now). - Tanker or whatever tool for managing packages will also have to materialize all, but should be able to "install" only a part of it. The example you said about Fuel is very clear. Soo... I think we all agree. Just that we use a different naming. Best regards,
|
In reply to this post by Eliot Miranda-2
Makes sense. It’s just that I often get asked if it is possible to read / update only parts of a Fuel file, so I wanted to make it clear that that can’t work.
Thanks for the explanation.
|
In reply to this post by Thierry Goubier
There was a Tanker prototype.
Now Levente did long time ago an analysis and he mentioned that the compilation was not that slow. Now I would love to have a binary format (fuel) as part of the mcz. because when we started Fuel it was the goal (I remember being shocked by the speed of Parcels 30 min to load RB in VW25 and 3 seconds in VW 30) Stef > Hi all, > > I'm just wondering. > > Would it work to have a package format based on Fuel? > > Would that make loading faster? > > Does it already exist? > > Thanks, > > Thierry |
In reply to this post by Ben Coman
> What is your use case for needing it faster? > * Just every day developer use? > * CI automation? > * Something in production? Everything :) > cheers -ben > > |
In reply to this post by Mariano Martinez Peck
Thanks Mariano.
If we could optimize common case = no change in superclass shape change and get that one fast it would be so great. Stef |
Free forum by Nabble | Edit this page |