A new version of Monticello was added to project The Inbox:
http://source.squeak.org/inbox/Monticello-mva.667.mcz ==================== Summary ==================== Name: Monticello-mva.667 Author: mva Time: 6 April 2017, 8:41:09.494386 pm UUID: 0075cba1-70ff-4e10-9be6-0c01f85dc85a Ancestors: Monticello-eem.666 New style diffy version (*.mcd): Prune ancestors version infos in the info of the base of the diff when writing a diffy version. Graft them back from the diff's base version when reading a diffy version unless base info already has ancestors (old-style diffy version with complete version history info) in which case leave them alone. =============== Diff against Monticello-eem.666 =============== Item was added: + ----- Method: MCMcdReader>>loadVersionInfo (in category 'loading') ----- + loadVersionInfo + | baseInfo | + super loadVersionInfo. + baseInfo := self baseInfo. + info graftAncestorsTo: baseInfo from: + (MCRepositoryGroup default versionWithInfo: baseInfo) info! Item was added: + ----- Method: MCMcdWriter>>writeVersion: (in category 'visiting') ----- + writeVersion: aVersion + self writeFormat. + self writePackage: aVersion package. + self writeVersionInfo: + (aVersion info veryDeepCopy + pruneAncestorsFrom: aVersion baseInfo). + self writeDefinitions: aVersion. + aVersion dependencies do: [:ea | self writeVersionDependency: ea]! Item was added: + ----- Method: MCVersionInfo>>ancestors: (in category 'accessing') ----- + ancestors: anObject + ancestors := anObject! Item was added: + ----- Method: MCVersionInfo>>graftAncestorsTo:from: (in category 'copying') ----- + graftAncestorsTo: aBaseVersionInfo from: aVersionInfo + (self allAncestors select: [:e | e = aBaseVersionInfo]) + do: [:e | e ancestors isEmpty ifTrue: [e ancestors: aVersionInfo ancestors]]! Item was added: + ----- Method: MCVersionInfo>>pruneAncestorsFrom: (in category 'copying') ----- + pruneAncestorsFrom: aBaseVersionInfo + (self allAncestors select: [:e | e = aBaseVersionInfo]) + do: [:e | e ancestors: #()] + + ! |
Hi,
> On 06.04.2017, at 21:07, [hidden email] wrote: > > A new version of Monticello was added to project The Inbox: > http://source.squeak.org/inbox/Monticello-mva.667.mcz > > ==================== Summary ==================== > > Name: Monticello-mva.667 > Author: mva > Time: 6 April 2017, 8:41:09.494386 pm > UUID: 0075cba1-70ff-4e10-9be6-0c01f85dc85a > Ancestors: Monticello-eem.666 > > New style diffy version (*.mcd): Prune ancestors version infos in the info of the base of the diff when writing a diffy version. Graft them back from the diff's base version when reading a diffy version unless base info already has ancestors (old-style diffy version with complete version history info) in which case leave them alone. Might the submitter want to explain their idea? :) Best regards -Tobias > > =============== Diff against Monticello-eem.666 =============== > > Item was added: > + ----- Method: MCMcdReader>>loadVersionInfo (in category 'loading') ----- > + loadVersionInfo > + | baseInfo | > + super loadVersionInfo. > + baseInfo := self baseInfo. > + info graftAncestorsTo: baseInfo from: > + (MCRepositoryGroup default versionWithInfo: baseInfo) info! > > Item was added: > + ----- Method: MCMcdWriter>>writeVersion: (in category 'visiting') ----- > + writeVersion: aVersion > + self writeFormat. > + self writePackage: aVersion package. > + self writeVersionInfo: > + (aVersion info veryDeepCopy > + pruneAncestorsFrom: aVersion baseInfo). > + self writeDefinitions: aVersion. > + aVersion dependencies do: [:ea | self writeVersionDependency: ea]! > > Item was added: > + ----- Method: MCVersionInfo>>ancestors: (in category 'accessing') ----- > + ancestors: anObject > + ancestors := anObject! > > Item was added: > + ----- Method: MCVersionInfo>>graftAncestorsTo:from: (in category 'copying') ----- > + graftAncestorsTo: aBaseVersionInfo from: aVersionInfo > + (self allAncestors select: [:e | e = aBaseVersionInfo]) > + do: [:e | e ancestors isEmpty ifTrue: [e ancestors: aVersionInfo ancestors]]! > > Item was added: > + ----- Method: MCVersionInfo>>pruneAncestorsFrom: (in category 'copying') ----- > + pruneAncestorsFrom: aBaseVersionInfo > + (self allAncestors select: [:e | e = aBaseVersionInfo]) > + do: [:e | e ancestors: #()] > + > + ! > > |
Hi Tobias,
Thanks for asking. So what is this Monticello-mva.667.mcz good for? The idea is... ... let's make mcd files smaller. As small as they can be. Only containing information relevant to the diff. No more. No less. And they can be. A lot smaller. An order of magnitude smaller compared with the in-trunk version. Think 8K instead of 80K for a mcd with one-line changes. If you have followed the instructions in http://lists.squeakfoundation.org/pipermail/squeak-dev/2017-April/194029.html to get a current squeak6.0 alpha, you will have seen files like Collections-eem.743(ul.742).mcd in your package-cache directory. If you look at their sizes, you will notice that they are much smaller than regular mcz files. For example. A standard snapshot mcz, Collections-ul.742.mcz is 485K. A diff mcd, Collections-eem.743(ul.742).mcd is only 84K. How do you create such files? Select a version in a Repository Browser, click Diff, select the version against which the diff should be made and you get a 'diffy version'. If you now click 'Copy' and copy it to a different directory repository, an mcd file will be stored there not an mcz file. Or if you yellow-click a directory repository in Monticello Browser and select 'store diffs' then whenever you select a version in Repository browser, click 'Copy' to copy to that directory repository, the version will be stored there as an mcd file. But what if I told you that that Collections-eem.743(ul.742).mcd could have been even smaller. A lot smaller. An order of magnitude smaller. Not 84K. Only 4.9K. With no loss of information? I have that converted version sitting on my disk right now. It were written out with this modification http://forum.world.st/The-Inbox-Monticello-mva-667-mcz-tt4941466.html http://source.squeak.org/inbox/Monticello-mva.667.mcz to Monticello. How could it be so small? Well, by trimming the information stored in the 'version' file in the mcd zip archive. This information grows over time as new versions are added and commit comments written. And if not trimmed will gradually take up a significant portion of the file's size. Especially for small changes. One-liners. And there's no real need to store it all in each mcd file. No information is lost. Because the information that is trimmed is readily available in the monticello version against which the diff mcd was made. So the trick is to trim on writing. And attach back from base version on reading. Disk space, network bandwidth is saved. The version's history appears the same as before when all that redundant information were saved in the mcd archive. So you can write out much smaller mcd files with this modification. Read them back in and the system will not know the difference. And if you have old mcd files sitting around with full version info history they are read in as before with no surprises. Does it make sense or which part needs better explanation? Best Regards, Milan Vavra |
I haven't looked at it, but would like to ask if you've tested when
you have multiple .mcd's in succession? Like, if you have, Kernel-cmm.100.mcz Kernel-cmm.101.mcd Kernel-cmm.102.mcd Does 102 need to have ancestry at least back to 101 (or, 100?) still stored? On Fri, Apr 7, 2017 at 6:38 AM, Milan Vavra via Squeak-dev <[hidden email]> wrote: > Hi Tobias, > > Thanks for asking. > > So what is this Monticello-mva.667.mcz good for? > > The idea is... > ... let's make mcd files smaller. > As small as they can be. > > Only containing information relevant to the diff. > > No more. No less. > > And they can be. A lot smaller. > > An order of magnitude smaller compared with the in-trunk version. > Think 8K instead of 80K for a mcd with one-line changes. > > If you have followed the instructions in > http://lists.squeakfoundation.org/pipermail/squeak-dev/2017-April/194029.html > to get a current squeak6.0 alpha, you will have seen files like > Collections-eem.743(ul.742).mcd in your package-cache directory. > > If you look at their sizes, you will notice that they are much smaller than > regular mcz files. > > For example. > A standard snapshot mcz, Collections-ul.742.mcz is 485K. > A diff mcd, Collections-eem.743(ul.742).mcd is only 84K. > > How do you create such files? > > Select a version in a Repository Browser, click Diff, select the > version against which the diff should be made and you get a 'diffy version'. > If you now click 'Copy' and copy it to a different directory repository, > an mcd file will be stored there not an mcz file. > > Or if you yellow-click a directory repository in Monticello Browser and > select 'store diffs' then whenever you select a version in Repository > browser, click 'Copy' to copy to that directory repository, the version > will be stored there as an mcd file. > > > But what if I told you that that Collections-eem.743(ul.742).mcd could have > been even smaller. A lot smaller. An order of magnitude smaller. > Not 84K. Only 4.9K. > With no loss of information? > > I have that converted version sitting on my disk right now. > > It were written out with this modification > http://forum.world.st/The-Inbox-Monticello-mva-667-mcz-tt4941466.html > http://source.squeak.org/inbox/Monticello-mva.667.mcz to Monticello. > > How could it be so small? > > Well, by trimming the information stored in the 'version' file in the mcd > zip archive. > > This information grows over time as new versions are added and commit > comments written. And if not trimmed will gradually take up a significant > portion of the file's size. Especially for small changes. One-liners. And > there's no real need to store it all in each mcd file. > > > No information is lost. Because the information that is trimmed is > readily available in the monticello version against which the diff mcd was > made. > > So the trick is to trim on writing. And attach back from base version on > reading. > > Disk space, network bandwidth is saved. > > The version's history appears the same as before when all that redundant > information were saved in the mcd archive. > > So you can write out much smaller mcd files with this modification. > > Read them back in and the system will not know the difference. > > And if you have old mcd files sitting around with full version info history > they are read in as before with no surprises. > > Does it make sense or which part needs better explanation? > > Best Regards, > > Milan Vavra > > > > > > -- > View this message in context: http://forum.world.st/The-Inbox-Monticello-mva-667-mcz-tp4941466p4941532.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > |
Chris Muller wrote:
> I haven't looked at it, but would like to ask if you've tested when > you have multiple .mcd's in succession? Like, if you have, > > Kernel-cmm.100.mcz > Kernel-cmm.101.mcd > Kernel-cmm.102.mcd > > Does 102 need to have ancestry at least back to 101 (or, 100?) still stored? > Assuming we have Kernel-cmm.100.mcz Kernel-cmm.101(cmm.100).mcd Kernel-cmm.102(cmm.101).mcd then yes, 102 needs to have ancestry going back to 101. But no further. No need to go beyond 101. Ancestry from 101 onward can be trimmed. So when writing Kernel-cmm.102(cmm.101).mcd ancestry of Kernel-cmm.101 can be trimmed. In the file we are saving, not in the system, that's why we need a #veryDeepCopy of the ancestry before we trim it. And when reading Kernel-cmm.102(cmm.101).mcd ancestry of Kernel-cmm.101 can be re-attached so that Kernel-cmm.102's version info looks the same as it did when we were writting the Kernel-cmm.102(cmm.101).mcd before the trimming. Best Regards, Milan Vavra |
On Fri, Apr 7, 2017 at 2:13 PM, Milan Vavra via Squeak-dev
<[hidden email]> wrote: > Chris Muller wrote: >> I haven't looked at it, but would like to ask if you've tested when >> you have multiple .mcd's in succession? Like, if you have, >> >> Kernel-cmm.100.mcz >> Kernel-cmm.101.mcd >> Kernel-cmm.102.mcd >> >> Does 102 need to have ancestry at least back to 101 (or, 100?) still >> stored? >> > Assuming we have > Kernel-cmm.100.mcz > Kernel-cmm.101(cmm.100).mcd > Kernel-cmm.102(cmm.101).mcd > then yes, 102 needs to have ancestry going back to 101. But no further. > No need to go beyond 101. Ancestry from 101 onward can be trimmed. > So when writing > Kernel-cmm.102(cmm.101).mcd > ancestry of Kernel-cmm.101 can be trimmed. In the file we are saving, not > in the system, that's why we need a #veryDeepCopy of the ancestry before > we trim it. So it reduces redundancy and disk utilization, with the trade-off being that it must re-open the original .mcz in order to get that ancestry back into memory. That read should be done eagerly, otherwise the system would interpret the empty ancestry as simply no ancestry. > And when reading > Kernel-cmm.102(cmm.101).mcd > ancestry of Kernel-cmm.101 can be re-attached so that Kernel-cmm.102's > version info looks the same as it did when we were writting the > Kernel-cmm.102(cmm.101).mcd before the trimming. You said, "can be", but I think it should do it eagerly to avoid unintended consequences. If we don't open the original .mcz eagerly, then I think we would need to terminate the ancestry with some kind of "reference stub" instead of an empty Array. |
Chris Muller wrote:
>> Chris Muller wrote: >>> I haven't looked at it, but would like to ask if you've tested when >>> you have multiple .mcd's in succession? Like, if you have, >>> >>> Kernel-cmm.100.mcz >>> Kernel-cmm.101.mcd >>> Kernel-cmm.102.mcd >>> >>> Does 102 need to have ancestry at least back to 101 (or, 100?) still >>> stored? >>> >> Assuming we have >> Kernel-cmm.100.mcz >> Kernel-cmm.101(cmm.100).mcd >> Kernel-cmm.102(cmm.101).mcd >> then yes, 102 needs to have ancestry going back to 101. But no further. >> No need to go beyond 101. Ancestry from 101 onward can be trimmed. >> So when writing >> Kernel-cmm.102(cmm.101).mcd >> ancestry of Kernel-cmm.101 can be trimmed. In the file we are saving, not >> in the system, that's why we need a #veryDeepCopy of the ancestry before >> we trim it. > >So it reduces redundancy and disk utilization, with the trade-off >being that it must re-open the original .mcz in order to get that >ancestry back into memory. That is correct. See below. > >That read should be done eagerly, otherwise the system would interpret >the empty ancestry as simply no ancestry. > Good point. The original mcz is being opened. Albeit indirectly and behind the scenes. The code MCRepositoryGroup default versionWithInfo: baseInfo basically does that. What it does is ask the system: 'in any of the repositories known to you, look for a version with this UUID and return it to me'. We then attach its ancestors to our newly read version info at points where the base version info is referenced. >> And when reading >> Kernel-cmm.102(cmm.101).mcd >> ancestry of Kernel-cmm.101 can be re-attached so that Kernel-cmm.102's >> version info looks the same as it did when we were writting the >> Kernel-cmm.102(cmm.101).mcd before the trimming. > >You said, "can be", but I think it should do it eagerly to avoid >unintended consequences. If we don't open the original .mcz eagerly, >then I think we would need to terminate the ancestry with some kind of >"reference stub" instead of an empty Array. > I said "can be" but what I really meant is "is being re-attached". Best Regards, Milan Vavra |
In reply to this post by Chris Muller-3
Chris Muller wrote:
> I haven't looked at it, but would like to ask if you've tested when > you have multiple .mcd's in succession? Like, if you have, Yes I have tested multiple .mcd's in succession. That is where this modification really shines. The size of each successive mcd is only proportional to the amount of changes it contains. One liners are just a few KB. Each time. No matter how many versions came before them. Especially with big packages with a lot of previous versions whose version information would normally be saved in the 'version' file of the mcd archives. Like the Kernel. An mcz snapshot must include the complete version information. An mcd is a different story. The mcd files being what they are store only the code that has been changed against its base version. They store a patch. A patch needs its base version to exist to be able to reconstruct the snapshot it represents. This modification just modifies the 'version' information to match that behavior so that only version information that has been modified since its base version is stored in an mcd file. Best Regards, Milan Vavra |
On Fri, Apr 7, 2017 at 10:38 PM, Milan Vavra via Squeak-dev <[hidden email]> wrote:
Awesome! I haven't tried it, but this sounds exactly how it should have been from the beginning. Thank you! I wonder what we should do about the MCDs auto-generated by the source.squeak server. When we update the server code to produce these new diffy versions, then an older image won't get the correct history. Basically we would need to detect if the image requesting the MCD does have the history restoration code or not. Maybe it should send a little argument in the URL? E.g. http://source.squeak.org/trunk/Foo-abc.123(120).mcd?prunehistory but I'm not sure if that would throw off an older source server ... - Bert - |
On Tue, Apr 11, 2017 at 7:28 AM, Bert Freudenberg <[hidden email]> wrote:
Does source.squeak server generate the MCD on each request, or does it cache and/or save the MCD's generated so it doesn't have to the next time? The later seems dangerous in this context.
|
Bert Freudenberg wrote:
>Milan Vavra wrote: >>An mcz snapshot must include the complete version information. >>An mcd is a different story. >> >>The mcd files being what they are store only the code that has been changed >>against its base version. > >Awesome! I haven't tried it, but this sounds exactly how it should have been from the beginning. Thank you! Glad to hear that. The surprisingly big mcds have been a personal pet peeve of mine for quite some time. I would really like this 'history trimming and restoration code' become part of Squeak so that people can use mcds to store their work and not waste a whole lot of disk space. And if this became part of the update process, the amount of data one needs to download to update to the current alpha would be down to a trickle. Best Regards, Milan Vavra |
In reply to this post by cbc
Chris Cunningham wrote:
>Bert Freudenberg wrote: >>Milan Vavra wrote: >>> >>>An mcz snapshot must include the complete version information. >>>An mcd is a different story. >>> >>>The mcd files being what they are store only the code that has been changed >>>against its base version. >> >>Awesome! I haven't tried it, but this sounds exactly how it should have been from the >>beginning. Thank you! >> >>I wonder what we should do about the MCDs auto-generated by the source.squeak server. >> >Does source.squeak server generate the MCD on each request, or does it cache and/or save >the >MCD's generated so it doesn't have to the next time? The later seems dangerous in this context. There is a danger of reading an mcd without history restoration code in place and so losing the history beyond base info at that moment. This could be avoided, by replacing the pruned ancestors in the written out 'version' member of the mcd zip archive, with a string like 'To use this mcd you need Monticello with history restoration code'. The history restoration code could be modified to look for this string and to remove it so that the version info reading/restoring can continue as before. The current in-trunk mcd reading code (read any Monticello without history restoration support) will choke on this and open a debugger. When you poke around in the variables in the debugger, you will see the string ('To use this mcd you need Monticello with history restoration code') in the tokens instance variable at some level. This should get your attentinon. Even better, if you don't care about having only partial version info history this time (just read that fine thing, thank you very much!), you can replace the array #('To use this mcd you need Monticello with history restoration code') with an empty array {} in the debugger, Restart and Proceed and the version will be read in with partial history. Best Regards, Milan Vavra |
Free forum by Nabble | Edit this page |