Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it.
Is it good idea to move now? Is it the right tool? Bests, Martín |
Hi Martin, several people successfully migrated with git-migration... so I guess it works. > Is it good idea to move now? There is no risk in trying to use it. If it fails, you report a bug. :-) Is it the right tool? Yes. As for Tonel... git-migration uses FileTree, but I am looking into making it configurable. (I have filetree-tonel migration tool now... but that is a bit more hardcore and potentially destructive :) (unlike git-migration)). Cheers, Peter On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias <[hidden email]> wrote:
|
Hi: 1. thanks for the tool and information. 2. I forgot to tell you that I tried the tool some weeks ago and it ran out of memory. I should try again with more memory or somehting different... Cheers, Martin On Mon, Nov 6, 2017 at 5:49 AM, Peter Uhnák <[hidden email]> wrote:
|
That sounds like a bug. Is the repo you are trying to migrate public? On Thu, Nov 30, 2017 at 6:01 PM, Martin Dias <[hidden email]> wrote:
|
In reply to this post by tinchodias
If you use Traits and want to have Epicea loadable from GitHub on Pharo 6.1, it will not work.
There is an issue when you try to load a Trait in Tonel format in the image. --- Julien Delplanque Doctorant à l’Université de Lille 1 http://juliendelplanque.be/phd.html Equipe Rmod, Inria Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq Numéro de téléphone: +333 59 35 86 40
|
Julien, I dodn't know it but anyway my issue was just in the step smalltalkhub -> filtree using the migration tool. Peter, yes it's public. This is the code I used: ---------------------------------------- migration := GitMigration on: 'MartinDias/Epicea'. migration cacheAllVersions. migration authors: { 'md' -> #('Martin Dias' '<[hidden email]>'). 'MD' -> #('Martin Dias' '<[hidden email]>'). 'MartinDias' -> #('Martin Dias' '<[hidden email]>'). 'PabloTesone' -> #('Pablo Tesone' '<[hidden email]>'). 'PavelKrivanek' -> #('Pavel Krivanek' '<[hidden email]>'). 'DamienCassou' -> #('Damien Cassou' '<[hidden email]>'). 'DenisKudryashov' -> #('Denis Kudryashov' '<[hidden email]>'). 'GustavoSantos' -> #('Gustavo Santos' '<[hidden email]>'). 'ThomasHeniart' -> #('Thomas Heniart' '<[hidden email]>'). 'LucasGodoy' -> #('Lucas Godoy' '<[hidden email]>'). 'StephaneDucasse' -> #('Stephane Ducasse' '<[hidden email]>'). 'VincentBlondeau' -> #('Vincent Blondeau' '<[hidden email]>'). 'AlbertoBacchelli' -> #('Alberto Bacchelli' '<[hidden email]>'). 'SkipLentz' -> #('Skip Lentz' '<[hidden email]>'). 'TheIntegrator' -> #('The Integrator' '<[hidden email]>'). }.l migration fastImportCodeToDirectory: 'src' initialCommit: '18a30d8d' to: './epicea-fast-import.txt' ---------------------- Martín On Thu, Nov 30, 2017 at 2:14 PM, Julien <[hidden email]> wrote:
|
I suspect it's related to the large number of commits in my repo. I made some tweaks and succeeded to create the fast-import file. But I get: fatal: Unsupported command: . fast-import: dumping crash report to .git/fast_import_crash_10301 Do you recognize this error? I will check my changes tweaking the git-migration tool to see if I modified some behavior my mistake... cheers, Martín On Fri, Dec 1, 2017 at 1:42 PM, Martin Dias <[hidden email]> wrote:
|
On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
> I suspect it's related to the large number of commits in my repo. I made > some tweaks and succeeded to create the fast-import file. But I get: > > fatal: Unsupported command: . > fast-import: dumping crash report to .git/fast_import_crash_10301 > > Do you recognize this error? > I will check my changes tweaking the git-migration tool to see if I modified > some behavior my mistake... I had the same error just last night. In my case, it turned out to be a non-UTF8 encoded character in one of the commit messages. I tracked it down by looking at the crash report and searching for a nearby command. I've deleted the crash reports now, but I think it was the number associated with a mark command that got me near the problem character in the fast-import file. I also modified the code to halt whenever it found a non-UTF8 character. I'm sure there are better ways to do this, but: GitMigrationCommitInfo>>inlineDataFor: aString | theString | theString := aString. "Ensure the message has valid UTF-8 encoding (this will raise an error if it doesn't)" [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString asByteArray ] on: Error do: [ :ex | self halt: 'Illegal string encoding'. theString := aString select: [ :each | each asciiValue between: 32 and: 128 ] ]. ^ 'data ' , theString size asString , String cr , (theString ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ]) Cheers, Alistair |
> On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote: > > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote: >> I suspect it's related to the large number of commits in my repo. I made >> some tweaks and succeeded to create the fast-import file. But I get: >> >> fatal: Unsupported command: . >> fast-import: dumping crash report to .git/fast_import_crash_10301 >> >> Do you recognize this error? >> I will check my changes tweaking the git-migration tool to see if I modified >> some behavior my mistake... > > I had the same error just last night. > > In my case, it turned out to be a non-UTF8 encoded character in one of > the commit messages. > > I tracked it down by looking at the crash report and searching for a > nearby command. I've deleted the crash reports now, but I think it > was the number associated with a mark command that got me near the > problem character in the fast-import file. > > I also modified the code to halt whenever it found a non-UTF8 > character. I'm sure there are better ways to do this, but: > > > GitMigrationCommitInfo>>inlineDataFor: aString > > | theString | > theString := aString. > "Ensure the message has valid UTF-8 encoding (this will raise an > error if it doesn't)" > [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString > asByteArray ] > on: Error > do: [ :ex | self halt: 'Illegal string encoding'. > theString := aString select: [ :each | each asciiValue > between: 32 and: 128 ] ]. > ^ 'data ' , theString size asString , String cr , (theString > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ]) There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So you could write it shorter as: aString asByteArray utf8Decoded Instead of Error you could use the more intention revealing ZnCharacterEncodingError. Apart from that, and I known you did not create this mess, encoding a String into a String, although it can be done, is so wrong. You encode a String (a collection of Characters, of Unicode code points) into a ByteArray and you decode a ByteArray into a String. Sending #asByteArray to a String in almost always wrong, as is sending #asString to a ByteArray. These are implicit conversions (null conversions, like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but not for the much richer set of Characters that Pharo supports. So these will eventually fail, one day, in a country far away. > Cheers, > Alistair > |
In reply to this post by tinchodias
All files in a single commit will be held temporarily in MemoryStore, so unless you have GBs of code in _single_ commit it shouldn't be a problem. There's also a possiblity that Pharo was going too fast and didn't manage to GC it in time or there's a memory leak.
Please look into the mentioned file, it should give you a context for the error (and then you can look into the import file around the same context). Peter |
In reply to this post by Sven Van Caekenberghe-2
> In my case, it turned out to be a non-UTF8 encoded character in one of the commit messages. I've ran into this problem in a sister project (tonel-migration), and do not have a proper resolution yet. I was forcing everything to be unicode, so I need a better way to read and write encoded strings. :< On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe <[hidden email]> wrote:
|
> On 5 Dec 2017, at 08:59, Peter Uhnák <[hidden email]> wrote: > > > In my case, it turned out to be a non-UTF8 encoded character in one of the commit messages. > > I've ran into this problem in a sister project (tonel-migration), and do not have a proper resolution yet. I was forcing everything to be unicode, so I need a better way to read and write encoded strings. :< Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it is far from perfect. > On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe <[hidden email]> wrote: > > > > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote: > > > > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote: > >> I suspect it's related to the large number of commits in my repo. I made > >> some tweaks and succeeded to create the fast-import file. But I get: > >> > >> fatal: Unsupported command: . > >> fast-import: dumping crash report to .git/fast_import_crash_10301 > >> > >> Do you recognize this error? > >> I will check my changes tweaking the git-migration tool to see if I modified > >> some behavior my mistake... > > > > I had the same error just last night. > > > > In my case, it turned out to be a non-UTF8 encoded character in one of > > the commit messages. > > > > I tracked it down by looking at the crash report and searching for a > > nearby command. I've deleted the crash reports now, but I think it > > was the number associated with a mark command that got me near the > > problem character in the fast-import file. > > > > I also modified the code to halt whenever it found a non-UTF8 > > character. I'm sure there are better ways to do this, but: > > > > > > GitMigrationCommitInfo>>inlineDataFor: aString > > > > | theString | > > theString := aString. > > "Ensure the message has valid UTF-8 encoding (this will raise an > > error if it doesn't)" > > [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString > > asByteArray ] > > on: Error > > do: [ :ex | self halt: 'Illegal string encoding'. > > theString := aString select: [ :each | each asciiValue > > between: 32 and: 128 ] ]. > > ^ 'data ' , theString size asString , String cr , (theString > > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ]) > > There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So you could write it shorter as: > > aString asByteArray utf8Decoded > > Instead of Error you could use the more intention revealing ZnCharacterEncodingError. > > Apart from that, and I known you did not create this mess, encoding a String into a String, although it can be done, is so wrong. You encode a String (a collection of Characters, of Unicode code points) into a ByteArray and you decode a ByteArray into a String. > > Sending #asByteArray to a String in almost always wrong, as is sending #asString to a ByteArray. These are implicit conversions (null conversions, like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but not for the much richer set of Characters that Pharo supports. So these will eventually fail, one day, in a country far away. > > > Cheers, > > Alistair > > > > > |
On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:
> > > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote: > > > > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote: > >> I suspect it's related to the large number of commits in my repo. I made > >> some tweaks and succeeded to create the fast-import file. But I get: > >> > >> fatal: Unsupported command: . > >> fast-import: dumping crash report to .git/fast_import_crash_10301 > >> > >> Do you recognize this error? > >> I will check my changes tweaking the git-migration tool to see if I modified > >> some behavior my mistake... > > > > I had the same error just last night. > > > > In my case, it turned out to be a non-UTF8 encoded character in one of > > the commit messages. > > > > I tracked it down by looking at the crash report and searching for a > > nearby command. I've deleted the crash reports now, but I think it > > was the number associated with a mark command that got me near the > > problem character in the fast-import file. > > > > I also modified the code to halt whenever it found a non-UTF8 > > character. I'm sure there are better ways to do this, but: > > > > > > GitMigrationCommitInfo>>inlineDataFor: aString > > > > | theString | > > theString := aString. > > "Ensure the message has valid UTF-8 encoding (this will raise an > > error if it doesn't)" > > [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString > > asByteArray ] > > on: Error > > do: [ :ex | self halt: 'Illegal string encoding'. > > theString := aString select: [ :each | each asciiValue > > between: 32 and: 128 ] ]. > > ^ 'data ' , theString size asString , String cr , (theString > > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ]) > > There is also ByteArray>>#utf8Decoded (as well as > String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So > you could write it shorter as: > > aString asByteArray utf8Decoded > > Instead of Error you could use the more intention revealing ZnCharacterEncodingError. I agree, this was a hack to get me past the immediate issue and see if I could get the migration to work. > Apart from that, and I known you did not create this mess, encoding a > String into a String, although it can be done, is so wrong. You encode > a String (a collection of Characters, of Unicode code points) into a > ByteArray and you decode a ByteArray into a String. > > Sending #asByteArray to a String in almost always wrong, as is sending > #asString to a ByteArray. These are implicit conversions (null > conversions, like ZnNullEncoder) that only work for pure ASCII or > Latin1 (iso-8859-1), but not for the much richer set of Characters > that Pharo supports. So these will eventually fail, one day, in a > country far away. I hate having to deal with string encoding, and know as little as possible. :-) On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote: > > In my case, it turned out to be a non-UTF8 encoded character in one of the > commit messages. > > I've ran into this problem in a sister project (tonel-migration), and do not > have a proper resolution yet. I was forcing everything to be unicode, so I need > a better way to read and write encoded strings. :< I think this is always going to be a tricky problem since the file may simply be corrupt, and the resolution won't always be the same. In my case I was happy to simply remove the offending characters. I do think it would be nice if the migration tool at least stopped and warned the user that the input is invalid. Would you be open to a PR that incorporates Sven's changes suggested above? On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote: > > Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it > is far from perfect. In this particular case I think it is better to try decoding with UTF8, since #detectEncoding: isn't completely reliable and we just want to know if it isn't valid UTF8. Cheers, Alistair |
> Would you be open to a PR that incorporates Sven's changes suggested above? Certainly. One thing to keep in mind is that I had to modify MemoryStore/MemoryHandle so it can process unicode at all, but it also forces everything to be unicode. > I hate having to deal with string encoding, and know as little as possible. :-) Apparently me taking this approach resulted in the current situation. :-D Peter On Tue, Dec 5, 2017 at 10:27 AM, Alistair Grant <[hidden email]> wrote:
|
Hi Peter,
On 5 December 2017 at 11:55, Peter Uhnák <[hidden email]> wrote: >> Would you be open to a PR that incorporates Sven's changes suggested >> above? > > Certainly. One thing to keep in mind is that I had to modify > MemoryStore/MemoryHandle so it can process unicode at all, but it also > forces everything to be unicode. This is probably a good thing. The fast-import documentation says that it is strict about character encoding and only accepts UTF-8. I'll let you know when the PRs are ready. Thanks, Alistair >> I hate having to deal with string encoding, and know as little as >> possible. :-) > > Apparently me taking this approach resulted in the current situation. :-D > > Peter > > On Tue, Dec 5, 2017 at 10:27 AM, Alistair Grant <[hidden email]> > wrote: >> >> On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote: >> > >> > > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote: >> > > >> > > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote: >> > >> I suspect it's related to the large number of commits in my repo. I >> > >> made >> > >> some tweaks and succeeded to create the fast-import file. But I get: >> > >> >> > >> fatal: Unsupported command: . >> > >> fast-import: dumping crash report to .git/fast_import_crash_10301 >> > >> >> > >> Do you recognize this error? >> > >> I will check my changes tweaking the git-migration tool to see if I >> > >> modified >> > >> some behavior my mistake... >> > > >> > > I had the same error just last night. >> > > >> > > In my case, it turned out to be a non-UTF8 encoded character in one of >> > > the commit messages. >> > > >> > > I tracked it down by looking at the crash report and searching for a >> > > nearby command. I've deleted the crash reports now, but I think it >> > > was the number associated with a mark command that got me near the >> > > problem character in the fast-import file. >> > > >> > > I also modified the code to halt whenever it found a non-UTF8 >> > > character. I'm sure there are better ways to do this, but: >> > > >> > > >> > > GitMigrationCommitInfo>>inlineDataFor: aString >> > > >> > > | theString | >> > > theString := aString. >> > > "Ensure the message has valid UTF-8 encoding (this will raise an >> > > error if it doesn't)" >> > > [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString >> > > asByteArray ] >> > > on: Error >> > > do: [ :ex | self halt: 'Illegal string encoding'. >> > > theString := aString select: [ :each | each asciiValue >> > > between: 32 and: 128 ] ]. >> > > ^ 'data ' , theString size asString , String cr , (theString >> > > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ]) >> > >> > There is also ByteArray>>#utf8Decoded (as well as >> > String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So >> > you could write it shorter as: >> > >> > aString asByteArray utf8Decoded >> > >> > Instead of Error you could use the more intention revealing >> > ZnCharacterEncodingError. >> >> I agree, this was a hack to get me past the immediate issue and see if >> I could get the migration to work. >> >> >> > Apart from that, and I known you did not create this mess, encoding a >> > String into a String, although it can be done, is so wrong. You encode >> > a String (a collection of Characters, of Unicode code points) into a >> > ByteArray and you decode a ByteArray into a String. >> > >> > Sending #asByteArray to a String in almost always wrong, as is sending >> > #asString to a ByteArray. These are implicit conversions (null >> > conversions, like ZnNullEncoder) that only work for pure ASCII or >> > Latin1 (iso-8859-1), but not for the much richer set of Characters >> > that Pharo supports. So these will eventually fail, one day, in a >> > country far away. >> >> I hate having to deal with string encoding, and know as little as >> possible. :-) >> >> >> >> On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote: >> > > In my case, it turned out to be a non-UTF8 encoded character in one of >> > > the >> > commit messages. >> > >> > I've ran into this problem in a sister project (tonel-migration), and do >> > not >> > have a proper resolution yet. I was forcing everything to be unicode, so >> > I need >> > a better way to read and write encoded strings. :< >> >> I think this is always going to be a tricky problem since the file may >> simply be corrupt, and the resolution won't always be the same. In my >> case I was happy to simply remove the offending characters. >> >> I do think it would be nice if the migration tool at least stopped and >> warned the user that the input is invalid. >> >> Would you be open to a PR that incorporates Sven's changes suggested >> above? >> >> >> >> On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote: >> > >> > Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it >> > is far from perfect. >> >> In this particular case I think it is better to try decoding with UTF8, >> since #detectEncoding: isn't completely reliable and we just want to >> know if it isn't valid UTF8. >> >> >> Cheers, >> Alistair >> > |
In reply to this post by Peter Uhnak
On 05-12-17 08:59, Peter Uhnák wrote:
> > In my case, it turned out to be a non-UTF8 encoded character in one > of the commit messages. > > I've ran into this problem in a sister project (tonel-migration), and do > not have a proper resolution yet. I was forcing everything to be > unicode, so I need a better way to read and write encoded strings. :< To be exact, exactly none of the older commits will be UTF8 encoded. For most it doesn't matter as they are ASCII, but if we want to have a change of converting older french or german code (or japanese), we need support for what was done with WideString. That probably needs a look in the squeak mailing list archives. Stephan |
2017-12-13 11:09 GMT+01:00 stephan <[hidden email]>: On 05-12-17 08:59, Peter Uhnák wrote: The thread would be"MC should really write snaphsot/source.st in UTF8" end of may 2013. I've made the switch in Squeak short after (june 2013). I've propsed a fix for Pharo (issues 10801 and 10811) that has been integrated in 3.0 end of june 2013. So the .mcz created before this might be problematic... As I've just reported here, I think there are problems in tonel (like if it was reading and writing with different encoding...) |
In reply to this post by Stephan Eggermont-3
Stephan Eggermont-3 wrote
> On 05-12-17 08:59, Peter Uhnák wrote: >> > In my case, it turned out to be a non-UTF8 encoded character in one >> of the commit messages. >> >> I've ran into this problem in a sister project (tonel-migration), and do >> not have a proper resolution yet. I was forcing everything to be >> unicode, so I need a better way to read and write encoded strings. :< > > To be exact, exactly none of the older commits will be UTF8 encoded. For > most it doesn't matter as they are ASCII, but if we want to have a > change of converting older french or german code (or japanese), we need > support for what was done with WideString. That probably needs a look in > the squeak mailing list archives. > > Stephan The mcz reader used to import the .bin file (which contained correctly serialized WideStrings), only falling back to reading the .st file if .bin was not present, has this changed? Or do these tools explicitly ignore the .bin file and try to read the .st file directly? If so, the MCDataStream class used to read .bin format still seems to be in the image... One could also create a tool to check/convert all mcz in a repo as a preprocess; if .bin contents decode as WideString, check that .st starts with utf8 BOM, if not, convert. Cheers, Henry -- Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html |
It would be great to be able to sanitize the files.
We should get a test about tonel misbehavior. Stef On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen <[hidden email]> wrote: > Stephan Eggermont-3 wrote >> On 05-12-17 08:59, Peter Uhnák wrote: >>> > In my case, it turned out to be a non-UTF8 encoded character in one >>> of the commit messages. >>> >>> I've ran into this problem in a sister project (tonel-migration), and do >>> not have a proper resolution yet. I was forcing everything to be >>> unicode, so I need a better way to read and write encoded strings. :< >> >> To be exact, exactly none of the older commits will be UTF8 encoded. For >> most it doesn't matter as they are ASCII, but if we want to have a >> change of converting older french or german code (or japanese), we need >> support for what was done with WideString. That probably needs a look in >> the squeak mailing list archives. >> >> Stephan > > The mcz reader used to import the .bin file (which contained correctly > serialized WideStrings), only falling back to reading the .st file if .bin > was not present, has this changed? > > Or do these tools explicitly ignore the .bin file and try to read the .st > file directly? > If so, the MCDataStream class used to read .bin format still seems to be in > the image... > > One could also create a tool to check/convert all mcz in a repo as a > preprocess; > if .bin contents decode as WideString, > check that .st starts with utf8 BOM, > if not, convert. > > Cheers, > Henry > > > > -- > Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html > |
Hi, any news or experiences on the migration tools? I'd try again on next days. Martín On Sat, Dec 16, 2017 at 5:06 AM, Stephane Ducasse <[hidden email]> wrote: It would be great to be able to sanitize the files. |
Free forum by Nabble | Edit this page |