Moving from mc to tonel?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Moving from mc to tonel?

tinchodias
Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín
Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Peter Uhnák
Hi Martin,

several people successfully migrated with git-migration... so I guess it works. 

> Is it good idea to move now?

There is no risk in trying to use it. If it fails, you report a bug. :-)

Is it the right tool?

Yes.

As for Tonel... git-migration uses FileTree, but I am looking into making it configurable.

(I have filetree-tonel migration tool now... but that is a bit more hardcore and potentially destructive :) (unlike git-migration)).

Cheers,
Peter





On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias <[hidden email]> wrote:
Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín

Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

tinchodias
Hi:

1. thanks for the tool and information.
2. I forgot to tell you that I tried the tool some weeks ago and it ran out of memory. I should try again with more memory or somehting different...

Cheers,
Martin

On Mon, Nov 6, 2017 at 5:49 AM, Peter Uhnák <[hidden email]> wrote:
Hi Martin,

several people successfully migrated with git-migration... so I guess it works. 

> Is it good idea to move now?

There is no risk in trying to use it. If it fails, you report a bug. :-)

Is it the right tool?

Yes.

As for Tonel... git-migration uses FileTree, but I am looking into making it configurable.

(I have filetree-tonel migration tool now... but that is a bit more hardcore and potentially destructive :) (unlike git-migration)).

Cheers,
Peter





On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias <[hidden email]> wrote:
Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín


Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Peter Uhnák
That sounds like a bug. Is the repo you are trying to migrate public?

On Thu, Nov 30, 2017 at 6:01 PM, Martin Dias <[hidden email]> wrote:
Hi:

1. thanks for the tool and information.
2. I forgot to tell you that I tried the tool some weeks ago and it ran out of memory. I should try again with more memory or somehting different...

Cheers,
Martin

On Mon, Nov 6, 2017 at 5:49 AM, Peter Uhnák <[hidden email]> wrote:
Hi Martin,

several people successfully migrated with git-migration... so I guess it works. 

> Is it good idea to move now?

There is no risk in trying to use it. If it fails, you report a bug. :-)

Is it the right tool?

Yes.

As for Tonel... git-migration uses FileTree, but I am looking into making it configurable.

(I have filetree-tonel migration tool now... but that is a bit more hardcore and potentially destructive :) (unlike git-migration)).

Cheers,
Peter





On Mon, Nov 6, 2017 at 1:42 AM, Martin Dias <[hidden email]> wrote:
Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín



Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Julien
In reply to this post by tinchodias
If you use Traits and want to have Epicea loadable from GitHub on Pharo 6.1, it will not work.

There is an issue when you try to load a Trait in Tonel format in the image.

---
Julien Delplanque
Doctorant à l’Université de Lille 1
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Numéro de téléphone: +333 59 35 86 40

Le 6 nov. 2017 à 01:42, Martin Dias <[hidden email]> a écrit :

Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín

Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

tinchodias
Julien, I dodn't know it but anyway my issue was just in the step smalltalkhub -> filtree using the migration tool.

Peter, yes it's public. This is the code I used:

----------------------------------------

migration := GitMigration on: 'MartinDias/Epicea'.
migration cacheAllVersions.
migration authors: {
'md' -> #('Martin Dias' '<[hidden email]>').
'MD' -> #('Martin Dias' '<[hidden email]>').
'MartinDias' -> #('Martin Dias' '<[hidden email]>').
'PabloTesone' -> #('Pablo Tesone' '<[hidden email]>').
'PavelKrivanek' -> #('Pavel Krivanek' '<[hidden email]>').
'DamienCassou' -> #('Damien Cassou' '<[hidden email]>').
'DenisKudryashov' -> #('Denis Kudryashov' '<[hidden email]>').
'GustavoSantos' -> #('Gustavo Santos' '<[hidden email]>').
'ThomasHeniart' -> #('Thomas Heniart' '<[hidden email]>').
'LucasGodoy' -> #('Lucas Godoy' '<[hidden email]>').
'StephaneDucasse' -> #('Stephane Ducasse' '<[hidden email]>').
'VincentBlondeau' -> #('Vincent Blondeau' '<[hidden email]>').
'AlbertoBacchelli' -> #('Alberto Bacchelli' '<[hidden email]>').
'SkipLentz' -> #('Skip Lentz' '<[hidden email]>').
'TheIntegrator' -> #('The Integrator' '<[hidden email]>').
}.l
migration
fastImportCodeToDirectory: 'src'
initialCommit: '18a30d8d'
to: './epicea-fast-import.txt'

----------------------

Martín

On Thu, Nov 30, 2017 at 2:14 PM, Julien <[hidden email]> wrote:
If you use Traits and want to have Epicea loadable from GitHub on Pharo 6.1, it will not work.

There is an issue when you try to load a Trait in Tonel format in the image.

---
Julien Delplanque
Doctorant à l’Université de Lille 1
Numéro de téléphone: +333 59 35 86 40

Le 6 nov. 2017 à 01:42, Martin Dias <[hidden email]> a écrit :

Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín


Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

tinchodias
I suspect it's related to the large number of commits in my repo. I made some tweaks and succeeded to create the fast-import file. But I get:

fatal: Unsupported command: .
fast-import: dumping crash report to .git/fast_import_crash_10301

Do you recognize this error? 
I will check my changes tweaking the git-migration tool to see if I modified some behavior my mistake...

cheers,
Martín

On Fri, Dec 1, 2017 at 1:42 PM, Martin Dias <[hidden email]> wrote:
Julien, I dodn't know it but anyway my issue was just in the step smalltalkhub -> filtree using the migration tool.

Peter, yes it's public. This is the code I used:

----------------------------------------

migration := GitMigration on: 'MartinDias/Epicea'.
migration cacheAllVersions.
migration authors: {
'md' -> #('Martin Dias' '<[hidden email]>').
'MD' -> #('Martin Dias' '<[hidden email]>').
'MartinDias' -> #('Martin Dias' '<[hidden email]>').
'PabloTesone' -> #('Pablo Tesone' '<[hidden email]>').
'PavelKrivanek' -> #('Pavel Krivanek' '<[hidden email]>').
'DamienCassou' -> #('Damien Cassou' '<[hidden email]>').
'DenisKudryashov' -> #('Denis Kudryashov' '<[hidden email]>').
'GustavoSantos' -> #('Gustavo Santos' '<[hidden email]>').
'ThomasHeniart' -> #('Thomas Heniart' '<[hidden email]>').
'LucasGodoy' -> #('Lucas Godoy' '<[hidden email]>').
'StephaneDucasse' -> #('Stephane Ducasse' '<[hidden email]>').
'VincentBlondeau' -> #('Vincent Blondeau' '<[hidden email]>').
'AlbertoBacchelli' -> #('Alberto Bacchelli' '<[hidden email]>').
'SkipLentz' -> #('Skip Lentz' '<[hidden email]>').
'TheIntegrator' -> #('The Integrator' '<[hidden email]>').
}.l
migration
fastImportCodeToDirectory: 'src'
initialCommit: '18a30d8d'
to: './epicea-fast-import.txt'

----------------------

Martín

On Thu, Nov 30, 2017 at 2:14 PM, Julien <[hidden email]> wrote:
If you use Traits and want to have Epicea loadable from GitHub on Pharo 6.1, it will not work.

There is an issue when you try to load a Trait in Tonel format in the image.

---
Julien Delplanque
Doctorant à l’Université de Lille 1
Numéro de téléphone: +333 59 35 86 40

Le 6 nov. 2017 à 01:42, Martin Dias <[hidden email]> a écrit :

Hi all, I'm considering to move Epicea (+Ombu+Hiedra) to github. I read that tonel is the new format and there is Peter's git-migration, which could be the tool to do it. 

Is it good idea to move now?
Is it the right tool?

Bests,
Martín



Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Alistair Grant
On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
> I suspect it's related to the large number of commits in my repo. I made
> some tweaks and succeeded to create the fast-import file. But I get:
>
> fatal: Unsupported command: .
> fast-import: dumping crash report to .git/fast_import_crash_10301
>
> Do you recognize this error?
> I will check my changes tweaking the git-migration tool to see if I modified
> some behavior my mistake...

I had the same error just last night.

In my case, it turned out to be a non-UTF8 encoded character in one of
the commit messages.

I tracked it down by looking at the crash report and searching for a
nearby command.  I've deleted the crash reports now, but I think it
was the number associated with a mark command that got me near the
problem character in the fast-import file.

I also modified the code to halt whenever it found a non-UTF8
character.  I'm sure there are better ways to do this, but:


GitMigrationCommitInfo>>inlineDataFor: aString

    | theString |
    theString := aString.
    "Ensure the message has valid UTF-8 encoding (this will raise an
error if it doesn't)"
    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
asByteArray ]
        on: Error
        do: [ :ex | self halt: 'Illegal string encoding'.
            theString := aString select: [ :each | each asciiValue
between: 32 and: 128 ] ].
    ^ 'data ' , theString size asString , String cr , (theString
ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])



Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Sven Van Caekenberghe-2


> On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote:
>
> On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
>> I suspect it's related to the large number of commits in my repo. I made
>> some tweaks and succeeded to create the fast-import file. But I get:
>>
>> fatal: Unsupported command: .
>> fast-import: dumping crash report to .git/fast_import_crash_10301
>>
>> Do you recognize this error?
>> I will check my changes tweaking the git-migration tool to see if I modified
>> some behavior my mistake...
>
> I had the same error just last night.
>
> In my case, it turned out to be a non-UTF8 encoded character in one of
> the commit messages.
>
> I tracked it down by looking at the crash report and searching for a
> nearby command.  I've deleted the crash reports now, but I think it
> was the number associated with a mark command that got me near the
> problem character in the fast-import file.
>
> I also modified the code to halt whenever it found a non-UTF8
> character.  I'm sure there are better ways to do this, but:
>
>
> GitMigrationCommitInfo>>inlineDataFor: aString
>
>    | theString |
>    theString := aString.
>    "Ensure the message has valid UTF-8 encoding (this will raise an
> error if it doesn't)"
>    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> asByteArray ]
>        on: Error
>        do: [ :ex | self halt: 'Illegal string encoding'.
>            theString := aString select: [ :each | each asciiValue
> between: 32 and: 128 ] ].
>    ^ 'data ' , theString size asString , String cr , (theString
> ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])

There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So you could write it shorter as:

  aString asByteArray utf8Decoded

Instead of Error you could use the more intention revealing ZnCharacterEncodingError.

Apart from that, and I known you did not create this mess, encoding a String into a String, although it can be done, is so wrong. You encode a String (a collection of Characters, of Unicode code points) into a ByteArray and you decode a ByteArray into a String.

Sending #asByteArray to a String in almost always wrong, as is sending #asString to a ByteArray. These are implicit conversions (null conversions, like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but not for the much richer set of Characters that Pharo supports. So these will eventually fail, one day, in a country far away.

> Cheers,
> Alistair
>


Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Peter Uhnák
In reply to this post by tinchodias

I suspect it's related to the large number of commits in my repo. I made some tweaks and succeeded to create the fast-import file. But I get:

All files in a single commit will be held temporarily in MemoryStore, so unless you have GBs of code in _single_ commit it shouldn't be a problem.
There's also a possiblity that Pharo was going too fast and didn't manage to GC it in time or there's a memory leak.
 

fatal: Unsupported command: .
fast-import: dumping crash report to .git/fast_import_crash_10301

Do you recognize this error? 


Please look into the mentioned file, it should give you a context for the error (and then you can look into the import file around the same context).

Peter

Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Peter Uhnák
In reply to this post by Sven Van Caekenberghe-2
In my case, it turned out to be a non-UTF8 encoded character in one of the commit messages.

I've ran into this problem in a sister project (tonel-migration), and do not have a proper resolution yet. I was forcing everything to be unicode, so I need a better way to read and write encoded strings. :<

On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe <[hidden email]> wrote:


> On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote:
>
> On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
>> I suspect it's related to the large number of commits in my repo. I made
>> some tweaks and succeeded to create the fast-import file. But I get:
>>
>> fatal: Unsupported command: .
>> fast-import: dumping crash report to .git/fast_import_crash_10301
>>
>> Do you recognize this error?
>> I will check my changes tweaking the git-migration tool to see if I modified
>> some behavior my mistake...
>
> I had the same error just last night.
>
> In my case, it turned out to be a non-UTF8 encoded character in one of
> the commit messages.
>
> I tracked it down by looking at the crash report and searching for a
> nearby command.  I've deleted the crash reports now, but I think it
> was the number associated with a mark command that got me near the
> problem character in the fast-import file.
>
> I also modified the code to halt whenever it found a non-UTF8
> character.  I'm sure there are better ways to do this, but:
>
>
> GitMigrationCommitInfo>>inlineDataFor: aString
>
>    | theString |
>    theString := aString.
>    "Ensure the message has valid UTF-8 encoding (this will raise an
> error if it doesn't)"
>    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> asByteArray ]
>        on: Error
>        do: [ :ex | self halt: 'Illegal string encoding'.
>            theString := aString select: [ :each | each asciiValue
> between: 32 and: 128 ] ].
>    ^ 'data ' , theString size asString , String cr , (theString
> ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])

There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So you could write it shorter as:

  aString asByteArray utf8Decoded

Instead of Error you could use the more intention revealing ZnCharacterEncodingError.

Apart from that, and I known you did not create this mess, encoding a String into a String, although it can be done, is so wrong. You encode a String (a collection of Characters, of Unicode code points) into a ByteArray and you decode a ByteArray into a String.

Sending #asByteArray to a String in almost always wrong, as is sending #asString to a ByteArray. These are implicit conversions (null conversions, like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but not for the much richer set of Characters that Pharo supports. So these will eventually fail, one day, in a country far away.

> Cheers,
> Alistair
>



Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Sven Van Caekenberghe-2


> On 5 Dec 2017, at 08:59, Peter Uhnák <[hidden email]> wrote:
>
> > In my case, it turned out to be a non-UTF8 encoded character in one of the commit messages.
>
> I've ran into this problem in a sister project (tonel-migration), and do not have a proper resolution yet. I was forcing everything to be unicode, so I need a better way to read and write encoded strings. :<

Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it is far from perfect.

> On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
>
> > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote:
> >
> > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >>
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >>
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I modified
> >> some behavior my mistake...
> >
> > I had the same error just last night.
> >
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> >
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> >
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> >
> >
> > GitMigrationCommitInfo>>inlineDataFor: aString
> >
> >    | theString |
> >    theString := aString.
> >    "Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >        on: Error
> >        do: [ :ex | self halt: 'Illegal string encoding'.
> >            theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >    ^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>
> There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So you could write it shorter as:
>
>   aString asByteArray utf8Decoded
>
> Instead of Error you could use the more intention revealing ZnCharacterEncodingError.
>
> Apart from that, and I known you did not create this mess, encoding a String into a String, although it can be done, is so wrong. You encode a String (a collection of Characters, of Unicode code points) into a ByteArray and you decode a ByteArray into a String.
>
> Sending #asByteArray to a String in almost always wrong, as is sending #asString to a ByteArray. These are implicit conversions (null conversions, like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but not for the much richer set of Characters that Pharo supports. So these will eventually fail, one day, in a country far away.
>
> > Cheers,
> > Alistair
> >
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Alistair Grant
On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:

>
> > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote:
> >
> > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >>
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >>
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I modified
> >> some behavior my mistake...
> >
> > I had the same error just last night.
> >
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> >
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> >
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> >
> >
> > GitMigrationCommitInfo>>inlineDataFor: aString
> >
> >    | theString |
> >    theString := aString.
> >    "Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >        on: Error
> >        do: [ :ex | self halt: 'Illegal string encoding'.
> >            theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >    ^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>
> There is also ByteArray>>#utf8Decoded (as well as
> String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So
> you could write it shorter as:
>
>   aString asByteArray utf8Decoded
>
> Instead of Error you could use the more intention revealing ZnCharacterEncodingError.

I agree, this was a hack to get me past the immediate issue and see if
I could get the migration to work.


> Apart from that, and I known you did not create this mess, encoding a
> String into a String, although it can be done, is so wrong. You encode
> a String (a collection of Characters, of Unicode code points) into a
> ByteArray and you decode a ByteArray into a String.
>
> Sending #asByteArray to a String in almost always wrong, as is sending
> #asString to a ByteArray. These are implicit conversions (null
> conversions, like ZnNullEncoder) that only work for pure ASCII or
> Latin1 (iso-8859-1), but not for the much richer set of Characters
> that Pharo supports. So these will eventually fail, one day, in a
> country far away.

I hate having to deal with string encoding, and know as little as
possible. :-)



On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote:
> > In my case, it turned out to be a non-UTF8 encoded character in one of the
> commit messages.
>
> I've ran into this problem in a sister project (tonel-migration), and do not
> have a proper resolution yet. I was forcing everything to be unicode, so I need
> a better way to read and write encoded strings. :<

I think this is always going to be a tricky problem since the file may
simply be corrupt, and the resolution won't always be the same.  In my
case I was happy to simply remove the offending characters.

I do think it would be nice if the migration tool at least stopped and
warned the user that the input is invalid.

Would you be open to a PR that incorporates Sven's changes suggested
above?



On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote:
>
> Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it
> is far from perfect.

In this particular case I think it is better to try decoding with UTF8,
since #detectEncoding: isn't completely reliable and we just want to
know if it isn't valid UTF8.


Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Peter Uhnák
Would you be open to a PR that incorporates Sven's changes suggested above?

Certainly. One thing to keep in mind is that I had to modify MemoryStore/MemoryHandle so it can process unicode at all, but it also forces everything to be unicode.

I hate having to deal with string encoding, and know as little as possible. :-)

Apparently me taking this approach resulted in the current situation. :-D

Peter

On Tue, Dec 5, 2017 at 10:27 AM, Alistair Grant <[hidden email]> wrote:
On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:
>
> > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote:
> >
> > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >>
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >>
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I modified
> >> some behavior my mistake...
> >
> > I had the same error just last night.
> >
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> >
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> >
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> >
> >
> > GitMigrationCommitInfo>>inlineDataFor: aString
> >
> >    | theString |
> >    theString := aString.
> >    "Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >        on: Error
> >        do: [ :ex | self halt: 'Illegal string encoding'.
> >            theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >    ^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>
> There is also ByteArray>>#utf8Decoded (as well as
> String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So
> you could write it shorter as:
>
>   aString asByteArray utf8Decoded
>
> Instead of Error you could use the more intention revealing ZnCharacterEncodingError.

I agree, this was a hack to get me past the immediate issue and see if
I could get the migration to work.


> Apart from that, and I known you did not create this mess, encoding a
> String into a String, although it can be done, is so wrong. You encode
> a String (a collection of Characters, of Unicode code points) into a
> ByteArray and you decode a ByteArray into a String.
>
> Sending #asByteArray to a String in almost always wrong, as is sending
> #asString to a ByteArray. These are implicit conversions (null
> conversions, like ZnNullEncoder) that only work for pure ASCII or
> Latin1 (iso-8859-1), but not for the much richer set of Characters
> that Pharo supports. So these will eventually fail, one day, in a
> country far away.

I hate having to deal with string encoding, and know as little as
possible. :-)



On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote:
> > In my case, it turned out to be a non-UTF8 encoded character in one of the
> commit messages.
>
> I've ran into this problem in a sister project (tonel-migration), and do not
> have a proper resolution yet. I was forcing everything to be unicode, so I need
> a better way to read and write encoded strings. :<

I think this is always going to be a tricky problem since the file may
simply be corrupt, and the resolution won't always be the same.  In my
case I was happy to simply remove the offending characters.

I do think it would be nice if the migration tool at least stopped and
warned the user that the input is invalid.

Would you be open to a PR that incorporates Sven's changes suggested
above?



On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote:
>
> Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it
> is far from perfect.

In this particular case I think it is better to try decoding with UTF8,
since #detectEncoding: isn't completely reliable and we just want to
know if it isn't valid UTF8.


Cheers,
Alistair


Reply | Threaded
Open this post in threaded view
|

Re: Moving from mc to tonel?

Alistair Grant
Hi Peter,

On 5 December 2017 at 11:55, Peter Uhnák <[hidden email]> wrote:
>> Would you be open to a PR that incorporates Sven's changes suggested
>> above?
>
> Certainly. One thing to keep in mind is that I had to modify
> MemoryStore/MemoryHandle so it can process unicode at all, but it also
> forces everything to be unicode.

This is probably a good thing.  The fast-import documentation says
that it is strict about character encoding and only accepts UTF-8.

I'll let you know when the PRs are ready.

Thanks,
Alistair


>> I hate having to deal with string encoding, and know as little as
>> possible. :-)
>
> Apparently me taking this approach resulted in the current situation. :-D
>
> Peter
>
> On Tue, Dec 5, 2017 at 10:27 AM, Alistair Grant <[hidden email]>
> wrote:
>>
>> On Tue, Dec 05, 2017 at 08:56:53AM +0100, Sven Van Caekenberghe wrote:
>> >
>> > > On 5 Dec 2017, at 08:34, Alistair Grant <[hidden email]> wrote:
>> > >
>> > > On 5 December 2017 at 03:41, Martin Dias <[hidden email]> wrote:
>> > >> I suspect it's related to the large number of commits in my repo. I
>> > >> made
>> > >> some tweaks and succeeded to create the fast-import file. But I get:
>> > >>
>> > >> fatal: Unsupported command: .
>> > >> fast-import: dumping crash report to .git/fast_import_crash_10301
>> > >>
>> > >> Do you recognize this error?
>> > >> I will check my changes tweaking the git-migration tool to see if I
>> > >> modified
>> > >> some behavior my mistake...
>> > >
>> > > I had the same error just last night.
>> > >
>> > > In my case, it turned out to be a non-UTF8 encoded character in one of
>> > > the commit messages.
>> > >
>> > > I tracked it down by looking at the crash report and searching for a
>> > > nearby command.  I've deleted the crash reports now, but I think it
>> > > was the number associated with a mark command that got me near the
>> > > problem character in the fast-import file.
>> > >
>> > > I also modified the code to halt whenever it found a non-UTF8
>> > > character.  I'm sure there are better ways to do this, but:
>> > >
>> > >
>> > > GitMigrationCommitInfo>>inlineDataFor: aString
>> > >
>> > >    | theString |
>> > >    theString := aString.
>> > >    "Ensure the message has valid UTF-8 encoding (this will raise an
>> > > error if it doesn't)"
>> > >    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
>> > > asByteArray ]
>> > >        on: Error
>> > >        do: [ :ex | self halt: 'Illegal string encoding'.
>> > >            theString := aString select: [ :each | each asciiValue
>> > > between: 32 and: 128 ] ].
>> > >    ^ 'data ' , theString size asString , String cr , (theString
>> > > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>> >
>> > There is also ByteArray>>#utf8Decoded (as well as
>> > String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So
>> > you could write it shorter as:
>> >
>> >   aString asByteArray utf8Decoded
>> >
>> > Instead of Error you could use the more intention revealing
>> > ZnCharacterEncodingError.
>>
>> I agree, this was a hack to get me past the immediate issue and see if
>> I could get the migration to work.
>>
>>
>> > Apart from that, and I known you did not create this mess, encoding a
>> > String into a String, although it can be done, is so wrong. You encode
>> > a String (a collection of Characters, of Unicode code points) into a
>> > ByteArray and you decode a ByteArray into a String.
>> >
>> > Sending #asByteArray to a String in almost always wrong, as is sending
>> > #asString to a ByteArray. These are implicit conversions (null
>> > conversions, like ZnNullEncoder) that only work for pure ASCII or
>> > Latin1 (iso-8859-1), but not for the much richer set of Characters
>> > that Pharo supports. So these will eventually fail, one day, in a
>> > country far away.
>>
>> I hate having to deal with string encoding, and know as little as
>> possible. :-)
>>
>>
>>
>> On Tue, Dec 05, 2017 at 08:59:34AM +0100, Peter Uhn??k wrote:
>> > > In my case, it turned out to be a non-UTF8 encoded character in one of
>> > > the
>> > commit messages.
>> >
>> > I've ran into this problem in a sister project (tonel-migration), and do
>> > not
>> > have a proper resolution yet. I was forcing everything to be unicode, so
>> > I need
>> > a better way to read and write encoded strings. :<
>>
>> I think this is always going to be a tricky problem since the file may
>> simply be corrupt, and the resolution won't always be the same.  In my
>> case I was happy to simply remove the offending characters.
>>
>> I do think it would be nice if the migration tool at least stopped and
>> warned the user that the input is invalid.
>>
>> Would you be open to a PR that incorporates Sven's changes suggested
>> above?
>>
>>
>>
>> On Tue, Dec 05, 2017 at 09:05:24AM +0100, Sven Van Caekenberghe wrote:
>> >
>> > Maybe ZnCharacterEncoder class>>#detectEncoding: can help, although it
>> > is far from perfect.
>>
>> In this particular case I think it is better to try decoding with UTF8,
>> since #detectEncoding: isn't completely reliable and we just want to
>> know if it isn't valid UTF8.
>>
>>
>> Cheers,
>> Alistair
>>
>