there are some pull requests regarding encoding, I finally have a bit more time so I'll look into that in the coming days.
Peter
On Thu, Jan 25, 2018 at 3:57 AM, Martin Dias <[hidden email]> wrote:
Hi, any news or experiences on the migration tools? I'd try again on next days.
Martín
On Sat, Dec 16, 2017 at 5:06 AM, Stephane Ducasse <[hidden email]> wrote:
It would be great to be able to sanitize the files.
We should get a test about tonel misbehavior.
Stef
On Thu, Dec 14, 2017 at 3:11 PM, Henrik Sperre Johansen
<[hidden email]> wrote:
> Stephan Eggermont-3 wrote
>> On 05-12-17 08:59, Peter Uhnák wrote:
>>> > In my case, it turned out to be a non-UTF8 encoded character in one
>>> of the commit messages.
>>>
>>> I've ran into this problem in a sister project (tonel-migration), and do
>>> not have a proper resolution yet. I was forcing everything to be
>>> unicode, so I need a better way to read and write encoded strings. :<
>>
>> To be exact, exactly none of the older commits will be UTF8 encoded. For
>> most it doesn't matter as they are ASCII, but if we want to have a
>> change of converting older french or german code (or japanese), we need
>> support for what was done with WideString. That probably needs a look in
>> the squeak mailing list archives.
>>
>> Stephan
>
> The mcz reader used to import the .bin file (which contained correctly
> serialized WideStrings), only falling back to reading the .st file if .bin
> was not present, has this changed?
>
> Or do these tools explicitly ignore the .bin file and try to read the .st
> file directly?
> If so, the MCDataStream class used to read .bin format still seems to be in
> the image...
>
> One could also create a tool to check/convert all mcz in a repo as a
> preprocess;
> if .bin contents decode as WideString,
> check that .st starts with utf8 BOM,
> if not, convert.
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>