[squeak-dev] Title bar bug on trunk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Title bar bug on trunk

Ian Trudel-2
The current font does not seem proportional. Can't we have a default
monospaced font? I'm curious to know what other thinks about this. The
current font seems fine as far as look is concerned but it's quite
traditional to use monospaced fonts when programming.

Ian.
--
http://mecenia.blogspot.com/

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Title bar bug on trunk

Stéphane Rollandin
I agree, but it seems that finding beautiful open-source proportional
fonts is not easy...

see http://hivelogic.com/articles/top-10-programming-fonts

Stef




Reply | Threaded
Open this post in threaded view
|

Polymorph Was:Re: [squeak-dev] Re: Title bar bug on trunk

Karl Ramberg
In reply to this post by Andreas.Raab
On 2009-08-19 08:40, Andreas Raab wrote:

> Ronald Spengler wrote:
>> Almost everyone I know who Squeaks uses Polymorph, and I enjoyed the
>> heck out of hacking up the Vista style look to have titlebar buttons
>> arranged in a fashion after OS X:P
>>
>> One problem that I kept having was, it seemed that the load order and
>> versions of Polymorph & OmniBrowser had to be just right or I'd wind
>> up pretty broken. I saw lots of bugginess, although I *was* rolling
>> my own theme, so the bugs may have been my own.
>>
>> Do people feel that Polymorph is generally stable enough to go in a
>> base image?
>
> We have to find out. I was asking earlier but I'll ask again: Which
> version would one start with to try Polymorph? From which repository?
> I'm in particular interested in finding out more about
> extensions/overrides and how to eliminate them to make loading
> painless and conflict-free for people.
>
> Cheers,
>   - Andreas
>
>
This seems to be the repository:
http://www.squeaksource.com/UIEnhancements.html

I don't know which version to use, probably latest ?

Karl


Reply | Threaded
Open this post in threaded view
|

Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

Bert Freudenberg
In reply to this post by Ian Trudel-2

On 22.08.2009, at 10:42, Ian Trudel wrote:

> The current font does not seem proportional. Can't we have a default
> monospaced font? I'm curious to know what other thinks about this. The
> current font seems fine as far as look is concerned but it's quite
> traditional to use monospaced fonts when programming.


Elsewhere yes, but not in the Smalltalk tradition. Others are still  
emulating character block generators, but Smalltalk relied on a  
bitmapped display pretty much forever. I find Smalltalk code displayed  
in a character-based terminal emulator style quite ugly.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

Ian Trudel-2
2009/8/22 Bert Freudenberg <[hidden email]>:
> Elsewhere yes, but not in the Smalltalk tradition. Others are still
> emulating character block generators, but Smalltalk relied on a bitmapped
> display pretty much forever. I find Smalltalk code displayed in a
> character-based terminal emulator style quite ugly.

Yes, usually but wouldn't it be interesting anyhow considering that we
can have anti-aliased monospaced font? For example, I did look quickly
into the list provided by Stéphane Rollandin and an anti-aliased
Monofur seems not that bad.

Ian.
--
http://mecenia.blogspot.com/

Reply | Threaded
Open this post in threaded view
|

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

Bert Freudenberg
On 22.08.2009, at 21:39, Ian Trudel wrote:

> 2009/8/22 Bert Freudenberg <[hidden email]>:
>> Elsewhere yes, but not in the Smalltalk tradition. Others are still
>> emulating character block generators, but Smalltalk relied on a  
>> bitmapped
>> display pretty much forever. I find Smalltalk code displayed in a
>> character-based terminal emulator style quite ugly.
>
> Yes, usually but wouldn't it be interesting anyhow considering that we
> can have anti-aliased monospaced font? For example, I did look quickly
> into the list provided by Stéphane Rollandin and an anti-aliased
> Monofur seems not that bad.


Sure, you can use it if you like, I'd just not make a non-proportional  
the default.

I use an anti-aliased monospaced font in my terminal every day. And in  
my C editor, too. Same for shell scripts or when I code Python. Even  
for plain-text emails. So it's not that I dislike them in general.

But not for Smalltalk :)

Smalltalk code looks a lot more like natural language text than most  
other programming languages, and the use of a proportional font  
emphasizes that likeness. Besides, if we had a proportional font by  
default then people would soon start aligning things with spaces,  
which looks ugly to those using a proportional font.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

Ian Trudel-2
Well, Bert, I guess that I can survive with the current fonts anyway.
There have been a lot of improvements in this respect lately, which is
a good thing. =)

Ian.

2009/8/22 Bert Freudenberg <[hidden email]>:

> On 22.08.2009, at 21:39, Ian Trudel wrote:
>
>> 2009/8/22 Bert Freudenberg <[hidden email]>:
>>>
>>> Elsewhere yes, but not in the Smalltalk tradition. Others are still
>>> emulating character block generators, but Smalltalk relied on a bitmapped
>>> display pretty much forever. I find Smalltalk code displayed in a
>>> character-based terminal emulator style quite ugly.
>>
>> Yes, usually but wouldn't it be interesting anyhow considering that we
>> can have anti-aliased monospaced font? For example, I did look quickly
>> into the list provided by Stéphane Rollandin and an anti-aliased
>> Monofur seems not that bad.
>
>
> Sure, you can use it if you like, I'd just not make a non-proportional the
> default.
>
> I use an anti-aliased monospaced font in my terminal every day. And in my C
> editor, too. Same for shell scripts or when I code Python. Even for
> plain-text emails. So it's not that I dislike them in general.
>
> But not for Smalltalk :)
>
> Smalltalk code looks a lot more like natural language text than most other
> programming languages, and the use of a proportional font emphasizes that
> likeness. Besides, if we had a proportional font by default then people
> would soon start aligning things with spaces, which looks ugly to those
> using a proportional font.
>
> - Bert -
>
>
>



--
http://mecenia.blogspot.com/

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Default code font

Colin Putney
In reply to this post by Bert Freudenberg

On 22-Aug-09, at 1:33 PM, Bert Freudenberg wrote:

> Smalltalk code looks a lot more like natural language text than most  
> other programming languages, and the use of a proportional font  
> emphasizes that likeness. Besides, if we had a proportional font by  
> default then people would soon start aligning things with spaces,  
> which looks ugly to those using a proportional font.

Also, in Smalltalk everyone uses the same text editor. Chances are  
good that if it looks good in my image, it'll look good in yours. With  
source code in files, everybody uses their favorite editor, and you  
get all sorts of holy wars over the width of tab stops, where to wrap  
lines etc. Of course, the best way to resolve them are with monospaced  
fonts and spaces for indenting...

Colin

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Title bar bug on trunk

Ian Trudel-2
In reply to this post by Andreas.Raab
2009/8/15 Andreas Raab <[hidden email]>:
> Ian Trudel wrote:
>>
>> I have attached a screenshot on Squeak 3.11.3 (beta) VM with 3.10.2
>> trunk image and latest updates as of today. The font in the title bar
>> is too big causing the window to overflow and having the resize border
>> controls inside the title bar (see red arrows pointing the problem).
>
> Yeah, I've seen that too. I don't know what's causing it - dare to dig into
> it and try to find out more?

Well, I have changed SystemWindow borderWidth from 4 to 3 and it seems
to fix the problem. This is unfortunately a temporary fix because I
suspect the underlying code is bugus at some point.

Ian.

--
http://mecenia.blogspot.com/

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Title bar bug on trunk

Stéphane Rollandin
Do people really like those grip morphs for resizing ? They are ugly and
intrusive, restrict the dragging area way to much, and do not even allow
a plain extension of the system window horizontally or vertically. I
hate them :)

What about a preference to get rid of them ?


Stef



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Title bar bug on trunk

Casey Ransberger
I like them because they're retro. I went out of my way to keep them in my Polymorph theme. You can get rid of them trivially with Polymorph, and I think we should talk some more about that.

 - Ron

2009/8/23 Stéphane Rollandin <[hidden email]>
Do people really like those grip morphs for resizing ? They are ugly and intrusive, restrict the dragging area way to much, and do not even allow a plain extension of the system window horizontally or vertically. I hate them :)

What about a preference to get rid of them ?


Stef






Reply | Threaded
Open this post in threaded view
|

Limited morphic resize grips (was Re: [squeak-dev] Re: Title bar bug on trunk)

Ken Causey-3
In reply to this post by Stéphane Rollandin
+1

I find myself often breathing a small sigh of relief when resizing
system morphs in older images and then finding it awkward when I return
to more recent versions.

Ken

On Sun, 2009-08-23 at 14:23 +0200, Stéphane Rollandin wrote:
> Do people really like those grip morphs for resizing ? They are ugly and
> intrusive, restrict the dragging area way to much, and do not even allow
> a plain extension of the system window horizontally or vertically. I
> hate them :)
>
> What about a preference to get rid of them ?
>
>
> Stef




signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

Michael van der Gulik-2
In reply to this post by Bert Freudenberg


On Sun, Aug 23, 2009 at 12:23 AM, Bert Freudenberg <[hidden email]> wrote:

On 22.08.2009, at 10:42, Ian Trudel wrote:

The current font does not seem proportional. Can't we have a default
monospaced font? I'm curious to know what other thinks about this. The
current font seems fine as far as look is concerned but it's quite
traditional to use monospaced fonts when programming.


Elsewhere yes, but not in the Smalltalk tradition. Others are still emulating character block generators, but Smalltalk relied on a bitmapped display pretty much forever. I find Smalltalk code displayed in a character-based terminal emulator style quite ugly.


I agree with Bert.

Fixed-width fonts are an artifact dating back to typewriters, printers and naive computer displays that weren't sophisticated enough to do proper typesetting. The main reason that you prefer them is because you've been using them for so long.

I find nicely typeset Smalltalk code using variable width fonts a pleasure to read. These days I use variable width fonts for all programming languages I use.

Gulik.

--
http://gulik.pbwiki.com/


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Title bar bug on trunk

Simon Michael
In reply to this post by Stéphane Rollandin
Indeed, they're nasty! :)

There must have been some reason to introduce them, but I can't think what it was.


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Zero bytes in Multilingual package

Bert Freudenberg
In reply to this post by Bert Freudenberg
On 16.08.2009, at 17:07, Bert Freudenberg wrote:

> On 16.08.2009, at 05:15, Andreas Raab wrote:
>
>> Ian Trudel wrote:
>>> Another issue but with the trunk. I have tried to update code from  
>>> the
>>> trunk into my image but there's a proxy error with source.squeak.org
>>> right at this minute, which causes Squeak to freeze for a minute or
>>> two trying to reach the server.
>>
>> It seems to be fine now. Probably just a temporary issue.
>
> There were three debuggers open in the squeaksource image when I  
> looked today. The problem comes from the source server trying to  
> parse Multilingual-ar.38 and Multilingual-sn.38. It contains  
> sections of code where each character is stored as a long instead of  
> a byte (that is, three null bytes and the char code). I've copied  
> the relevant portion out of the .mcz's source.st, see attachment. If  
> you try to open a changelist browser on that file, you get the same  
> parse error.
>
> I have no idea how these widened characters made it into the mzc's  
> source.st file. In particular since this starts in the middle of a  
> method (and of a class comment). It extends over a few chunks, then  
> reverts back to a regular encoding. Strange.
>
> JapaneseEnvironment class>>isBreakableAt:in: looks suspicious though  
> I'm not sure if it is actually broken or not.
>
> I then looked into the trunk's changes file. It has this problem  
> too, though apparently only in the class comment of  
> LanguageEnvironment.
>
> "LanguageEnvironment comment string asByteArray" contains this:
>
> 116 104 114 101 101 32 99 97 110 32 104 97 118 101 32 40 97 110 100  
> 32 100 111 101 115 32 104 97 118 101 41 32 100 105 102 102 101 114  
> 101 110 116 32 101 110 99 111 100 105 110 103 115 46 32 32 83 0 0 0  
> 111 0 0 0 32 0 0 0 119 0 0 0 101 0 0 0 32 0 0 0 110 0 0 0 101 0 0 0  
> 101 0 0 0 100 0 0 0 32 0 0 0 116 0 0 0 111 0 0 0 32 0 0 0 109 0 0 0  
> 97 0 0 0 110 0 0 0 97 0 0 0 103 0 0 0 101 0 0 0 32 0 0 0 116 0 0 0  
> 104 0 0 0 101 0 0 0 109 0 0 0 32 0 0 0 115 0 0 0 101 0 0 0 112 0 0 0  
> 97 0 0 0 114 0 0 0 97 0 0 0 116 0 0 0 101 0 0 0 108 0 0 0 121 0 0 0  
> 46 0 0 0 32 0 0 0 32 0 0 0 78 0 0 0 111 0 0 0 116 0 0 0 101 0 0 0 32  
> 0 0 0 116 0 0 0 104 0 0 0 97 0 0 0 116 0 0 0 32 0 0 0 116 0 0 0 104  
> 0 0 0 101 0 0 0 32 0 0 0 101 0 0 0 110 0 0 0 99 0 0 0 111 0 0 0 100  
> 0 0 0 105 0 0 0 110 0 0 0 103 0 0 0 32 0 0 0 105 0 0 0 110 0 0 0 32  
> 0 0 0 97 0 0 0 32 0 0 0 102 0 0 0 105 0 0 0 108 0 0 0 101 0 0 0 32 0  
> 0 0 99 0 0 0 97 0 0 0 110 0 0 0 32 0 0 0 98 0 0 0
>
> Increasingly strange. So I removed the null bytes from the class  
> comment and published as Multilingual-bf.39. After updating they are  
> indeed gone from the comment. But looking at the source.st in that  
> mcz shows the encoding problem again. Bummer.
>
> Something very strange is going on. I'm out of ideas (short of  
> debugging into the MCZ save process).
>
> - Bert -

*ping*

Problem occurred again today. Will likely happen every time someone  
touches the Multilingual package?

- Bert -




BuggyMultilingual-ar.38.st.zip (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Zero bytes in Multilingual package

Andreas.Raab
Hi Bert -

I figured it out, but you won't like it The problem comes from a
combination of things going wrong. First, you are right, there are
non-Latin characters in the source. This causes the MCWriter to silently
go WideString when it writes source.st. The resulting WideString gets
passed into ZipArchive which compresses it in chunks of 4k. The funny
thing is that when you pull 4k chunks out of a WideString it reduces the
result to ByteString again if it can fit into Latin1. Meaning that only
those definitions that happen to fall into the same 4k chunk that
containing a non-Latin character get screwed up (excuse me for a second
while I walk out and shoot myself).

Ah, feeling better now. This is why nobody ever noticed it, because it
won't affect all of the stuff and since MC is reasonably smart and
doesn't need the source too often, screw-ups of the source do not get
noticed.

I think there is a solution though, namely having the writer check
whether whether the source is wide and if so use utf-8 instead. The big
issue is backwards compatibility though. I can see three approaches:

1) Write a BOM marker in front of any UTF8 encoded source.st file. This
will work for any Monticello version which is aware of the BOM; for the
others YMMV (it depends on whether you're on 3.8 or later - it *should*
be okay for those but I haven't tested).

2) Assume all source as UTF8 all the time and allow conversion errors to
pass through assuming Latin-1. This will work both ways (older
Monticello's would would get multiple characters in some situations but
be otherwise unaffected) at the cost of not detecting possibly incorrect
encodings in the file (which isn't a terrible choice since the zip file
has a CRC).

3) Write two versions of the source, one in snapshot/source one in
snapshot.utf8/source. Works both ways, too at the cost of doubling disk
space requirements.

One thing to keep in mind here is that MCDs may only work with #2 unless
the servers get updated. I think we should also consult with other MC
users to ensure future compatibility. FWIW, my vote is with option #2.

Cheers,
   - Andreas

Bert Freudenberg wrote:

> On 16.08.2009, at 17:07, Bert Freudenberg wrote:
>
>> On 16.08.2009, at 05:15, Andreas Raab wrote:
>>
>>> Ian Trudel wrote:
>>>> Another issue but with the trunk. I have tried to update code from the
>>>> trunk into my image but there's a proxy error with source.squeak.org
>>>> right at this minute, which causes Squeak to freeze for a minute or
>>>> two trying to reach the server.
>>>
>>> It seems to be fine now. Probably just a temporary issue.
>>
>> There were three debuggers open in the squeaksource image when I
>> looked today. The problem comes from the source server trying to parse
>> Multilingual-ar.38 and Multilingual-sn.38. It contains sections of
>> code where each character is stored as a long instead of a byte (that
>> is, three null bytes and the char code). I've copied the relevant
>> portion out of the .mcz's source.st, see attachment. If you try to
>> open a changelist browser on that file, you get the same parse error.
>>
>> I have no idea how these widened characters made it into the mzc's
>> source.st file. In particular since this starts in the middle of a
>> method (and of a class comment). It extends over a few chunks, then
>> reverts back to a regular encoding. Strange.
>>
>> JapaneseEnvironment class>>isBreakableAt:in: looks suspicious though
>> I'm not sure if it is actually broken or not.
>>
>> I then looked into the trunk's changes file. It has this problem too,
>> though apparently only in the class comment of LanguageEnvironment.
>>
>> "LanguageEnvironment comment string asByteArray" contains this:
>>
>> 116 104 114 101 101 32 99 97 110 32 104 97 118 101 32 40 97 110 100 32
>> 100 111 101 115 32 104 97 118 101 41 32 100 105 102 102 101 114 101
>> 110 116 32 101 110 99 111 100 105 110 103 115 46 32 32 83 0 0 0 111 0
>> 0 0 32 0 0 0 119 0 0 0 101 0 0 0 32 0 0 0 110 0 0 0 101 0 0 0 101 0 0
>> 0 100 0 0 0 32 0 0 0 116 0 0 0 111 0 0 0 32 0 0 0 109 0 0 0 97 0 0 0
>> 110 0 0 0 97 0 0 0 103 0 0 0 101 0 0 0 32 0 0 0 116 0 0 0 104 0 0 0
>> 101 0 0 0 109 0 0 0 32 0 0 0 115 0 0 0 101 0 0 0 112 0 0 0 97 0 0 0
>> 114 0 0 0 97 0 0 0 116 0 0 0 101 0 0 0 108 0 0 0 121 0 0 0 46 0 0 0 32
>> 0 0 0 32 0 0 0 78 0 0 0 111 0 0 0 116 0 0 0 101 0 0 0 32 0 0 0 116 0 0
>> 0 104 0 0 0 97 0 0 0 116 0 0 0 32 0 0 0 116 0 0 0 104 0 0 0 101 0 0 0
>> 32 0 0 0 101 0 0 0 110 0 0 0 99 0 0 0 111 0 0 0 100 0 0 0 105 0 0 0
>> 110 0 0 0 103 0 0 0 32 0 0 0 105 0 0 0 110 0 0 0 32 0 0 0 97 0 0 0 32
>> 0 0 0 102 0 0 0 105 0 0 0 108 0 0 0 101 0 0 0 32 0 0 0 99 0 0 0 97 0 0
>> 0 110 0 0 0 32 0 0 0 98 0 0 0
>>
>> Increasingly strange. So I removed the null bytes from the class
>> comment and published as Multilingual-bf.39. After updating they are
>> indeed gone from the comment. But looking at the source.st in that mcz
>> shows the encoding problem again. Bummer.
>>
>> Something very strange is going on. I'm out of ideas (short of
>> debugging into the MCZ save process).
>>
>> - Bert -
>
>
> *ping*
>
> Problem occurred again today. Will likely happen every time someone
> touches the Multilingual package?
>
> - Bert -
>
>
> ------------------------------------------------------------------------
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Zero bytes in Multilingual package

Bert Freudenberg

On 02.09.2009, at 07:28, Andreas Raab wrote:

> Hi Bert -
>
> I figured it out, but you won't like it The problem comes from a  
> combination of things going wrong. First, you are right, there are  
> non-Latin characters in the source. This causes the MCWriter to  
> silently go WideString when it writes source.st. The resulting  
> WideString gets passed into ZipArchive which compresses it in chunks  
> of 4k. The funny thing is that when you pull 4k chunks out of a  
> WideString it reduces the result to ByteString again if it can fit  
> into Latin1. Meaning that only those definitions that happen to fall  
> into the same 4k chunk that containing a non-Latin character get  
> screwed up (excuse me for a second while I walk out and shoot myself).
>
> Ah, feeling better now. This is why nobody ever noticed it, because  
> it won't affect all of the stuff and since MC is reasonably smart  
> and doesn't need the source too often, screw-ups of the source do  
> not get noticed.
>
> I think there is a solution though, namely having the writer check  
> whether whether the source is wide and if so use utf-8 instead. The  
> big issue is backwards compatibility though. I can see three  
> approaches:
>
> 1) Write a BOM marker in front of any UTF8 encoded source.st file.  
> This will work for any Monticello version which is aware of the BOM;  
> for the others YMMV (it depends on whether you're on 3.8 or later -  
> it *should* be okay for those but I haven't tested).
>
> 2) Assume all source as UTF8 all the time and allow conversion  
> errors to pass through assuming Latin-1. This will work both ways  
> (older Monticello's would would get multiple characters in some  
> situations but be otherwise unaffected) at the cost of not detecting  
> possibly incorrect encodings in the file (which isn't a terrible  
> choice since the zip file has a CRC).
>
> 3) Write two versions of the source, one in snapshot/source one in  
> snapshot.utf8/source. Works both ways, too at the cost of doubling  
> disk space requirements.
>
> One thing to keep in mind here is that MCDs may only work with #2  
> unless the servers get updated. I think we should also consult with  
> other MC users to ensure future compatibility. FWIW, my vote is with  
> option #2.
>
> Cheers,
>  - Andreas


Yes, go UTF-8. This is precisely one of the backwards compatibility  
problems UTF-8 was designed to work around. In fact I had thought we  
did this already, must be an omission in our MC version.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Zero bytes in Multilingual package

Bert Freudenberg

On 02.09.2009, at 13:45, Bert Freudenberg wrote:

>
> On 02.09.2009, at 07:28, Andreas Raab wrote:
>
>> Hi Bert -
>>
>> I figured it out, but you won't like it The problem comes from a  
>> combination of things going wrong. First, you are right, there are  
>> non-Latin characters in the source. This causes the MCWriter to  
>> silently go WideString when it writes source.st. The resulting  
>> WideString gets passed into ZipArchive which compresses it in  
>> chunks of 4k. The funny thing is that when you pull 4k chunks out  
>> of a WideString it reduces the result to ByteString again if it can  
>> fit into Latin1. Meaning that only those definitions that happen to  
>> fall into the same 4k chunk that containing a non-Latin character  
>> get screwed up (excuse me for a second while I walk out and shoot  
>> myself).
>>
>> Ah, feeling better now. This is why nobody ever noticed it, because  
>> it won't affect all of the stuff and since MC is reasonably smart  
>> and doesn't need the source too often, screw-ups of the source do  
>> not get noticed.
>>
>> I think there is a solution though, namely having the writer check  
>> whether whether the source is wide and if so use utf-8 instead. The  
>> big issue is backwards compatibility though. I can see three  
>> approaches:
>>
>> 1) Write a BOM marker in front of any UTF8 encoded source.st file.  
>> This will work for any Monticello version which is aware of the  
>> BOM; for the others YMMV (it depends on whether you're on 3.8 or  
>> later - it *should* be okay for those but I haven't tested).
>>
>> 2) Assume all source as UTF8 all the time and allow conversion  
>> errors to pass through assuming Latin-1. This will work both ways  
>> (older Monticello's would would get multiple characters in some  
>> situations but be otherwise unaffected) at the cost of not  
>> detecting possibly incorrect encodings in the file (which isn't a  
>> terrible choice since the zip file has a CRC).
>>
>> 3) Write two versions of the source, one in snapshot/source one in  
>> snapshot.utf8/source. Works both ways, too at the cost of doubling  
>> disk space requirements.
>>
>> One thing to keep in mind here is that MCDs may only work with #2  
>> unless the servers get updated. I think we should also consult with  
>> other MC users to ensure future compatibility. FWIW, my vote is  
>> with option #2.
>>
>> Cheers,
>> - Andreas
>
>
> Yes, go UTF-8. This is precisely one of the backwards compatibility  
> problems UTF-8 was designed to work around. In fact I had thought we  
> did this already, must be an omission in our MC version.
>
> - Bert -


Looking closer into this I understand what you mean and why you didn't  
fix it right away. It's a mess.

I started by writing tests for MCStReader and MCStWriter but later  
realized it's testing the wrong thing. The stream to file out and in  
is created in the test, and the stream class used is actually what we  
need to change.

So I tried to change

        RWBinaryOrTextStream on: String new.
to
        MultiByteBinaryOrTextStream on: String new encoding: 'utf-8'

in MCStWriterTest>>setUp but it's not a drop-in replacement, I get 7  
test failures from that change alone.

E.g.:
        (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents
gives
        'Hi'
whereas
        (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents
answers
        ''

Giving up for now.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Zero bytes in Multilingual package

Nicolas Cellier
That reminds me http://bugs.squeak.org/view.php?id=5996


There are some other bugs sleeping there like this one:

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-May/008994.html
http://code.google.com/p/pharo/issues/detail?id=830

SystemDictionary>>#condenseChanges use StandardFileStream when it
should better not...

Nicolas

2009/9/3 Bert Freudenberg <[hidden email]>:

>
> On 02.09.2009, at 13:45, Bert Freudenberg wrote:
>
>>
>> On 02.09.2009, at 07:28, Andreas Raab wrote:
>>
>>> Hi Bert -
>>>
>>> I figured it out, but you won't like it The problem comes from a
>>> combination of things going wrong. First, you are right, there are non-Latin
>>> characters in the source. This causes the MCWriter to silently go WideString
>>> when it writes source.st. The resulting WideString gets passed into
>>> ZipArchive which compresses it in chunks of 4k. The funny thing is that when
>>> you pull 4k chunks out of a WideString it reduces the result to ByteString
>>> again if it can fit into Latin1. Meaning that only those definitions that
>>> happen to fall into the same 4k chunk that containing a non-Latin character
>>> get screwed up (excuse me for a second while I walk out and shoot myself).
>>>
>>> Ah, feeling better now. This is why nobody ever noticed it, because it
>>> won't affect all of the stuff and since MC is reasonably smart and doesn't
>>> need the source too often, screw-ups of the source do not get noticed.
>>>
>>> I think there is a solution though, namely having the writer check
>>> whether whether the source is wide and if so use utf-8 instead. The big
>>> issue is backwards compatibility though. I can see three approaches:
>>>
>>> 1) Write a BOM marker in front of any UTF8 encoded source.st file. This
>>> will work for any Monticello version which is aware of the BOM; for the
>>> others YMMV (it depends on whether you're on 3.8 or later - it *should* be
>>> okay for those but I haven't tested).
>>>
>>> 2) Assume all source as UTF8 all the time and allow conversion errors to
>>> pass through assuming Latin-1. This will work both ways (older Monticello's
>>> would would get multiple characters in some situations but be otherwise
>>> unaffected) at the cost of not detecting possibly incorrect encodings in the
>>> file (which isn't a terrible choice since the zip file has a CRC).
>>>
>>> 3) Write two versions of the source, one in snapshot/source one in
>>> snapshot.utf8/source. Works both ways, too at the cost of doubling disk
>>> space requirements.
>>>
>>> One thing to keep in mind here is that MCDs may only work with #2 unless
>>> the servers get updated. I think we should also consult with other MC users
>>> to ensure future compatibility. FWIW, my vote is with option #2.
>>>
>>> Cheers,
>>> - Andreas
>>
>>
>> Yes, go UTF-8. This is precisely one of the backwards compatibility
>> problems UTF-8 was designed to work around. In fact I had thought we did
>> this already, must be an omission in our MC version.
>>
>> - Bert -
>
>
> Looking closer into this I understand what you mean and why you didn't fix
> it right away. It's a mess.
>
> I started by writing tests for MCStReader and MCStWriter but later realized
> it's testing the wrong thing. The stream to file out and in is created in
> the test, and the stream class used is actually what we need to change.
>
> So I tried to change
>
>        RWBinaryOrTextStream on: String new.
> to
>        MultiByteBinaryOrTextStream on: String new encoding: 'utf-8'
>
> in MCStWriterTest>>setUp but it's not a drop-in replacement, I get 7 test
> failures from that change alone.
>
> E.g.:
>        (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents
> gives
>        'Hi'
> whereas
>        (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi';
> contents
> answers
>        ''
>
> Giving up for now.
>
> - Bert -
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Zero bytes in Multilingual package

Bert Freudenberg
On 03.09.2009, at 22:41, Nicolas Cellier wrote:

> That reminds me http://bugs.squeak.org/view.php?id=5996

Ah, thanks! That's implementing Andreas' suggestion #1 below.

Does someone know if this was integrated in any MC version? The ticket  
doesn't say.

- Bert -

> There are some other bugs sleeping there like this one:
>
> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-May/008994.html
> http://code.google.com/p/pharo/issues/detail?id=830
>
> SystemDictionary>>#condenseChanges use StandardFileStream when it
> should better not...
>
> Nicolas
>
> 2009/9/3 Bert Freudenberg <[hidden email]>:
>>
>> On 02.09.2009, at 13:45, Bert Freudenberg wrote:
>>
>>>
>>> On 02.09.2009, at 07:28, Andreas Raab wrote:
>>>
>>>> Hi Bert -
>>>>
>>>> I figured it out, but you won't like it The problem comes from a
>>>> combination of things going wrong. First, you are right, there  
>>>> are non-Latin
>>>> characters in the source. This causes the MCWriter to silently go  
>>>> WideString
>>>> when it writes source.st. The resulting WideString gets passed into
>>>> ZipArchive which compresses it in chunks of 4k. The funny thing  
>>>> is that when
>>>> you pull 4k chunks out of a WideString it reduces the result to  
>>>> ByteString
>>>> again if it can fit into Latin1. Meaning that only those  
>>>> definitions that
>>>> happen to fall into the same 4k chunk that containing a non-Latin  
>>>> character
>>>> get screwed up (excuse me for a second while I walk out and shoot  
>>>> myself).
>>>>
>>>> Ah, feeling better now. This is why nobody ever noticed it,  
>>>> because it
>>>> won't affect all of the stuff and since MC is reasonably smart  
>>>> and doesn't
>>>> need the source too often, screw-ups of the source do not get  
>>>> noticed.
>>>>
>>>> I think there is a solution though, namely having the writer check
>>>> whether whether the source is wide and if so use utf-8 instead.  
>>>> The big
>>>> issue is backwards compatibility though. I can see three  
>>>> approaches:
>>>>
>>>> 1) Write a BOM marker in front of any UTF8 encoded source.st  
>>>> file. This
>>>> will work for any Monticello version which is aware of the BOM;  
>>>> for the
>>>> others YMMV (it depends on whether you're on 3.8 or later - it  
>>>> *should* be
>>>> okay for those but I haven't tested).
>>>>
>>>> 2) Assume all source as UTF8 all the time and allow conversion  
>>>> errors to
>>>> pass through assuming Latin-1. This will work both ways (older  
>>>> Monticello's
>>>> would would get multiple characters in some situations but be  
>>>> otherwise
>>>> unaffected) at the cost of not detecting possibly incorrect  
>>>> encodings in the
>>>> file (which isn't a terrible choice since the zip file has a CRC).
>>>>
>>>> 3) Write two versions of the source, one in snapshot/source one in
>>>> snapshot.utf8/source. Works both ways, too at the cost of  
>>>> doubling disk
>>>> space requirements.
>>>>
>>>> One thing to keep in mind here is that MCDs may only work with #2  
>>>> unless
>>>> the servers get updated. I think we should also consult with  
>>>> other MC users
>>>> to ensure future compatibility. FWIW, my vote is with option #2.
>>>>
>>>> Cheers,
>>>> - Andreas
>>>
>>>
>>> Yes, go UTF-8. This is precisely one of the backwards compatibility
>>> problems UTF-8 was designed to work around. In fact I had thought  
>>> we did
>>> this already, must be an omission in our MC version.
>>>
>>> - Bert -
>>
>>
>> Looking closer into this I understand what you mean and why you  
>> didn't fix
>> it right away. It's a mess.
>>
>> I started by writing tests for MCStReader and MCStWriter but later  
>> realized
>> it's testing the wrong thing. The stream to file out and in is  
>> created in
>> the test, and the stream class used is actually what we need to  
>> change.
>>
>> So I tried to change
>>
>>        RWBinaryOrTextStream on: String new.
>> to
>>        MultiByteBinaryOrTextStream on: String new encoding: 'utf-8'
>>
>> in MCStWriterTest>>setUp but it's not a drop-in replacement, I get  
>> 7 test
>> failures from that change alone.
>>
>> E.g.:
>>        (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi';  
>> contents
>> gives
>>        'Hi'
>> whereas
>>        (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi';
>> contents
>> answers
>>        ''
>>
>> Giving up for now.
>>
>> - Bert -
>>
>>
>>
>>
>




12