Smalltalk › Squeak › Squeak - Dev

[squeak-dev] Title bar bug on trunk

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

40 messages Options

Ian Trudel-2

Re: [squeak-dev] Re: Title bar bug on trunk

The current font does not seem proportional. Can't we have a default
monospaced font? I'm curious to know what other thinks about this. The
current font seems fine as far as look is concerned but it's quite
traditional to use monospaced fonts when programming.

Ian.
--
http://mecenia.blogspot.com/

Stéphane Rollandin

Re: [squeak-dev] Re: Title bar bug on trunk

I agree, but it seems that finding beautiful open-source proportional
fonts is not easy...

see http://hivelogic.com/articles/top-10-programming-fonts

Stef

Karl Ramberg

Polymorph Was:Re: [squeak-dev] Re: Title bar bug on trunk

In reply to this post by Andreas.Raab

On 2009-08-19 08:40, Andreas Raab wrote:

> Ronald Spengler wrote:
>> Almost everyone I know who Squeaks uses Polymorph, and I enjoyed the
>> heck out of hacking up the Vista style look to have titlebar buttons
>> arranged in a fashion after OS X:P
>>
>> One problem that I kept having was, it seemed that the load order and
>> versions of Polymorph & OmniBrowser had to be just right or I'd wind
>> up pretty broken. I saw lots of bugginess, although I *was* rolling
>> my own theme, so the bugs may have been my own.
>>
>> Do people feel that Polymorph is generally stable enough to go in a
>> base image?
>
> We have to find out. I was asking earlier but I'll ask again: Which
> version would one start with to try Polymorph? From which repository?
> I'm in particular interested in finding out more about
> extensions/overrides and how to eliminate them to make loading
> painless and conflict-free for people.
>
> Cheers,
> - Andreas
>
>

This seems to be the repository:
http://www.squeaksource.com/UIEnhancements.html

I don't know which version to use, probably latest ?

Karl

Bert Freudenberg

Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

In reply to this post by Ian Trudel-2

On 22.08.2009, at 10:42, Ian Trudel wrote:

> The current font does not seem proportional. Can't we have a default
> monospaced font? I'm curious to know what other thinks about this. The
> current font seems fine as far as look is concerned but it's quite
> traditional to use monospaced fonts when programming.

Elsewhere yes, but not in the Smalltalk tradition. Others are still
emulating character block generators, but Smalltalk relied on a
bitmapped display pretty much forever. I find Smalltalk code displayed
in a character-based terminal emulator style quite ugly.

- Bert -

Ian Trudel-2

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

2009/8/22 Bert Freudenberg <[hidden email]>:
> Elsewhere yes, but not in the Smalltalk tradition. Others are still
> emulating character block generators, but Smalltalk relied on a bitmapped
> display pretty much forever. I find Smalltalk code displayed in a
> character-based terminal emulator style quite ugly.

Yes, usually but wouldn't it be interesting anyhow considering that we
can have anti-aliased monospaced font? For example, I did look quickly
into the list provided by Stéphane Rollandin and an anti-aliased
Monofur seems not that bad.

Ian.
--
http://mecenia.blogspot.com/

Bert Freudenberg

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

On 22.08.2009, at 21:39, Ian Trudel wrote:

> 2009/8/22 Bert Freudenberg <[hidden email]>:
>> Elsewhere yes, but not in the Smalltalk tradition. Others are still
>> emulating character block generators, but Smalltalk relied on a
>> bitmapped
>> display pretty much forever. I find Smalltalk code displayed in a
>> character-based terminal emulator style quite ugly.
>
> Yes, usually but wouldn't it be interesting anyhow considering that we
> can have anti-aliased monospaced font? For example, I did look quickly
> into the list provided by Stéphane Rollandin and an anti-aliased
> Monofur seems not that bad.

Sure, you can use it if you like, I'd just not make a non-proportional
the default.

I use an anti-aliased monospaced font in my terminal every day. And in
my C editor, too. Same for shell scripts or when I code Python. Even
for plain-text emails. So it's not that I dislike them in general.

But not for Smalltalk :)

Smalltalk code looks a lot more like natural language text than most
other programming languages, and the use of a proportional font
emphasizes that likeness. Besides, if we had a proportional font by
default then people would soon start aligning things with spaces,
which looks ugly to those using a proportional font.

- Bert -

Ian Trudel-2

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

Well, Bert, I guess that I can survive with the current fonts anyway.
There have been a lot of improvements in this respect lately, which is
a good thing. =)

Ian.

2009/8/22 Bert Freudenberg <[hidden email]>:

> On 22.08.2009, at 21:39, Ian Trudel wrote:
>
>> 2009/8/22 Bert Freudenberg <[hidden email]>:
>>>
>>> Elsewhere yes, but not in the Smalltalk tradition. Others are still
>>> emulating character block generators, but Smalltalk relied on a bitmapped
>>> display pretty much forever. I find Smalltalk code displayed in a
>>> character-based terminal emulator style quite ugly.
>>
>> Yes, usually but wouldn't it be interesting anyhow considering that we
>> can have anti-aliased monospaced font? For example, I did look quickly
>> into the list provided by Stéphane Rollandin and an anti-aliased
>> Monofur seems not that bad.
>
>
> Sure, you can use it if you like, I'd just not make a non-proportional the
> default.
>
> I use an anti-aliased monospaced font in my terminal every day. And in my C
> editor, too. Same for shell scripts or when I code Python. Even for
> plain-text emails. So it's not that I dislike them in general.
>
> But not for Smalltalk :)
>
> Smalltalk code looks a lot more like natural language text than most other
> programming languages, and the use of a proportional font emphasizes that
> likeness. Besides, if we had a proportional font by default then people
> would soon start aligning things with spaces, which looks ugly to those
> using a proportional font.
>
> - Bert -
>
>
>

--
http://mecenia.blogspot.com/

Colin Putney

[squeak-dev] Re: Default code font

In reply to this post by Bert Freudenberg

On 22-Aug-09, at 1:33 PM, Bert Freudenberg wrote:

> Smalltalk code looks a lot more like natural language text than most
> other programming languages, and the use of a proportional font
> emphasizes that likeness. Besides, if we had a proportional font by
> default then people would soon start aligning things with spaces,
> which looks ugly to those using a proportional font.

Also, in Smalltalk everyone uses the same text editor. Chances are
good that if it looks good in my image, it'll look good in yours. With
source code in files, everybody uses their favorite editor, and you
get all sorts of holy wars over the width of tab stops, where to wrap
lines etc. Of course, the best way to resolve them are with monospaced
fonts and spaces for indenting...

Colin

Ian Trudel-2

Re: [squeak-dev] Re: Title bar bug on trunk

In reply to this post by Andreas.Raab

2009/8/15 Andreas Raab <[hidden email]>:
> Ian Trudel wrote:
>>
>> I have attached a screenshot on Squeak 3.11.3 (beta) VM with 3.10.2
>> trunk image and latest updates as of today. The font in the title bar
>> is too big causing the window to overflow and having the resize border
>> controls inside the title bar (see red arrows pointing the problem).
>
> Yeah, I've seen that too. I don't know what's causing it - dare to dig into
> it and try to find out more?

Well, I have changed SystemWindow borderWidth from 4 to 3 and it seems
to fix the problem. This is unfortunately a temporary fix because I
suspect the underlying code is bugus at some point.

Ian.

--
http://mecenia.blogspot.com/

Stéphane Rollandin

Re: [squeak-dev] Re: Title bar bug on trunk

Do people really like those grip morphs for resizing ? They are ugly and
intrusive, restrict the dragging area way to much, and do not even allow
a plain extension of the system window horizontally or vertically. I
hate them :)

What about a preference to get rid of them ?

Stef

Casey Ransberger

Re: [squeak-dev] Re: Title bar bug on trunk

I like them because they're retro. I went out of my way to keep them in my Polymorph theme. You can get rid of them trivially with Polymorph, and I think we should talk some more about that.

- Ron

2009/8/23 Stéphane Rollandin <[hidden email]>

Do people really like those grip morphs for resizing ? They are ugly and intrusive, restrict the dragging area way to much, and do not even allow a plain extension of the system window horizontally or vertically. I hate them :)

What about a preference to get rid of them ?

Stef

Ken Causey-3

Limited morphic resize grips (was Re: [squeak-dev] Re: Title bar bug on trunk)

In reply to this post by Stéphane Rollandin

+1

I find myself often breathing a small sigh of relief when resizing
system morphs in older images and then finding it awkward when I return
to more recent versions.

Ken

On Sun, 2009-08-23 at 14:23 +0200, Stéphane Rollandin wrote:
> Do people really like those grip morphs for resizing ? They are ugly and
> intrusive, restrict the dragging area way to much, and do not even allow
> a plain extension of the system window horizontally or vertically. I
> hate them :)
>
> What about a preference to get rid of them ?
>
>
> Stef

signature.asc (196 bytes) Download Attachment

Michael van der Gulik-2

Re: Default code font (was Re: [squeak-dev] Re: Title bar bug on trunk)

In reply to this post by Bert Freudenberg

On Sun, Aug 23, 2009 at 12:23 AM, Bert Freudenberg <[hidden email]> wrote:

On 22.08.2009, at 10:42, Ian Trudel wrote:

The current font does not seem proportional. Can't we have a default
monospaced font? I'm curious to know what other thinks about this. The
current font seems fine as far as look is concerned but it's quite
traditional to use monospaced fonts when programming.

Elsewhere yes, but not in the Smalltalk tradition. Others are still emulating character block generators, but Smalltalk relied on a bitmapped display pretty much forever. I find Smalltalk code displayed in a character-based terminal emulator style quite ugly.

I agree with Bert.

Fixed-width fonts are an artifact dating back to typewriters, printers and naive computer displays that weren't sophisticated enough to do proper typesetting. The main reason that you prefer them is because you've been using them for so long.

I find nicely typeset Smalltalk code using variable width fonts a pleasure to read. These days I use variable width fonts for all programming languages I use.

Gulik.

--
http://gulik.pbwiki.com/

Simon Michael

[squeak-dev] Re: Title bar bug on trunk

In reply to this post by Stéphane Rollandin

Indeed, they're nasty! :)

There must have been some reason to introduce them, but I can't think what it was.

Bert Freudenberg

Re: [squeak-dev] Zero bytes in Multilingual package

In reply to this post by Bert Freudenberg

On 16.08.2009, at 17:07, Bert Freudenberg wrote:

> On 16.08.2009, at 05:15, Andreas Raab wrote:
>
>> Ian Trudel wrote:
>>> Another issue but with the trunk. I have tried to update code from
>>> the
>>> trunk into my image but there's a proxy error with source.squeak.org
>>> right at this minute, which causes Squeak to freeze for a minute or
>>> two trying to reach the server.
>>
>> It seems to be fine now. Probably just a temporary issue.
>
> There were three debuggers open in the squeaksource image when I
> looked today. The problem comes from the source server trying to
> parse Multilingual-ar.38 and Multilingual-sn.38. It contains
> sections of code where each character is stored as a long instead of
> a byte (that is, three null bytes and the char code). I've copied
> the relevant portion out of the .mcz's source.st, see attachment. If
> you try to open a changelist browser on that file, you get the same
> parse error.
>
> I have no idea how these widened characters made it into the mzc's
> source.st file. In particular since this starts in the middle of a
> method (and of a class comment). It extends over a few chunks, then
> reverts back to a regular encoding. Strange.
>
> JapaneseEnvironment class>>isBreakableAt:in: looks suspicious though
> I'm not sure if it is actually broken or not.
>
> I then looked into the trunk's changes file. It has this problem
> too, though apparently only in the class comment of
> LanguageEnvironment.
>
> "LanguageEnvironment comment string asByteArray" contains this:
>
> 116 104 114 101 101 32 99 97 110 32 104 97 118 101 32 40 97 110 100
> 32 100 111 101 115 32 104 97 118 101 41 32 100 105 102 102 101 114
> 101 110 116 32 101 110 99 111 100 105 110 103 115 46 32 32 83 0 0 0
> 111 0 0 0 32 0 0 0 119 0 0 0 101 0 0 0 32 0 0 0 110 0 0 0 101 0 0 0
> 101 0 0 0 100 0 0 0 32 0 0 0 116 0 0 0 111 0 0 0 32 0 0 0 109 0 0 0
> 97 0 0 0 110 0 0 0 97 0 0 0 103 0 0 0 101 0 0 0 32 0 0 0 116 0 0 0
> 104 0 0 0 101 0 0 0 109 0 0 0 32 0 0 0 115 0 0 0 101 0 0 0 112 0 0 0
> 97 0 0 0 114 0 0 0 97 0 0 0 116 0 0 0 101 0 0 0 108 0 0 0 121 0 0 0
> 46 0 0 0 32 0 0 0 32 0 0 0 78 0 0 0 111 0 0 0 116 0 0 0 101 0 0 0 32
> 0 0 0 116 0 0 0 104 0 0 0 97 0 0 0 116 0 0 0 32 0 0 0 116 0 0 0 104
> 0 0 0 101 0 0 0 32 0 0 0 101 0 0 0 110 0 0 0 99 0 0 0 111 0 0 0 100
> 0 0 0 105 0 0 0 110 0 0 0 103 0 0 0 32 0 0 0 105 0 0 0 110 0 0 0 32
> 0 0 0 97 0 0 0 32 0 0 0 102 0 0 0 105 0 0 0 108 0 0 0 101 0 0 0 32 0
> 0 0 99 0 0 0 97 0 0 0 110 0 0 0 32 0 0 0 98 0 0 0
>
> Increasingly strange. So I removed the null bytes from the class
> comment and published as Multilingual-bf.39. After updating they are
> indeed gone from the comment. But looking at the source.st in that
> mcz shows the encoding problem again. Bummer.
>
> Something very strange is going on. I'm out of ideas (short of
> debugging into the MCZ save process).
>
> - Bert -

*ping*

Problem occurred again today. Will likely happen every time someone
touches the Multilingual package?

- Bert -

BuggyMultilingual-ar.38.st.zip (5K) Download Attachment

Andreas.Raab

[squeak-dev] Re: Zero bytes in Multilingual package

Hi Bert -

I figured it out, but you won't like it The problem comes from a
combination of things going wrong. First, you are right, there are
non-Latin characters in the source. This causes the MCWriter to silently
go WideString when it writes source.st. The resulting WideString gets
passed into ZipArchive which compresses it in chunks of 4k. The funny
thing is that when you pull 4k chunks out of a WideString it reduces the
result to ByteString again if it can fit into Latin1. Meaning that only
those definitions that happen to fall into the same 4k chunk that
containing a non-Latin character get screwed up (excuse me for a second
while I walk out and shoot myself).

Ah, feeling better now. This is why nobody ever noticed it, because it
won't affect all of the stuff and since MC is reasonably smart and
doesn't need the source too often, screw-ups of the source do not get
noticed.

I think there is a solution though, namely having the writer check
whether whether the source is wide and if so use utf-8 instead. The big
issue is backwards compatibility though. I can see three approaches:

1) Write a BOM marker in front of any UTF8 encoded source.st file. This
will work for any Monticello version which is aware of the BOM; for the
others YMMV (it depends on whether you're on 3.8 or later - it *should*
be okay for those but I haven't tested).

2) Assume all source as UTF8 all the time and allow conversion errors to
pass through assuming Latin-1. This will work both ways (older
Monticello's would would get multiple characters in some situations but
be otherwise unaffected) at the cost of not detecting possibly incorrect
encodings in the file (which isn't a terrible choice since the zip file
has a CRC).

3) Write two versions of the source, one in snapshot/source one in
snapshot.utf8/source. Works both ways, too at the cost of doubling disk
space requirements.

One thing to keep in mind here is that MCDs may only work with #2 unless
the servers get updated. I think we should also consult with other MC
users to ensure future compatibility. FWIW, my vote is with option #2.

Cheers,
- Andreas

Bert Freudenberg wrote:

> On 16.08.2009, at 17:07, Bert Freudenberg wrote:
>
>> On 16.08.2009, at 05:15, Andreas Raab wrote:
>>
>>> Ian Trudel wrote:
>>>> Another issue but with the trunk. I have tried to update code from the
>>>> trunk into my image but there's a proxy error with source.squeak.org
>>>> right at this minute, which causes Squeak to freeze for a minute or
>>>> two trying to reach the server.
>>>
>>> It seems to be fine now. Probably just a temporary issue.
>>
>> There were three debuggers open in the squeaksource image when I
>> looked today. The problem comes from the source server trying to parse
>> Multilingual-ar.38 and Multilingual-sn.38. It contains sections of
>> code where each character is stored as a long instead of a byte (that
>> is, three null bytes and the char code). I've copied the relevant
>> portion out of the .mcz's source.st, see attachment. If you try to
>> open a changelist browser on that file, you get the same parse error.
>>
>> I have no idea how these widened characters made it into the mzc's
>> source.st file. In particular since this starts in the middle of a
>> method (and of a class comment). It extends over a few chunks, then
>> reverts back to a regular encoding. Strange.
>>
>> JapaneseEnvironment class>>isBreakableAt:in: looks suspicious though
>> I'm not sure if it is actually broken or not.
>>
>> I then looked into the trunk's changes file. It has this problem too,
>> though apparently only in the class comment of LanguageEnvironment.
>>
>> "LanguageEnvironment comment string asByteArray" contains this:
>>
>> 116 104 114 101 101 32 99 97 110 32 104 97 118 101 32 40 97 110 100 32
>> 100 111 101 115 32 104 97 118 101 41 32 100 105 102 102 101 114 101
>> 110 116 32 101 110 99 111 100 105 110 103 115 46 32 32 83 0 0 0 111 0
>> 0 0 32 0 0 0 119 0 0 0 101 0 0 0 32 0 0 0 110 0 0 0 101 0 0 0 101 0 0
>> 0 100 0 0 0 32 0 0 0 116 0 0 0 111 0 0 0 32 0 0 0 109 0 0 0 97 0 0 0
>> 110 0 0 0 97 0 0 0 103 0 0 0 101 0 0 0 32 0 0 0 116 0 0 0 104 0 0 0
>> 101 0 0 0 109 0 0 0 32 0 0 0 115 0 0 0 101 0 0 0 112 0 0 0 97 0 0 0
>> 114 0 0 0 97 0 0 0 116 0 0 0 101 0 0 0 108 0 0 0 121 0 0 0 46 0 0 0 32
>> 0 0 0 32 0 0 0 78 0 0 0 111 0 0 0 116 0 0 0 101 0 0 0 32 0 0 0 116 0 0
>> 0 104 0 0 0 97 0 0 0 116 0 0 0 32 0 0 0 116 0 0 0 104 0 0 0 101 0 0 0
>> 32 0 0 0 101 0 0 0 110 0 0 0 99 0 0 0 111 0 0 0 100 0 0 0 105 0 0 0
>> 110 0 0 0 103 0 0 0 32 0 0 0 105 0 0 0 110 0 0 0 32 0 0 0 97 0 0 0 32
>> 0 0 0 102 0 0 0 105 0 0 0 108 0 0 0 101 0 0 0 32 0 0 0 99 0 0 0 97 0 0
>> 0 110 0 0 0 32 0 0 0 98 0 0 0
>>
>> Increasingly strange. So I removed the null bytes from the class
>> comment and published as Multilingual-bf.39. After updating they are
>> indeed gone from the comment. But looking at the source.st in that mcz
>> shows the encoding problem again. Bummer.
>>
>> Something very strange is going on. I'm out of ideas (short of
>> debugging into the MCZ save process).
>>
>> - Bert -
>
>
> *ping*
>
> Problem occurred again today. Will likely happen every time someone
> touches the Multilingual package?
>
> - Bert -
>
>
> ------------------------------------------------------------------------
>
>

Bert Freudenberg

Re: [squeak-dev] Re: Zero bytes in Multilingual package

On 02.09.2009, at 07:28, Andreas Raab wrote:

> Hi Bert -
>
> I figured it out, but you won't like it The problem comes from a
> combination of things going wrong. First, you are right, there are
> non-Latin characters in the source. This causes the MCWriter to
> silently go WideString when it writes source.st. The resulting
> WideString gets passed into ZipArchive which compresses it in chunks
> of 4k. The funny thing is that when you pull 4k chunks out of a
> WideString it reduces the result to ByteString again if it can fit
> into Latin1. Meaning that only those definitions that happen to fall
> into the same 4k chunk that containing a non-Latin character get
> screwed up (excuse me for a second while I walk out and shoot myself).
>
> Ah, feeling better now. This is why nobody ever noticed it, because
> it won't affect all of the stuff and since MC is reasonably smart
> and doesn't need the source too often, screw-ups of the source do
> not get noticed.
>
> I think there is a solution though, namely having the writer check
> whether whether the source is wide and if so use utf-8 instead. The
> big issue is backwards compatibility though. I can see three
> approaches:
>
> 1) Write a BOM marker in front of any UTF8 encoded source.st file.
> This will work for any Monticello version which is aware of the BOM;
> for the others YMMV (it depends on whether you're on 3.8 or later -
> it *should* be okay for those but I haven't tested).
>
> 2) Assume all source as UTF8 all the time and allow conversion
> errors to pass through assuming Latin-1. This will work both ways
> (older Monticello's would would get multiple characters in some
> situations but be otherwise unaffected) at the cost of not detecting
> possibly incorrect encodings in the file (which isn't a terrible
> choice since the zip file has a CRC).
>
> 3) Write two versions of the source, one in snapshot/source one in
> snapshot.utf8/source. Works both ways, too at the cost of doubling
> disk space requirements.
>
> One thing to keep in mind here is that MCDs may only work with #2
> unless the servers get updated. I think we should also consult with
> other MC users to ensure future compatibility. FWIW, my vote is with
> option #2.
>
> Cheers,
> - Andreas

Yes, go UTF-8. This is precisely one of the backwards compatibility
problems UTF-8 was designed to work around. In fact I had thought we
did this already, must be an omission in our MC version.

- Bert -

Bert Freudenberg

Re: [squeak-dev] Re: Zero bytes in Multilingual package

On 02.09.2009, at 13:45, Bert Freudenberg wrote:

>
> On 02.09.2009, at 07:28, Andreas Raab wrote:
>
>> Hi Bert -
>>
>> I figured it out, but you won't like it The problem comes from a
>> combination of things going wrong. First, you are right, there are
>> non-Latin characters in the source. This causes the MCWriter to
>> silently go WideString when it writes source.st. The resulting
>> WideString gets passed into ZipArchive which compresses it in
>> chunks of 4k. The funny thing is that when you pull 4k chunks out
>> of a WideString it reduces the result to ByteString again if it can
>> fit into Latin1. Meaning that only those definitions that happen to
>> fall into the same 4k chunk that containing a non-Latin character
>> get screwed up (excuse me for a second while I walk out and shoot
>> myself).
>>
>> Ah, feeling better now. This is why nobody ever noticed it, because
>> it won't affect all of the stuff and since MC is reasonably smart
>> and doesn't need the source too often, screw-ups of the source do
>> not get noticed.
>>
>> I think there is a solution though, namely having the writer check
>> whether whether the source is wide and if so use utf-8 instead. The
>> big issue is backwards compatibility though. I can see three
>> approaches:
>>
>> 1) Write a BOM marker in front of any UTF8 encoded source.st file.
>> This will work for any Monticello version which is aware of the
>> BOM; for the others YMMV (it depends on whether you're on 3.8 or
>> later - it *should* be okay for those but I haven't tested).
>>
>> 2) Assume all source as UTF8 all the time and allow conversion
>> errors to pass through assuming Latin-1. This will work both ways
>> (older Monticello's would would get multiple characters in some
>> situations but be otherwise unaffected) at the cost of not
>> detecting possibly incorrect encodings in the file (which isn't a
>> terrible choice since the zip file has a CRC).
>>
>> 3) Write two versions of the source, one in snapshot/source one in
>> snapshot.utf8/source. Works both ways, too at the cost of doubling
>> disk space requirements.
>>
>> One thing to keep in mind here is that MCDs may only work with #2
>> unless the servers get updated. I think we should also consult with
>> other MC users to ensure future compatibility. FWIW, my vote is
>> with option #2.
>>
>> Cheers,
>> - Andreas
>
>
> Yes, go UTF-8. This is precisely one of the backwards compatibility
> problems UTF-8 was designed to work around. In fact I had thought we
> did this already, must be an omission in our MC version.
>
> - Bert -

Looking closer into this I understand what you mean and why you didn't
fix it right away. It's a mess.

I started by writing tests for MCStReader and MCStWriter but later
realized it's testing the wrong thing. The stream to file out and in
is created in the test, and the stream class used is actually what we
need to change.

So I tried to change

RWBinaryOrTextStream on: String new.
to
MultiByteBinaryOrTextStream on: String new encoding: 'utf-8'

in MCStWriterTest>>setUp but it's not a drop-in replacement, I get 7
test failures from that change alone.

E.g.:
(RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents
gives
'Hi'
whereas
(MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents
answers
''

Giving up for now.

- Bert -

Nicolas Cellier

Re: [squeak-dev] Re: Zero bytes in Multilingual package

That reminds me http://bugs.squeak.org/view.php?id=5996

There are some other bugs sleeping there like this one:

http://lists.gforge.inria.fr/pipermail/pharo-project/2009-May/008994.html
http://code.google.com/p/pharo/issues/detail?id=830

SystemDictionary>>#condenseChanges use StandardFileStream when it
should better not...

Nicolas

2009/9/3 Bert Freudenberg <[hidden email]>:

>
> On 02.09.2009, at 13:45, Bert Freudenberg wrote:
>
>>
>> On 02.09.2009, at 07:28, Andreas Raab wrote:
>>
>>> Hi Bert -
>>>
>>> I figured it out, but you won't like it The problem comes from a
>>> combination of things going wrong. First, you are right, there are non-Latin
>>> characters in the source. This causes the MCWriter to silently go WideString
>>> when it writes source.st. The resulting WideString gets passed into
>>> ZipArchive which compresses it in chunks of 4k. The funny thing is that when
>>> you pull 4k chunks out of a WideString it reduces the result to ByteString
>>> again if it can fit into Latin1. Meaning that only those definitions that
>>> happen to fall into the same 4k chunk that containing a non-Latin character
>>> get screwed up (excuse me for a second while I walk out and shoot myself).
>>>
>>> Ah, feeling better now. This is why nobody ever noticed it, because it
>>> won't affect all of the stuff and since MC is reasonably smart and doesn't
>>> need the source too often, screw-ups of the source do not get noticed.
>>>
>>> I think there is a solution though, namely having the writer check
>>> whether whether the source is wide and if so use utf-8 instead. The big
>>> issue is backwards compatibility though. I can see three approaches:
>>>
>>> 1) Write a BOM marker in front of any UTF8 encoded source.st file. This
>>> will work for any Monticello version which is aware of the BOM; for the
>>> others YMMV (it depends on whether you're on 3.8 or later - it *should* be
>>> okay for those but I haven't tested).
>>>
>>> 2) Assume all source as UTF8 all the time and allow conversion errors to
>>> pass through assuming Latin-1. This will work both ways (older Monticello's
>>> would would get multiple characters in some situations but be otherwise
>>> unaffected) at the cost of not detecting possibly incorrect encodings in the
>>> file (which isn't a terrible choice since the zip file has a CRC).
>>>
>>> 3) Write two versions of the source, one in snapshot/source one in
>>> snapshot.utf8/source. Works both ways, too at the cost of doubling disk
>>> space requirements.
>>>
>>> One thing to keep in mind here is that MCDs may only work with #2 unless
>>> the servers get updated. I think we should also consult with other MC users
>>> to ensure future compatibility. FWIW, my vote is with option #2.
>>>
>>> Cheers,
>>> - Andreas
>>
>>
>> Yes, go UTF-8. This is precisely one of the backwards compatibility
>> problems UTF-8 was designed to work around. In fact I had thought we did
>> this already, must be an omission in our MC version.
>>
>> - Bert -
>
>
> Looking closer into this I understand what you mean and why you didn't fix
> it right away. It's a mess.
>
> I started by writing tests for MCStReader and MCStWriter but later realized
> it's testing the wrong thing. The stream to file out and in is created in
> the test, and the stream class used is actually what we need to change.
>
> So I tried to change
>
> RWBinaryOrTextStream on: String new.
> to
> MultiByteBinaryOrTextStream on: String new encoding: 'utf-8'
>
> in MCStWriterTest>>setUp but it's not a drop-in replacement, I get 7 test
> failures from that change alone.
>
> E.g.:
> (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents
> gives
> 'Hi'
> whereas
> (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi';
> contents
> answers
> ''
>
> Giving up for now.
>
> - Bert -
>
>
>
>

Bert Freudenberg

Re: [squeak-dev] Re: Zero bytes in Multilingual package

On 03.09.2009, at 22:41, Nicolas Cellier wrote:

> That reminds me http://bugs.squeak.org/view.php?id=5996

Ah, thanks! That's implementing Andreas' suggestion #1 below.

Does someone know if this was integrated in any MC version? The ticket
doesn't say.

- Bert -

> There are some other bugs sleeping there like this one:
>
> http://lists.gforge.inria.fr/pipermail/pharo-project/2009-May/008994.html
> http://code.google.com/p/pharo/issues/detail?id=830
>
> SystemDictionary>>#condenseChanges use StandardFileStream when it
> should better not...
>
> Nicolas
>
> 2009/9/3 Bert Freudenberg <[hidden email]>:
>>
>> On 02.09.2009, at 13:45, Bert Freudenberg wrote:
>>
>>>
>>> On 02.09.2009, at 07:28, Andreas Raab wrote:
>>>
>>>> Hi Bert -
>>>>
>>>> I figured it out, but you won't like it The problem comes from a
>>>> combination of things going wrong. First, you are right, there
>>>> are non-Latin
>>>> characters in the source. This causes the MCWriter to silently go
>>>> WideString
>>>> when it writes source.st. The resulting WideString gets passed into
>>>> ZipArchive which compresses it in chunks of 4k. The funny thing
>>>> is that when
>>>> you pull 4k chunks out of a WideString it reduces the result to
>>>> ByteString
>>>> again if it can fit into Latin1. Meaning that only those
>>>> definitions that
>>>> happen to fall into the same 4k chunk that containing a non-Latin
>>>> character
>>>> get screwed up (excuse me for a second while I walk out and shoot
>>>> myself).
>>>>
>>>> Ah, feeling better now. This is why nobody ever noticed it,
>>>> because it
>>>> won't affect all of the stuff and since MC is reasonably smart
>>>> and doesn't
>>>> need the source too often, screw-ups of the source do not get
>>>> noticed.
>>>>
>>>> I think there is a solution though, namely having the writer check
>>>> whether whether the source is wide and if so use utf-8 instead.
>>>> The big
>>>> issue is backwards compatibility though. I can see three
>>>> approaches:
>>>>
>>>> 1) Write a BOM marker in front of any UTF8 encoded source.st
>>>> file. This
>>>> will work for any Monticello version which is aware of the BOM;
>>>> for the
>>>> others YMMV (it depends on whether you're on 3.8 or later - it
>>>> *should* be
>>>> okay for those but I haven't tested).
>>>>
>>>> 2) Assume all source as UTF8 all the time and allow conversion
>>>> errors to
>>>> pass through assuming Latin-1. This will work both ways (older
>>>> Monticello's
>>>> would would get multiple characters in some situations but be
>>>> otherwise
>>>> unaffected) at the cost of not detecting possibly incorrect
>>>> encodings in the
>>>> file (which isn't a terrible choice since the zip file has a CRC).
>>>>
>>>> 3) Write two versions of the source, one in snapshot/source one in
>>>> snapshot.utf8/source. Works both ways, too at the cost of
>>>> doubling disk
>>>> space requirements.
>>>>
>>>> One thing to keep in mind here is that MCDs may only work with #2
>>>> unless
>>>> the servers get updated. I think we should also consult with
>>>> other MC users
>>>> to ensure future compatibility. FWIW, my vote is with option #2.
>>>>
>>>> Cheers,
>>>> - Andreas
>>>
>>>
>>> Yes, go UTF-8. This is precisely one of the backwards compatibility
>>> problems UTF-8 was designed to work around. In fact I had thought
>>> we did
>>> this already, must be an omission in our MC version.
>>>
>>> - Bert -
>>
>>
>> Looking closer into this I understand what you mean and why you
>> didn't fix
>> it right away. It's a mess.
>>
>> I started by writing tests for MCStReader and MCStWriter but later
>> realized
>> it's testing the wrong thing. The stream to file out and in is
>> created in
>> the test, and the stream class used is actually what we need to
>> change.
>>
>> So I tried to change
>>
>> RWBinaryOrTextStream on: String new.
>> to
>> MultiByteBinaryOrTextStream on: String new encoding: 'utf-8'
>>
>> in MCStWriterTest>>setUp but it's not a drop-in replacement, I get
>> 7 test
>> failures from that change alone.
>>
>> E.g.:
>> (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi';
>> contents
>> gives
>> 'Hi'
>> whereas
>> (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi';
>> contents
>> answers
>> ''
>>
>> Giving up for now.
>>
>> - Bert -
>>
>>
>>
>>
>