The current font does not seem proportional. Can't we have a default
monospaced font? I'm curious to know what other thinks about this. The current font seems fine as far as look is concerned but it's quite traditional to use monospaced fonts when programming. Ian. -- http://mecenia.blogspot.com/ |
I agree, but it seems that finding beautiful open-source proportional
fonts is not easy... see http://hivelogic.com/articles/top-10-programming-fonts Stef |
In reply to this post by Andreas.Raab
On 2009-08-19 08:40, Andreas Raab wrote:
> Ronald Spengler wrote: >> Almost everyone I know who Squeaks uses Polymorph, and I enjoyed the >> heck out of hacking up the Vista style look to have titlebar buttons >> arranged in a fashion after OS X:P >> >> One problem that I kept having was, it seemed that the load order and >> versions of Polymorph & OmniBrowser had to be just right or I'd wind >> up pretty broken. I saw lots of bugginess, although I *was* rolling >> my own theme, so the bugs may have been my own. >> >> Do people feel that Polymorph is generally stable enough to go in a >> base image? > > We have to find out. I was asking earlier but I'll ask again: Which > version would one start with to try Polymorph? From which repository? > I'm in particular interested in finding out more about > extensions/overrides and how to eliminate them to make loading > painless and conflict-free for people. > > Cheers, > - Andreas > > http://www.squeaksource.com/UIEnhancements.html I don't know which version to use, probably latest ? Karl |
In reply to this post by Ian Trudel-2
On 22.08.2009, at 10:42, Ian Trudel wrote: > The current font does not seem proportional. Can't we have a default > monospaced font? I'm curious to know what other thinks about this. The > current font seems fine as far as look is concerned but it's quite > traditional to use monospaced fonts when programming. Elsewhere yes, but not in the Smalltalk tradition. Others are still emulating character block generators, but Smalltalk relied on a bitmapped display pretty much forever. I find Smalltalk code displayed in a character-based terminal emulator style quite ugly. - Bert - |
2009/8/22 Bert Freudenberg <[hidden email]>:
> Elsewhere yes, but not in the Smalltalk tradition. Others are still > emulating character block generators, but Smalltalk relied on a bitmapped > display pretty much forever. I find Smalltalk code displayed in a > character-based terminal emulator style quite ugly. Yes, usually but wouldn't it be interesting anyhow considering that we can have anti-aliased monospaced font? For example, I did look quickly into the list provided by Stéphane Rollandin and an anti-aliased Monofur seems not that bad. Ian. -- http://mecenia.blogspot.com/ |
On 22.08.2009, at 21:39, Ian Trudel wrote:
> 2009/8/22 Bert Freudenberg <[hidden email]>: >> Elsewhere yes, but not in the Smalltalk tradition. Others are still >> emulating character block generators, but Smalltalk relied on a >> bitmapped >> display pretty much forever. I find Smalltalk code displayed in a >> character-based terminal emulator style quite ugly. > > Yes, usually but wouldn't it be interesting anyhow considering that we > can have anti-aliased monospaced font? For example, I did look quickly > into the list provided by Stéphane Rollandin and an anti-aliased > Monofur seems not that bad. Sure, you can use it if you like, I'd just not make a non-proportional the default. I use an anti-aliased monospaced font in my terminal every day. And in my C editor, too. Same for shell scripts or when I code Python. Even for plain-text emails. So it's not that I dislike them in general. But not for Smalltalk :) Smalltalk code looks a lot more like natural language text than most other programming languages, and the use of a proportional font emphasizes that likeness. Besides, if we had a proportional font by default then people would soon start aligning things with spaces, which looks ugly to those using a proportional font. - Bert - |
Well, Bert, I guess that I can survive with the current fonts anyway.
There have been a lot of improvements in this respect lately, which is a good thing. =) Ian. 2009/8/22 Bert Freudenberg <[hidden email]>: > On 22.08.2009, at 21:39, Ian Trudel wrote: > >> 2009/8/22 Bert Freudenberg <[hidden email]>: >>> >>> Elsewhere yes, but not in the Smalltalk tradition. Others are still >>> emulating character block generators, but Smalltalk relied on a bitmapped >>> display pretty much forever. I find Smalltalk code displayed in a >>> character-based terminal emulator style quite ugly. >> >> Yes, usually but wouldn't it be interesting anyhow considering that we >> can have anti-aliased monospaced font? For example, I did look quickly >> into the list provided by Stéphane Rollandin and an anti-aliased >> Monofur seems not that bad. > > > Sure, you can use it if you like, I'd just not make a non-proportional the > default. > > I use an anti-aliased monospaced font in my terminal every day. And in my C > editor, too. Same for shell scripts or when I code Python. Even for > plain-text emails. So it's not that I dislike them in general. > > But not for Smalltalk :) > > Smalltalk code looks a lot more like natural language text than most other > programming languages, and the use of a proportional font emphasizes that > likeness. Besides, if we had a proportional font by default then people > would soon start aligning things with spaces, which looks ugly to those > using a proportional font. > > - Bert - > > > -- http://mecenia.blogspot.com/ |
In reply to this post by Bert Freudenberg
On 22-Aug-09, at 1:33 PM, Bert Freudenberg wrote: > Smalltalk code looks a lot more like natural language text than most > other programming languages, and the use of a proportional font > emphasizes that likeness. Besides, if we had a proportional font by > default then people would soon start aligning things with spaces, > which looks ugly to those using a proportional font. Also, in Smalltalk everyone uses the same text editor. Chances are good that if it looks good in my image, it'll look good in yours. With source code in files, everybody uses their favorite editor, and you get all sorts of holy wars over the width of tab stops, where to wrap lines etc. Of course, the best way to resolve them are with monospaced fonts and spaces for indenting... Colin |
In reply to this post by Andreas.Raab
2009/8/15 Andreas Raab <[hidden email]>:
> Ian Trudel wrote: >> >> I have attached a screenshot on Squeak 3.11.3 (beta) VM with 3.10.2 >> trunk image and latest updates as of today. The font in the title bar >> is too big causing the window to overflow and having the resize border >> controls inside the title bar (see red arrows pointing the problem). > > Yeah, I've seen that too. I don't know what's causing it - dare to dig into > it and try to find out more? Well, I have changed SystemWindow borderWidth from 4 to 3 and it seems to fix the problem. This is unfortunately a temporary fix because I suspect the underlying code is bugus at some point. Ian. -- http://mecenia.blogspot.com/ |
Do people really like those grip morphs for resizing ? They are ugly and
intrusive, restrict the dragging area way to much, and do not even allow a plain extension of the system window horizontally or vertically. I hate them :) What about a preference to get rid of them ? Stef |
I like them because they're retro. I went out of my way to keep them in my Polymorph theme. You can get rid of them trivially with Polymorph, and I think we should talk some more about that.
- Ron
2009/8/23 Stéphane Rollandin <[hidden email]> Do people really like those grip morphs for resizing ? They are ugly and intrusive, restrict the dragging area way to much, and do not even allow a plain extension of the system window horizontally or vertically. I hate them :) |
In reply to this post by Stéphane Rollandin
+1
I find myself often breathing a small sigh of relief when resizing system morphs in older images and then finding it awkward when I return to more recent versions. Ken On Sun, 2009-08-23 at 14:23 +0200, Stéphane Rollandin wrote: > Do people really like those grip morphs for resizing ? They are ugly and > intrusive, restrict the dragging area way to much, and do not even allow > a plain extension of the system window horizontally or vertically. I > hate them :) > > What about a preference to get rid of them ? > > > Stef signature.asc (196 bytes) Download Attachment |
In reply to this post by Bert Freudenberg
On Sun, Aug 23, 2009 at 12:23 AM, Bert Freudenberg <[hidden email]> wrote:
I agree with Bert. Fixed-width fonts are an artifact dating back to typewriters, printers and naive computer displays that weren't sophisticated enough to do proper typesetting. The main reason that you prefer them is because you've been using them for so long. I find nicely typeset Smalltalk code using variable width fonts a pleasure to read. These days I use variable width fonts for all programming languages I use. Gulik. -- http://gulik.pbwiki.com/ |
In reply to this post by Stéphane Rollandin
Indeed, they're nasty! :)
There must have been some reason to introduce them, but I can't think what it was. |
In reply to this post by Bert Freudenberg
On 16.08.2009, at 17:07, Bert Freudenberg wrote:
> On 16.08.2009, at 05:15, Andreas Raab wrote: > >> Ian Trudel wrote: >>> Another issue but with the trunk. I have tried to update code from >>> the >>> trunk into my image but there's a proxy error with source.squeak.org >>> right at this minute, which causes Squeak to freeze for a minute or >>> two trying to reach the server. >> >> It seems to be fine now. Probably just a temporary issue. > > There were three debuggers open in the squeaksource image when I > looked today. The problem comes from the source server trying to > parse Multilingual-ar.38 and Multilingual-sn.38. It contains > sections of code where each character is stored as a long instead of > a byte (that is, three null bytes and the char code). I've copied > the relevant portion out of the .mcz's source.st, see attachment. If > you try to open a changelist browser on that file, you get the same > parse error. > > I have no idea how these widened characters made it into the mzc's > source.st file. In particular since this starts in the middle of a > method (and of a class comment). It extends over a few chunks, then > reverts back to a regular encoding. Strange. > > JapaneseEnvironment class>>isBreakableAt:in: looks suspicious though > I'm not sure if it is actually broken or not. > > I then looked into the trunk's changes file. It has this problem > too, though apparently only in the class comment of > LanguageEnvironment. > > "LanguageEnvironment comment string asByteArray" contains this: > > 116 104 114 101 101 32 99 97 110 32 104 97 118 101 32 40 97 110 100 > 32 100 111 101 115 32 104 97 118 101 41 32 100 105 102 102 101 114 > 101 110 116 32 101 110 99 111 100 105 110 103 115 46 32 32 83 0 0 0 > 111 0 0 0 32 0 0 0 119 0 0 0 101 0 0 0 32 0 0 0 110 0 0 0 101 0 0 0 > 101 0 0 0 100 0 0 0 32 0 0 0 116 0 0 0 111 0 0 0 32 0 0 0 109 0 0 0 > 97 0 0 0 110 0 0 0 97 0 0 0 103 0 0 0 101 0 0 0 32 0 0 0 116 0 0 0 > 104 0 0 0 101 0 0 0 109 0 0 0 32 0 0 0 115 0 0 0 101 0 0 0 112 0 0 0 > 97 0 0 0 114 0 0 0 97 0 0 0 116 0 0 0 101 0 0 0 108 0 0 0 121 0 0 0 > 46 0 0 0 32 0 0 0 32 0 0 0 78 0 0 0 111 0 0 0 116 0 0 0 101 0 0 0 32 > 0 0 0 116 0 0 0 104 0 0 0 97 0 0 0 116 0 0 0 32 0 0 0 116 0 0 0 104 > 0 0 0 101 0 0 0 32 0 0 0 101 0 0 0 110 0 0 0 99 0 0 0 111 0 0 0 100 > 0 0 0 105 0 0 0 110 0 0 0 103 0 0 0 32 0 0 0 105 0 0 0 110 0 0 0 32 > 0 0 0 97 0 0 0 32 0 0 0 102 0 0 0 105 0 0 0 108 0 0 0 101 0 0 0 32 0 > 0 0 99 0 0 0 97 0 0 0 110 0 0 0 32 0 0 0 98 0 0 0 > > Increasingly strange. So I removed the null bytes from the class > comment and published as Multilingual-bf.39. After updating they are > indeed gone from the comment. But looking at the source.st in that > mcz shows the encoding problem again. Bummer. > > Something very strange is going on. I'm out of ideas (short of > debugging into the MCZ save process). > > - Bert - *ping* Problem occurred again today. Will likely happen every time someone touches the Multilingual package? - Bert - BuggyMultilingual-ar.38.st.zip (5K) Download Attachment |
Hi Bert -
I figured it out, but you won't like it The problem comes from a combination of things going wrong. First, you are right, there are non-Latin characters in the source. This causes the MCWriter to silently go WideString when it writes source.st. The resulting WideString gets passed into ZipArchive which compresses it in chunks of 4k. The funny thing is that when you pull 4k chunks out of a WideString it reduces the result to ByteString again if it can fit into Latin1. Meaning that only those definitions that happen to fall into the same 4k chunk that containing a non-Latin character get screwed up (excuse me for a second while I walk out and shoot myself). Ah, feeling better now. This is why nobody ever noticed it, because it won't affect all of the stuff and since MC is reasonably smart and doesn't need the source too often, screw-ups of the source do not get noticed. I think there is a solution though, namely having the writer check whether whether the source is wide and if so use utf-8 instead. The big issue is backwards compatibility though. I can see three approaches: 1) Write a BOM marker in front of any UTF8 encoded source.st file. This will work for any Monticello version which is aware of the BOM; for the others YMMV (it depends on whether you're on 3.8 or later - it *should* be okay for those but I haven't tested). 2) Assume all source as UTF8 all the time and allow conversion errors to pass through assuming Latin-1. This will work both ways (older Monticello's would would get multiple characters in some situations but be otherwise unaffected) at the cost of not detecting possibly incorrect encodings in the file (which isn't a terrible choice since the zip file has a CRC). 3) Write two versions of the source, one in snapshot/source one in snapshot.utf8/source. Works both ways, too at the cost of doubling disk space requirements. One thing to keep in mind here is that MCDs may only work with #2 unless the servers get updated. I think we should also consult with other MC users to ensure future compatibility. FWIW, my vote is with option #2. Cheers, - Andreas Bert Freudenberg wrote: > On 16.08.2009, at 17:07, Bert Freudenberg wrote: > >> On 16.08.2009, at 05:15, Andreas Raab wrote: >> >>> Ian Trudel wrote: >>>> Another issue but with the trunk. I have tried to update code from the >>>> trunk into my image but there's a proxy error with source.squeak.org >>>> right at this minute, which causes Squeak to freeze for a minute or >>>> two trying to reach the server. >>> >>> It seems to be fine now. Probably just a temporary issue. >> >> There were three debuggers open in the squeaksource image when I >> looked today. The problem comes from the source server trying to parse >> Multilingual-ar.38 and Multilingual-sn.38. It contains sections of >> code where each character is stored as a long instead of a byte (that >> is, three null bytes and the char code). I've copied the relevant >> portion out of the .mcz's source.st, see attachment. If you try to >> open a changelist browser on that file, you get the same parse error. >> >> I have no idea how these widened characters made it into the mzc's >> source.st file. In particular since this starts in the middle of a >> method (and of a class comment). It extends over a few chunks, then >> reverts back to a regular encoding. Strange. >> >> JapaneseEnvironment class>>isBreakableAt:in: looks suspicious though >> I'm not sure if it is actually broken or not. >> >> I then looked into the trunk's changes file. It has this problem too, >> though apparently only in the class comment of LanguageEnvironment. >> >> "LanguageEnvironment comment string asByteArray" contains this: >> >> 116 104 114 101 101 32 99 97 110 32 104 97 118 101 32 40 97 110 100 32 >> 100 111 101 115 32 104 97 118 101 41 32 100 105 102 102 101 114 101 >> 110 116 32 101 110 99 111 100 105 110 103 115 46 32 32 83 0 0 0 111 0 >> 0 0 32 0 0 0 119 0 0 0 101 0 0 0 32 0 0 0 110 0 0 0 101 0 0 0 101 0 0 >> 0 100 0 0 0 32 0 0 0 116 0 0 0 111 0 0 0 32 0 0 0 109 0 0 0 97 0 0 0 >> 110 0 0 0 97 0 0 0 103 0 0 0 101 0 0 0 32 0 0 0 116 0 0 0 104 0 0 0 >> 101 0 0 0 109 0 0 0 32 0 0 0 115 0 0 0 101 0 0 0 112 0 0 0 97 0 0 0 >> 114 0 0 0 97 0 0 0 116 0 0 0 101 0 0 0 108 0 0 0 121 0 0 0 46 0 0 0 32 >> 0 0 0 32 0 0 0 78 0 0 0 111 0 0 0 116 0 0 0 101 0 0 0 32 0 0 0 116 0 0 >> 0 104 0 0 0 97 0 0 0 116 0 0 0 32 0 0 0 116 0 0 0 104 0 0 0 101 0 0 0 >> 32 0 0 0 101 0 0 0 110 0 0 0 99 0 0 0 111 0 0 0 100 0 0 0 105 0 0 0 >> 110 0 0 0 103 0 0 0 32 0 0 0 105 0 0 0 110 0 0 0 32 0 0 0 97 0 0 0 32 >> 0 0 0 102 0 0 0 105 0 0 0 108 0 0 0 101 0 0 0 32 0 0 0 99 0 0 0 97 0 0 >> 0 110 0 0 0 32 0 0 0 98 0 0 0 >> >> Increasingly strange. So I removed the null bytes from the class >> comment and published as Multilingual-bf.39. After updating they are >> indeed gone from the comment. But looking at the source.st in that mcz >> shows the encoding problem again. Bummer. >> >> Something very strange is going on. I'm out of ideas (short of >> debugging into the MCZ save process). >> >> - Bert - > > > *ping* > > Problem occurred again today. Will likely happen every time someone > touches the Multilingual package? > > - Bert - > > > ------------------------------------------------------------------------ > > |
On 02.09.2009, at 07:28, Andreas Raab wrote: > Hi Bert - > > I figured it out, but you won't like it The problem comes from a > combination of things going wrong. First, you are right, there are > non-Latin characters in the source. This causes the MCWriter to > silently go WideString when it writes source.st. The resulting > WideString gets passed into ZipArchive which compresses it in chunks > of 4k. The funny thing is that when you pull 4k chunks out of a > WideString it reduces the result to ByteString again if it can fit > into Latin1. Meaning that only those definitions that happen to fall > into the same 4k chunk that containing a non-Latin character get > screwed up (excuse me for a second while I walk out and shoot myself). > > Ah, feeling better now. This is why nobody ever noticed it, because > it won't affect all of the stuff and since MC is reasonably smart > and doesn't need the source too often, screw-ups of the source do > not get noticed. > > I think there is a solution though, namely having the writer check > whether whether the source is wide and if so use utf-8 instead. The > big issue is backwards compatibility though. I can see three > approaches: > > 1) Write a BOM marker in front of any UTF8 encoded source.st file. > This will work for any Monticello version which is aware of the BOM; > for the others YMMV (it depends on whether you're on 3.8 or later - > it *should* be okay for those but I haven't tested). > > 2) Assume all source as UTF8 all the time and allow conversion > errors to pass through assuming Latin-1. This will work both ways > (older Monticello's would would get multiple characters in some > situations but be otherwise unaffected) at the cost of not detecting > possibly incorrect encodings in the file (which isn't a terrible > choice since the zip file has a CRC). > > 3) Write two versions of the source, one in snapshot/source one in > snapshot.utf8/source. Works both ways, too at the cost of doubling > disk space requirements. > > One thing to keep in mind here is that MCDs may only work with #2 > unless the servers get updated. I think we should also consult with > other MC users to ensure future compatibility. FWIW, my vote is with > option #2. > > Cheers, > - Andreas Yes, go UTF-8. This is precisely one of the backwards compatibility problems UTF-8 was designed to work around. In fact I had thought we did this already, must be an omission in our MC version. - Bert - |
On 02.09.2009, at 13:45, Bert Freudenberg wrote: > > On 02.09.2009, at 07:28, Andreas Raab wrote: > >> Hi Bert - >> >> I figured it out, but you won't like it The problem comes from a >> combination of things going wrong. First, you are right, there are >> non-Latin characters in the source. This causes the MCWriter to >> silently go WideString when it writes source.st. The resulting >> WideString gets passed into ZipArchive which compresses it in >> chunks of 4k. The funny thing is that when you pull 4k chunks out >> of a WideString it reduces the result to ByteString again if it can >> fit into Latin1. Meaning that only those definitions that happen to >> fall into the same 4k chunk that containing a non-Latin character >> get screwed up (excuse me for a second while I walk out and shoot >> myself). >> >> Ah, feeling better now. This is why nobody ever noticed it, because >> it won't affect all of the stuff and since MC is reasonably smart >> and doesn't need the source too often, screw-ups of the source do >> not get noticed. >> >> I think there is a solution though, namely having the writer check >> whether whether the source is wide and if so use utf-8 instead. The >> big issue is backwards compatibility though. I can see three >> approaches: >> >> 1) Write a BOM marker in front of any UTF8 encoded source.st file. >> This will work for any Monticello version which is aware of the >> BOM; for the others YMMV (it depends on whether you're on 3.8 or >> later - it *should* be okay for those but I haven't tested). >> >> 2) Assume all source as UTF8 all the time and allow conversion >> errors to pass through assuming Latin-1. This will work both ways >> (older Monticello's would would get multiple characters in some >> situations but be otherwise unaffected) at the cost of not >> detecting possibly incorrect encodings in the file (which isn't a >> terrible choice since the zip file has a CRC). >> >> 3) Write two versions of the source, one in snapshot/source one in >> snapshot.utf8/source. Works both ways, too at the cost of doubling >> disk space requirements. >> >> One thing to keep in mind here is that MCDs may only work with #2 >> unless the servers get updated. I think we should also consult with >> other MC users to ensure future compatibility. FWIW, my vote is >> with option #2. >> >> Cheers, >> - Andreas > > > Yes, go UTF-8. This is precisely one of the backwards compatibility > problems UTF-8 was designed to work around. In fact I had thought we > did this already, must be an omission in our MC version. > > - Bert - Looking closer into this I understand what you mean and why you didn't fix it right away. It's a mess. I started by writing tests for MCStReader and MCStWriter but later realized it's testing the wrong thing. The stream to file out and in is created in the test, and the stream class used is actually what we need to change. So I tried to change RWBinaryOrTextStream on: String new. to MultiByteBinaryOrTextStream on: String new encoding: 'utf-8' in MCStWriterTest>>setUp but it's not a drop-in replacement, I get 7 test failures from that change alone. E.g.: (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents gives 'Hi' whereas (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents answers '' Giving up for now. - Bert - |
That reminds me http://bugs.squeak.org/view.php?id=5996
There are some other bugs sleeping there like this one: http://lists.gforge.inria.fr/pipermail/pharo-project/2009-May/008994.html http://code.google.com/p/pharo/issues/detail?id=830 SystemDictionary>>#condenseChanges use StandardFileStream when it should better not... Nicolas 2009/9/3 Bert Freudenberg <[hidden email]>: > > On 02.09.2009, at 13:45, Bert Freudenberg wrote: > >> >> On 02.09.2009, at 07:28, Andreas Raab wrote: >> >>> Hi Bert - >>> >>> I figured it out, but you won't like it The problem comes from a >>> combination of things going wrong. First, you are right, there are non-Latin >>> characters in the source. This causes the MCWriter to silently go WideString >>> when it writes source.st. The resulting WideString gets passed into >>> ZipArchive which compresses it in chunks of 4k. The funny thing is that when >>> you pull 4k chunks out of a WideString it reduces the result to ByteString >>> again if it can fit into Latin1. Meaning that only those definitions that >>> happen to fall into the same 4k chunk that containing a non-Latin character >>> get screwed up (excuse me for a second while I walk out and shoot myself). >>> >>> Ah, feeling better now. This is why nobody ever noticed it, because it >>> won't affect all of the stuff and since MC is reasonably smart and doesn't >>> need the source too often, screw-ups of the source do not get noticed. >>> >>> I think there is a solution though, namely having the writer check >>> whether whether the source is wide and if so use utf-8 instead. The big >>> issue is backwards compatibility though. I can see three approaches: >>> >>> 1) Write a BOM marker in front of any UTF8 encoded source.st file. This >>> will work for any Monticello version which is aware of the BOM; for the >>> others YMMV (it depends on whether you're on 3.8 or later - it *should* be >>> okay for those but I haven't tested). >>> >>> 2) Assume all source as UTF8 all the time and allow conversion errors to >>> pass through assuming Latin-1. This will work both ways (older Monticello's >>> would would get multiple characters in some situations but be otherwise >>> unaffected) at the cost of not detecting possibly incorrect encodings in the >>> file (which isn't a terrible choice since the zip file has a CRC). >>> >>> 3) Write two versions of the source, one in snapshot/source one in >>> snapshot.utf8/source. Works both ways, too at the cost of doubling disk >>> space requirements. >>> >>> One thing to keep in mind here is that MCDs may only work with #2 unless >>> the servers get updated. I think we should also consult with other MC users >>> to ensure future compatibility. FWIW, my vote is with option #2. >>> >>> Cheers, >>> - Andreas >> >> >> Yes, go UTF-8. This is precisely one of the backwards compatibility >> problems UTF-8 was designed to work around. In fact I had thought we did >> this already, must be an omission in our MC version. >> >> - Bert - > > > Looking closer into this I understand what you mean and why you didn't fix > it right away. It's a mess. > > I started by writing tests for MCStReader and MCStWriter but later realized > it's testing the wrong thing. The stream to file out and in is created in > the test, and the stream class used is actually what we need to change. > > So I tried to change > > RWBinaryOrTextStream on: String new. > to > MultiByteBinaryOrTextStream on: String new encoding: 'utf-8' > > in MCStWriterTest>>setUp but it's not a drop-in replacement, I get 7 test > failures from that change alone. > > E.g.: > (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; contents > gives > 'Hi' > whereas > (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi'; > contents > answers > '' > > Giving up for now. > > - Bert - > > > > |
On 03.09.2009, at 22:41, Nicolas Cellier wrote:
> That reminds me http://bugs.squeak.org/view.php?id=5996 Ah, thanks! That's implementing Andreas' suggestion #1 below. Does someone know if this was integrated in any MC version? The ticket doesn't say. - Bert - > There are some other bugs sleeping there like this one: > > http://lists.gforge.inria.fr/pipermail/pharo-project/2009-May/008994.html > http://code.google.com/p/pharo/issues/detail?id=830 > > SystemDictionary>>#condenseChanges use StandardFileStream when it > should better not... > > Nicolas > > 2009/9/3 Bert Freudenberg <[hidden email]>: >> >> On 02.09.2009, at 13:45, Bert Freudenberg wrote: >> >>> >>> On 02.09.2009, at 07:28, Andreas Raab wrote: >>> >>>> Hi Bert - >>>> >>>> I figured it out, but you won't like it The problem comes from a >>>> combination of things going wrong. First, you are right, there >>>> are non-Latin >>>> characters in the source. This causes the MCWriter to silently go >>>> WideString >>>> when it writes source.st. The resulting WideString gets passed into >>>> ZipArchive which compresses it in chunks of 4k. The funny thing >>>> is that when >>>> you pull 4k chunks out of a WideString it reduces the result to >>>> ByteString >>>> again if it can fit into Latin1. Meaning that only those >>>> definitions that >>>> happen to fall into the same 4k chunk that containing a non-Latin >>>> character >>>> get screwed up (excuse me for a second while I walk out and shoot >>>> myself). >>>> >>>> Ah, feeling better now. This is why nobody ever noticed it, >>>> because it >>>> won't affect all of the stuff and since MC is reasonably smart >>>> and doesn't >>>> need the source too often, screw-ups of the source do not get >>>> noticed. >>>> >>>> I think there is a solution though, namely having the writer check >>>> whether whether the source is wide and if so use utf-8 instead. >>>> The big >>>> issue is backwards compatibility though. I can see three >>>> approaches: >>>> >>>> 1) Write a BOM marker in front of any UTF8 encoded source.st >>>> file. This >>>> will work for any Monticello version which is aware of the BOM; >>>> for the >>>> others YMMV (it depends on whether you're on 3.8 or later - it >>>> *should* be >>>> okay for those but I haven't tested). >>>> >>>> 2) Assume all source as UTF8 all the time and allow conversion >>>> errors to >>>> pass through assuming Latin-1. This will work both ways (older >>>> Monticello's >>>> would would get multiple characters in some situations but be >>>> otherwise >>>> unaffected) at the cost of not detecting possibly incorrect >>>> encodings in the >>>> file (which isn't a terrible choice since the zip file has a CRC). >>>> >>>> 3) Write two versions of the source, one in snapshot/source one in >>>> snapshot.utf8/source. Works both ways, too at the cost of >>>> doubling disk >>>> space requirements. >>>> >>>> One thing to keep in mind here is that MCDs may only work with #2 >>>> unless >>>> the servers get updated. I think we should also consult with >>>> other MC users >>>> to ensure future compatibility. FWIW, my vote is with option #2. >>>> >>>> Cheers, >>>> - Andreas >>> >>> >>> Yes, go UTF-8. This is precisely one of the backwards compatibility >>> problems UTF-8 was designed to work around. In fact I had thought >>> we did >>> this already, must be an omission in our MC version. >>> >>> - Bert - >> >> >> Looking closer into this I understand what you mean and why you >> didn't fix >> it right away. It's a mess. >> >> I started by writing tests for MCStReader and MCStWriter but later >> realized >> it's testing the wrong thing. The stream to file out and in is >> created in >> the test, and the stream class used is actually what we need to >> change. >> >> So I tried to change >> >> RWBinaryOrTextStream on: String new. >> to >> MultiByteBinaryOrTextStream on: String new encoding: 'utf-8' >> >> in MCStWriterTest>>setUp but it's not a drop-in replacement, I get >> 7 test >> failures from that change alone. >> >> E.g.: >> (RWBinaryOrTextStream on: String new) nextPutAll: 'Hi'; >> contents >> gives >> 'Hi' >> whereas >> (MultiByteBinaryOrTextStream on: String new) nextPutAll: 'Hi'; >> contents >> answers >> '' >> >> Giving up for now. >> >> - Bert - >> >> >> >> > |
Free forum by Nabble | Edit this page |