Ever since I started using Squeak with Squeak 3.6
(I only use Linux, currently Ubuntu 9.10) I have always had trouble with line separators. I am checking out Squeak 4.1 and things have changed though I am not sure if they are any better. If I have FileStream>>contreteStream return MultiByteFileStream (the default) then when I fileOut code the .st file consists of a single line so that I cannot use utilities such as wc and vi on these files. If I modify concreteStream to return CrLfFileStream then the problem goes away but a host other of problems occur. 1) It used to be that if you looked at the versions of a method each version would be written on a single line (I believe linefeeds were used instead of carriage returns). This no longer happens. Instead every line of a version is separated by a blank line. An improvement I suppose; it is more readable. (I believe what is happening is that end line separator contains a line feed and a carriage return and both are treated as line separators.) 2) It used to be that if you wrote out a file (with concreteStream returning CrLfFileStream) then when you filed in the file using: (FileStream oldFileNamed: 'filename.st') fileIn you got carriage returns for line separators. Now you get linefeeds. This causes problems with Menu labels as in: aMenu labels: 'find class... (f) recent classes... (r) browse all browse ... because carriage returns are expected. Consequentially your Menu has a single entry. :-( I expect there are other problems as well. Now, even if you set concreteStream back to the default the same problem occurs. This works in 3.10.2 so we have gone backwards here. 3) If I cut and paste from a different 4.1 image I lose my line separators altogether. this is unchanged from before. 4?) Finally, I used to have problems loading in .mcz files. So far in 4.1 I haven't had a problem but I have only loaded 2 .mcz files. In prior versions of Squeak loading a .mcz file would sometimes succeed. Are other Linux users having similar experiences? Frankly, I think only one of Cr and Lf should be accepted in Smalltalk code, the other generating a syntax error except inside strings and inside strings it should have to be escaped somehow. If Cr is the character chosen for line separators then it should be impossible to write: returnAString ^'a two line string where the line separator is a linefeed' The fact that the above code is legal leads to subtle errors such as those above. A blatant compiler error is preferred. One final curiousity: Why is the following method written as it is (in both 4.1 and 3.10.2)? Method CrLfFileStream>>new ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself. I presume it is correct but a comment explaining why wouldn't hurt. Regards, Ralph Boland -- Quantum Theory cannot save us from the tyranny of a deterministic universe but it does give God something to do |
Hi Ralph,
Here is a little highlight on the CR/LF strategy. 1) CR+LF (windows) or CR only (mac) or LF only (unix) exists in the external world whether we like it or not. 2) Given 1), Smalltalk main strategy - and in particular Squeak - has always been this one: 2.a) convert every input from external-world to CR, 2.b) convert every output to external world to platform-specific preference. Historically, this was implemented in CrLfFileStream. But it has been superseded by MultiByteFileStream. If you inspect it's API, you'll see it provides both automated platform guessed or programmable lineEndConvention. 3) If all tools used to develop applications (Smalltalk/ruby/python/perl/javascript/.Net/etc...) did provide APIS making these applications insensitive to line end conventions, then we would be in a better world and would have to care less about line end conventions. This is easier than to impose a uniform line-end convention to the world, because it enables a smooth transition. 4) Until strategy 2) is perfect and absolutely no LF-in-image CR-out-image leakage occurs, then Squeak/Pharo are bad citizens, They are sensitive to line-end-conventions and break chains made of multiple heterogeneous tools. 5) I observed, you observed, everyone observed recurrent deficiencies in either 2.a) or 2.b) or both... 6) So my logical conclusion is to propose a complementary strategy: 6.a) Let Smalltalk algorithms work pan-line-ending-conventions. Observe how any decent file editor (notepad vim etc...) works transparently whatever line-end-conventions. IMO, it's a shame that the so-called reference Object-Oriented language cannot deal with mixed line-end-conventions. 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order to reach goal 3). This is two-fold: 7-a) let display CR-LF or LF or CR as a single line break (changes in CharacterScanner and co) 7-b) let Stream and String handle CR-LF or LF or CR delimited lines. Note that I cared to provide decently optimized implementations (often more optimized than previous CR-only algorithms). 8) Of course, in order to profit by new 7-b) facilities, there's a little change of API. We need to replace some old-fashioned idioms (myStream upTo: Character cr) with modernized pan-line-ending-wise (myStream nextLine). 9) I did not apply these changes very deeply to Squeak nor Pharo, but at a few places here and there... So there is still a bit of work to reach goal 3) (parsing the menus specs is just an example of it) 10) This 6.a) strategy could eventually replace 2.a), but it does not have to, and we didn't went this way... So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always has been with this respect. 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain applications are line-ending sensitive, then WE must care of producing the expected convention. Conclusions So my opinion is that 6.a) did not make our life worse. On the contrary, Squeak and Pharo are moving toward what I would call a better behaved I.T. world citizen. They now offers an API to handle line-endings transparently inside the image. This is at the price of not-so-much complexity, and no noticeable slow down. But now we have to learn new idioms (and I don't see nextLine as more complex than upTo: Character cr)... ... and apply it were due (like parsing menu specs) to obtain a homogeneous behaviour- goal 3) We still have to care of 2.b), and a bit less of 2.a) once 4.b) will be achieved. And maybe in the future, we will be able to get rid of 2.b) too when all applications will be line-ending-insensitive. In the meantime, nothing prevents us to improve 2.a) and 2.b) to avoid LF leaking in or CR leaking out the image. But untill 2) strategy is perfect, then we just act as one of the bad world citizen perpetuating line-ending problems. IMO reaching goal 3) is easier than reaching goal 2). That's only my personal opinion, but it's based on pragmatic years of using bad line-ending behaved apps and trying to program a bit better ones. There are alternate possible strategies, like in CUIS: display a boxed [LF] explicitely in text editors so as to provide visual control to programmers... Not sure I sold my POV. It's quite opposite to your proposition. You don't have to adhere, but at least you have some rationale. Cheers Nicolas 2010/6/11 Ralph Boland <[hidden email]>: > Ever since I started using Squeak with Squeak 3.6 > (I only use Linux, currently Ubuntu 9.10) > I have always had trouble with line separators. > I am checking out Squeak 4.1 and things have > changed though I am not sure if they are any better. > > If I have FileStream>>contreteStream return MultiByteFileStream > (the default) then when I fileOut code the .st file consists of a single > line so that I cannot use utilities such as wc and vi on these files. > If I modify concreteStream to return CrLfFileStream then the problem > goes away but a host other of problems occur. > > 1) It used to be that if you looked at the versions of a method > each version would be written on a single line (I believe linefeeds > were used instead of carriage returns). > This no longer happens. Instead every line of a version is separated > by a blank line. An improvement I suppose; it is more readable. > (I believe what is happening is that end line separator contains > a line feed and a carriage return and both are treated as line > separators.) > > 2) It used to be that if you wrote out a file (with concreteStream returning > CrLfFileStream) then when you filed in the file using: > (FileStream oldFileNamed: 'filename.st') fileIn > you got carriage returns for line separators. > Now you get linefeeds. > This causes problems with Menu labels as in: > > aMenu labels: > 'find class... (f) > recent classes... (r) > browse all > browse > ... > > because carriage returns are expected. > Consequentially your Menu has a single entry. :-( > I expect there are other problems as well. > Now, even if you set concreteStream back to the default the same > problem occurs. This works in 3.10.2 so we have gone backwards here. > > 3) If I cut and paste from a different 4.1 image I lose > my line separators altogether. this is unchanged from before. > > 4?) Finally, I used to have problems loading in .mcz files. So far in 4.1 I > haven't had a problem but I have only loaded 2 .mcz files. > In prior versions of Squeak loading a .mcz file would sometimes succeed. > > Are other Linux users having similar experiences? > > Frankly, I think only one of Cr and Lf should be accepted in > Smalltalk code, the > other generating a syntax error except inside strings and inside > strings it should > have to be escaped somehow. > > If Cr is the character chosen for line separators then it should be > impossible to > write: > > returnAString > ^'a two line string where > the line separator is a linefeed' > > The fact that the above code is legal leads to subtle errors such > as those above. A blatant compiler error is preferred. > > > One final curiousity: Why is the following method written as it is > (in both 4.1 and 3.10.2)? > > Method CrLfFileStream>>new > > ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself. > > > I presume it is correct but a comment explaining why wouldn't hurt. > > Regards, > > Ralph Boland > > > > -- > Quantum Theory cannot save us from the tyranny of a deterministic universe > but it does give God something to do > > |
2010/6/11 Nicolas Cellier <[hidden email]>:
> Hi Ralph, > > Here is a little highlight on the CR/LF strategy. > > 1) CR+LF (windows) or CR only (mac) or LF only (unix) exists in the > external world whether we like it or not. > > 2) Given 1), Smalltalk main strategy - and in particular Squeak - has > always been this one: > 2.a) convert every input from external-world to CR, > 2.b) convert every output to external world to platform-specific preference. > > Historically, this was implemented in CrLfFileStream. > But it has been superseded by MultiByteFileStream. > If you inspect it's API, you'll see it provides both automated > platform guessed or programmable lineEndConvention. > > 3) If all tools used to develop applications > (Smalltalk/ruby/python/perl/javascript/.Net/etc...) > did provide APIS making these applications insensitive to line end > conventions, > then we would be in a better world and would have to care less about > line end conventions. > This is easier than to impose a uniform line-end convention to the > world, because it enables a smooth transition. > > 4) Until strategy 2) is perfect and absolutely no LF-in-image > CR-out-image leakage occurs, then Squeak/Pharo are bad citizens, > They are sensitive to line-end-conventions and break chains made of > multiple heterogeneous tools. > > 5) I observed, you observed, everyone observed recurrent deficiencies > in either 2.a) or 2.b) or both... > > 6) So my logical conclusion is to propose a complementary strategy: > 6.a) Let Smalltalk algorithms work pan-line-ending-conventions. > > Observe how any decent file editor (notepad vim etc...) works > transparently whatever line-end-conventions. > IMO, it's a shame that the so-called reference Object-Oriented > language cannot deal with mixed line-end-conventions. > > 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order > to reach goal 3). This is two-fold: > 7-a) let display CR-LF or LF or CR as a single line break (changes in > CharacterScanner and co) > 7-b) let Stream and String handle CR-LF or LF or CR delimited lines. > > Note that I cared to provide decently optimized implementations (often > more optimized than previous CR-only algorithms). > > 8) Of course, in order to profit by new 7-b) facilities, there's a > little change of API. > We need to replace some old-fashioned idioms (myStream upTo: Character > cr) with modernized pan-line-ending-wise (myStream nextLine). > > 9) I did not apply these changes very deeply to Squeak nor Pharo, but > at a few places here and there... > So there is still a bit of work to reach goal 3) > (parsing the menus specs is just an example of it) > > 10) This 6.a) strategy could eventually replace 2.a), but it does not > have to, and we didn't went this way... > So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always > has been with this respect. > > 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain > applications are line-ending sensitive, then WE must care of producing > the expected convention. > > Conclusions > > So my opinion is that 6.a) did not make our life worse. > On the contrary, Squeak and Pharo are moving toward what I would call > a better behaved I.T. world citizen. > They now offers an API to handle line-endings transparently inside the image. > This is at the price of not-so-much complexity, and no noticeable slow down. > But now we have to learn new idioms (and I don't see nextLine as more > complex than upTo: Character cr)... > ... and apply it were due (like parsing menu specs) to obtain a > homogeneous behaviour- goal 3) > > We still have to care of 2.b), and a bit less of 2.a) once 4.b) will > be achieved. Opps, once 3) will be achieved > And maybe in the future, we will be able to get rid of 2.b) too when > all applications will be line-ending-insensitive. > In the meantime, nothing prevents us to improve 2.a) and 2.b) to > avoid LF leaking in or CR leaking out the image. > But untill 2) strategy is perfect, then we just act as one of the bad > world citizen perpetuating line-ending problems. > IMO reaching goal 3) is easier than reaching goal 2). > > That's only my personal opinion, but it's based on pragmatic years of > using bad line-ending behaved apps and trying to program a bit better > ones. > > There are alternate possible strategies, like in CUIS: display a boxed > [LF] explicitely in text editors so as to provide visual control to > programmers... > > Not sure I sold my POV. It's quite opposite to your proposition. > You don't have to adhere, but at least you have some rationale. > > Cheers > > Nicolas > > 2010/6/11 Ralph Boland <[hidden email]>: >> Ever since I started using Squeak with Squeak 3.6 >> (I only use Linux, currently Ubuntu 9.10) >> I have always had trouble with line separators. >> I am checking out Squeak 4.1 and things have >> changed though I am not sure if they are any better. >> >> If I have FileStream>>contreteStream return MultiByteFileStream >> (the default) then when I fileOut code the .st file consists of a single >> line so that I cannot use utilities such as wc and vi on these files. >> If I modify concreteStream to return CrLfFileStream then the problem >> goes away but a host other of problems occur. >> >> 1) It used to be that if you looked at the versions of a method >> each version would be written on a single line (I believe linefeeds >> were used instead of carriage returns). >> This no longer happens. Instead every line of a version is separated >> by a blank line. An improvement I suppose; it is more readable. >> (I believe what is happening is that end line separator contains >> a line feed and a carriage return and both are treated as line >> separators.) >> >> 2) It used to be that if you wrote out a file (with concreteStream returning >> CrLfFileStream) then when you filed in the file using: >> (FileStream oldFileNamed: 'filename.st') fileIn >> you got carriage returns for line separators. >> Now you get linefeeds. >> This causes problems with Menu labels as in: >> >> aMenu labels: >> 'find class... (f) >> recent classes... (r) >> browse all >> browse >> ... >> >> because carriage returns are expected. >> Consequentially your Menu has a single entry. :-( >> I expect there are other problems as well. >> Now, even if you set concreteStream back to the default the same >> problem occurs. This works in 3.10.2 so we have gone backwards here. >> >> 3) If I cut and paste from a different 4.1 image I lose >> my line separators altogether. this is unchanged from before. >> >> 4?) Finally, I used to have problems loading in .mcz files. So far in 4.1 I >> haven't had a problem but I have only loaded 2 .mcz files. >> In prior versions of Squeak loading a .mcz file would sometimes succeed. >> >> Are other Linux users having similar experiences? >> >> Frankly, I think only one of Cr and Lf should be accepted in >> Smalltalk code, the >> other generating a syntax error except inside strings and inside >> strings it should >> have to be escaped somehow. >> >> If Cr is the character chosen for line separators then it should be >> impossible to >> write: >> >> returnAString >> ^'a two line string where >> the line separator is a linefeed' >> >> The fact that the above code is legal leads to subtle errors such >> as those above. A blatant compiler error is preferred. >> >> >> One final curiousity: Why is the following method written as it is >> (in both 4.1 and 3.10.2)? >> >> Method CrLfFileStream>>new >> >> ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself. >> >> >> I presume it is correct but a comment explaining why wouldn't hurt. >> >> Regards, >> >> Ralph Boland >> >> >> >> -- >> Quantum Theory cannot save us from the tyranny of a deterministic universe >> but it does give God something to do >> >> > |
So, I just commited a few trunk changes toward the goal of making
Squeak immune to line endings. There might be a bit more idioms to fix, but that's certainly not that difficult. See also a shorter manifest at http://code.google.com/p/pharo/issues/detail?id=2538. That does not prevents us to continue improving the conversion strategy. I should better stop speaking alone now ;) Nicolas 2010/6/11 Nicolas Cellier <[hidden email]>: > 2010/6/11 Nicolas Cellier <[hidden email]>: >> Hi Ralph, >> >> Here is a little highlight on the CR/LF strategy. >> >> 1) CR+LF (windows) or CR only (mac) or LF only (unix) exists in the >> external world whether we like it or not. >> >> 2) Given 1), Smalltalk main strategy - and in particular Squeak - has >> always been this one: >> 2.a) convert every input from external-world to CR, >> 2.b) convert every output to external world to platform-specific preference. >> >> Historically, this was implemented in CrLfFileStream. >> But it has been superseded by MultiByteFileStream. >> If you inspect it's API, you'll see it provides both automated >> platform guessed or programmable lineEndConvention. >> >> 3) If all tools used to develop applications >> (Smalltalk/ruby/python/perl/javascript/.Net/etc...) >> did provide APIS making these applications insensitive to line end >> conventions, >> then we would be in a better world and would have to care less about >> line end conventions. >> This is easier than to impose a uniform line-end convention to the >> world, because it enables a smooth transition. >> >> 4) Until strategy 2) is perfect and absolutely no LF-in-image >> CR-out-image leakage occurs, then Squeak/Pharo are bad citizens, >> They are sensitive to line-end-conventions and break chains made of >> multiple heterogeneous tools. >> >> 5) I observed, you observed, everyone observed recurrent deficiencies >> in either 2.a) or 2.b) or both... >> >> 6) So my logical conclusion is to propose a complementary strategy: >> 6.a) Let Smalltalk algorithms work pan-line-ending-conventions. >> >> Observe how any decent file editor (notepad vim etc...) works >> transparently whatever line-end-conventions. >> IMO, it's a shame that the so-called reference Object-Oriented >> language cannot deal with mixed line-end-conventions. >> >> 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order >> to reach goal 3). This is two-fold: >> 7-a) let display CR-LF or LF or CR as a single line break (changes in >> CharacterScanner and co) >> 7-b) let Stream and String handle CR-LF or LF or CR delimited lines. >> >> Note that I cared to provide decently optimized implementations (often >> more optimized than previous CR-only algorithms). >> >> 8) Of course, in order to profit by new 7-b) facilities, there's a >> little change of API. >> We need to replace some old-fashioned idioms (myStream upTo: Character >> cr) with modernized pan-line-ending-wise (myStream nextLine). >> >> 9) I did not apply these changes very deeply to Squeak nor Pharo, but >> at a few places here and there... >> So there is still a bit of work to reach goal 3) >> (parsing the menus specs is just an example of it) >> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not >> have to, and we didn't went this way... >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always >> has been with this respect. >> >> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain >> applications are line-ending sensitive, then WE must care of producing >> the expected convention. >> >> Conclusions >> >> So my opinion is that 6.a) did not make our life worse. >> On the contrary, Squeak and Pharo are moving toward what I would call >> a better behaved I.T. world citizen. >> They now offers an API to handle line-endings transparently inside the image. >> This is at the price of not-so-much complexity, and no noticeable slow down. >> But now we have to learn new idioms (and I don't see nextLine as more >> complex than upTo: Character cr)... >> ... and apply it were due (like parsing menu specs) to obtain a >> homogeneous behaviour- goal 3) >> >> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will >> be achieved. > > Opps, once 3) will be achieved > >> And maybe in the future, we will be able to get rid of 2.b) too when >> all applications will be line-ending-insensitive. >> In the meantime, nothing prevents us to improve 2.a) and 2.b) to >> avoid LF leaking in or CR leaking out the image. >> But untill 2) strategy is perfect, then we just act as one of the bad >> world citizen perpetuating line-ending problems. >> IMO reaching goal 3) is easier than reaching goal 2). >> >> That's only my personal opinion, but it's based on pragmatic years of >> using bad line-ending behaved apps and trying to program a bit better >> ones. >> >> There are alternate possible strategies, like in CUIS: display a boxed >> [LF] explicitely in text editors so as to provide visual control to >> programmers... >> >> Not sure I sold my POV. It's quite opposite to your proposition. >> You don't have to adhere, but at least you have some rationale. >> >> Cheers >> >> Nicolas >> >> 2010/6/11 Ralph Boland <[hidden email]>: >>> Ever since I started using Squeak with Squeak 3.6 >>> (I only use Linux, currently Ubuntu 9.10) >>> I have always had trouble with line separators. >>> I am checking out Squeak 4.1 and things have >>> changed though I am not sure if they are any better. >>> >>> If I have FileStream>>contreteStream return MultiByteFileStream >>> (the default) then when I fileOut code the .st file consists of a single >>> line so that I cannot use utilities such as wc and vi on these files. >>> If I modify concreteStream to return CrLfFileStream then the problem >>> goes away but a host other of problems occur. >>> >>> 1) It used to be that if you looked at the versions of a method >>> each version would be written on a single line (I believe linefeeds >>> were used instead of carriage returns). >>> This no longer happens. Instead every line of a version is separated >>> by a blank line. An improvement I suppose; it is more readable. >>> (I believe what is happening is that end line separator contains >>> a line feed and a carriage return and both are treated as line >>> separators.) >>> >>> 2) It used to be that if you wrote out a file (with concreteStream returning >>> CrLfFileStream) then when you filed in the file using: >>> (FileStream oldFileNamed: 'filename.st') fileIn >>> you got carriage returns for line separators. >>> Now you get linefeeds. >>> This causes problems with Menu labels as in: >>> >>> aMenu labels: >>> 'find class... (f) >>> recent classes... (r) >>> browse all >>> browse >>> ... >>> >>> because carriage returns are expected. >>> Consequentially your Menu has a single entry. :-( >>> I expect there are other problems as well. >>> Now, even if you set concreteStream back to the default the same >>> problem occurs. This works in 3.10.2 so we have gone backwards here. >>> >>> 3) If I cut and paste from a different 4.1 image I lose >>> my line separators altogether. this is unchanged from before. >>> >>> 4?) Finally, I used to have problems loading in .mcz files. So far in 4.1 I >>> haven't had a problem but I have only loaded 2 .mcz files. >>> In prior versions of Squeak loading a .mcz file would sometimes succeed. >>> >>> Are other Linux users having similar experiences? >>> >>> Frankly, I think only one of Cr and Lf should be accepted in >>> Smalltalk code, the >>> other generating a syntax error except inside strings and inside >>> strings it should >>> have to be escaped somehow. >>> >>> If Cr is the character chosen for line separators then it should be >>> impossible to >>> write: >>> >>> returnAString >>> ^'a two line string where >>> the line separator is a linefeed' >>> >>> The fact that the above code is legal leads to subtle errors such >>> as those above. A blatant compiler error is preferred. >>> >>> >>> One final curiousity: Why is the following method written as it is >>> (in both 4.1 and 3.10.2)? >>> >>> Method CrLfFileStream>>new >>> >>> ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself. >>> >>> >>> I presume it is correct but a comment explaining why wouldn't hurt. >>> >>> Regards, >>> >>> Ralph Boland >>> >>> >>> >>> -- >>> Quantum Theory cannot save us from the tyranny of a deterministic universe >>> but it does give God something to do >>> >>> >> > |
In reply to this post by Ralph Boland
...
> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not > >> have to, and we didn't went this way... > >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > > always > >> has been with this respect. > > > > Except that now the conversion of Lf in Linux files to Cr in Squeak no longer > > occurs and this breaks things such as Menu labels. Thus things that used > > to work now don't. > > > I don't see what change could cause this problem... I checked out loading a .st file in both Squeak 10.2 and Squeak 4.1. filing in the following file: 'From Squeak4.1 of 17 April 2010 [latest update: #9957] on 7 June 2010 at 9:30:22 pm'! Object subclass: #Junk instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'Kernel-Objects'! !Junk methodsFor: 'as yet unclassified' stamp: 'rpb 6/7/2010 21:27'! junk | a | a := 'abc def ghi'. self halt. a := a.! ! In Squeak 10.2 a) If concreteStream returns CrLfFileStream: ClassCategoryReader eventually calls scanFrom: aStream where aStream is a MultiByteFileStream and the next chunk of text is: junk | a | a := 'abc def ghi'. self halt. a := a.! ! At this point 'aStream nextChunkText' is called. which does: string := self nextChunk. nextChunk then does a 'self skipSeparators' and then calls 'self next' in a loop. The 'self next' reads the next character and does a 'self doConversion' test which returns true so if the character read is a Lf character it is converted into a Cr character. b) If concreteStream returns MultiByteFileStream: The same thing happens except doConversion returns false and so Lf characters are NOT converted into Cr characters. In Squeak 4.1 everything is the same up to the point 'aStream nextChunkText' is called. nextChunkText calls: '^converter nextChunkTextFromStream: self' where converter is a MacRomanTextConverter. Following the trail from here looks completely different than the Squeak 10.2 code. In particular I could not find where an attempt to convert Lf characters to Cr characters was supposed to occur let alone why it failed if concreteStream returns CrLfFileStream. Note that in 10.2 if concreteStream returns MultiByteFileStream: then Lf characters are NOT converted into Cr characters. I would have expected Lf characters and Cr,Lf character pairs to be converted to Cr characters regardless of what concreteStream returns. We do at this point know we are reading Squeak code so Lfs are inappropriate. >From my point of view there is no need for the 'doConversion' test at all except in strings where the user may intensionally want Lf or Cr,Lf for some odd reason and we shouldn't break his/her code. In that case no conversion should be done under any circumstances so the code is wrong both ways: it fails to convert when it should and converts when it shouldn't. Since I couldn't figure out how 4.1 handles things I can't say if it does any better. Hope this explains a few things. > The recent commit should solve the menu problem in presence of LF leakage. How do I install the version with this fix rather than the version of 4.1 found on the Squeak page? Regards, Ralph Boland |
2010/6/12 Ralph Boland <[hidden email]>:
> ... >> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not >> >> have to, and we didn't went this way... >> >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > > always >> >> has been with this respect. >> > >> > Except that now the conversion of Lf in Linux files to Cr in Squeak no longer >> > occurs and this breaks things such as Menu labels. Thus things that used >> > to work now don't. >> > > >> I don't see what change could cause this problem... > > > I checked out loading a .st file in both Squeak 10.2 and Squeak 4.1. > filing in the > following file: > > 'From Squeak4.1 of 17 April 2010 [latest update: #9957] on 7 June 2010 > at 9:30:22 pm'! > Object subclass: #Junk > instanceVariableNames: '' > classVariableNames: '' > poolDictionaries: '' > category: 'Kernel-Objects'! > > !Junk methodsFor: 'as yet unclassified' stamp: 'rpb 6/7/2010 21:27'! > junk > > | a | > a := 'abc > def > ghi'. > self halt. > a := a.! ! > > In Squeak 10.2 > a) If concreteStream returns CrLfFileStream: > ClassCategoryReader eventually calls scanFrom: aStream where aStream > is a MultiByteFileStream and the next chunk of text is: > > junk > > | a | > a := 'abc > def > ghi'. > self halt. > a := a.! ! > > At this point 'aStream nextChunkText' is called. > which does: string := self nextChunk. > nextChunk then does a 'self skipSeparators' and then calls 'self > next' in a loop. > > The 'self next' reads the next character and does a 'self doConversion' test > which returns true so if the character read is a Lf character it is > converted into a Cr character. > > b) If concreteStream returns MultiByteFileStream: > The same thing happens except doConversion returns false and so > Lf characters are NOT converted into Cr characters. > > In Squeak 4.1 everything is the same up to the point 'aStream > nextChunkText' is called. > nextChunkText calls: > > '^converter nextChunkTextFromStream: self' > > where converter is a MacRomanTextConverter. > > Following the trail from here looks completely different than the > Squeak 10.2 code. > In particular I could not find where an attempt to convert Lf > characters to Cr characters > was supposed to occur let alone why it failed if concreteStream > returns CrLfFileStream. > > > Note that in 10.2 if concreteStream returns MultiByteFileStream: > then Lf characters > are NOT converted into Cr characters. I would have expected Lf > characters and > Cr,Lf character pairs to be converted to Cr characters regardless of > what concreteStream > returns. We do at this point know we are reading Squeak code so Lfs > are inappropriate. > >From my point of view there is no need for the 'doConversion' test at > all except in strings > where the user may intensionally want Lf or Cr,Lf for some odd > reason and we shouldn't > break his/her code. In that case no conversion should be done under > any circumstances > so the code is wrong both ways: it fails to convert when it should > and converts when > it shouldn't. > > Since I couldn't figure out how 4.1 handles things I can't say if it > does any better. > > Hope this explains a few things. > >> The recent commit should solve the menu problem in presence of LF leakage. > > How do I install the version with this fix rather than the version of 4.1 found > on the Squeak page? > > Regards, > > Ralph Boland > > ABOUT THE IMPLEMENTATION: ------------------------------------------------ The two inst.vars of interest in MultiByteFileStream are - wantsLineEndConversion - lineEndConvention There are two ways to set the line ending convention. 1) myStream wantsLineEndConversion: true. if you follow the code, you will discover that: lineEndConvention ifNil: [ self detectLineEndConvention ]. detectLineEndConvention scan the an input file to guess the line ending convention at first line break (this won't work if you have mixed conventions in your file...) detectLineEndConvention set the lineEndConvention to LineEndDefault for output files (if empty) This default is guessed at image startup based on underlying OS (see guessDefaultLineEndConvention). 2) myStream lineEndConvention: aSymbol (#cr #lf #crlf or nil). handling of line endings does not happens entirely here though... self installLineEndConventionInConverter. this leads you to: TextConverter>>installLineEndConvention: This is an optimization. Since TextConverter already converts characters with proper encoding, it also handles line ending with no extra cost. For output, you can see part of the job is performed in MultiByteFileStream>>nextPut: too... These handling assume the Smalltalk String is made of CR only... (so not immune to LF !!!). Anyway, the MultiByteFileStream is an ugly part of the system, so I don't encourage you to loose time in its tricky implementation. ABOUT THE DEFAULT BEHAVIOUR: ----------------------------------------------------- The first method is invoked when you use CrLfFileStream to create your stream. This second method is not sent. So by default, NO conversion is performed, unless you EXPLICITELY use - CrLfFileStream - or #wantsLineEndConversion: message after stream creation - or #lineEndConvention: message Not sure it was much different in 3.10.2, no time to inquire... Hope this helps Nicolas |
Free forum by Nabble | Edit this page |