Hi,
I am having some problems with personal names. I have a database of persons that I want (firstly) to sort by name, and (secondly) to correlate to some old files that have been generated outside Smalltalk. Mostly these operations proceed smoothly, but there are a couple of hurdles. Firstly, some Norwegian names have the combination 'aa' which is an obsolete way of writing 'å' but is retained in personal names. I'd like to have these test > 'z' (and even > 'ø'), and not < 'b'. I have written a Norwegian collating method testing on copied strings that have 'aa' changed to 'å', and I have entered it into the Norwegian part of the symbol dictionary for StringCollatingPolicy, but when I run installNOlocale, it loads only the English method. Is there some other way to accomplish this? I think I'm going to use a custom method for comparing names instead of <, but anyway I would like to know how to handle this kind of thing through locales. I have a few Celtic names in my database too, and a related problem is to expand 'Mc' to test > Mabon and < Madison, for example. This can be done easily with a custom method, testing on expanded copies of the strings I guess. But quite another problem is the Irish O names. I have always written them with the single quote, because it's so easily available. But this character happens also to be the string delimiter in Smalltalk, just as in many other languages, and while Smalltalk books claim that they can be used in strings by doubling them, just as in many other languages, it doesn't seem to work that way in VisualWorks. Strings containing single quotes can indeed be stored in variables by doubling them. But when you ask for a printout, you will get a string printed out with doubled single quotes. And when they are tested against strings read from a file containing single single quotes they test as unequal even if everything else is the same. Moreover, all attempts to remove the doubled single quotes seem to fail. For example: 'O''Neill' copyReplaceAll: '''' with: (String with: $') and p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse: [s:= s, (String with: c)]]. give 'O''Neill'. In fact, 'O''Neill' asByteString seems to indicate that the double single quote is represented internally by a single occurrence of the code 39. I think this is odd, and a little funny. I guess the Smalltalk answer is to avoid imbedding single quotes within strings. And I suppose the way to go is to use another character than the single quote, which perhaps isn't the correct one to use anyway, as the qoutes traditionally used by computer system aren't really quotes, but inch/ feet or minutes/seconds signs. LEF |
>
> But quite another problem is the Irish O names. I have always > written them with the single quote, because it's so easily > available. But this character happens also to be the string > delimiter in Smalltalk, just as in many other languages, and while > Smalltalk books claim that they can be used in strings by doubling > them, just as in many other languages, it doesn't seem to work that > way in VisualWorks. Strings containing single quotes can indeed be > stored in variables by doubling them. But when you ask for a > printout, you will get a string printed out with doubled single > quotes. And when they are tested against strings read from a file > containing single single quotes they test as unequal even if > everything else is the same. > > Moreover, all attempts to remove the doubled single quotes seem to > fail. For example: > > 'O''Neill' copyReplaceAll: '''' with: (String with: $') > > and > > p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse: > [s:= s, (String with: c)]]. > > give 'O''Neill'. > > In fact, 'O''Neill' asByteString seems to indicate that the double > single quote is represented internally by a single occurrence of > the code 39. > > I think this is odd, and a little funny. I guess the Smalltalk > answer is to avoid imbedding single quotes within strings. And I > suppose the way to go is to use another character than the single > quote, which perhaps isn't the correct one to use anyway, as the > qoutes traditionally used by computer system aren't really quotes, > but inch/feet or minutes/seconds signs. VW supports three printing protocols, they are each supposed to be used in a different realm but that philosophy is not always followed. #displayString should return what the object shows in the UI, for example in list boxes. #printString should return useful information about the object for developers using the VisualWorks IDE. #storeString should return Smalltalk code that can be evaluated to recreate the object. Simple objects that have a literal representation in the Smalltalk syntax typically return their Smalltalk literal representation as their #printString. So if you do Transcript print: 'O''Neill' you see 'O''Neill' But if you do Transcript nextPutAll: 'O''Neill' you see O'Neill Moreover if you do 'O''Neill' inspect You should see O'Neill under the 'text' tab and $' as the second element under the 'elements' tab. There you can also see that the hex value of that character is 27, this translates to 39 in decimal as you can see under the 'basics' tab, a normal ascii single quote character, no magic here :-) HTH, Reinout ------- |
In reply to this post by Lars Finsen
Lars Finsen wrote:
> But quite another problem is the Irish O names. I have always written > them with the single quote, because it's so easily available. But this > character happens also to be the string delimiter in Smalltalk, just > as in many other languages, and while Smalltalk books claim that they > can be used in strings by doubling them, just as in many other > languages, it doesn't seem to work that way in VisualWorks. Strings > containing single quotes can indeed be stored in variables by doubling > them. But when you ask for a printout, you will get a string printed > out with doubled single quotes. And when they are tested against > strings read from a file containing single single quotes they test as > unequal even if everything else is the same. > > Moreover, all attempts to remove the doubled single quotes seem to > fail. For example: > > 'O''Neill' copyReplaceAll: '''' with: (String with: $') > > and > > p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse: [s:= > s, (String with: c)]]. > > give 'O''Neill'. > > In fact, 'O''Neill' asByteString seems to indicate that the double > single quote is represented internally by a single occurrence of the > code 39. > > I think this is odd, and a little funny. I guess the Smalltalk answer > is to avoid imbedding single quotes within strings. And I suppose the > way to go is to use another character than the single quote, which > perhaps isn't the correct one to use anyway, as the qoutes > traditionally used by computer system aren't really quotes, but > inch/feet or minutes/seconds signs. happens when you send that string the message #'printString' (which happens when you select the [Smalltalk]/[Print It] menu item in a workspace. You are confusing how a string prints (the result of the #'printString' message), with its internal representation. Select the following in a workspace and [Do It]: Transcript cr; nextPutAll: 'O''Neill' asString; cr; nextPutAll: 'O''Neill' printString; yourself. (Filename named: 'test.txt') writeStream nextPutAll: 'O''Neill'; close. FileBrowser openOnFileNamed: 'test.txt'. These two should show you that there is a way to see the string without the extra quotes. James |
In reply to this post by Lars Finsen
Hi Lars,
I can't comment on the Norwegian names but the embedded quote issue is I think straight-forward. It would be a problem if you're using the print string to display strings with embedded quotes sicne embedded quotes get doubled. Bu you don't have to do that. e.g. String withAll: #($O $' $N $e $i $l $l) => 'O''Neill' Transcript nextPutAll: (String withAll: #($O $' $N $e $i $l $l)); flush => O'Neill 'O''Neill' size => 7 are all consistent. The string contains seven characters, but its printString prints out as 10 characters because embedded single quotes need to be doubled otherwise they would wrongly terminate a string. If you were using C in an interactive environment you'd soon see the same things. e.g. char string[] = { '"' }; /* the character $", written single quote, double quote, single quote */ int main() { printf("%s\n", "\""); printf("%s\n", string); printf("%d %d\n", strlen("\""), strlen(string)); return 0; } if string could be printed to yeild the literal that would reconstruct it would have to be printed as "\"". Literal strings in any language are going to require some mechanism to escape the string's delimiters. You always need to distinguish between the string's contents and its literal denotation. These are always different. e.g. 'foo' takes 5 characters to write but contains only 3. The answer absolutely _isn't_ to avoid embedding single quotes in strings; strings handle embedded single quotes just fine. The solution is to avoid using the printString (its literal denotation) when you mean to display its contents. HTH On 8/2/07, Lars Finsen <[hidden email]> wrote: Hi, |
In reply to this post by Reinout Heeck
Hi and thanks for all your input on this. It seems my problems came
from the way I first stored my data, with simple #store: messages using the variables containing the names and other stuff as arguments. Now I have stored it all again with #nextPutAll: instead for the strings, and the double single quotes are gone. It seems we don't have many amateur questions here. The other questions asked are on a different level than mine. It was quite another matter on the c-prog list. I guess the reason is kind of obvious. After 30 years we are still small we Smalltalkers. But we are growing, or what? LEF |
Free forum by Nabble | Edit this page |