Some problems with names.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Some problems with names.

Lars Finsen
Hi,
I am having some problems with personal names. I have a database of  
persons that I want (firstly) to sort by name, and (secondly) to  
correlate to some old files that have been generated outside  
Smalltalk. Mostly these operations proceed smoothly, but there are a  
couple of hurdles.

Firstly, some Norwegian names have the combination 'aa' which is an  
obsolete way of writing 'å' but is retained in personal names. I'd  
like to have these test > 'z' (and even > 'ø'), and not < 'b'. I have  
written a Norwegian collating method testing on copied strings that  
have 'aa' changed to 'å', and I have entered it into the Norwegian  
part of the symbol dictionary for StringCollatingPolicy, but when I  
run installNOlocale, it loads only the English method. Is there some  
other way to accomplish this? I think I'm going to use a custom  
method for comparing names instead of <, but anyway I would like to  
know how to handle this kind of thing through locales.

I have a few Celtic names in my database too, and a related problem  
is to expand 'Mc' to test > Mabon and < Madison, for example. This  
can be done easily with a custom method, testing on expanded copies  
of the strings I guess.

But quite another problem is the Irish O names. I have always written  
them with the single quote, because it's so easily available. But  
this character happens also to be the string delimiter in Smalltalk,  
just as in many other languages, and while Smalltalk books claim that  
they can be used in strings by doubling them, just as in many other  
languages, it doesn't seem to work that way in VisualWorks. Strings  
containing single quotes can indeed be stored in variables by  
doubling them. But when you ask for a printout, you will get a string  
printed out with doubled single quotes. And when they are tested  
against strings read from a file containing single single quotes they  
test as unequal even if everything else is the same.

Moreover, all attempts to remove the doubled single quotes seem to  
fail. For example:

'O''Neill' copyReplaceAll: '''' with: (String with: $')

and

p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse:  
[s:= s, (String with: c)]].

give 'O''Neill'.

In fact, 'O''Neill' asByteString seems to indicate that the double  
single quote is represented internally by a single occurrence of the  
code 39.

I think this is odd, and a little funny. I guess the Smalltalk answer  
is to avoid imbedding single quotes within strings. And I suppose the  
way to go is to use another character than the single quote, which  
perhaps isn't the correct one to use anyway, as the qoutes  
traditionally used by computer system aren't really quotes, but inch/
feet or minutes/seconds signs.

LEF
Reply | Threaded
Open this post in threaded view
|

Re: Some problems with names.

Reinout Heeck
>
> But quite another problem is the Irish O names. I have always  
> written them with the single quote, because it's so easily  
> available. But this character happens also to be the string  
> delimiter in Smalltalk, just as in many other languages, and while  
> Smalltalk books claim that they can be used in strings by doubling  
> them, just as in many other languages, it doesn't seem to work that  
> way in VisualWorks. Strings containing single quotes can indeed be  
> stored in variables by doubling them. But when you ask for a  
> printout, you will get a string printed out with doubled single  
> quotes. And when they are tested against strings read from a file  
> containing single single quotes they test as unequal even if  
> everything else is the same.
>
> Moreover, all attempts to remove the doubled single quotes seem to  
> fail. For example:
>
> 'O''Neill' copyReplaceAll: '''' with: (String with: $')
>
> and
>
> p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse:  
> [s:= s, (String with: c)]].
>
> give 'O''Neill'.
>
> In fact, 'O''Neill' asByteString seems to indicate that the double  
> single quote is represented internally by a single occurrence of  
> the code 39.
>
> I think this is odd, and a little funny. I guess the Smalltalk  
> answer is to avoid imbedding single quotes within strings. And I  
> suppose the way to go is to use another character than the single  
> quote, which perhaps isn't the correct one to use anyway, as the  
> qoutes traditionally used by computer system aren't really quotes,  
> but inch/feet or minutes/seconds signs.

VW supports three printing protocols, they are each supposed to be  
used in a different realm but that philosophy is not always followed.

#displayString should return what the object shows in the UI, for  
example in list boxes.
#printString should return useful information about the object for  
developers using the VisualWorks IDE.
#storeString should return Smalltalk code that can be evaluated to  
recreate the object.


Simple objects that have a literal representation in the Smalltalk  
syntax typically return their Smalltalk literal representation as  
their #printString.

So if you do
   Transcript print: 'O''Neill'
you see 'O''Neill'

But if you do
   Transcript nextPutAll: 'O''Neill'
you see
   O'Neill


Moreover if you do
   'O''Neill' inspect
You should see
   O'Neill
under the 'text' tab

and $' as the second element under the 'elements' tab. There you can  
also see that the hex value of that character is 27, this translates  
to 39 in decimal as you can see under the 'basics' tab, a normal  
ascii single quote character, no magic here :-)


HTH,

Reinout
-------




Reply | Threaded
Open this post in threaded view
|

Re: Some problems with names.

jgfoster
In reply to this post by Lars Finsen
Lars Finsen wrote:

> But quite another problem is the Irish O names. I have always written
> them with the single quote, because it's so easily available. But this
> character happens also to be the string delimiter in Smalltalk, just
> as in many other languages, and while Smalltalk books claim that they
> can be used in strings by doubling them, just as in many other
> languages, it doesn't seem to work that way in VisualWorks. Strings
> containing single quotes can indeed be stored in variables by doubling
> them. But when you ask for a printout, you will get a string printed
> out with doubled single quotes. And when they are tested against
> strings read from a file containing single single quotes they test as
> unequal even if everything else is the same.
>
> Moreover, all attempts to remove the doubled single quotes seem to
> fail. For example:
>
> 'O''Neill' copyReplaceAll: '''' with: (String with: $')
>
> and
>
> p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse: [s:=
> s, (String with: c)]].
>
> give 'O''Neill'.
>
> In fact, 'O''Neill' asByteString seems to indicate that the double
> single quote is represented internally by a single occurrence of the
> code 39.
>
> I think this is odd, and a little funny. I guess the Smalltalk answer
> is to avoid imbedding single quotes within strings. And I suppose the
> way to go is to use another character than the single quote, which
> perhaps isn't the correct one to use anyway, as the qoutes
> traditionally used by computer system aren't really quotes, but
> inch/feet or minutes/seconds signs.
In Smalltalk you can embed single quotes within a string. The confusion
happens when you send that string the message #'printString' (which
happens when you select the [Smalltalk]/[Print It] menu item in a workspace.
You are confusing how a string prints (the result of the #'printString'
message), with its internal representation. Select the following in a
workspace and [Do It]:

    Transcript
        cr; nextPutAll: 'O''Neill' asString;
        cr; nextPutAll: 'O''Neill' printString;
        yourself.
    (Filename named: 'test.txt') writeStream
        nextPutAll: 'O''Neill';
        close.
    FileBrowser openOnFileNamed: 'test.txt'.

These two should show you that there is a way to see the string without
the extra quotes.

James

Reply | Threaded
Open this post in threaded view
|

Re: Some problems with names.

Eliot Miranda-2
In reply to this post by Lars Finsen
Hi Lars,

    I can't comment on the Norwegian names but the embedded quote issue is I think straight-forward.  It would be a problem if you're using the print string to display strings with embedded quotes sicne embedded quotes get doubled.  Bu you don't have to do that.

e.g.
    String withAll: #($O $' $N $e $i $l $l) => 'O''Neill'
    Transcript nextPutAll: (String withAll: #($O $' $N $e $i $l $l)); flush => O'Neill
    'O''Neill' size => 7
are all consistent.  The string contains seven characters, but its printString prints out as 10 characters because embedded single quotes need to be doubled otherwise they would wrongly terminate a string.

If you were using C in an interactive environment you'd soon see the same things.  e.g.
    char string[] = { '"' }; /* the character $", written single quote, double quote, single quote */
    int
    main()
    {
        printf("%s\n", "\"");
        printf("%s\n", string);
        printf("%d %d\n", strlen("\""), strlen(string));
        return 0;
    }
if string could be printed to yeild the literal that would reconstruct it would have to be printed as "\"".

Literal strings in any language are going to require some mechanism to escape the string's delimiters.  You always need to distinguish between the string's contents and its literal denotation.  These are always different.  e.g. 'foo' takes 5 characters to write but contains only 3.

The answer absolutely _isn't_ to avoid embedding single quotes in strings; strings handle embedded single quotes just fine.  The solution is to avoid using the printString (its literal denotation) when you mean to display its contents.

HTH
   

On 8/2/07, Lars Finsen <[hidden email]> wrote:
Hi,
I am having some problems with personal names. I have a database of
persons that I want (firstly) to sort by name, and (secondly) to
correlate to some old files that have been generated outside
Smalltalk. Mostly these operations proceed smoothly, but there are a
couple of hurdles.

Firstly, some Norwegian names have the combination 'aa' which is an
obsolete way of writing 'å' but is retained in personal names. I'd
like to have these test > 'z' (and even > 'ø'), and not < 'b'. I have
written a Norwegian collating method testing on copied strings that
have 'aa' changed to 'å', and I have entered it into the Norwegian
part of the symbol dictionary for StringCollatingPolicy, but when I
run installNOlocale, it loads only the English method. Is there some
other way to accomplish this? I think I'm going to use a custom
method for comparing names instead of <, but anyway I would like to
know how to handle this kind of thing through locales.

I have a few Celtic names in my database too, and a related problem
is to expand 'Mc' to test > Mabon and < Madison, for example. This
can be done easily with a custom method, testing on expanded copies
of the strings I guess.

But quite another problem is the Irish O names. I have always written
them with the single quote, because it's so easily available. But
this character happens also to be the string delimiter in Smalltalk,
just as in many other languages, and while Smalltalk books claim that
they can be used in strings by doubling them, just as in many other
languages, it doesn't seem to work that way in VisualWorks. Strings
containing single quotes can indeed be stored in variables by
doubling them. But when you ask for a printout, you will get a string
printed out with doubled single quotes. And when they are tested
against strings read from a file containing single single quotes they
test as unequal even if everything else is the same.

Moreover, all attempts to remove the doubled single quotes seem to
fail. For example:

'O''Neill' copyReplaceAll: '''' with: (String with: $')

and

p:= $$. s:= ''. 'O''Neill' do: [:c | (c = $') & (p = $') ifFalse:
[s:= s, (String with: c)]].

give 'O''Neill'.

In fact, 'O''Neill' asByteString seems to indicate that the double
single quote is represented internally by a single occurrence of the
code 39.

I think this is odd, and a little funny. I guess the Smalltalk answer
is to avoid imbedding single quotes within strings. And I suppose the
way to go is to use another character than the single quote, which
perhaps isn't the correct one to use anyway, as the qoutes
traditionally used by computer system aren't really quotes, but inch/
feet or minutes/seconds signs.

LEF

Reply | Threaded
Open this post in threaded view
|

Re: Some problems with names.

Lars Finsen
In reply to this post by Reinout Heeck
Hi and thanks for all your input on this. It seems my problems came  
from the way I first stored my data, with simple #store: messages  
using the variables containing the names and other stuff as  
arguments. Now I have stored it all again with #nextPutAll: instead  
for the strings, and the double single quotes are gone.

It seems we don't have many amateur questions here. The other  
questions asked are on a different level than mine. It was quite  
another matter on the c-prog list. I guess the reason is kind of  
obvious. After 30 years we are still small we Smalltalkers. But we  
are growing, or what?

LEF