Litteral arrays parsing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Litteral arrays parsing

Noury Bouraqadi
Hi,

A stupid question, why evaluating
#("comment") leads to an empty array instead of an array with a single
element #'"comment"'?

I guess that this is somehow bound to the fact that
#"comment"
evaluates to a symbol with a single hidden character (Ascii 30)

BTW, I test this on a 3.9-7032 image

Noury
--------------------------------------------------------------
Dr. Noury Bouraqadi - Enseignant/Chercheur
Ecole des Mines de Douai - Dept. G.I.P
http://csl.ensm-douai.fr/noury

European Smalltalk Users Group Board
http://www.esug.org

Squeak: an Open Source Smalltalk
http://www.squeak.org
--------------------------------------------------------------



Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Bert Freudenberg-3
Am 23.05.2006 um 16:42 schrieb Noury Bouraqadi:

> Hi,
>
> A stupid question, why evaluating
> #("comment") leads to an empty array instead of an array with a  
> single element #'"comment"'?

Because a comment is parsed as whitespace, not as token.

> I guess that this is somehow bound to the fact that
> #"comment"
> evaluates to a symbol with a single hidden character (Ascii 30)

No, that's because Ascii 30 signifies the end of input (see  
Scanner>>step). It's the same as if you just evaluate a single #  
character.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Nicolas Cellier-3
In reply to this post by Noury Bouraqadi
Le Mardi 23 Mai 2006 16:42, Noury Bouraqadi a écrit :
> Hi,
>
> A stupid question, why evaluating
> #("comment") leads to an empty array instead of an array with a single
> element #'"comment"'?
>

Isn't it a good thing, this ability to insert comment inside long literal
arrays ?

stupidExample := #(
    "keys are stored in first sub array"
        #( 'one' 'two' 'three')
    "values are stored in second sub array"
        #(1 2 3)
).

> I guess that this is somehow bound to the fact that
> #"comment"
> evaluates to a symbol with a single hidden character (Ascii 30)
>
> BTW, I test this on a 3.9-7032 image
>

This one is a bad behaviour indeed, a side effect of Scanner/Parser internal
implementation... (Ascii 30 being used with meaning "end of input").

Behind #, i would expect a letter [a-z][A-Z], a string quote ', or an opening
parenthesis (. Maybe a second # in Dolphin Smalltalk extension...

What else does make sense according to Smalltalk formal definition?

Nicolas

> Noury
> --------------------------------------------------------------
> Dr. Noury Bouraqadi - Enseignant/Chercheur
> Ecole des Mines de Douai - Dept. G.I.P
> http://csl.ensm-douai.fr/noury
>
> European Smalltalk Users Group Board
> http://www.esug.org
>
> Squeak: an Open Source Smalltalk
> http://www.squeak.org
> --------------------------------------------------------------


Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Wolfgang Helbig-2
In reply to this post by Noury Bouraqadi
Nicolas,
you asked:

>Le Mardi 23 Mai 2006 16:42, Noury Bouraqadi a écrit :
>> Hi,
>>
>> A stupid question, why evaluating
>> #("comment") leads to an empty array instead of an array with a single
>> element #'"comment"'?
>>

>
>This one is a bad behaviour indeed, a side effect of Scanner/Parser internal
>implementation... (Ascii 30 being used with meaning "end of input").

As long as it is "internal", I can't see anything wrong with it.

>
>Behind #, i would expect a letter [a-z][A-Z], a string quote ', or an opening
>parenthesis (. Maybe a second # in Dolphin Smalltalk extension...
>
>What else does make sense according to Smalltalk formal definition?

According to the syntax diagrams in the Book (choose the book's color from
blue, yellow or purple), the sharp character may occur as the first character of
an array constant or a symbol constant. In these positions it is followed by a
left parenthesis, if it marks an array constant, otherwise it marks a symbol
constant and is followed by a letter, a special character or a minus character.
Remember, special characters are the ones that make a binary selector.

Inside a string or a comment, the sharp character may be followed by any of the
95 graphic characters.

And finally, inside a character constant, the sharp character may be followed by
any character.

This holds for the language as defined formally by the syntax diagrams, but not
for the Smalltalk programming language as described informally by the Blue Book,
where "any character" may occur inside comments, strings and character
constants, that is not only the graphic characters but ASCII control characters
as well, like carriage return, horizontal tabulator or record separator which is
ASCII 30.

And this again differs from the language as accepted by the compiler in the V2
image of Smalltalk-80. For example, the ASCII 0 character inside a character
constant gets you an index error. But this is another thread :-)

Greetings,
Wolfgang

--
Weniger, aber besser.


Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Noury Bouraqadi
In reply to this post by Nicolas Cellier-3

Le 23 mai 06, à 19:51, nicolas cellier a écrit :

> Le Mardi 23 Mai 2006 16:42, Noury Bouraqadi a écrit :
>> Hi,
>>
>> A stupid question, why evaluating
>> #("comment") leads to an empty array instead of an array with a single
>> element #'"comment"'?
>>
>
> Isn't it a good thing, this ability to insert comment inside long
> literal
> arrays ?
>
> stupidExample := #(
>     "keys are stored in first sub array"
>         #( 'one' 'two' 'three')
>     "values are stored in second sub array"
>         #(1 2 3)
> ).
>
Yes. You're right. This is a good reason.



>  Maybe a second # in Dolphin Smalltalk extension...
>
>
What is this?

I read somewhere in the Squeak Scanner code that ##() may be given the
Ansi Smalltalk semantics without further explanation. Any hint?


Noury
--------------------------------------------------------------
Dr. Noury Bouraqadi - Enseignant/Chercheur
Ecole des Mines de Douai - Dept. G.I.P
http://csl.ensm-douai.fr/noury

European Smalltalk Users Group Board
http://www.esug.org

Squeak: an Open Source Smalltalk
http://www.squeak.org
--------------------------------------------------------------



Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Nicolas Cellier-3
In reply to this post by Wolfgang Helbig-2
Le Mercredi 24 Mai 2006 01:16, Wolfgang Helbig a écrit :

> Nicolas,
>
> you asked:
> >Le Mardi 23 Mai 2006 16:42, Noury Bouraqadi a écrit :
> >> Hi,
> >>
> >> A stupid question, why evaluating
> >> #("comment") leads to an empty array instead of an array with a single
> >> element #'"comment"'?
> >
> >This one is a bad behaviour indeed, a side effect of Scanner/Parser
> > internal implementation... (Ascii 30 being used with meaning "end of
> > input").
>
> As long as it is "internal", I can't see anything wrong with it.
>

Hi Wolfgang,
Just try:
    (Compiler evaluate: '#') inspect.
and you will see this ascii 30 dangerously leaking from internal...

If # alone were really a valid syntax, then:
    (Compiler evaluate: '# inspect').
should inspect it...

It does not, because space is just ignored:
    (Compiler evaluate: '# inspect') inspect.

So as extra sharp signs:
    (Compiler evaluate: '# # # # inspect') inspect.

Do you agree with such behavior ?

> >Behind #, i would expect a letter [a-z][A-Z], a string quote ', or an
> > opening parenthesis (. Maybe a second # in Dolphin Smalltalk extension...
> >
> >What else does make sense according to Smalltalk formal definition?
>
> According to the syntax diagrams in the Book (choose the book's color from
> blue, yellow or purple), the sharp character may occur as the first
> character of an array constant or a symbol constant. In these positions it
> is followed by a left parenthesis, if it marks an array constant, otherwise
> it marks a symbol constant and is followed by a letter, a special character
> or a minus character. Remember, special characters are the ones that make a
> binary selector.
>

Oh yes, i should not have forgotten... #* #-
In latest squeak, also work with any number of special characters like #***.
In VW you can have a ByteArray with #[ 0 0 ]

> Inside a string or a comment, the sharp character may be followed by any of
> the 95 graphic characters.
>
> And finally, inside a character constant, the sharp character may be
> followed by any character.
>

I do not understand this sentence. Isn't it the dollar that is used in
character constants ?
Or is it inside a literal array like #( ^x:=y@z ), in which case each
character is interpreted as a single character symbol...

For fun, note that Squeak does not complain when you write
# $a

> This holds for the language as defined formally by the syntax diagrams, but
> not for the Smalltalk programming language as described informally by the
> Blue Book, where "any character" may occur inside comments, strings and
> character constants, that is not only the graphic characters but ASCII
> control characters as well, like carriage return, horizontal tabulator or
> record separator which is ASCII 30.
>
> And this again differs from the language as accepted by the compiler in the
> V2 image of Smalltalk-80. For example, the ASCII 0 character inside a
> character constant gets you an index error. But this is another thread :-)
>

You mean using ascii value as an index in the scanner character table?
I started with st-80 v2.3 but just don't remember such details...

> Greetings,
> Wolfgang
>
> --
> Weniger, aber besser.

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Nicolas Cellier-3
In reply to this post by Noury Bouraqadi
Le Mercredi 24 Mai 2006 01:56, Noury Bouraqadi a écrit :
> >  Maybe a second # in Dolphin Smalltalk extension...
>
> What is this?
>
> I read somewhere in the Squeak Scanner code that ##() may be given the
> Ansi Smalltalk semantics without further explanation. Any hint?

AFAIR, in Dolphin, it means that the literal can be any valid Smalltalk
expession.

It is evaluated once at compile time (when you accept in the browser, or load
from file).

for example ##(1@2)
##(Dictionary with: $a -> 0)

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Litteral arrays parsing

Wolfgang Helbig-2
In reply to this post by Noury Bouraqadi
Nicolas,

You want me to try something out:
>
>Hi Wolfgang,
>Just try:
>    (Compiler evaluate: '#') inspect.
>and you will see this ascii 30 dangerously leaking from internal...

And I did. And I see. And this is wrong. A symbol constant is the sharp
character followed by an identifier or binary selector, both of which are
nonempty strings. Looks like the compiler in this case accepts an empty string,
with the record separator, that is ASCII 30, internally marking the end of the
string (Not the end of the file). Anyway, the compiler must not accept '#'.

>
>If # alone were really a valid syntax, then:
>    (Compiler evaluate: '# inspect').
>should inspect it...

But it isn't, so it should not be accepted. In this case, the compiler is
right in not accepting it.

>
>It does not, because space is just ignored:
>    (Compiler evaluate: '# inspect') inspect.

>
>So as extra sharp signs:
>    (Compiler evaluate: '# # # # inspect') inspect.
>
>Do you agree with such behavior ?

No.
There seems to be an error with the compiler handling empty strings.
They are represented internally by the record separator. Inspecting nonempty
strings reveal, that the record separator does not mark the end of string,
as I thought originally.

Squeak inherited this error from Smalltalk-80 V2, which I tested using the
Hobbes emulator.

Aside: Since zero is not a natural number, you have to handle empty strings
as special cases, makeing the program unnatural and inviting errors.
Couldn't resist :-)
End of Aside

>
>> >Behind #, i would expect a letter [a-z][A-Z], a string quote ', or an
>> > opening parenthesis (. Maybe a second # in Dolphin Smalltalk extension...
>> >
>> >What else does make sense according to Smalltalk formal definition?
>>
>> According to the syntax diagrams in the Book (choose the book's color from
>> blue, yellow or purple), the sharp character may occur as the first
>> character of an array constant or a symbol constant. In these positions it
>> is followed by a left parenthesis, if it marks an array constant, otherwise
>> it marks a symbol constant and is followed by a letter, a special character
>> or a minus character. Remember, special characters are the ones that make a
>> binary selector.
>>
>
>Oh yes, i should not have forgotten... #* #-
>In latest squeak, also work with any number of special characters like #***.
>In VW you can have a ByteArray with #[ 0 0 ]
>
>> Inside a string or a comment, the sharp character may be followed by any of
>> the 95 graphic characters.
>>
>> And finally, inside a character constant, the sharp character may be
>> followed by any character.
>>
>
>I do not understand this sentence. Isn't it the dollar that is used in
>character constants ?

Yes, it is. And $# can be followed by any character.

>Or is it inside a literal array like #( ^x:=y@z ), in which case each
>character is interpreted as a single character symbol...
>
>For fun, note that Squeak does not complain when you write
># $a

This is not a valid expression.

>
>> This holds for the language as defined formally by the syntax diagrams, but
>> not for the Smalltalk programming language as described informally by the
>> Blue Book, where "any character" may occur inside comments, strings and
>> character constants, that is not only the graphic characters but ASCII
>> control characters as well, like carriage return, horizontal tabulator or
>> record separator which is ASCII 30.
>>
>> And this again differs from the language as accepted by the compiler in the
>> V2 image of Smalltalk-80. For example, the ASCII 0 character inside a
>> character constant gets you an index error. But this is another thread :-)
>>
>
>You mean using ascii value as an index in the scanner character table?
>I started with st-80 v2.3 but just don't remember such details...

Months ago, I stumbled across this error in ST-80 V2 when I tried to read back
a form from the output of
        aForm storeOn: aStream.
The stream then contained ASCII control characters, like ASCII 0, which on
reading back triggered. Here is my report I've sent Dan on April 5th:

Report:

| f s n|
f _ StandardFileStream oldFileNamed: 'DefaultTextStyle.so' .
s _ String new: f size.
n _ 0.
f do: [ :v | n _ n + 1. s at: n put: v].
TextConstants at: #ST80DefaultTextStyle put: (Compiler evaluate: s)

The above gives me a "subscript is out of bounds: 0".
Debug shows:

xBinary

        tokenType _ #binary.
        token _ Symbol internCharacter: self step.
        ((typeTable at: hereChar asciiValue) = #xBinary and: [hereChar ~= $-])
                ifTrue: [token _ (token , (String with: self step)) asSymbol]

And, of course, "hereChar asciiValue" is zero.

(End of Report)

By the way, I still don't have those venerable ST-80 fonts in a Squeak image.

Greetings,
Wolfgang

--
Weniger, aber besser.