Smalltalk › Squeak › Squeak - Dev

NewCompiler weird ANSI

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Nicolas Cellier-3

NewCompiler weird ANSI

As stated in NewCompiler's code, the ANSI syntax:

ClosureCompiler evaluate: '- 1'

is WEIRD!
It answers -1 as a literal negative number, space not being significant.
BEWARE a tab or cr are significant in current implementation (ANSI?)

This is more confusing than usefull.
It makes people think of a prefixed operator like other languages.
Also, as already said, inside literal array #(- 1) space is significant.

And what about the sign of exponent?
ClosureCompiler evaluate: '-1.0e- 1'. Message not understood e
((-1.0) e) - (1), so space is significant here.

Beside, as NewCompiler accepts minus as last character of a
multi-character binary selector, this causes further ambiguity.

ClosureCompiler evaluate: '0--1'. is 1 (0-(-1)) last minus is attached
to digit because there is no space.
ClosureCompiler evaluate: '0-- 1'. Message not understood --
2 contradictory rules
- either space is significant thus selector is #--
- or space is not significant (like '- 1')
The first rule wins apparently

Weak weak ANSI. What was in their mind?

Nicolas

Mathieu SUEN

Re: NewCompiler weird ANSI

Yes I agree we should not allow the space between the minus and the one.
By the way you should post this to the NewCompiler mailing list.

http://lists.squeakfoundation.org/mailman/listinfo/newcompiler

Mth

On May 24, 2007, at 12:12 AM, nicolas cellier wrote:

>
> As stated in NewCompiler's code, the ANSI syntax:
>
> ClosureCompiler evaluate: '- 1'
>
> is WEIRD!
> It answers -1 as a literal negative number, space not being
> significant. BEWARE a tab or cr are significant in current
> implementation (ANSI?)
>
> This is more confusing than usefull.
> It makes people think of a prefixed operator like other languages.
> Also, as already said, inside literal array #(- 1) space is
> significant.
>
> And what about the sign of exponent?
> ClosureCompiler evaluate: '-1.0e- 1'. Message not understood e
> ((-1.0) e) - (1), so space is significant here.
>
> Beside, as NewCompiler accepts minus as last character of a multi-
> character binary selector, this causes further ambiguity.
>
> ClosureCompiler evaluate: '0--1'. is 1 (0-(-1)) last minus is
> attached to digit because there is no space.
> ClosureCompiler evaluate: '0-- 1'. Message not understood --
> 2 contradictory rules
> - either space is significant thus selector is #--
> - or space is not significant (like '- 1')
> The first rule wins apparently
>
> Weak weak ANSI. What was in their mind?
>
> Nicolas
>
>

stephane ducasse

Re: NewCompiler weird ANSI

In reply to this post by Nicolas Cellier-3

nicolas

thanks, please notice that marcus is not reading the squeak-dev
mailinglist anymore.
Please cross post to the new compiler mailing-list

Stef

On 24 mai 07, at 00:12, nicolas cellier wrote:

Lex Spoon-3

Re: NewCompiler weird ANSI

In reply to this post by Nicolas Cellier-3

nicolas cellier <[hidden email]> writes:
> As stated in NewCompiler's code, the ANSI syntax:

These are fun challenges, Nicolas! They will help us get a compiler
that does precisely what we want from it.

In deciding the behavior we want, I would propose two principles, the
first being stronger than the last:

1. Be compatible on the trivial stuff. Save Squeak-isms for places
where there is a real advantage.

2. Lean towards accepting more rather than less. Especially we
should try to accept things that other implementations accept.

Here are my attempts to figure out what ANSI actually wants in these
cases. Be aware it is not always what the compiler is currently
doing.

> ClosureCompiler evaluate: '- 1'
>
> is WEIRD!
> It answers -1 as a literal negative number, space not being
> significant. BEWARE a tab or cr are significant in current
> implementation (ANSI?)
>
> This is more confusing than usefull.
> It makes people think of a prefixed operator like other languages.

The standard is clear. A unary - is allowed to dangle way ahead of
the the number literal that it modifies.

Section 3.4.6.1 includes the appropriate rule. Note the last
sentence.

<number literal> ::= ['-'] <number>
<number> ::= integer | float | scaledDecimal

If the preceding '-' is not present the value of the numeric
object is a positive number. If the '-' is present the value of
the numeric object is the negative number that is the negation of
the positive number defined by the <number> clause. White space is
allowed between the '-' and the <number>.

As best as I can tell, the standard is factored this way so that you
can divide your parser in the standard way into a tokenizer and a
parser. When the tokenizer sees "-1", it should divide it into two
tokens, "-" and "1". Then, the parser is free to interpreter this as
either a literal, as in "x := -1", or a subtraction of 1, as in
"y := x-1".

Since it's handled at the level of parsing, white space is allowed
for consistency. You can even put comments in there if you like.

That's my rationalization, anyway. :) The rationale doesn't say, but
the spec is clear that spaces are allowed there.

> Also, as already said, inside literal array #(- 1) space is
> significant.

In this case, it should probably be the same as #(-1). The standard is
generally too quiet about array literals, but in this case it does
define a parse, so I guess we should use the standard parse.

If you want to parse a two-element array out of the above, you can
write it as: #(#- 1) .

> And what about the sign of exponent?
> ClosureCompiler evaluate: '-1.0e- 1'. Message not understood e
> ((-1.0) e) - (1), so space is significant here.

According to the standard, the only place you can insert a space
inside a number literal is if the whole literal starts with a "-".
This (ugly) exception only applies at the beginning, and not in the
example you give.

> Beside, as NewCompiler accepts minus as last character of a
> multi-character binary selector, this causes further ambiguity.
>
> ClosureCompiler evaluate: '0--1'. is 1 (0-(-1)) last minus is attached
> to digit because there is no space.

This code is incorrect by the standard, and I'd be happy with
rejecting it. ANSI uses normal old longest match. Section 3.5 says:

Unless otherwise specified, white space or another separator must
appear between any two tokens if the initial characters of the
second token would be a valid extension of the first token.

Thus, 0--1 should tokenize as "0", "--", "1", which then does not
parse.

Lex Spoon

Nicolas Cellier-3

Re: NewCompiler weird ANSI

Lex Spoon a écrit :

> nicolas cellier <[hidden email]> writes:
>> As stated in NewCompiler's code, the ANSI syntax:
>
> These are fun challenges, Nicolas! They will help us get a compiler
> that does precisely what we want from it.
>
> In deciding the behavior we want, I would propose two principles, the
> first being stronger than the last:
>
> 1. Be compatible on the trivial stuff. Save Squeak-isms for places
> where there is a real advantage.
>
> 2. Lean towards accepting more rather than less. Especially we
> should try to accept things that other implementations accept.
>

Very reasonnable.
But which dialect does interpret '- 1' the ANSI way?
Not VW, nor gst, nor stx... (didn't check Dolphin nor VA)

>
> Here are my attempts to figure out what ANSI actually wants in these
> cases. Be aware it is not always what the compiler is currently
> doing.
>
>
>> ClosureCompiler evaluate: '- 1'
>>
>> is WEIRD!
>> It answers -1 as a literal negative number, space not being
>> significant. BEWARE a tab or cr are significant in current
>> implementation (ANSI?)
>>
>> This is more confusing than usefull.
>> It makes people think of a prefixed operator like other languages.
>
> The standard is clear. A unary - is allowed to dangle way ahead of
> the the number literal that it modifies.
>
>
> Section 3.4.6.1 includes the appropriate rule. Note the last
> sentence.
>
>
> <number literal> ::= ['-'] <number>
> <number> ::= integer | float | scaledDecimal
>
> If the preceding '-' is not present the value of the numeric
> object is a positive number. If the '-' is present the value of
> the numeric object is the negative number that is the negation of
> the positive number defined by the <number> clause. White space is
> allowed between the '-' and the <number>.
>
>
> As best as I can tell, the standard is factored this way so that you
> can divide your parser in the standard way into a tokenizer and a
> parser. When the tokenizer sees "-1", it should divide it into two
> tokens, "-" and "1". Then, the parser is free to interpreter this as
> either a literal, as in "x := -1", or a subtraction of 1, as in
> "y := x-1".
>
> Since it's handled at the level of parsing, white space is allowed
> for consistency. You can even put comments in there if you like.
>
> That's my rationalization, anyway. :) The rationale doesn't say, but
> the spec is clear that spaces are allowed there.
>

This makes some sense. Though not implemented that way in new compiler
(only character spaces are allowed, not logical spaces).

This is more the gramar rule which is questionable. It's not based on
any dialect customs, nor historical roots. Maybe the fact that people
coming from other language may appreciate...

>
>
>> Also, as already said, inside literal array #(- 1) space is
>> significant.
>
> In this case, it should probably be the same as #(-1). The standard is
> generally too quiet about array literals, but in this case it does
> define a parse, so I guess we should use the standard parse.
>
> If you want to parse a two-element array out of the above, you can
> write it as: #(#- 1) .
>
>

I would not like it. I prefer to understand the rule as:
1) a space separates two tokens
2) -1 is a single token not two, while - 1 is two tokens
3) a literal array is an array of literal tokens

My guess is that this was the rule in the mind of Smalltalk gods.
Only a guess...

Anyway, it's the actual behaviour of most Smalltalks.

>
>
>> And what about the sign of exponent?
>> ClosureCompiler evaluate: '-1.0e- 1'. Message not understood e
>> ((-1.0) e) - (1), so space is significant here.
>
> According to the standard, the only place you can insert a space
> inside a number literal is if the whole literal starts with a "-".
> This (ugly) exception only applies at the beginning, and not in the
> example you give.
>
>

Not a brilliant example anyway, forget it.

>
>
>> Beside, as NewCompiler accepts minus as last character of a
>> multi-character binary selector, this causes further ambiguity.
>>
>> ClosureCompiler evaluate: '0--1'. is 1 (0-(-1)) last minus is attached
>> to digit because there is no space.
>
> This code is incorrect by the standard, and I'd be happy with
> rejecting it. ANSI uses normal old longest match. Section 3.5 says:
>
> Unless otherwise specified, white space or another separator must
> appear between any two tokens if the initial characters of the
> second token would be a valid extension of the first token.
>
> Thus, 0--1 should tokenize as "0", "--", "1", which then does not
> parse.
>
>
>
> Lex Spoon
>
>
>

Yes, just like #||, #-- is not standard. It is an extension.
For the very reason to avoid ambiguity caused when mixed with negative
literal numbers if i understood it well.

It would be easy to have a pedantic interactive compiler forcing user to
disambiguate (at least warning, or a menu proposing various
interpretations with auto-inserting space action).

See:
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-May/thread.html#103895
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-May/104088.html
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-May/103907.html
etc... (binary selectors ambiguity and space)

http://bugs.squeak.org/view.php?id=3616