Smalltalk › Pharo › Pharo Smalltalk Developers

Binary selector and special characters

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

23 messages Options

Nicolai Hess-3-2

Binary selector and special characters

Hi,

where can I find a good reference about what characters are allowed as

binary selectors (from old syntax definition) and what is nowadays allowed

by the implementations.

And whether the current set of allowed binaries selector includes some additions on

purpose or if this is just a bug of the parser.

From what I found out, (Blue book and some other smalltalk syntax definitions)
the current set of allowed characters includes the "special characters":
$! $% $& $* $+ $, $- $/ $< $= $> $? $@ $\ $| $~

(some implementation do not allow $@ and some calls $- not a special character

but allowed as binary selector character)

And this is what String>>#numArgs uses. Therefore

'-' numArgs "->1".

'!' numArgs "->1".

And for example:
'§' numArgs "-> -1 (the -1 is indicating "not even a valid selector")"

But I am interested in the characters not called "special characters and

not even in the range 0-126.

The scanner allowes much more characters to be used as a selector name

(From the scanners typeTable) :

{Character value: 1 . Character value: 2 . Character value: 3 . Character value: 4 . Character value: 5 . Character value: 6 . Character value: 7 . Character backspace . Character value: 11 . Character value: 14 . Character value: 15 . Character value: 16 . Character value: 17 . Character value: 18 . Character value: 19 . Character value: 20 . Character value: 21 . Character value: 22 . Character value: 23 . Character value: 24 . Character value: 25 . Character value: 26 . Character escape . Character value: 28 . Character value: 29 . Character value: 30 . Character value: 31 . $! . $% . $& . $* . $+ . $, . $- . $/ . $< . $= . $> . $? . $@ . $\ . $` . $~ . Character delete . $€ . $ . $‚ . $ƒ . $„ . $… . $† . $‡ . $ˆ . $‰ . $Š . $‹ . $Œ . $ . $Ž . $ . $ . $‘ . $’ . $“ . $” . $• . $– . $— . $˜ . $™ . $š . $› . $œ . $ . $ž . $Ÿ . $ . $¡ . $¢ . $£ . $¤ . $¥ . $¦ . $§ . $¨ . $© . $« . $¬ . $ . $® . $¯ . $° . $± . $² . $³ . $´ . $¶ . $· . $¸ . $¹ . $» . $¼ . $½ . $¾ . $¿ . $× . $÷}

This means you can define a method with for example the name "÷".

So , the question I want to ask. What do we want to allow as a binary selector (character).

All that is nowadays "parseable" as binary selector, or only the set of "special characters"

or something between both, and where to put this information, the "this is an allowed binary

selector character" information?

Thanks

Nicolai

monty-3

Re: Binary selector and special characters

See RBParserTest>>#testBinarySelectors

It's based on the draft ANSI Smalltalk-80 standard. You integrated it. It tests the RBParser's parsing of binary method definitions and message sends of all binary selectors from 1 char upto 3 chars. (The Blue Book is more restrictive than ANSI, limiting them to 2 chars max IIRC.)

I wrote the test because of issues I had with the OldCompiler's handling of selectors containing "|" and issues on other platforms like GemStone, so the behavior I need and think is correct won't get broken without warning.

stepharo

Re: Binary selector and special characters

In reply to this post by Nicolai Hess-3-2

This is a really nice and important question.
I would really have a clear answer because it will make the system more
stable.

If you can build an analysis and let us know it would be really great.

Something related but not on the same topic is that I would love to have
a syntax for nested comments.

This is really annoying to have to uncomment parts when we have to
comment a large part. We discussed this back in 2007-2009 but we never
did it.

Stef

Le 28/8/16 à 12:17, Nicolai Hess a écrit :

> Hi,
>
> where can I find a good reference about what characters are allowed as
> binary selectors (from old syntax definition) and what is nowadays allowed
> by the implementations.
>
> And whether the current set of allowed binaries selector includes some
> additions on
> purpose or if this is just a bug of the parser.
>
> From what I found out, (Blue book and some other smalltalk syntax
> definitions)
> the current set of allowed characters includes the "special characters":
> $! $% $& $* $+ $, $- $/ $< $= $> $? $@ $\ $| $~
> (some implementation do not allow $@ and some calls $- not a special
> character
> but allowed as binary selector character)
>
> And this is what String>>#numArgs uses. Therefore
>
> '-' numArgs "->1".
> '!' numArgs "->1".
> And for example:
> '§' numArgs "-> -1 (the -1 is indicating "not even a valid selector")"
>
> But I am interested in the characters not called "special characters and
> not even in the range 0-126.
>
> The scanner allowes much more characters to be used as a selector name
> (From the scanners typeTable) :
>
> {Character value: 1 . Character value: 2 . Character value: 3 .
> Character value: 4 . Character value: 5 . Character value: 6 .
> Character value: 7 . Character backspace . Character value: 11 .
> Character value: 14 . Character value: 15 . Character value: 16 .
> Character value: 17 . Character value: 18 . Character value: 19 .
> Character value: 20 . Character value: 21 . Character value: 22 .
> Character value: 23 . Character value: 24 . Character value: 25 .
> Character value: 26 . Character escape . Character value: 28 .
> Character value: 29 . Character value: 30 . Character value: 31 . $! .
> $% . $& . $* . $+ . $, . $- . $/ . $< . $= . $> . $? . $@ . $\ . $` .
> $~ . Character delete . $€ . $ . $‚ . $ƒ . $„ . $… . $† . $‡ . $ˆ . $‰
> . $Š . $‹ . $Œ . $ . $Ž . $ . $ . $‘ . $’ . $“ . $” . $• . $– . $— .
> $˜ . $™ . $š . $› . $œ . $ . $ž . $Ÿ . $ . $¡ . $¢ . $£ . $¤ . $¥ .
> $¦ . $§ . $¨ . $© . $« . $¬ . $ . $® . $¯ . $° . $± . $² . $³ . $´ .
> $¶ . $· . $¸ . $¹ . $» . $¼ . $½ . $¾ . $¿ . $× . $÷}
>
> This means you can define a method with for example the name "÷".
>
> So , the question I want to ask. What do we want to allow as a binary
> selector (character).
> All that is nowadays "parseable" as binary selector, or only the set
> of "special characters"
> or something between both, and where to put this information, the
> "this is an allowed binary
> selector character" information?
>
> Thanks
> Nicolai
>

Thierry Goubier

Re: Binary selector and special characters

Hi Stef,

2016-08-29 8:35 GMT+02:00 stepharo <[hidden email]>:

This is a really nice and important question.
I would really have a clear answer because it will make the system more stable.

If you can build an analysis and let us know it would be really great.

Something related but not on the same topic is that I would love to have a syntax for nested comments.

This is really annoying to have to uncomment parts when we have to comment a large part. We discussed this back in 2007-2009 but we never did it.

If your need is that (uncomment while commenting and the reverse), then the answer is not a syntax change, but a better comment/uncomment command in the editor.

Now that you say that, I also have the issue. I'll have a try in AltBrowser if I can get the behavior you'd like.

Thierry

Stef

Le 28/8/16 à 12:17, Nicolai Hess a écrit :

Hi,

where can I find a good reference about what characters are allowed as
binary selectors (from old syntax definition) and what is nowadays allowed
by the implementations.

And whether the current set of allowed binaries selector includes some additions on
purpose or if this is just a bug of the parser.

From what I found out, (Blue book and some other smalltalk syntax definitions)
the current set of allowed characters includes the "special characters":
$! $% $& $* $+ $, $- $/ $< $= $> $? $@ $\ $| $~
(some implementation do not allow $@ and some calls $- not a special character
but allowed as binary selector character)

And this is what String>>#numArgs uses. Therefore

'-' numArgs "->1".
'!' numArgs "->1".
And for example:
'§' numArgs "-> -1 (the -1 is indicating "not even a valid selector")"

But I am interested in the characters not called "special characters and
not even in the range 0-126.

The scanner allowes much more characters to be used as a selector name
(From the scanners typeTable) :

{Character value: 1 . Character value: 2 . Character value: 3 . Character value: 4 . Character value: 5 . Character value: 6 . Character value: 7 . Character backspace . Character value: 11 . Character value: 14 . Character value: 15 . Character value: 16 . Character value: 17 . Character value: 18 . Character value: 19 . Character value: 20 . Character value: 21 . Character value: 22 . Character value: 23 . Character value: 24 . Character value: 25 . Character value: 26 . Character escape . Character value: 28 . Character value: 29 . Character value: 30 . Character value: 31 . $! . $% . $& . $* . $+ . $, . $- . $/ . $< . $= . $> . $? . $@ . $\ . $` . $~ . Character delete . $€ . $ . $‚ . $ƒ . $„ . $… . $† . $‡ . $ˆ . $‰ . $Š . $‹ . $Œ . $ . $Ž . $ . $ . $‘ . $’ . $“ . $” . $• . $– . $— . $˜ . $™ . $š . $› . $œ . $ . $ž . $Ÿ . $ . $¡ . $¢ . $£ . $¤ . $¥ . $¦ . $§ . $¨ . $© . $« . $¬ . $ . $® . $¯ . $° . $± . $² . $³ . $´ . $¶ . $· . $¸ . $¹ . $» . $¼ . $½ . $¾ . $¿ . $× . $÷}

This means you can define a method with for example the name "÷".

So , the question I want to ask. What do we want to allow as a binary selector (character).
All that is nowadays "parseable" as binary selector, or only the set of "special characters"
or something between both, and where to put this information, the "this is an allowed binary
selector character" information?

Thanks
Nicolai

Nicolai Hess-3-2

Re: Binary selector and special characters

In reply to this post by monty-3

2016-08-28 13:41 GMT+02:00 monty <[hidden email]>:

See RBParserTest>>#testBinarySelectors

It's based on the draft ANSI Smalltalk-80 standard. You integrated it. It tests the RBParser's parsing of binary method definitions and message sends of all binary selectors from 1 char upto 3 chars. (The Blue Book is more restrictive than ANSI, limiting them to 2 chars max IIRC.)

I wrote the test because of issues I had with the OldCompiler's handling of selectors containing "|" and issues on other platforms like GemStone, so the behavior I need and think is correct won't get broken without warning.

Hi Monty,

yes, but I am just wondering why the scanner interprets some characters as binary selector token, whereas they are not allowed as binary selectors.

In the old scanner, the initialization of the type table just sets " binary token" as the default for all characters and changes some of them explicit to for example

($0 asciiValue to: $9 asciiValue) -> digit tokens

#(9 10 12 13 32 ) -> delimiter token
...

But RBScanner on the other hand explicitly sets some non-ascii characters to be used as "binary tokens"

    classificationTable at: 177 put: #binary.    "plus-or-minus"
    classificationTable at: 183 put: #binary.    "centered dot"
    classificationTable at: 215 put: #binary.    "times"
    classificationTable at: 247 put: #binary.    "divide"

It looks like someone ( or somewhere ) it should be allowed to use these characters as a binary selector
"#($± $· $× $÷)"

Although later at the parsing step, using this tokens for binary message selectors isn't allowed.

I think I will exclude these characters as binary selector tokens.

Thierry Goubier

Re: Binary selector and special characters

In reply to this post by Thierry Goubier

2016-08-29 10:58 GMT+02:00 Thierry Goubier <[hidden email]>:

Hi Stef,

2016-08-29 8:35 GMT+02:00 stepharo <[hidden email]>:
This is a really nice and important question.
I would really have a clear answer because it will make the system more stable.

If you can build an analysis and let us know it would be really great.

Something related but not on the same topic is that I would love to have a syntax for nested comments.

This is really annoying to have to uncomment parts when we have to comment a large part. We discussed this back in 2007-2009 but we never did it.

If your need is that (uncomment while commenting and the reverse), then the answer is not a syntax change, but a better comment/uncomment command in the editor.

Now that you say that, I also have the issue. I'll have a try in AltBrowser if I can get the behavior you'd like.

That works, but:

- Single click select doesn't work very well (stops at the next double quotes instead of the end double quote)

- The formatter likes to split double quotes (adding returns)

- Backporting that to a standard editor is a mess, because there is a need to change the #enclose: method.

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

Thierry

Thierry

Stef

Le 28/8/16 à 12:17, Nicolai Hess a écrit :

Hi,

where can I find a good reference about what characters are allowed as
binary selectors (from old syntax definition) and what is nowadays allowed
by the implementations.

And whether the current set of allowed binaries selector includes some additions on
purpose or if this is just a bug of the parser.

From what I found out, (Blue book and some other smalltalk syntax definitions)
the current set of allowed characters includes the "special characters":
$! $% $& $* $+ $, $- $/ $< $= $> $? $@ $\ $| $~
(some implementation do not allow $@ and some calls $- not a special character
but allowed as binary selector character)

And this is what String>>#numArgs uses. Therefore

'-' numArgs "->1".
'!' numArgs "->1".
And for example:
'§' numArgs "-> -1 (the -1 is indicating "not even a valid selector")"

But I am interested in the characters not called "special characters and
not even in the range 0-126.

The scanner allowes much more characters to be used as a selector name
(From the scanners typeTable) :

{Character value: 1 . Character value: 2 . Character value: 3 . Character value: 4 . Character value: 5 . Character value: 6 . Character value: 7 . Character backspace . Character value: 11 . Character value: 14 . Character value: 15 . Character value: 16 . Character value: 17 . Character value: 18 . Character value: 19 . Character value: 20 . Character value: 21 . Character value: 22 . Character value: 23 . Character value: 24 . Character value: 25 . Character value: 26 . Character escape . Character value: 28 . Character value: 29 . Character value: 30 . Character value: 31 . $! . $% . $& . $* . $+ . $, . $- . $/ . $< . $= . $> . $? . $@ . $\ . $` . $~ . Character delete . $€ . $ . $‚ . $ƒ . $„ . $… . $† . $‡ . $ˆ . $‰ . $Š . $‹ . $Œ . $ . $Ž . $ . $ . $‘ . $’ . $“ . $” . $• . $– . $— . $˜ . $™ . $š . $› . $œ . $ . $ž . $Ÿ . $ . $¡ . $¢ . $£ . $¤ . $¥ . $¦ . $§ . $¨ . $© . $« . $¬ . $ . $® . $¯ . $° . $± . $² . $³ . $´ . $¶ . $· . $¸ . $¹ . $» . $¼ . $½ . $¾ . $¿ . $× . $÷}

This means you can define a method with for example the name "÷".

So , the question I want to ask. What do we want to allow as a binary selector (character).
All that is nowadays "parseable" as binary selector, or only the set of "special characters"
or something between both, and where to put this information, the "this is an allowed binary
selector character" information?

Thanks
Nicolai

stepharo

Re: Binary selector and special characters

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

This one could be nice too :)

Peter Uhnak

Re: Binary selector and special characters

On Mon, Aug 29, 2016 at 11:42 AM, stepharo <[hidden email]> wrote:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

This one could be nice too :)

This would be also consistent with how single quotes are handled in a string, which would make it doublenice. :)

Peter

Thierry Goubier

Re: Binary selector and special characters

In reply to this post by stepharo

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]>:

Thierry

If you have a better editor control even better :)

What I did is yank that sort of behavior into independent commands that are keymap-bound to an editor. A lot easier to customize " independently of ', (, [.

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

This one could be nice too :)

And simple: probably just a line or two inside RBScanner.

Thierry

Thierry Goubier

Re: Binary selector and special characters

In reply to this post by stepharo

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean regarding the Smalltalk commonly accepted syntax if that feature is integrated?

Thierry

Thierry Goubier

Re: Binary selector and special characters

In reply to this post by Peter Uhnak

2016-08-29 13:24 GMT+02:00 Peter Uhnák <[hidden email]>:

On Mon, Aug 29, 2016 at 11:42 AM, stepharo <[hidden email]> wrote:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

This one could be nice too :)

This would be also consistent with how single quotes are handled in a string, which would make it doublenice. :)

Implementing it means looking at how strings are scanned :)

Thierry

Peter

stepharo

Re: Binary selector and special characters

In reply to this post by Thierry Goubier

Le 29/8/16 à 17:45, Thierry Goubier a écrit :

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e. do not split into two comments if encountered inside a comment, as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean regarding the Smalltalk commonly accepted syntax if that feature is integrated?

It will break compatibility for people using it now we should raise the topic and lets a chance to people to discuss about it. We could check before publishing if code contain nested comments.
I think that I would use them only when developing.

Stef

Thierry

Thierry Goubier

Re: Binary selector and special characters

Le 29/08/2016 à 21:28, stepharo a écrit :

>
>
> Le 29/8/16 à 17:45, Thierry Goubier a écrit :
>> Hi Stef,
>>
>> 2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]
>> <mailto:[hidden email]>>:
>>
>> Thierry
>>
>> If you have a better editor control even better :)
>>>
>>> Syntax wise, one could consider "" to be inside a comment (i.e.
>>> do not split into two comments if encountered inside a comment,
>>> as it is done now).
>> This one could be nice too :)
>>
>>
>> https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments
>>
>> I'll have the slice ready soon. Any comments on what that would mean
>> regarding the Smalltalk commonly accepted syntax if that feature is
>> integrated?
> It will break compatibility for people using it now we should raise the
> topic and lets a chance to people to discuss about it. We could check
> before publishing if code contain nested comments.

Hum. The slice should parse anything legal Smalltalk; just that it may
show less comments intervals (because in fact it will coalesce adjacent
comments).

For example, standard parse will say that:

'"this ""test"' is a token with two comments, intervals 1 to: 7 and 8
to: 13.

The slice makes that a single comment:

'"this ""test"' is a token with one comment, interval 1 to: 13.

Now, this has probably no impact on parsing smalltalk code. But it
changes a bit the language definition, so that's why I'd like comments
on it.

> I think that I would use them only when developing.

Up to you :)

The most interesting is to have the correct comment/uncomment behavior
in an editor... that one works independently and is quite cool.

Thierry

> Stef
>
>>
>> Thierry
>>
>

Nicolai Hess-3-2

Re: Binary selector and special characters

2016-08-29 21:38 GMT+02:00 Thierry Goubier <[hidden email]>:

Le 29/08/2016 à 21:28, stepharo a écrit :

Le 29/8/16 à 17:45, Thierry Goubier a écrit :

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]
<mailto:[hidden email]>>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e.
do not split into two comments if encountered inside a comment,
as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean
regarding the Smalltalk commonly accepted syntax if that feature is
integrated?

It will break compatibility for people using it now we should raise the
topic and lets a chance to people to discuss about it. We could check
before publishing if code contain nested comments.

Hum. The slice should parse anything legal Smalltalk; just that it may show less comments intervals (because in fact it will coalesce adjacent comments).

Yes, I think the change for RBScanner is fine, it does not changes what kind of comments are accepted, only how they are assigned to

the AST nodes (one vs. multiple comments).

(BTW. do we have a function that would do the coalescing of intervals:

(1 to:99) (100 to: 199) -> (1 to:199)

? )

For example, standard parse will say that:

'"this ""test"' is a token with two comments, intervals 1 to: 7 and 8 to: 13.

The slice makes that a single comment:

'"this ""test"' is a token with one comment, interval 1 to: 13.

Now, this has probably no impact on parsing smalltalk code. But it changes a bit the language definition, so that's why I'd like comments on it.

I think that I would use them only when developing.

Up to you :)

The most interesting is to have the correct comment/uncomment behavior in an editor... that one works independently and is quite cool.

Thierry

Stef

Thierry

Eliot Miranda-2

Re: Binary selector and special characters

On Tue, Aug 30, 2016 at 8:14 AM, Nicolai Hess <[hidden email]> wrote:

2016-08-29 21:38 GMT+02:00 Thierry Goubier <[hidden email]>:
Le 29/08/2016 à 21:28, stepharo a écrit :

Le 29/8/16 à 17:45, Thierry Goubier a écrit :

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]
<mailto:[hidden email]>>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e.
do not split into two comments if encountered inside a comment,
as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean
regarding the Smalltalk commonly accepted syntax if that feature is
integrated?

It will break compatibility for people using it now we should raise the
topic and lets a chance to people to discuss about it. We could check
before publishing if code contain nested comments.

Hum. The slice should parse anything legal Smalltalk; just that it may show less comments intervals (because in fact it will coalesce adjacent comments).

Yes, I think the change for RBScanner is fine, it does not changes what kind of comments are accepted, only how they are assigned to
the AST nodes (one vs. multiple comments).

(BTW. do we have a function that would do the coalescing of intervals:

(1 to:99) (100 to: 199) -> (1 to:199)

? )

Specialize #, in Interval to check. Right now #, will answer an Array, but it would be easy to special-case.

For example, standard parse will say that:

'"this ""test"' is a token with two comments, intervals 1 to: 7 and 8 to: 13.

The slice makes that a single comment:

'"this ""test"' is a token with one comment, interval 1 to: 13.

Now, this has probably no impact on parsing smalltalk code. But it changes a bit the language definition, so that's why I'd like comments on it.

I think that I would use them only when developing.

Up to you :)

The most interesting is to have the correct comment/uncomment behavior in an editor... that one works independently and is quite cool.

Thierry

Stef

Thierry

_,,,^..^,,,_

best, Eliot

Eliot Miranda-2

Re: Binary selector and special characters

In reply to this post by Nicolai Hess-3-2

On Tue, Aug 30, 2016 at 8:14 AM, Nicolai Hess <[hidden email]> wrote:

2016-08-29 21:38 GMT+02:00 Thierry Goubier <[hidden email]>:
Le 29/08/2016 à 21:28, stepharo a écrit :

Le 29/8/16 à 17:45, Thierry Goubier a écrit :

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]
<mailto:[hidden email]>>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e.
do not split into two comments if encountered inside a comment,
as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean
regarding the Smalltalk commonly accepted syntax if that feature is
integrated?

It will break compatibility for people using it now we should raise the
topic and lets a chance to people to discuss about it. We could check
before publishing if code contain nested comments.

Hum. The slice should parse anything legal Smalltalk; just that it may show less comments intervals (because in fact it will coalesce adjacent comments).

Yes, I think the change for RBScanner is fine, it does not changes what kind of comments are accepted, only how they are assigned to
the AST nodes (one vs. multiple comments).

(BTW. do we have a function that would do the coalescing of intervals:

(1 to:99) (100 to: 199) -> (1 to:199)

? )

Find attached something that works in Squeak 5

For example, standard parse will say that:

'"this ""test"' is a token with two comments, intervals 1 to: 7 and 8 to: 13.

The slice makes that a single comment:

'"this ""test"' is a token with one comment, interval 1 to: 13.

Now, this has probably no impact on parsing smalltalk code. But it changes a bit the language definition, so that's why I'd like comments on it.

I think that I would use them only when developing.

Up to you :)

The most interesting is to have the correct comment/uncomment behavior in an editor... that one works independently and is quite cool.

Thierry

Stef

Thierry

_,,,^..^,,,_

best, Eliot

Interval-methods.st (788 bytes) Download Attachment

Eliot Miranda-2

Re: Binary selector and special characters

Oops. No need to add a step method; the increment method already exists:

On Wed, Aug 31, 2016 at 9:12 AM, Eliot Miranda <[hidden email]> wrote:

On Tue, Aug 30, 2016 at 8:14 AM, Nicolai Hess <[hidden email]> wrote:

2016-08-29 21:38 GMT+02:00 Thierry Goubier <[hidden email]>:
Le 29/08/2016 à 21:28, stepharo a écrit :

Le 29/8/16 à 17:45, Thierry Goubier a écrit :

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]
<mailto:[hidden email]>>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e.
do not split into two comments if encountered inside a comment,
as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean
regarding the Smalltalk commonly accepted syntax if that feature is
integrated?

It will break compatibility for people using it now we should raise the
topic and lets a chance to people to discuss about it. We could check
before publishing if code contain nested comments.

Hum. The slice should parse anything legal Smalltalk; just that it may show less comments intervals (because in fact it will coalesce adjacent comments).

Yes, I think the change for RBScanner is fine, it does not changes what kind of comments are accepted, only how they are assigned to
the AST nodes (one vs. multiple comments).

(BTW. do we have a function that would do the coalescing of intervals:

(1 to:99) (100 to: 199) -> (1 to:199)

? )

Find attached something that works in Squeak 5

For example, standard parse will say that:

'"this ""test"' is a token with two comments, intervals 1 to: 7 and 8 to: 13.

The slice makes that a single comment:

'"this ""test"' is a token with one comment, interval 1 to: 13.

Now, this has probably no impact on parsing smalltalk code. But it changes a bit the language definition, so that's why I'd like comments on it.

I think that I would use them only when developing.

Up to you :)

The most interesting is to have the correct comment/uncomment behavior in an editor... that one works independently and is quite cool.

Thierry

Stef

Thierry

_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_

best, Eliot

Interval-methods.st (618 bytes) Download Attachment

Nicolai Hess-3-2

Re: Binary selector and special characters

2016-08-31 10:14 GMT+02:00 Eliot Miranda <[hidden email]>:

Oops. No need to add a step method; the increment method already exists:

On Wed, Aug 31, 2016 at 9:12 AM, Eliot Miranda <[hidden email]> wrote:

On Tue, Aug 30, 2016 at 8:14 AM, Nicolai Hess <[hidden email]> wrote:

2016-08-29 21:38 GMT+02:00 Thierry Goubier <[hidden email]>:
Le 29/08/2016 à 21:28, stepharo a écrit :

Le 29/8/16 à 17:45, Thierry Goubier a écrit :

Hi Stef,

2016-08-29 11:42 GMT+02:00 stepharo <[hidden email]
<mailto:[hidden email]>>:

Thierry

If you have a better editor control even better :)

Syntax wise, one could consider "" to be inside a comment (i.e.
do not split into two comments if encountered inside a comment,
as it is done now).

This one could be nice too :)

https://pharo.fogbugz.com/f/cases/19011/Integrate-two-double-quotes-inside-comments

I'll have the slice ready soon. Any comments on what that would mean
regarding the Smalltalk commonly accepted syntax if that feature is
integrated?

It will break compatibility for people using it now we should raise the
topic and lets a chance to people to discuss about it. We could check
before publishing if code contain nested comments.

Hum. The slice should parse anything legal Smalltalk; just that it may show less comments intervals (because in fact it will coalesce adjacent comments).

Yes, I think the change for RBScanner is fine, it does not changes what kind of comments are accepted, only how they are assigned to
the AST nodes (one vs. multiple comments).

(BTW. do we have a function that would do the coalescing of intervals:

(1 to:99) (100 to: 199) -> (1 to:199)

? )

Find attached something that works in Squeak 5

Nice,

But actually I wasn't clear about the requirements :-)

The purpose was to merge source code intervals after parsing code comments.

The comments may be adjacent and could be merged into one comment. For this

I would like to merge an collection of intervals in a smaller number of intervals with adjacent intervals merged into one:

{ (30 to: 35) . (36 to:40) . (50 to:100) }
-> { (30 to:40) . (50 to:100) }

But Thierry already changed the scanner to produce this smaller set of intervals/comments :-)

For example, standard parse will say that:

'"this ""test"' is a token with two comments, intervals 1 to: 7 and 8 to: 13.

The slice makes that a single comment:

'"this ""test"' is a token with one comment, interval 1 to: 13.

Now, this has probably no impact on parsing smalltalk code. But it changes a bit the language definition, so that's why I'd like comments on it.

I think that I would use them only when developing.

Up to you :)

The most interesting is to have the correct comment/uncomment behavior in an editor... that one works independently and is quite cool.

Thierry

Stef

Thierry

_,,,^..^,,,_
best, Eliot

--
_,,,^..^,,,_
best, Eliot

Nicolai Hess-3-2

Re: Binary selector and special characters

In reply to this post by Nicolai Hess-3-2

2016-08-29 11:23 GMT+02:00 Nicolai Hess <[hidden email]>:

2016-08-28 13:41 GMT+02:00 monty <[hidden email]>:
See RBParserTest>>#testBinarySelectors

It's based on the draft ANSI Smalltalk-80 standard. You integrated it. It tests the RBParser's parsing of binary method definitions and message sends of all binary selectors from 1 char upto 3 chars. (The Blue Book is more restrictive than ANSI, limiting them to 2 chars max IIRC.)

I wrote the test because of issues I had with the OldCompiler's handling of selectors containing "|" and issues on other platforms like GemStone, so the behavior I need and think is correct won't get broken without warning.

Hi Monty,
yes, but I am just wondering why the scanner interprets some characters as binary selector token, whereas they are not allowed as binary selectors.
In the old scanner, the initialization of the type table just sets " binary token" as the default for all characters and changes some of them explicit to for example

($0 asciiValue to: $9 asciiValue) -> digit tokens
#(9 10 12 13 32 ) -> delimiter token
...

But RBScanner on the other hand explicitly sets some non-ascii characters to be used as "binary tokens"

    classificationTable at: 177 put: #binary.    "plus-or-minus"
    classificationTable at: 183 put: #binary.    "centered dot"
    classificationTable at: 215 put: #binary.    "times"
    classificationTable at: 247 put: #binary.    "divide"

It looks like someone ( or somewhere ) it should be allowed to use these characters as a binary selector
"#($± $· $× $÷)"
Although later at the parsing step, using this tokens for binary message selectors isn't allowed.

I think I will exclude these characters as binary selector tokens.

Now back ontopic :)

Anyone knows why RefactoringBrowsers smalltalk scanner (RBScanner) explicit allowes

"#($± $· $× $÷)" to be binary selector characters ?

Is there any smalltalk dialect that uses these characters ?

I think I 'll remove the support for this in Pharo (it actually isn't really supported, althought the scanner scannes these characters

as binary selector tokens, the parser finally does not allow these characters as binary selector symbols.

John Brant-2

Re: Binary selector and special characters

On 08/31/2016 08:46 AM, Nicolai Hess wrote:

> Anyone knows why RefactoringBrowsers smalltalk scanner (RBScanner)
> explicit allowes
> "#($± $· $× $÷)" to be binary selector characters ?

I do -- I added them about 20 years ago :)...

> Is there any smalltalk dialect that uses these characters ?

VW allows them. When possible, we made the scanner/parser be a superset
of the VW & VA syntax.

John Brant