Encoding method source code in Tonel

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Encoding method source code in Tonel

Sven Van Caekenberghe-2
Hi,

Right now, TonelParser>>#methodBody looks a bit dangerous (too complex), and Nicolas already found some breaking cases as well a one possible solution. I was thinking a bit along the same lines, so here is a proposal.

Overall I think we should keep the current readability of Tonel as much as possible (it is very important for diffs), while still being correct - i.e. no method source, even convoluted ones, possibly in different languages should break the encoding.

The problem is how to quickly scan a stream for the closing $] (which occurs as a single character on a line by its own). Especially in Pharo source code, that situation (a conflict between the proper marker and actual source) is very rare, since code is usually intended. But is can occur and must be dealt with.

The proposal is to just double the $] but only on lines that could conflict (which are lines consisting of only $]'s). The code itself becomes quite simple and remains fast enough.

Here is some code to play with (EOL fixed to #cr for demo purposes):

writer := [ :stream :methodSourceString |
        stream nextPut: $[; cr.
        methodSourceString linesDo: [ :line |
                (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
                        ifTrue: [ stream << $] ].
                stream << line; cr ].
        stream nextPut: $]; cr ].

reader := [ :stream | | line |
        (stream peekFor: $[) ifTrue: [ stream nextLine ] ifFalse: [ self error ].
        String streamContents: [ :out |
                [ (line := stream nextLine) = ']' ] whileFalse: [
                        (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
                                ifTrue: [ out << line allButFirst ]
                                ifFalse: [ out << line ].
                        out cr ] ] ].

encoded := String streamContents: [ :out | writer value: out value: (Collection>>#inject:into:) sourceCode ].

(reader value: encoded readStream) = ((Collection>>#inject:into:) sourceCode, String cr).

escapedEncoded := String streamContents: [ :out | writer value: out value: 'test
"Make sure there is no whitespace at line start to force an escape"
self isFoo ifTrue: [ #foo
]
ifFalse: [ [ "block" #bar
]]
' ].

(reader value: escapedEncoded readStream).

What do you think ?
Did I miss any special cases ?
Is the price (very occasional insertion of an extra #] in the source code) worth the advantage (a simple, safe scheme).

Sven

PS: Note that several tools line syntax highlighting and double-click next to brackets have problems with weird delimiters such as in the above code ;-)


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Henrik Sperre Johansen
Can a line with a single ] ever at the "right" indentation level?
Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
(or 2,3,4 spaces?) would be an ok workaround that doesn't require special
decoding when loading the code?

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html

Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Sven Van Caekenberghe-2


> On 10 Nov 2017, at 11:49, Henrik Sperre Johansen <[hidden email]> wrote:
>
> Can a line with a single ] ever at the "right" indentation level?
> Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
> (or 2,3,4 spaces?) would be an ok workaround that doesn't require special
> decoding when loading the code?

The problem with inserting some whitespace is that you cannot know if you have to remove it (unless you insert whitespace everywhere, which was I believe Nicolas' proposal).

Like I said, this is all rare and convoluted. But inside a multiline string constant you can have everything, right ?

Foo>#myMethod
  "Return a help string"
  ^ 'This is for example help:

Never use
]
as the single character on a line!'

I think the formatter is not good enough to do an auto reformat on every commit (especially for data driven, spec style code). Changing the formatting algorithm (which might be a personal preference) would change all source code.

> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Nicolas Cellier
In reply to this post by Henrik Sperre Johansen
My proposition was to force methodBody closing to be on first column,
and disambiguate potential method conflicts by forcing a more transparent white space.

Maybe we can force the white space only on conflicting lines that would begin with a ].
Most methods are indented (and even formatted) so the conflict should be extremely rare.
It could happen though with multiple line literal string

    ^'
[
something went wrong here
]
'

If first column trick is not acceptable, then we will indeed come up with more visible escape sequences.

2017-11-10 11:49 GMT+01:00 Henrik Sperre Johansen <[hidden email]>:
Can a line with a single ] ever at the "right" indentation level?
Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
(or 2,3,4 spaces?) would be an ok workaround that doesn't require special
decoding when loading the code?

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Sven Van Caekenberghe-2


> On 10 Nov 2017, at 11:59, Nicolas Cellier <[hidden email]> wrote:
>
> My proposition was to force methodBody closing to be on first column,
> and disambiguate potential method conflicts by forcing a more transparent white space.
>
> Maybe we can force the white space only on conflicting lines that would begin with a ].
> Most methods are indented (and even formatted) so the conflict should be extremely rare.
> It could happen though with multiple line literal string
>
>     ^'
> [
> something went wrong here
> ]
> '
>
> If first column trick is not acceptable, then we will indeed come up with more visible escape sequences.

But don't you think that it would be weird to insert extra whitespace and not remove it (under the assumption that it does not matter), because is string constants like in your example, you would actually change code !

And yes, we are discussing a very rare case.

But as you showed, simplifying #methodBody is really important.

> 2017-11-10 11:49 GMT+01:00 Henrik Sperre Johansen <[hidden email]>:
> Can a line with a single ] ever at the "right" indentation level?
> Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
> (or 2,3,4 spaces?) would be an ok workaround that doesn't require special
> decoding when loading the code?
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

EstebanLM
Well… I’m working on make it more robust :)
but in general, I disagree with all solutions style “force some character to appear in one place”. I prefer to rely in a good (better) parsing than to force such kind of conventions that are also very easy to break (as soon as people can modify things outside the tools we propose). 

Esteban

On 10 Nov 2017, at 08:08, Sven Van Caekenberghe <[hidden email]> wrote:



On 10 Nov 2017, at 11:59, Nicolas Cellier <[hidden email]> wrote:

My proposition was to force methodBody closing to be on first column,
and disambiguate potential method conflicts by forcing a more transparent white space.

Maybe we can force the white space only on conflicting lines that would begin with a ].
Most methods are indented (and even formatted) so the conflict should be extremely rare.
It could happen though with multiple line literal string

   ^'
[
something went wrong here
]
'

If first column trick is not acceptable, then we will indeed come up with more visible escape sequences.

But don't you think that it would be weird to insert extra whitespace and not remove it (under the assumption that it does not matter), because is string constants like in your example, you would actually change code !

And yes, we are discussing a very rare case.

But as you showed, simplifying #methodBody is really important.

2017-11-10 11:49 GMT+01:00 Henrik Sperre Johansen <[hidden email]>:
Can a line with a single ] ever at the "right" indentation level?
Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
(or 2,3,4 spaces?) would be an ok workaround that doesn't require special
decoding when loading the code?

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html

Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Nicolas Cellier
In reply to this post by Sven Van Caekenberghe-2


2017-11-10 12:08 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:


> On 10 Nov 2017, at 11:59, Nicolas Cellier <[hidden email]> wrote:
>
> My proposition was to force methodBody closing to be on first column,
> and disambiguate potential method conflicts by forcing a more transparent white space.
>
> Maybe we can force the white space only on conflicting lines that would begin with a ].
> Most methods are indented (and even formatted) so the conflict should be extremely rare.
> It could happen though with multiple line literal string
>
>     ^'
> [
> something went wrong here
> ]
> '
>
> If first column trick is not acceptable, then we will indeed come up with more visible escape sequences.

But don't you think that it would be weird to insert extra whitespace and not remove it (under the assumption that it does not matter), because is string constants like in your example, you would actually change code !

And yes, we are discussing a very rare case.

But as you showed, simplifying #methodBody is really important.


Indeed, in some well known case, a literal would be modified bewteen Pharo and Tonel which would be rare but weird...
Currently, Tonel does not have to change the syntax at all.

The first goals are:
 - we should prefer a syntax agnostic format for two reasons
    * not insulting the future and supporting alternate syntax extensions (compilerClass)
    * making the parsing both robust and efficient (not necessary to use a RB parser for a quick scan)
 - we don't want too many decorations that would make the Tonel syntax look different from Pharo syntax

If we also want to support human edited Tonel files:
- we want something resilient (not be too much picky about a missing leading space for example)
- we want the most simple rules and least surprising rules

The first two goals look really antagonist...

There are other possibilities: if we don't want to change the methodBody at all, then let's change the opening/closing sequence

- at write, choose a closing sequence that has no exact match in source code, then use the inverse sequence as the opening sequence
- at read, just scan for the inverse of opening sequence

opening sequence could be any combination of [{<(
or maybe just a single [ repeated as many times as necessary



> 2017-11-10 11:49 GMT+01:00 Henrik Sperre Johansen <[hidden email]>:
> Can a line with a single ] ever at the "right" indentation level?
> Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
> (or 2,3,4 spaces?) would be an ok workaround that doesn't require special
> decoding when loading the code?
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Martin McClure-2
In reply to this post by EstebanLM
On 11/10/2017 07:10 AM, Esteban Lorenzano wrote:
> Well… I’m working on make it more robust :)
> but in general, I disagree with all solutions style “force some
> character to appear in one place”. I prefer to rely in a good (better)
> parsing than to force such kind of conventions that are also very easy
> to break (as soon as people can modify things outside the tools we
> propose).
>
Thanks, Esteban. This is approximately what I was just starting to write.

Regards,

-Martin


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Sven Van Caekenberghe-2
In reply to this post by Nicolas Cellier


> On 10 Nov 2017, at 16:11, Nicolas Cellier <[hidden email]> wrote:
>
> There are other possibilities: if we don't want to change the methodBody at all, then let's change the opening/closing sequence
>
> - at write, choose a closing sequence that has no exact match in source code, then use the inverse sequence as the opening sequence
> - at read, just scan for the inverse of opening sequence
>
> opening sequence could be any combination of [{<(
> or maybe just a single [ repeated as many times as necessary

That is a cool idea (used by some scripting languages, stdin handling, IIRC).

Check for conflicts of 1 $] closing sequence for $[ and keep on increasing the count until there is no conflict. Simple & effective. Supports any syntax.

BTW, I held off from commenting about the basic Tonel principle of a single file, but if 'we want to allow people to edit this file manually' becomes a goal, then I would be VERY VERY disappointed.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Nicolas Cellier


2017-11-10 16:21 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:


> On 10 Nov 2017, at 16:11, Nicolas Cellier <[hidden email]> wrote:
>
> There are other possibilities: if we don't want to change the methodBody at all, then let's change the opening/closing sequence
>
> - at write, choose a closing sequence that has no exact match in source code, then use the inverse sequence as the opening sequence
> - at read, just scan for the inverse of opening sequence
>
> opening sequence could be any combination of [{<(
> or maybe just a single [ repeated as many times as necessary

That is a cool idea (used by some scripting languages, stdin handling, IIRC).

Check for conflicts of 1 $] closing sequence for $[ and keep on increasing the count until there is no conflict. Simple & effective. Supports any syntax.

BTW, I held off from commenting about the basic Tonel principle of a single file, but if 'we want to allow people to edit this file manually' becomes a goal, then I would be VERY VERY disappointed.

Sven


Bah, pragmatism rules, for patching a typo directly thru github API, it's OK...
If it's about replacing IDE with vi, then it's no more Pharo.

Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Martin McClure-2
In reply to this post by Nicolas Cellier
On 11/10/2017 07:11 AM, Nicolas Cellier wrote:
The first goals are:
 - we should prefer a syntax agnostic format for two reasons
    * not insulting the future and supporting alternate syntax extensions (compilerClass)
    * making the parsing both robust and efficient (not necessary to use a RB parser for a quick scan)
 - we don't want too many decorations that would make the Tonel syntax look different from Pharo syntax

If we also want to support human edited Tonel files:
- we want something resilient (not be too much picky about a missing leading space for example)
- we want the most simple rules and least surprising rules

The first two goals look really antagonist...

Tonel is intended to be, I believe:
* For Smalltalk primarily (although there are other languages)
* Be for any Smalltalk dialect (although the first implementation is for Pharo, it is being ported to GemStone, and we intend to port it to VW and VA)
* Be human-readable (some people want to be able to use tools like GitHub to look at diffs and such)
* Be human-editable using normal text editors (important for porting -- sometimes you just have to edit the source before it will load, for instance)

I think that Tonel using [] to enclose the source is a pretty good idea. A proper parser should be able to correctly parse Smalltalk code from any dialect, and it makes it human-readable and human-editable.

However, as has been noted in this thread, it would also be nice to handle non-Smalltalk code. One way to do this (one that was in an earlier proposal but is not currently in Tonel) is to use pure STON for any source code that does not have standard Smalltalk syntax.  A method like this would look something like:

{ 'category' : 'public',
   'source' : 'whatever the source code is, with STON escapes as necessary' }

Regards,
-Martin
Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Denis Kudriashov

2017-11-10 16:30 GMT+01:00 Martin McClure <[hidden email]>:
Tonel is intended to be, I believe:
* For Smalltalk primarily (although there are other languages)
* Be for any Smalltalk dialect (although the first implementation is for Pharo, it is being ported to GemStone, and we intend to port it to VW and VA)

Interesting how it will be handled when Pharo will use slots abstraction exclusively. And more simple, when there will be only package and tags instead of class categories.
 
* Be human-readable (some people want to be able to use tools like GitHub to look at diffs and such)
* Be human-editable using normal text editors (important for porting -- sometimes you just have to edit the source before it will load, for instance)

I think that Tonel using [] to enclose the source is a pretty good idea. A proper parser should be able to correctly parse Smalltalk code from any dialect, and it makes it human-readable and human-editable.

However, as has been noted in this thread, it would also be nice to handle non-Smalltalk code. One way to do this (one that was in an earlier proposal but is not currently in Tonel) is to use pure STON for any source code that does not have standard Smalltalk syntax.  A method like this would look something like:

{ 'category' : 'public',
   'source' : 'whatever the source code is, with STON escapes as necessary' }

Regards,
-Martin

Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Denis Kudriashov
2017-11-10 17:03 GMT+01:00 Denis Kudriashov <[hidden email]>:

2017-11-10 16:30 GMT+01:00 Martin McClure <[hidden email]>:
Tonel is intended to be, I believe:
* For Smalltalk primarily (although there are other languages)
* Be for any Smalltalk dialect (although the first implementation is for Pharo, it is being ported to GemStone, and we intend to port it to VW and VA)

Interesting how it will be handled when Pharo will use slots abstraction exclusively. And more simple, when there will be only package and tags instead of class categories.

To be more clear. Imaging that instead of #instanceVariables field Pharo will use #slots. And instead of #category it will be #package and #tags
 
 
* Be human-readable (some people want to be able to use tools like GitHub to look at diffs and such)
* Be human-editable using normal text editors (important for porting -- sometimes you just have to edit the source before it will load, for instance)

I think that Tonel using [] to enclose the source is a pretty good idea. A proper parser should be able to correctly parse Smalltalk code from any dialect, and it makes it human-readable and human-editable.

However, as has been noted in this thread, it would also be nice to handle non-Smalltalk code. One way to do this (one that was in an earlier proposal but is not currently in Tonel) is to use pure STON for any source code that does not have standard Smalltalk syntax.  A method like this would look something like:

{ 'category' : 'public',
   'source' : 'whatever the source code is, with STON escapes as necessary' }

Regards,
-Martin


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

EstebanLM


On 10 Nov 2017, at 13:05, Denis Kudriashov <[hidden email]> wrote:

2017-11-10 17:03 GMT+01:00 Denis Kudriashov <[hidden email]>:

2017-11-10 16:30 GMT+01:00 Martin McClure <[hidden email]>:
Tonel is intended to be, I believe:
* For Smalltalk primarily (although there are other languages)
* Be for any Smalltalk dialect (although the first implementation is for Pharo, it is being ported to GemStone, and we intend to port it to VW and VA)

Interesting how it will be handled when Pharo will use slots abstraction exclusively. And more simple, when there will be only package and tags instead of class categories.

To be more clear. Imaging that instead of #instanceVariables field Pharo will use #slots. And instead of #category it will be #package and #tags

what does it has to do with this thread?
handling that is trivial: once we actually have slots (bah, we have them, but once we *use* them) and once we move to package+tags, we just adapt the descriptions. That’s the advantage of using STON to keep them.

Esteban

 
 
* Be human-readable (some people want to be able to use tools like GitHub to look at diffs and such)
* Be human-editable using normal text editors (important for porting -- sometimes you just have to edit the source before it will load, for instance)

I think that Tonel using [] to enclose the source is a pretty good idea. A proper parser should be able to correctly parse Smalltalk code from any dialect, and it makes it human-readable and human-editable.

However, as has been noted in this thread, it would also be nice to handle non-Smalltalk code. One way to do this (one that was in an earlier proposal but is not currently in Tonel) is to use pure STON for any source code that does not have standard Smalltalk syntax.  A method like this would look something like:

{ 'category' : 'public',
   'source' : 'whatever the source code is, with STON escapes as necessary' }

Regards,
-Martin



Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

EstebanLM
In reply to this post by Nicolas Cellier
Hi Nico, 

I have a fix for both cases you provided. 
now, I need to refactor it because is ugly how I solve it :)

right now this is super agnostic: the only thing we ensure is to count correctly the brackets to be sure we have a complete method. All the cases you mention it are possible ways where a brackets can appear without being part of a block construction… of which I think we can delimit.
 
Esteban

On 10 Nov 2017, at 12:11, Nicolas Cellier <[hidden email]> wrote:



2017-11-10 12:08 GMT+01:00 Sven Van Caekenberghe <[hidden email]>:


> On 10 Nov 2017, at 11:59, Nicolas Cellier <[hidden email]> wrote:
>
> My proposition was to force methodBody closing to be on first column,
> and disambiguate potential method conflicts by forcing a more transparent white space.
>
> Maybe we can force the white space only on conflicting lines that would begin with a ].
> Most methods are indented (and even formatted) so the conflict should be extremely rare.
> It could happen though with multiple line literal string
>
>     ^'
> [
> something went wrong here
> ]
> '
>
> If first column trick is not acceptable, then we will indeed come up with more visible escape sequences.

But don't you think that it would be weird to insert extra whitespace and not remove it (under the assumption that it does not matter), because is string constants like in your example, you would actually change code !

And yes, we are discussing a very rare case.

But as you showed, simplifying #methodBody is really important.


Indeed, in some well known case, a literal would be modified bewteen Pharo and Tonel which would be rare but weird...
Currently, Tonel does not have to change the syntax at all.

The first goals are:
 - we should prefer a syntax agnostic format for two reasons
    * not insulting the future and supporting alternate syntax extensions (compilerClass)
    * making the parsing both robust and efficient (not necessary to use a RB parser for a quick scan)
 - we don't want too many decorations that would make the Tonel syntax look different from Pharo syntax

If we also want to support human edited Tonel files:
- we want something resilient (not be too much picky about a missing leading space for example)
- we want the most simple rules and least surprising rules

The first two goals look really antagonist...

There are other possibilities: if we don't want to change the methodBody at all, then let's change the opening/closing sequence

- at write, choose a closing sequence that has no exact match in source code, then use the inverse sequence as the opening sequence
- at read, just scan for the inverse of opening sequence

opening sequence could be any combination of [{<( 
or maybe just a single [ repeated as many times as necessary



> 2017-11-10 11:49 GMT+01:00 Henrik Sperre Johansen <[hidden email]>:

> Can a line with a single ] ever at the "right" indentation level?
> Forcing autoformat at commit is a bit heavy hand, but maybe inserting a tab
> (or 2,3,4 spaces?) would be an ok workaround that doesn't require special
> decoding when loading the code?
>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Stephane Ducasse-3
In reply to this post by Sven Van Caekenberghe-2
Hi sven

Do you have an example because I do not understand your doubling.

stef


On Fri, Nov 10, 2017 at 11:40 AM, Sven Van Caekenberghe <[hidden email]> wrote:

> Hi,
>
> Right now, TonelParser>>#methodBody looks a bit dangerous (too complex), and Nicolas already found some breaking cases as well a one possible solution. I was thinking a bit along the same lines, so here is a proposal.
>
> Overall I think we should keep the current readability of Tonel as much as possible (it is very important for diffs), while still being correct - i.e. no method source, even convoluted ones, possibly in different languages should break the encoding.
>
> The problem is how to quickly scan a stream for the closing $] (which occurs as a single character on a line by its own). Especially in Pharo source code, that situation (a conflict between the proper marker and actual source) is very rare, since code is usually intended. But is can occur and must be dealt with.
>
> The proposal is to just double the $] but only on lines that could conflict (which are lines consisting of only $]'s). The code itself becomes quite simple and remains fast enough.
>
> Here is some code to play with (EOL fixed to #cr for demo purposes):
>
> writer := [ :stream :methodSourceString |
>         stream nextPut: $[; cr.
>         methodSourceString linesDo: [ :line |
>                 (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>                         ifTrue: [ stream << $] ].
>                 stream << line; cr ].
>         stream nextPut: $]; cr ].
>
> reader := [ :stream | | line |
>         (stream peekFor: $[) ifTrue: [ stream nextLine ] ifFalse: [ self error ].
>         String streamContents: [ :out |
>                 [ (line := stream nextLine) = ']' ] whileFalse: [
>                         (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>                                 ifTrue: [ out << line allButFirst ]
>                                 ifFalse: [ out << line ].
>                         out cr ] ] ].
>
> encoded := String streamContents: [ :out | writer value: out value: (Collection>>#inject:into:) sourceCode ].
>
> (reader value: encoded readStream) = ((Collection>>#inject:into:) sourceCode, String cr).
>
> escapedEncoded := String streamContents: [ :out | writer value: out value: 'test
> "Make sure there is no whitespace at line start to force an escape"
> self isFoo ifTrue: [ #foo
> ]
> ifFalse: [ [ "block" #bar
> ]]
> ' ].
>
> (reader value: escapedEncoded readStream).
>
> What do you think ?
> Did I miss any special cases ?
> Is the price (very occasional insertion of an extra #] in the source code) worth the advantage (a simple, safe scheme).
>
> Sven
>
> PS: Note that several tools line syntax highlighting and double-click next to brackets have problems with weird delimiters such as in the above code ;-)
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Sven Van Caekenberghe-2


> On 10 Nov 2017, at 19:12, Stephane Ducasse <[hidden email]> wrote:
>
> Hi sven
>
> Do you have an example because I do not understand your doubling.

Sure,

Bla { bla:bla } [
self isFoo ifTrue: [ #foo
]
ifFalse: [ [ "block" #bar
]]
]

would have to become

Bla { bla:bla } [
self isFoo ifTrue: [ #foo
]]
ifFalse: [ [ "block" #bar
]]]
]

but this very rare (because usually code is intended).

Nicolas' latest idea is cool too (use as much ['s as needed to avoid conflict). So the example would then become

Bla { bla:bla } [[[
self isFoo ifTrue: [ #foo
]
ifFalse: [ [ "block" #bar
]]
]]]

> stef
>
>
> On Fri, Nov 10, 2017 at 11:40 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>> Hi,
>>
>> Right now, TonelParser>>#methodBody looks a bit dangerous (too complex), and Nicolas already found some breaking cases as well a one possible solution. I was thinking a bit along the same lines, so here is a proposal.
>>
>> Overall I think we should keep the current readability of Tonel as much as possible (it is very important for diffs), while still being correct - i.e. no method source, even convoluted ones, possibly in different languages should break the encoding.
>>
>> The problem is how to quickly scan a stream for the closing $] (which occurs as a single character on a line by its own). Especially in Pharo source code, that situation (a conflict between the proper marker and actual source) is very rare, since code is usually intended. But is can occur and must be dealt with.
>>
>> The proposal is to just double the $] but only on lines that could conflict (which are lines consisting of only $]'s). The code itself becomes quite simple and remains fast enough.
>>
>> Here is some code to play with (EOL fixed to #cr for demo purposes):
>>
>> writer := [ :stream :methodSourceString |
>>        stream nextPut: $[; cr.
>>        methodSourceString linesDo: [ :line |
>>                (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>                        ifTrue: [ stream << $] ].
>>                stream << line; cr ].
>>        stream nextPut: $]; cr ].
>>
>> reader := [ :stream | | line |
>>        (stream peekFor: $[) ifTrue: [ stream nextLine ] ifFalse: [ self error ].
>>        String streamContents: [ :out |
>>                [ (line := stream nextLine) = ']' ] whileFalse: [
>>                        (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>                                ifTrue: [ out << line allButFirst ]
>>                                ifFalse: [ out << line ].
>>                        out cr ] ] ].
>>
>> encoded := String streamContents: [ :out | writer value: out value: (Collection>>#inject:into:) sourceCode ].
>>
>> (reader value: encoded readStream) = ((Collection>>#inject:into:) sourceCode, String cr).
>>
>> escapedEncoded := String streamContents: [ :out | writer value: out value: 'test
>> "Make sure there is no whitespace at line start to force an escape"
>> self isFoo ifTrue: [ #foo
>> ]
>> ifFalse: [ [ "block" #bar
>> ]]
>> ' ].
>>
>> (reader value: escapedEncoded readStream).
>>
>> What do you think ?
>> Did I miss any special cases ?
>> Is the price (very occasional insertion of an extra #] in the source code) worth the advantage (a simple, safe scheme).
>>
>> Sven
>>
>> PS: Note that several tools line syntax highlighting and double-click next to brackets have problems with weird delimiters such as in the above code ;-)
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Stephane Ducasse-3
I would prefer the last one.
Regularity over exception.
Now the parser should just be good :)

Stef

On Fri, Nov 10, 2017 at 7:20 PM, Sven Van Caekenberghe <[hidden email]> wrote:

>
>
>> On 10 Nov 2017, at 19:12, Stephane Ducasse <[hidden email]> wrote:
>>
>> Hi sven
>>
>> Do you have an example because I do not understand your doubling.
>
> Sure,
>
> Bla { bla:bla } [
> self isFoo ifTrue: [ #foo
> ]
> ifFalse: [ [ "block" #bar
> ]]
> ]
>
> would have to become
>
> Bla { bla:bla } [
> self isFoo ifTrue: [ #foo
> ]]
> ifFalse: [ [ "block" #bar
> ]]]
> ]
>
> but this very rare (because usually code is intended).
>
> Nicolas' latest idea is cool too (use as much ['s as needed to avoid conflict). So the example would then become
>
> Bla { bla:bla } [[[
> self isFoo ifTrue: [ #foo
> ]
> ifFalse: [ [ "block" #bar
> ]]
> ]]]
>
>> stef
>>
>>
>> On Fri, Nov 10, 2017 at 11:40 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> Hi,
>>>
>>> Right now, TonelParser>>#methodBody looks a bit dangerous (too complex), and Nicolas already found some breaking cases as well a one possible solution. I was thinking a bit along the same lines, so here is a proposal.
>>>
>>> Overall I think we should keep the current readability of Tonel as much as possible (it is very important for diffs), while still being correct - i.e. no method source, even convoluted ones, possibly in different languages should break the encoding.
>>>
>>> The problem is how to quickly scan a stream for the closing $] (which occurs as a single character on a line by its own). Especially in Pharo source code, that situation (a conflict between the proper marker and actual source) is very rare, since code is usually intended. But is can occur and must be dealt with.
>>>
>>> The proposal is to just double the $] but only on lines that could conflict (which are lines consisting of only $]'s). The code itself becomes quite simple and remains fast enough.
>>>
>>> Here is some code to play with (EOL fixed to #cr for demo purposes):
>>>
>>> writer := [ :stream :methodSourceString |
>>>        stream nextPut: $[; cr.
>>>        methodSourceString linesDo: [ :line |
>>>                (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>>                        ifTrue: [ stream << $] ].
>>>                stream << line; cr ].
>>>        stream nextPut: $]; cr ].
>>>
>>> reader := [ :stream | | line |
>>>        (stream peekFor: $[) ifTrue: [ stream nextLine ] ifFalse: [ self error ].
>>>        String streamContents: [ :out |
>>>                [ (line := stream nextLine) = ']' ] whileFalse: [
>>>                        (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>>                                ifTrue: [ out << line allButFirst ]
>>>                                ifFalse: [ out << line ].
>>>                        out cr ] ] ].
>>>
>>> encoded := String streamContents: [ :out | writer value: out value: (Collection>>#inject:into:) sourceCode ].
>>>
>>> (reader value: encoded readStream) = ((Collection>>#inject:into:) sourceCode, String cr).
>>>
>>> escapedEncoded := String streamContents: [ :out | writer value: out value: 'test
>>> "Make sure there is no whitespace at line start to force an escape"
>>> self isFoo ifTrue: [ #foo
>>> ]
>>> ifFalse: [ [ "block" #bar
>>> ]]
>>> ' ].
>>>
>>> (reader value: escapedEncoded readStream).
>>>
>>> What do you think ?
>>> Did I miss any special cases ?
>>> Is the price (very occasional insertion of an extra #] in the source code) worth the advantage (a simple, safe scheme).
>>>
>>> Sven
>>>
>>> PS: Note that several tools line syntax highlighting and double-click next to brackets have problems with weird delimiters such as in the above code ;-)
>>>
>>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

EstebanLM
none of them, I’m sorry :)
I think we can do a lot better with just a bit of effort.

Esteban

> On 10 Nov 2017, at 15:26, Stephane Ducasse <[hidden email]> wrote:
>
> I would prefer the last one.
> Regularity over exception.
> Now the parser should just be good :)
>
> Stef
>
> On Fri, Nov 10, 2017 at 7:20 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>>
>>> On 10 Nov 2017, at 19:12, Stephane Ducasse <[hidden email]> wrote:
>>>
>>> Hi sven
>>>
>>> Do you have an example because I do not understand your doubling.
>>
>> Sure,
>>
>> Bla { bla:bla } [
>> self isFoo ifTrue: [ #foo
>> ]
>> ifFalse: [ [ "block" #bar
>> ]]
>> ]
>>
>> would have to become
>>
>> Bla { bla:bla } [
>> self isFoo ifTrue: [ #foo
>> ]]
>> ifFalse: [ [ "block" #bar
>> ]]]
>> ]
>>
>> but this very rare (because usually code is intended).
>>
>> Nicolas' latest idea is cool too (use as much ['s as needed to avoid conflict). So the example would then become
>>
>> Bla { bla:bla } [[[
>> self isFoo ifTrue: [ #foo
>> ]
>> ifFalse: [ [ "block" #bar
>> ]]
>> ]]]
>>
>>> stef
>>>
>>>
>>> On Fri, Nov 10, 2017 at 11:40 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> Right now, TonelParser>>#methodBody looks a bit dangerous (too complex), and Nicolas already found some breaking cases as well a one possible solution. I was thinking a bit along the same lines, so here is a proposal.
>>>>
>>>> Overall I think we should keep the current readability of Tonel as much as possible (it is very important for diffs), while still being correct - i.e. no method source, even convoluted ones, possibly in different languages should break the encoding.
>>>>
>>>> The problem is how to quickly scan a stream for the closing $] (which occurs as a single character on a line by its own). Especially in Pharo source code, that situation (a conflict between the proper marker and actual source) is very rare, since code is usually intended. But is can occur and must be dealt with.
>>>>
>>>> The proposal is to just double the $] but only on lines that could conflict (which are lines consisting of only $]'s). The code itself becomes quite simple and remains fast enough.
>>>>
>>>> Here is some code to play with (EOL fixed to #cr for demo purposes):
>>>>
>>>> writer := [ :stream :methodSourceString |
>>>>       stream nextPut: $[; cr.
>>>>       methodSourceString linesDo: [ :line |
>>>>               (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>>>                       ifTrue: [ stream << $] ].
>>>>               stream << line; cr ].
>>>>       stream nextPut: $]; cr ].
>>>>
>>>> reader := [ :stream | | line |
>>>>       (stream peekFor: $[) ifTrue: [ stream nextLine ] ifFalse: [ self error ].
>>>>       String streamContents: [ :out |
>>>>               [ (line := stream nextLine) = ']' ] whileFalse: [
>>>>                       (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>>>                               ifTrue: [ out << line allButFirst ]
>>>>                               ifFalse: [ out << line ].
>>>>                       out cr ] ] ].
>>>>
>>>> encoded := String streamContents: [ :out | writer value: out value: (Collection>>#inject:into:) sourceCode ].
>>>>
>>>> (reader value: encoded readStream) = ((Collection>>#inject:into:) sourceCode, String cr).
>>>>
>>>> escapedEncoded := String streamContents: [ :out | writer value: out value: 'test
>>>> "Make sure there is no whitespace at line start to force an escape"
>>>> self isFoo ifTrue: [ #foo
>>>> ]
>>>> ifFalse: [ [ "block" #bar
>>>> ]]
>>>> ' ].
>>>>
>>>> (reader value: escapedEncoded readStream).
>>>>
>>>> What do you think ?
>>>> Did I miss any special cases ?
>>>> Is the price (very occasional insertion of an extra #] in the source code) worth the advantage (a simple, safe scheme).
>>>>
>>>> Sven
>>>>
>>>> PS: Note that several tools line syntax highlighting and double-click next to brackets have problems with weird delimiters such as in the above code ;-)
>>>>
>>>>
>>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Encoding method source code in Tonel

Stephane Ducasse-3
Tx esteban I agree :)


On Fri, Nov 10, 2017 at 7:30 PM, Esteban Lorenzano <[hidden email]> wrote:

> none of them, I’m sorry :)
> I think we can do a lot better with just a bit of effort.
>
> Esteban
>
>> On 10 Nov 2017, at 15:26, Stephane Ducasse <[hidden email]> wrote:
>>
>> I would prefer the last one.
>> Regularity over exception.
>> Now the parser should just be good :)
>>
>> Stef
>>
>> On Fri, Nov 10, 2017 at 7:20 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>>
>>>> On 10 Nov 2017, at 19:12, Stephane Ducasse <[hidden email]> wrote:
>>>>
>>>> Hi sven
>>>>
>>>> Do you have an example because I do not understand your doubling.
>>>
>>> Sure,
>>>
>>> Bla { bla:bla } [
>>> self isFoo ifTrue: [ #foo
>>> ]
>>> ifFalse: [ [ "block" #bar
>>> ]]
>>> ]
>>>
>>> would have to become
>>>
>>> Bla { bla:bla } [
>>> self isFoo ifTrue: [ #foo
>>> ]]
>>> ifFalse: [ [ "block" #bar
>>> ]]]
>>> ]
>>>
>>> but this very rare (because usually code is intended).
>>>
>>> Nicolas' latest idea is cool too (use as much ['s as needed to avoid conflict). So the example would then become
>>>
>>> Bla { bla:bla } [[[
>>> self isFoo ifTrue: [ #foo
>>> ]
>>> ifFalse: [ [ "block" #bar
>>> ]]
>>> ]]]
>>>
>>>> stef
>>>>
>>>>
>>>> On Fri, Nov 10, 2017 at 11:40 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>> Hi,
>>>>>
>>>>> Right now, TonelParser>>#methodBody looks a bit dangerous (too complex), and Nicolas already found some breaking cases as well a one possible solution. I was thinking a bit along the same lines, so here is a proposal.
>>>>>
>>>>> Overall I think we should keep the current readability of Tonel as much as possible (it is very important for diffs), while still being correct - i.e. no method source, even convoluted ones, possibly in different languages should break the encoding.
>>>>>
>>>>> The problem is how to quickly scan a stream for the closing $] (which occurs as a single character on a line by its own). Especially in Pharo source code, that situation (a conflict between the proper marker and actual source) is very rare, since code is usually intended. But is can occur and must be dealt with.
>>>>>
>>>>> The proposal is to just double the $] but only on lines that could conflict (which are lines consisting of only $]'s). The code itself becomes quite simple and remains fast enough.
>>>>>
>>>>> Here is some code to play with (EOL fixed to #cr for demo purposes):
>>>>>
>>>>> writer := [ :stream :methodSourceString |
>>>>>       stream nextPut: $[; cr.
>>>>>       methodSourceString linesDo: [ :line |
>>>>>               (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>>>>                       ifTrue: [ stream << $] ].
>>>>>               stream << line; cr ].
>>>>>       stream nextPut: $]; cr ].
>>>>>
>>>>> reader := [ :stream | | line |
>>>>>       (stream peekFor: $[) ifTrue: [ stream nextLine ] ifFalse: [ self error ].
>>>>>       String streamContents: [ :out |
>>>>>               [ (line := stream nextLine) = ']' ] whileFalse: [
>>>>>                       (line notEmpty and: [ line allSatisfy: [ :ch | ch = $] ] ])
>>>>>                               ifTrue: [ out << line allButFirst ]
>>>>>                               ifFalse: [ out << line ].
>>>>>                       out cr ] ] ].
>>>>>
>>>>> encoded := String streamContents: [ :out | writer value: out value: (Collection>>#inject:into:) sourceCode ].
>>>>>
>>>>> (reader value: encoded readStream) = ((Collection>>#inject:into:) sourceCode, String cr).
>>>>>
>>>>> escapedEncoded := String streamContents: [ :out | writer value: out value: 'test
>>>>> "Make sure there is no whitespace at line start to force an escape"
>>>>> self isFoo ifTrue: [ #foo
>>>>> ]
>>>>> ifFalse: [ [ "block" #bar
>>>>> ]]
>>>>> ' ].
>>>>>
>>>>> (reader value: escapedEncoded readStream).
>>>>>
>>>>> What do you think ?
>>>>> Did I miss any special cases ?
>>>>> Is the price (very occasional insertion of an extra #] in the source code) worth the advantage (a simple, safe scheme).
>>>>>
>>>>> Sven
>>>>>
>>>>> PS: Note that several tools line syntax highlighting and double-click next to brackets have problems with weird delimiters such as in the above code ;-)
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>