AST tokens question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

AST tokens question

Mark Rizun
Hi all,

Trying to understand here how tokens are used in AST.
So far I can not see any order in usage of tokens.
For instance, why RBValueNode doesn't have token? Is it haow it's supposed to be?

Cheers,
Mark
Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Nicolai Hess
2014-10-27 19:36 GMT+01:00 Mark Rizun <[hidden email]>:
Hi all,

Trying to understand here how tokens are used in AST.
So far I can not see any order in usage of tokens.
For instance, why RBValueNode doesn't have token? Is it haow it's supposed to be?

RBValueNode is an abstract class.
 

Cheers,
Mark

Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun
Thanks. I see that, however RBBlockNode or RBArrayNode doesn't have tokens.
These classes have only methods in accessing-token protocol.
I think it would be better if we have token object for those classes, because it makes more sense to hold such information in token object.

Mark



2014-10-28 11:59 GMT+02:00 Nicolai Hess <[hidden email]>:
2014-10-27 19:36 GMT+01:00 Mark Rizun <[hidden email]>:
Hi all,

Trying to understand here how tokens are used in AST.
So far I can not see any order in usage of tokens.
For instance, why RBValueNode doesn't have token? Is it haow it's supposed to be?

RBValueNode is an abstract class.
 

Cheers,
Mark


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Marcus Denker-4

> On 28 Oct 2014, at 11:23, Mark Rizun <[hidden email]> wrote:
>
> Thanks. I see that, however RBBlockNode or RBArrayNode doesn't have tokens.
> These classes have only methods in accessing-token protocol.
> I think it would be better if we have token object for those classes, because it makes more sense to hold such information in token object.
>

No, actually we should get rid of tokens.. they are just used for parsing (the scanner produces tokens, there is no token for a block, as a
block consists of many tokens… so conceptually, a block can not have a token).

Tokens expose a very low level implementation artefact of the parser to the AST model, this is not good.

There is…
https://pharo.fogbugz.com/f/cases/11992/Remove-tokens-from-the-AST-Core-Node-classes

We should have a look at that and integrate it. This should simplify lots of things.

        Marcus



Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
In reply to this post by Mark Rizun
Le 28/10/2014 11:23, Mark Rizun a écrit :
> Thanks. I see that, however RBBlockNode or RBArrayNode doesn't have tokens.
> These classes have only methods in accessing-token protocol.
> I think it would be better if we have token object for those classes,
> because it makes more sense to hold such information in token object.

Well, not really.

Technically, tokens are used to drive a parser from a scanner.

If an AST node knows how to relate itself to its original source code
chunk and is able to print itself correctly, then tokens are redundant.

In short, if you work with parsers, you'd better know what tokens are.
If you're only working with the AST, tokens are redundant and noise
(i.e. they often have a type (or more than one) which is only understood
by the parser).

Example of how it is done:

RBPragmaNode
        accessing-tokens gives access to left and right, which are positions,
not tokens.

Thierry

>
> Mark
>
>
>
> 2014-10-28 11:59 GMT+02:00 Nicolai Hess <[hidden email]
> <mailto:[hidden email]>>:
>
>     2014-10-27 19:36 GMT+01:00 Mark Rizun <[hidden email]
>     <mailto:[hidden email]>>:
>
>         Hi all,
>
>         Trying to understand here how tokens are used in AST.
>         So far I can not see any order in usage of tokens.
>         For instance, why RBValueNode doesn't have token? Is it haow
>         it's supposed to be?
>
>
>     RBValueNode is an abstract class.
>
>
>         Cheers,
>         Mark
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun
Well, not really.

Technically, tokens are used to drive a parser from a scanner.

If an AST node knows how to relate itself to its original source code chunk and is able to print itself correctly, then tokens are redundant.

In short, if you work with parsers, you'd better know what tokens are. If you're only working with the AST, tokens are redundant and noise (i.e. they often have a type (or more than one) which is only understood by the parser).


I'm working with ASTs sourceInterval. Trying to calculate it after method replaceWith:.
You see, my proble was that each node of AST doesn't hold its start and stop position in same place. So I thought that token is such a place, however, eventually I understood that RBValueNodes don't have tokens:)
 
Example of how it is done:

RBPragmaNode
        accessing-tokens gives access to left and right, which are positions, not tokens.

Yes, I know. 
 

Thierry


Mark



2014-10-28 11:59 GMT+02:00 Nicolai Hess <[hidden email]
<mailto:[hidden email]>>:

    2014-10-27 19:36 GMT+01:00 Mark Rizun <[hidden email]
    <mailto:[hidden email]>>:

        Hi all,

        Trying to understand here how tokens are used in AST.
        So far I can not see any order in usage of tokens.
        For instance, why RBValueNode doesn't have token? Is it haow
        it's supposed to be?


    RBValueNode is an abstract class.


        Cheers,
        Mark






Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
Le 28/10/2014 12:12, Mark Rizun a écrit :

>     Well, not really.
>
>     Technically, tokens are used to drive a parser from a scanner.
>
>     If an AST node knows how to relate itself to its original source
>     code chunk and is able to print itself correctly, then tokens are
>     redundant.
>
>     In short, if you work with parsers, you'd better know what tokens
>     are. If you're only working with the AST, tokens are redundant and
>     noise (i.e. they often have a type (or more than one) which is only
>     understood by the parser).
>
>
> I'm working with ASTs sourceInterval. Trying to calculate it after
> method replaceWith:.
> You see, my proble was that each node of AST doesn't hold its start and
> stop position in same place. So I thought that token is such a place,
> however, eventually I understood that RBValueNodes don't have tokens:)

Do you mean you're trying to do a replace and update the positions of
all the nodes ?

Thierry

Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun

2014-10-28 13:32 GMT+02:00 Thierry Goubier <[hidden email]>:
Le 28/10/2014 12:12, Mark Rizun a écrit :
    Well, not really.

    Technically, tokens are used to drive a parser from a scanner.

    If an AST node knows how to relate itself to its original source
    code chunk and is able to print itself correctly, then tokens are
    redundant.

    In short, if you work with parsers, you'd better know what tokens
    are. If you're only working with the AST, tokens are redundant and
    noise (i.e. they often have a type (or more than one) which is only
    understood by the parser).


I'm working with ASTs sourceInterval. Trying to calculate it after
method replaceWith:.
You see, my proble was that each node of AST doesn't hold its start and
stop position in same place. So I thought that token is such a place,
however, eventually I understood that RBValueNodes don't have tokens:)

Do you mean you're trying to do a replace and update the positions of all the nodes ?

Thierry


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
Le 28/10/2014 12:45, Mark Rizun a écrit :
> Yes, because they are wrong. Here is an issue:
> https://pharo.fogbugz.com/f/cases/14254/AST-method-replaceWith-does-not-change-source-interval

I would say that they are correct.

When I write source to source compilers, I admit that anything that I
change in the AST (via replaceWith equivalent) has no valid source
interval (since it does not exist in the original source). However, all
unmodified nodes should keep their 'non-modified' source interval (since
I may need it to fetch the relevant text from the source).

If I want my modification to the AST to have valid source intervals,
then, I need to regenerate the source from the modified AST. And only
then they are valid.

You may want to update the source interval when you do a replaceWith,
but the only thing we will get with what you want to do is that, after a
replaceWith, no source interval can be trusted since it may end up past
the end of the original source string length.

Thierry

>
> 2014-10-28 13:32 GMT+02:00 Thierry Goubier <[hidden email]
> <mailto:[hidden email]>>:
>
>     Le 28/10/2014 12:12, Mark Rizun a écrit :
>
>              Well, not really.
>
>              Technically, tokens are used to drive a parser from a scanner.
>
>              If an AST node knows how to relate itself to its original
>         source
>              code chunk and is able to print itself correctly, then
>         tokens are
>              redundant.
>
>              In short, if you work with parsers, you'd better know what
>         tokens
>              are. If you're only working with the AST, tokens are
>         redundant and
>              noise (i.e. they often have a type (or more than one) which
>         is only
>              understood by the parser).
>
>
>         I'm working with ASTs sourceInterval. Trying to calculate it after
>         method replaceWith:.
>         You see, my proble was that each node of AST doesn't hold its
>         start and
>         stop position in same place. So I thought that token is such a
>         place,
>         however, eventually I understood that RBValueNodes don't have
>         tokens:)
>
>
>     Do you mean you're trying to do a replace and update the positions
>     of all the nodes ?
>
>     Thierry
>
>


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun
In the first place why I'm doing this. I work on Rewrite Tool and it's main functionality bases on replacing nodes in AST. Plus it works with sourceIntervals of nodes. Untill now my solution was:
if I replace node, I reparse tree to get intervals updated
However this solution brought new problems.
Second reason is, that I think it makes sense to update interval of all AST if you replace one node.
For example, we have:
obj1 foo + obj2 bar
and we replace obj1 with myObject.
The interval of ast was 1 to: 19, and now it should be 1 to: 23.

Mark

2014-10-28 14:00 GMT+02:00 Thierry Goubier <[hidden email]>:
Le 28/10/2014 12:45, Mark Rizun a écrit :
Yes, because they are wrong. Here is an issue:
https://pharo.fogbugz.com/f/cases/14254/AST-method-replaceWith-does-not-change-source-interval

I would say that they are correct.

When I write source to source compilers, I admit that anything that I change in the AST (via replaceWith equivalent) has no valid source interval (since it does not exist in the original source). However, all unmodified nodes should keep their 'non-modified' source interval (since I may need it to fetch the relevant text from the source).

If I want my modification to the AST to have valid source intervals, then, I need to regenerate the source from the modified AST. And only then they are valid.

You may want to update the source interval when you do a replaceWith, but the only thing we will get with what you want to do is that, after a replaceWith, no source interval can be trusted since it may end up past the end of the original source string length.

Thierry


2014-10-28 13:32 GMT+02:00 Thierry Goubier <[hidden email]
<mailto:[hidden email]>>:


    Le 28/10/2014 12:12, Mark Rizun a écrit :

             Well, not really.

             Technically, tokens are used to drive a parser from a scanner.

             If an AST node knows how to relate itself to its original
        source
             code chunk and is able to print itself correctly, then
        tokens are
             redundant.

             In short, if you work with parsers, you'd better know what
        tokens
             are. If you're only working with the AST, tokens are
        redundant and
             noise (i.e. they often have a type (or more than one) which
        is only
             understood by the parser).


        I'm working with ASTs sourceInterval. Trying to calculate it after
        method replaceWith:.
        You see, my proble was that each node of AST doesn't hold its
        start and
        stop position in same place. So I thought that token is such a
        place,
        however, eventually I understood that RBValueNodes don't have
        tokens:)


    Do you mean you're trying to do a replace and update the positions
    of all the nodes ?

    Thierry





Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
Le 28/10/2014 14:33, Mark Rizun a écrit :
> In the first place why I'm doing this. I work on Rewrite Tool and it's
> main functionality bases on replacing nodes in AST. Plus it works with
> sourceIntervals of nodes. Untill now my solution was:
> if I replace node, I reparse tree to get intervals updated

I print the modified tree and parse :)

> However this solution brought new problems.

Which ones?

> Second reason is, that I think it makes sense to update interval of all
> AST if you replace one node.
> For example, we have:
> obj1 foo + obj2 bar
> and we replace obj1 with myObject.
> The interval of ast was 1 to: 19, and now it should be 1 to: 23.

No it shouldn't. If the source has not been regenerated from the
modified AST, then

'source copyFrom: theBarASTNode start to: theBarASTNode stop'

end past the end of it (20 to 23 with source ending at 19).

If you regenerate the source, then you can parse it and you'll have
correct intervals.

Thierry

> Mark
>
> 2014-10-28 14:00 GMT+02:00 Thierry Goubier <[hidden email]
> <mailto:[hidden email]>>:
>
>     Le 28/10/2014 12:45, Mark Rizun a écrit :
>
>         Yes, because they are wrong. Here is an issue:
>         https://pharo.fogbugz.com/f/__cases/14254/AST-method-__replaceWith-does-not-change-__source-interval
>         <https://pharo.fogbugz.com/f/cases/14254/AST-method-replaceWith-does-not-change-source-interval>
>
>
>     I would say that they are correct.
>
>     When I write source to source compilers, I admit that anything that
>     I change in the AST (via replaceWith equivalent) has no valid source
>     interval (since it does not exist in the original source). However,
>     all unmodified nodes should keep their 'non-modified' source
>     interval (since I may need it to fetch the relevant text from the
>     source).
>
>     If I want my modification to the AST to have valid source intervals,
>     then, I need to regenerate the source from the modified AST. And
>     only then they are valid.
>
>     You may want to update the source interval when you do a
>     replaceWith, but the only thing we will get with what you want to do
>     is that, after a replaceWith, no source interval can be trusted
>     since it may end up past the end of the original source string length.
>
>     Thierry
>
>
>         2014-10-28 13:32 GMT+02:00 Thierry Goubier
>         <[hidden email] <mailto:[hidden email]>
>         <mailto:thierry.goubier@gmail.__com
>         <mailto:[hidden email]>>>:
>
>
>              Le 28/10/2014 12:12, Mark Rizun a écrit :
>
>                       Well, not really.
>
>                       Technically, tokens are used to drive a parser
>         from a scanner.
>
>                       If an AST node knows how to relate itself to its
>         original
>                  source
>                       code chunk and is able to print itself correctly, then
>                  tokens are
>                       redundant.
>
>                       In short, if you work with parsers, you'd better
>         know what
>                  tokens
>                       are. If you're only working with the AST, tokens are
>                  redundant and
>                       noise (i.e. they often have a type (or more than
>         one) which
>                  is only
>                       understood by the parser).
>
>
>                  I'm working with ASTs sourceInterval. Trying to
>         calculate it after
>                  method replaceWith:.
>                  You see, my proble was that each node of AST doesn't
>         hold its
>                  start and
>                  stop position in same place. So I thought that token is
>         such a
>                  place,
>                  however, eventually I understood that RBValueNodes
>         don't have
>                  tokens:)
>
>
>              Do you mean you're trying to do a replace and update the
>         positions
>              of all the nodes ?
>
>              Thierry
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun

Which ones?

In my tool each node has property oldNodes, which holds collection of obviously AST nodes:)
When I replace one of node I have to update source interval in some way.
1)If I update it with reparsing, I loose all data about oldNodes for each node of my AST.
So I have to save old AST with all oldNodes, and somehow detect which nodes were not changed and reassign their lost oldNodes.
But sometimes it's difficult to detect where and what you have to assign, as sometimes AST may be changed in dramatic way.
2) But if source interval is updated automatically I don't bother with losing data for all AST.
I just have to update oldNodes for node that was replaced.

That is way I'd like to have automatically updated source interval.
 

No it shouldn't. If the source has not been regenerated from the modified AST, then

'source copyFrom: theBarASTNode start to: theBarASTNode stop'

end past the end of it (20 to 23 with source ending at 19).

If you regenerate the source, then you can parse it and you'll have correct intervals.
 
Marcus, do I have to redo everything back, or you can somehow remove that slice from newest version? 

Mark

 
Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier


2014-10-29 8:32 GMT+01:00 Mark Rizun <[hidden email]>:

Which ones?

In my tool each node has property oldNodes, which holds collection of obviously AST nodes:)
When I replace one of node I have to update source interval in some way.
 
1)If I update it with reparsing, I loose all data about oldNodes for each node of my AST.
So I have to save old AST with all oldNodes, and somehow detect which nodes were not changed and reassign their lost oldNodes.
But sometimes it's difficult to detect where and what you have to assign, as sometimes AST may be changed in dramatic way.

Ok; yes, I can relate to that.

But, knowing who has designed the RB ast, I'm sure it has a proper equality property where:

oldNodeFromAST = sameNodeFromASTreparsed

holds true.

So you can reparse and rematch old node to new node.

But I would only regenerate and reparse when I need to update the source intervals (or display the modified source).
 
2) But if source interval is updated automatically I don't bother with losing data for all AST.
I just have to update oldNodes for node that was replaced.

Well, not really.

If during regeneration of your source, you change the way the code is formatted (more tabs here, removing a return there, etc...) then regenerated source intervals are different from modified ast source intervals. So your 'modify source intervals' is very fragile for me.

The only way I can see a way out for that problem:

Inserted / replaced node have source intervals in a separate source; unmodified nodes keep their source interval. This is the way I do it in my source to source compilers when I want to inject code (and tag in the generated file where that code comes from: usefull for debugging).

Or the match of old node to new node as seen above.
 

That is way I'd like to have automatically updated source interval.

I'm still not entirely sure why. Source intervals are only there to help relating the ast to the source, not much else, really.

Thierry
 
 

No it shouldn't. If the source has not been regenerated from the modified AST, then

'source copyFrom: theBarASTNode start to: theBarASTNode stop'

end past the end of it (20 to 23 with source ending at 19).

If you regenerate the source, then you can parse it and you'll have correct intervals.
 
Marcus, do I have to redo everything back, or you can somehow remove that slice from newest version? 

Mark

 

Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun
That is way I'd like to have automatically updated source interval.

I'm still not entirely sure why. Source intervals are only there to help relating the ast to the source, not much else, really.

I use source intervals to detect which node is selected and than in the right-click menu user can see only options that are relevanto to selected node, as it is also made in SmartSuggestions.
Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun
P.S. I have a solution, but don't know if it's appropriate: I remove updating of source interval from replaceWith: method, but my tool will do all the calculations of interval on it's own.

2014-10-29 10:59 GMT+02:00 Mark Rizun <[hidden email]>:
That is way I'd like to have automatically updated source interval.

I'm still not entirely sure why. Source intervals are only there to help relating the ast to the source, not much else, really.

I use source intervals to detect which node is selected and than in the right-click menu user can see only options that are relevanto to selected node, as it is also made in SmartSuggestions.

Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
In reply to this post by Mark Rizun


2014-10-29 9:59 GMT+01:00 Mark Rizun <[hidden email]>:

I use source intervals to detect which node is selected and than in the right-click menu user can see only options that are relevanto to selected node, as it is also made in SmartSuggestions.

I know that use case ;)

Ok, then this means you are regenerating the code (1) (or are you doing replace a node / insert new source at right place? (2))

Thinking a bit about it, I'd try reparse, get node from selection index, find equal old node in old (modified) ast, or replace old (modified) ast with new one.

Thierry
Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun
Second one, I do replace node all the time.

Thinking a bit about it, I'd try reparse, get node from selection index, find equal old node in old (modified) ast, or replace old (modified) ast with new one.
Can you explain this, sorry I didn't get the point


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
In reply to this post by Mark Rizun


2014-10-29 10:09 GMT+01:00 Mark Rizun <[hidden email]>:
P.S. I have a solution, but don't know if it's appropriate: I remove updating of source interval from replaceWith: method, but my tool will do all the calculations of interval on it's own.

This is a possibility: have a transform; when you get selection intervals, look if they are inside "replaced areas" or outside of it; increase or decrease the indexes to compensate for added code / removed code. But this is more complex than it looks.

But I would try the equality over an AST first. A lot more robust for me.

Thierry
 

2014-10-29 10:59 GMT+02:00 Mark Rizun <[hidden email]>:
That is way I'd like to have automatically updated source interval.

I'm still not entirely sure why. Source intervals are only there to help relating the ast to the source, not much else, really.

I use source intervals to detect which node is selected and than in the right-click menu user can see only options that are relevanto to selected node, as it is also made in SmartSuggestions.


Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Thierry Goubier
In reply to this post by Mark Rizun


2014-10-29 10:22 GMT+01:00 Mark Rizun <[hidden email]>:
Second one, I do replace node all the time.
 
You insert new code inside the text view?

For me, if you replace nodes and display that, then you are "slowly" replacing nodes. Anything which has some display to a user in the loop is "slow".
 
Thinking a bit about it, I'd try reparse, get node from selection index, find equal old node in old (modified) ast, or replace old (modified) ast with new one.
Can you explain this, sorry I didn't get the point

Use either =, equalTo:withMapping:, match:inContext: to find the relevant new node equal to your old node in the new ast.

Thierry
Reply | Threaded
Open this post in threaded view
|

Re: AST tokens question

Mark Rizun


2014-10-29 11:40 GMT+02:00 Thierry Goubier <[hidden email]>:


2014-10-29 10:22 GMT+01:00 Mark Rizun <[hidden email]>:
Second one, I do replace node all the time.
 
You insert new code inside the text view?

Yes
 

For me, if you replace nodes and display that, then you are "slowly" replacing nodes. Anything which has some display to a user in the loop is "slow".
I'm not doing this in loop. I have AST and text view of it. Than I do one replace and update text view.
When I wrote "all the time", I ment replacing nodes is very important in my tool, as it does main functionality.
 
 
Thinking a bit about it, I'd try reparse, get node from selection index, find equal old node in old (modified) ast, or replace old (modified) ast with new one.
Can you explain this, sorry I didn't get the point

Use either =, equalTo:withMapping:, match:inContext: to find the relevant new node equal to your old node in the new ast.
Good, thanks for advice. Firstly, I will check your suggestion with equality. If it fails for me, I'll try my suggestion with calculating inside tool.

Thanks again,
Mark

12