why RBComment is not called RBCommentNode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

why RBComment is not called RBCommentNode

stepharo
Hi marcus et al

why RBComment is not called RBCommentNode and why it is not a subclass
of RBNode?

Stef


Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

Nicolai Hess-3-2


2016-11-03 13:41 GMT+01:00 stepharo <[hidden email]>:
Hi marcus et al

why RBComment is not called RBCommentNode and why it is not a subclass of RBNode?

Stef



RBComment wasn't part of the original RefactoringBrowser, it only had a "comments" instance variable on its nodes.
This was refactored out to its own RBComment class (I think). But no "real" AST-Node class.

+1 for converting this into a real AST node.

Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

John Brant-2
Why convert it to a node? It has no affect on the running code.


John Brant

> On Nov 3, 2016, at 8:23 AM, Nicolai Hess <[hidden email]> wrote:
>
>
>
> 2016-11-03 13:41 GMT+01:00 stepharo <[hidden email]>:
> Hi marcus et al
>
> why RBComment is not called RBCommentNode and why it is not a subclass of RBNode?
>
> Stef
>
>
>
> RBComment wasn't part of the original RefactoringBrowser, it only had a "comments" instance variable on its nodes.
> This was refactored out to its own RBComment class (I think). But no "real" AST-Node class.
>
> +1 for converting this into a real AST node.
>


Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

stepharo
Because like that

     we have a correct and nice hierarchy and

     all the nodes inherits from it and

     we can have a nice visitor and make assumption about the root.

Regularity is nice and it helps controlling complexity.

Why a comment could not have a parent?

Stef


Le 3/11/16 à 14:53, John Brant a écrit :

> Why convert it to a node? It has no affect on the running code.
>
>
> John Brant
>
>> On Nov 3, 2016, at 8:23 AM, Nicolai Hess <[hidden email]> wrote:
>>
>>
>>
>> 2016-11-03 13:41 GMT+01:00 stepharo <[hidden email]>:
>> Hi marcus et al
>>
>> why RBComment is not called RBCommentNode and why it is not a subclass of RBNode?
>>
>> Stef
>>
>>
>>
>> RBComment wasn't part of the original RefactoringBrowser, it only had a "comments" instance variable on its nodes.
>> This was refactored out to its own RBComment class (I think). But no "real" AST-Node class.
>>
>> +1 for converting this into a real AST node.
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

John Brant-2
> On Nov 3, 2016, at 9:00 AM, stepharo <[hidden email]> wrote:
>
> Because like that
>
>    we have a correct and nice hierarchy and
>
>    all the nodes inherits from it and
>
>    we can have a nice visitor and make assumption about the root.
>
> Regularity is nice and it helps controlling complexity.
>
> Why a comment could not have a parent?

A comment isn’t code. The nodes are for code. A comment is more like a token — RBComment basically has the same two instance variables as RBToken. Should tokens also be nodes? For example, should a left paren token be represented as a node?

If a comment is a node, how is it visited? For example, if you have (“comment1” self “comment2” foo: “comment3” a bar: “comment4” b “comment5”), what is the traversal order for the nodes? Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools?


John Brant
Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

Nicolai Hess-3-2

Am 03.11.2016 17:15 schrieb "John Brant" <[hidden email]>:
>
> > On Nov 3, 2016, at 9:00 AM, stepharo <[hidden email]> wrote:
> >
> > Because like that
> >
> >    we have a correct and nice hierarchy and
> >
> >    all the nodes inherits from it and
> >
> >    we can have a nice visitor and make assumption about the root.
> >
> > Regularity is nice and it helps controlling complexity.
> >
> > Why a comment could not have a parent?
>
> A comment isn’t code. The nodes are for code. A comment is more like a token — RBComment basically has the same two instance variables as RBToken. Should tokens also be nodes? For example, should a left paren token be represented as a node?
>
> If a comment is a node, how is it visited? For example, if you have (“comment1” self “comment2” foo: “comment3” a bar: “comment4” b “comment5”), what is the traversal order for the nodes? Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools?
>
>
> John Brant

OK,  I think I am conceived.
Otherwise, it may be easier for the AST-based  formatter.

Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

stepharo
In reply to this post by John Brant-2

> A comment isn’t code. The nodes are for code. A comment is more like a token — RBComment basically has the same two instance variables as RBToken. Should tokens also be nodes? For example, should a left paren token be represented as a node?
>
> If a comment is a node, how is it visited?
> For example, if you have (“comment1” self “comment2” foo: “comment3” a bar: “comment4” b “comment5”), what is the traversal order for the nodes?
To reproduce the code as it was typed why the representation a message
is not a sequence. because you need to encode that you
have comment1 and self and comment2
We do not have a structure that is responsible for that order.
We could have two entries in AST node;
     one discrete where you can cherrypick what you want receiver
arguments selector
     and one that represent the node placement.

Now when we emit bytecode comment would do nothing.

How having them as token helps? You will visit the receiver and check if
it has a token and place it.
I found that strange that comments are not part of the code tree as
first class citizen.

> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
With comments cannot act as nullobjects for such operation?

Stef

Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

stepharo
https://pharo.fogbugz.com/f/cases/19281/We-should-improve-the-class-comment-of-RBComment

If we would have better comments.... a kind of a dream.

Stef

Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

John Brant-2
In reply to this post by stepharo

> On Nov 3, 2016, at 11:46 AM, stepharo <[hidden email]> wrote:
>
>
>> A comment isn’t code. The nodes are for code. A comment is more like a token — RBComment basically has the same two instance variables as RBToken. Should tokens also be nodes? For example, should a left paren token be represented as a node?
>>
>> If a comment is a node, how is it visited?
>> For example, if you have (“comment1” self “comment2” foo: “comment3” a bar: “comment4” b “comment5”), what is the traversal order for the nodes?
> To reproduce the code as it was typed why the representation a message is not a sequence. because you need to encode that you
> have comment1 and self and comment2

But how does this fit with a tree structure. For example, should “comment5” be associated with the “b” variable node, or the #foo:bar: message, or the sequence node that contains the #foo:bar: message, or the block or method node that contains the sequence node. You can choose, but only the writer of the comment can tell if you are right.

> We do not have a structure that is responsible for that order.
> We could have two entries in AST node;
>    one discrete where you can cherrypick what you want receiver arguments selector
>    and one that represent the node placement.
>
> Now when we emit bytecode comment would do nothing.
>
> How having them as token helps? You will visit the receiver and check if it has a token and place it.
> I found that strange that comments are not part of the code tree as first class citizen.

Tokens are essentially what they are now — although they aren’t in the RBToken hierarchy. So far, the only advantages that I’ve heard you mention for nodes are visiting nodes and having a parent. The example above shows why neither of these are valid — you can’t determine what is the correct parent for the comment.

Anyway, I think comments should not be in code, but rather on code. When you add a comment, you would select a node or nodes that it applied to. When viewing a method, the comment could be displayed inline (like we have now), off to the side like reviewer comments in word processors, or a highlight that when you move the mouse over would show in a popup. Furthermore, there could be different comment types. For example, one could have design comments, todo/review comments, example code comments, bug fix comments, etc. The code view could be customized to show each one differently. Comments could have pictures and other non-string items.

Having comments as objects outside the method’s source string would allow them to be easily searched. For example, someone could search for a certain bug fix. Currently, if a bug fix id is in a comment, it requires a full text search of all method text. This is slow. Therefore, many companies start adding these bug fix ids to the code as symbols so that a senders search will work. I’ve seen companies that put symbols like #Fix12345 as statements inside of methods. However, this doesn’t work for Smalltalk’s that remove unused literals, so in those cases they do something like adding a statement like #Fix12345 == #Fix12345 which generally isn’t optimized away.

In addition to comments, I believe that annotations (not primitives) should be on code instead of in code like they are now. With a good implementation one could unify the concepts of comments and annotations.

>> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
> With comments cannot act as nullobjects for such operation?

Yes, someone could spend time making all of this work with the existing code. However, why not fix the real problem in that comments and methods should not be a single string, but rather objects.


John Brant
Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

stepharo
> But how does this fit with a tree structure. For example, should “comment5” be associated with the “b” variable node, or the #foo:bar: message, or the sequence node that contains the #foo:bar: message, or the block or method node that contains the sequence node. You can choose, but only the writer of the comment can tell if you are right.
How tokens solves this problem?
I think that we do not really care one objective is to be able to
recreate the code from the tree. I worked on a project in C
where the guys want to know if the code is well commented and we built
heuristics to associate the comments to the closest node.

>
>> We do not have a structure that is responsible for that order.
>> We could have two entries in AST node;
>>     one discrete where you can cherrypick what you want receiver arguments selector
>>     and one that represent the node placement.
>>
>> Now when we emit bytecode comment would do nothing.
>>
>> How having them as token helps? You will visit the receiver and check if it has a token and place it.
>> I found that strange that comments are not part of the code tree as first class citizen.
> Tokens are essentially what they are now — although they aren’t in the RBToken hierarchy. So far, the only advantages that I’ve heard you mention for nodes are visiting nodes and having a parent. The example above shows why neither of these are valid — you can’t determine what is the correct parent for the comment.
>
> Anyway, I think comments should not be in code, but rather on code. When you add a comment, you would select a node or nodes that it applied to. When viewing a method, the comment could be displayed inline (like we have now), off to the side like reviewer comments in word processors, or a highlight that when you move the mouse over would show in a popup.
Yes you are right.

> Furthermore, there could be different comment types. For example, one could have design comments, todo/review comments, example code comments, bug fix comments, etc. The code view could be customized to show each one differently. Comments could have pictures and other non-string items.
It would be really nice to have that.
I would put optional type declaration (and they should be optional).
Now for the main comment of a method I would not separate it because
this is too important.
> Having comments as objects outside the method’s source string would allow them to be easily searched. For example, someone could search for a certain bug fix. Currently, if a bug fix id is in a comment, it requires a full text search of all method text. This is slow. Therefore, many companies start adding these bug fix ids to the code as symbols so that a senders search will work. I’ve seen companies that put symbols like #Fix12345 as statements inside of methods. However, this doesn’t work for Smalltalk’s that remove unused literals, so in those cases they do something like adding a statement like #Fix12345 == #Fix12345 which generally isn’t optimized away.
:)
> In addition to comments, I believe that annotations (not primitives) should be on code instead of in code like they are now. With a good implementation one could unify the concepts of comments and annotations.
Yes this is interesting.
Now the problem with annotations is that you have two kinds;
     - the ones that decorate
     - the ones that change or define the semantics (for those ones I
would not separate) them
>>> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
>> With comments cannot act as nullobjects for such operation?
> Yes, someone could spend time making all of this work with the existing code. However, why not fix the real problem in that comments and methods should not be a single string, but rather objects.

     Marcus did that during his phd and he blew up memory. So after we
got a guy that worked on tree compression but he never finished.
     Because this is the part that makes all these ideas flying or not.

Anyway thanks for the discussion. It lets me think and I learn
something. I improved the class comment.
Because if it would have been good in the first place I would not even
ask the question.
>
>
> John Brant
>


Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

John Brant-2

> On Nov 3, 2016, at 4:03 PM, stepharo <[hidden email]> wrote:
>
>> But how does this fit with a tree structure. For example, should “comment5” be associated with the “b” variable node, or the #foo:bar: message, or the sequence node that contains the #foo:bar: message, or the block or method node that contains the sequence node. You can choose, but only the writer of the comment can tell if you are right.
> How tokens solves this problem?
> I think that we do not really care one objective is to be able to recreate the code from the tree. I worked on a project in C
> where the guys want to know if the code is well commented and we built heuristics to associate the comments to the closest node.

The nodes have comments and the comments have their source and position so you can recreate the code from the tree already. It may not be as easy as you want, but all the information is there.


>>>> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
>>> With comments cannot act as nullobjects for such operation?
>> Yes, someone could spend time making all of this work with the existing code. However, why not fix the real problem in that comments and methods should not be a single string, but rather objects.
>
>    Marcus did that during his phd and he blew up memory. So after we got a guy that worked on tree compression but he never finished.
>    Because this is the part that makes all these ideas flying or not.

From what I remember, ASTs are roughly 10x the size of the code string. I’m not suggesting holding on to them in the image when they can be loaded/built from a file like the method sources currently are. Instead, we can associate a comment/annotation to a particular node in the AST without having to keep the AST around. For example, we could say that some comment is associated to the 4th node visited from the standard AST traversal. We don’t need to keep the AST around since we can recreate it and the 4th node visited will be the same.


John Brant
Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

Nicolas Cellier


2016-11-04 1:54 GMT+01:00 John Brant <[hidden email]>:

> On Nov 3, 2016, at 4:03 PM, stepharo <[hidden email]> wrote:
>
>> But how does this fit with a tree structure. For example, should “comment5” be associated with the “b” variable node, or the #foo:bar: message, or the sequence node that contains the #foo:bar: message, or the block or method node that contains the sequence node. You can choose, but only the writer of the comment can tell if you are right.
> How tokens solves this problem?
> I think that we do not really care one objective is to be able to recreate the code from the tree. I worked on a project in C
> where the guys want to know if the code is well commented and we built heuristics to associate the comments to the closest node.

The nodes have comments and the comments have their source and position so you can recreate the code from the tree already. It may not be as easy as you want, but all the information is there.


>>>> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
>>> With comments cannot act as nullobjects for such operation?
>> Yes, someone could spend time making all of this work with the existing code. However, why not fix the real problem in that comments and methods should not be a single string, but rather objects.
>
>    Marcus did that during his phd and he blew up memory. So after we got a guy that worked on tree compression but he never finished.
>    Because this is the part that makes all these ideas flying or not.

From what I remember, ASTs are roughly 10x the size of the code string. I’m not suggesting holding on to them in the image when they can be loaded/built from a file like the method sources currently are. Instead, we can associate a comment/annotation to a particular node in the AST without having to keep the AST around. For example, we could say that some comment is associated to the 4th node visited from the standard AST traversal. We don’t need to keep the AST around since we can recreate it and the 4th node visited will be the same.


John Brant

But this vision is a bit static. Code is living. Shall these annotations be lost on next refactoring? Or are we going to diff the two AST and salvage as many annotations as we can? Or are we going to keep an history of these comments in the versioning system?
Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

NorbertHartl

Am 04.11.2016 um 07:53 schrieb Nicolas Cellier <[hidden email]>:



2016-11-04 1:54 GMT+01:00 John Brant <[hidden email]>:

> On Nov 3, 2016, at 4:03 PM, stepharo <[hidden email]> wrote:
>
>> But how does this fit with a tree structure. For example, should “comment5” be associated with the “b” variable node, or the #foo:bar: message, or the sequence node that contains the #foo:bar: message, or the block or method node that contains the sequence node. You can choose, but only the writer of the comment can tell if you are right.
> How tokens solves this problem?
> I think that we do not really care one objective is to be able to recreate the code from the tree. I worked on a project in C
> where the guys want to know if the code is well commented and we built heuristics to associate the comments to the closest node.

The nodes have comments and the comments have their source and position so you can recreate the code from the tree already. It may not be as easy as you want, but all the information is there.


>>>> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
>>> With comments cannot act as nullobjects for such operation?
>> Yes, someone could spend time making all of this work with the existing code. However, why not fix the real problem in that comments and methods should not be a single string, but rather objects.
>
>    Marcus did that during his phd and he blew up memory. So after we got a guy that worked on tree compression but he never finished.
>    Because this is the part that makes all these ideas flying or not.

From what I remember, ASTs are roughly 10x the size of the code string. I’m not suggesting holding on to them in the image when they can be loaded/built from a file like the method sources currently are. Instead, we can associate a comment/annotation to a particular node in the AST without having to keep the AST around. For example, we could say that some comment is associated to the 4th node visited from the standard AST traversal. We don’t need to keep the AST around since we can recreate it and the 4th node visited will be the same.


John Brant

But this vision is a bit static. Code is living. Shall these annotations be lost on next refactoring? Or are we going to diff the two AST and salvage as many annotations as we can? Or are we going to keep an history of these comments in the versioning system?

+1

Norbert

Reply | Threaded
Open this post in threaded view
|

Re: why RBComment is not called RBCommentNode

John Brant-2
On 11/04/2016 03:50 AM, Norbert Hartl wrote:

>
>> Am 04.11.2016 um 07:53 schrieb Nicolas Cellier
>> <[hidden email]
>> <mailto:[hidden email]>>:
>>
>>
>>
>> 2016-11-04 1:54 GMT+01:00 John Brant <[hidden email]
>> <mailto:[hidden email]>>:
>>
>>
>>     > On Nov 3, 2016, at 4:03 PM, stepharo <[hidden email] <mailto:[hidden email]>> wrote:
>>     >>>> Furthermore, if comments are nodes, how do they affect all of the code rewriting and validation that is part of the RB and its rewrite tools
>>     >>> With comments cannot act as nullobjects for such operation?
>>     >> Yes, someone could spend time making all of this work with the existing code. However, why not fix the real problem in that comments and methods should not be a single string, but rather objects.
>>     >
>>     >    Marcus did that during his phd and he blew up memory. So after we got a guy that worked on tree compression but he never finished.
>>     >    Because this is the part that makes all these ideas flying or not.
>>
>>     From what I remember, ASTs are roughly 10x the size of the code
>>     string. I’m not suggesting holding on to them in the image when
>>     they can be loaded/built from a file like the method sources
>>     currently are. Instead, we can associate a comment/annotation to a
>>     particular node in the AST without having to keep the AST around.
>>     For example, we could say that some comment is associated to the
>>     4th node visited from the standard AST traversal. We don’t need to
>>     keep the AST around since we can recreate it and the 4th node
>>     visited will be the same.
>>
>>
>>     John Brant
>>
>>
>> But this vision is a bit static. Code is living. Shall these
>> annotations be lost on next refactoring? Or are we going to diff the
>> two AST and salvage as many annotations as we can? Or are we going to
>> keep an history of these comments in the versioning system?
>
> +1

It's only static if you do a poor implementation. The refactorings
already operate on the AST level so we don't need any AST diffing for
them. We would need to update the rewriter to update this information
for nodes that get replaced. This updating would likely need the help of
the annotation objects since some annotations might not want to be
copied to the replaced node, and others (e.g., comments) would need to
be copied.

The editor will need to be changed to display these annotations/comments
so we can also change it to keep track of the rearrangement of the
nodes. That leaves custom tools that use the compile:* methods. These
will need to be updated if they want to generate code with proper
annotations/comments.

Of course, all of these will need to be stored where they can be
versioned. Currently, we store some of the method properties outside of
a method body:
        !SomeClass methodsFor: 'some category' stamp: 'SomeAuthor 1/1/1111
11:11:11'!
The additional properties could be store similarly. The ANSI standard
already defines the SIF format which contains annotations that can be
added to filed-out objects.


John Brant