RB and reformatting

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RB and reformatting

Keith Alcock-3
(Dolphin) Smalltalkers,

Can anyone tell me why the RB wants to reformat most of the code that it refactors?  It
seems to me that it would be best to keep these two issues as separate as possible.  
There are probably times when the RB must generate new code and has to use its own
formatting conventions, but there are lots of places where simply replacing some text
would be preferable.  I won't try to list all the places where refactoring is really a
very clever text replacement, but changing a variable name is one of them.  When I
rename i to index, why should this code

renameTemporaryExample
        "This i should be called index, so rename it."

        1 to: 10 do: [ :i |
        ].
        ^self.

be reformatted to

renameTemporaryExample
        "This i should be called index, so rename it."

        1 to: 10 do: [:index | ].
        ^self

There really isn't a reason to change any CRs or remove periods except perhaps that
programming this solution was more expedient.

I'd really like to see an RB that makes as few changes to the format as possible.

Keith Alcock


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Chris Uppal-3
Keith Alcock wrote:

> Can anyone tell me why the RB wants to reformat most of the code that it
refactors?

I think it's because its re-write rules work in terms of the abstract syntax
trees, rather than the source itself.  If you think about it, then you'll
see that that is pretty much the only thing it *can* do, at least in the
general case.

But I agree, it re-writes code where there is no need at all.  No tools
should throw away semantically meaningful (for people, as opposed to
machines) information like whitespace and brackets, but that is exactly what
the RB does.

One of the big improvements in D5 was the RB, but -- for me -- the bloody
thing is just a nuisance.

I simply disable it.

Sad.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Peter van Rooijen
In reply to this post by Keith Alcock-3
> There really isn't a reason to change any CRs or remove periods except
perhaps that
> programming this solution was more expedient.
>
> I'd really like to see an RB that makes as few changes to the format as
possible.

Get VW7 :-).

Regards,

Peter

> Keith Alcock


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Don Roberts
In reply to this post by Keith Alcock-3
In the version of the RB that made it into vw7, the reformatter tries to
make minimal changes if possible. The other poster is right in that all
of the analysis and manipulation is done with AST and we have to get the
code back into readable form. To understand why even simple renamings
aren't search-and-replace, consider:

foo
        | foo|
        self foo: foo + sel foo

Renaming the temporary variable foo requires figuring out which foo's are
variables, and which aren't. The RB formats according to Kent Beck's best
practices. If you don't like the standard formatting, you can always
change the RBFormatter (it's pretty straightforward).

don

n Fri, 05 Jul 2002 16:25:29 -0500, Keith Alcock wrote:

> (Dolphin) Smalltalkers,
>
> Can anyone tell me why the RB wants to reformat most of the code that it
> refactors?  It seems to me that it would be best to keep these two
> issues as separate as possible. There are probably times when the RB
> must generate new code and has to use its own formatting conventions,
> but there are lots of places where simply replacing some text would be
> preferable.  I won't try to list all the places where refactoring is
> really a very clever text replacement, but changing a variable name is
> one of them.  When I rename i to index, why should this code
>
> renameTemporaryExample
> "This i should be called index, so rename it."
>
> 1 to: 10 do: [ :i |
> ].
> ^self.
>
> be reformatted to
>
> renameTemporaryExample
> "This i should be called index, so rename it."
>
> 1 to: 10 do: [:index | ].
> ^self
>
> There really isn't a reason to change any CRs or remove periods except
> perhaps that programming this solution was more expedient.
>
> I'd really like to see an RB that makes as few changes to the format as
> possible.
>
> Keith Alcock


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Peter van Rooijen
Don,

> Renaming the temporary variable foo requires figuring out which foo's are
> variables, and which aren't. The RB formats according to Kent Beck's best
> practices. If you don't like the standard formatting, you can always
> change the RBFormatter (it's pretty straightforward).

I didn't check this in VW7, but in the 3.5.x version of the RB (which I have
been using on VA) the formatter was not pluggable. So, I am in the situation
that I have a wonderful formatter (using VA, not RB parse trees), that
formats the way I like, but I cannot persuade the RB to use it after
refactoring a method. Thus, each refactoring means I am losing some of my
well-formatted code.

Could the formatter be pluggable in the future? Are you still going to
release new versions on other dialectversions than VW7+? There are some
annoying bugs that I could report to you if it seems useful.

Thanks,

Peter van Rooijen

> don


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Keith Alcock-3
In reply to this post by Don Roberts
Don,

Somewhere I said "really clever text replacement" and by that I meant replacing text
that is pointed to by some node in a parse tree.  I'll have to check out vw7, which
Peter also suggested.  Even if you have a parse tree, which I grant is a requirement,
each node should record its position in the original code (and it looks like it does).  
As the code is reconstituted, all unchanged (parts of) nodes should just write out the
original code (plus any trailing white space from the original if it is not attributed
to nodes).  Maybe that's how vw7 works already (or will when it's finished), but I'd
like to see an RBMirroringFormatter that just mirrors the format it saw in as much as
possible.  Maybe it's so irksome to me because I wrote code to do almost exactly this
about two years ago and I am loath to do it again.  Thanks for the background
information.

Keith


Don Roberts wrote:

>
> In the version of the RB that made it into vw7, the reformatter tries to
> make minimal changes if possible. The other poster is right in that all
> of the analysis and manipulation is done with AST and we have to get the
> code back into readable form. To understand why even simple renamings
> aren't search-and-replace, consider:
>
> foo
>         | foo|
>         self foo: foo + sel foo
>
> Renaming the temporary variable foo requires figuring out which foo's are
> variables, and which aren't. The RB formats according to Kent Beck's best
> practices. If you don't like the standard formatting, you can always
> change the RBFormatter (it's pretty straightforward).
>
> don
>
> n Fri, 05 Jul 2002 16:25:29 -0500, Keith Alcock wrote:
>
> > (Dolphin) Smalltalkers,
> >
> > Can anyone tell me why the RB wants to reformat most of the code that it
> > refactors?  It seems to me that it would be best to keep these two
> > issues as separate as possible. There are probably times when the RB
> > must generate new code and has to use its own formatting conventions,
> > but there are lots of places where simply replacing some text would be
> > preferable.  I won't try to list all the places where refactoring is
> > really a very clever text replacement, but changing a variable name is
> > one of them.  When I rename i to index, why should this code
> >
> > renameTemporaryExample
> >       "This i should be called index, so rename it."
> >
> >       1 to: 10 do: [ :i |
> >       ].
> >       ^self.
> >
> > be reformatted to
> >
> > renameTemporaryExample
> >       "This i should be called index, so rename it."
> >
> >       1 to: 10 do: [:index | ].
> >       ^self
> >
> > There really isn't a reason to change any CRs or remove periods except
> > perhaps that programming this solution was more expedient.
> >
> > I'd really like to see an RB that makes as few changes to the format as
> > possible.
> >
> > Keith Alcock


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

John Brant
In reply to this post by Peter van Rooijen
"Peter van Rooijen" <[hidden email]> wrote in message
news:ag79m1$i4q$[hidden email]...
>
> I didn't check this in VW7, but in the 3.5.x version of the RB (which I
have
> been using on VA) the formatter was not pluggable. So, I am in the
situation
> that I have a wonderful formatter (using VA, not RB parse trees), that
> formats the way I like, but I cannot persuade the RB to use it after
> refactoring a method. Thus, each refactoring means I am losing some of my
> well-formatted code.

I just checked, and in version 3.5.1 it is a pluggable formatter. You can
set it to use whatever formatter class you want: "RBProgramNode
formatterClass: MyFormatter". This also works in Dolphin (except that in
Dolphin, it is called StProgramNode).


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

John Brant
In reply to this post by Keith Alcock-3
"Keith Alcock" <[hidden email]> wrote in message
news:[hidden email]...

>  Even if you have a parse tree, which I grant is a requirement,
> each node should record its position in the original code (and it looks
like it does).
> As the code is reconstituted, all unchanged (parts of) nodes should just
write out the
> original code (plus any trailing white space from the original if it is
not attributed
> to nodes).  Maybe that's how vw7 works already (or will when it's
finished), but I'd
> like to see an RBMirroringFormatter that just mirrors the format it saw in
as much as
> possible.

This is similar to what the RB does in VW7. However, it can only determine a
few cases. For example, if you replace some code with a variable or a
literal, it just deletes the old code and puts in the new. However, this is
much harder in the general case. For example, consider adding a parameter to
a message send. If the message send looks like:
    self arg1: 1 arg2: 2
then what should the #arg1:arg2:arg3: message send look like? Should it be
on multiple lines or a single line? If we were to keep formatting
information, then it should go on the same line, but most people would
format it on multiple lines. As another example, consider extracting code
from the middle of a block. Since the code is in a block, it is most likely
indented more than plain method text is idented. When we extract this, if we
keep the same formatting, then the extracted text would be indented more
than we normally format methods.

In addition to these cases, you also have some cases that would appear to be
easy, but aren't as easy once you start studying them. For example, suppose
you want to replace all occurrences of the symbol #asdf with the number 2.
The simple implementation would delete the text for the symbol and insert
the text for the number. However, this does not work in all cases. For
example, consider #(-asdf). This is a literal array that represents the
symbols #- and #asdf. If we performed the simple replacement, we would end
up with #(-2) instead of the literal array with symbol #- and the number 2.
While these cases are difficult to fix, they are likely to case bugs since
it is easy to miss some case.


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Peter van Rooijen
In reply to this post by John Brant
"John Brant" <[hidden email]> wrote in message
news:ZyIV8.420916$352.56902@sccrnsc02...

> "Peter van Rooijen" <[hidden email]> wrote in message
> news:ag79m1$i4q$[hidden email]...
> >
> > I didn't check this in VW7, but in the 3.5.x version of the RB (which I
> have
> > been using on VA) the formatter was not pluggable. So, I am in the
> situation
> > that I have a wonderful formatter (using VA, not RB parse trees), that
> > formats the way I like, but I cannot persuade the RB to use it after
> > refactoring a method. Thus, each refactoring means I am losing some of
my
> > well-formatted code.
>
> I just checked, and in version 3.5.1 it is a pluggable formatter. You can
> set it to use whatever formatter class you want: "RBProgramNode
> formatterClass: MyFormatter".

John, forgive me for not checking this to be 100% sure before replying, but
I believe the formatter that you can plug in that way needs to support quite
a specific api to be pluggable. All that is really needed is source text or
the selector (if the method is already saved), plus the methodclass to go
into the formatter, and then the formatted source to come out. My formatter
(any formatter) could easily support such a plug-in api. What I'm asking for
is support for that. If I missed that and it is present, please tell me!

Thanks,

Peter van Rooijen

> This also works in Dolphin (except that in
> Dolphin, it is called StProgramNode).
>
>
> John Brant
>
>


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Keith Alcock-3
In reply to this post by John Brant
John Brant wrote:

>
> "Keith Alcock" <[hidden email]> wrote in message
> news:[hidden email]..
>
> This is similar to what the RB does in VW7. However, it can only determine a
> few cases. For example, if you replace some code with a variable or a
> literal, it just deletes the old code and puts in the new. However, this is
> much harder in the general case. For example, consider adding a parameter to
> a message send. If the message send looks like:
>     self arg1: 1 arg2: 2
> then what should the #arg1:arg2:arg3: message send look like? Should it be
> on multiple lines or a single line? If we were to keep formatting
> information, then it should go on the same line, but most people would
> format it on multiple lines. As another example, consider extracting code
> from the middle of a block. Since the code is in a block, it is most likely
> indented more than plain method text is idented. When we extract this, if we
> keep the same formatting, then the extracted text would be indented more
> than we normally format methods.
>

John,

I should definitely check out VW7 and then hope that any improvements will be ported to
D5.  I am not especially concerned about how the message send for #arg1:arg2:arg3 looks,
but I am concerned that the original method changes from

arg1: arg1 arg2: arg2

        1 to: 10 do: [ :i |
        ].
        ^self.

to

arg1: arg1 arg2: arg2 arg3: arg3
        1 to: 10 do: [:i | ].
        ^self


and that callers can change from

testArgs

        1 to: 10 do: [ :i |
                | tmp1 tmp2 |
                tmp1 := #(1 2 3).
                self arg1: i arg2: i.
        ].
        ^self.

to

testArgs
        1 to: 10
                do:
                        [:i |
                        | tmp1 tmp2 |
                        tmp1 := #(1 2 3).
                        self
                                arg1: i
                                arg2: i
                                arg3: nil].
        ^self

These changes have little to do with the message send.  It has probably already been
fixed, so I'll just wait for the changes to propagate to me.  Thanks.

Keith


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

John Brant
In reply to this post by Peter van Rooijen
"Peter van Rooijen" <[hidden email]> wrote in message
news:ag7udl$fa2$[hidden email]...
>
> John, forgive me for not checking this to be 100% sure before replying,
but
> I believe the formatter that you can plug in that way needs to support
quite
> a specific api to be pluggable. All that is really needed is source text
or
> the selector (if the method is already saved), plus the methodclass to go
> into the formatter, and then the formatted source to come out. My
formatter
> (any formatter) could easily support such a plug-in api. What I'm asking
for
> is support for that. If I missed that and it is present, please tell me!

OK, you specify the #formatterClass:. This class needs to understand #new
and whatever #new returns needs to understand #format:. That is all that is
required. The argument to the #format: message is the parse tree that needs
to be formatted and the return value is the formatted string. We cannot pass
a string, since parse tree can be constructed without any source.


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

John Brant
In reply to this post by Keith Alcock-3
"Keith Alcock" <[hidden email]> wrote in message
news:[hidden email]...
>
> I should definitely check out VW7 and then hope that any improvements will
be ported to
> D5.  I am not especially concerned about how the message send for
#arg1:arg2:arg3 looks,
> but I am concerned that the original method changes from

While you are not concerned, others would be concerned that a three argument
message was being formatted on a single line which is not following their
standard rules. You can't satisfy everyone. Therefore, we try to satisfy
those who use a standard formatter since their rules are codified.
Furthermore, if some piece of code requires some custom formatting, then in
my opinion, that code is wrong. It should be refactored so that formatting
does not matter.


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Blair McGlashan
In reply to this post by Peter van Rooijen
"Peter van Rooijen" <[hidden email]> wrote in message
news:ag5hg4$pro$[hidden email]...
> > There really isn't a reason to change any CRs or remove periods except
> perhaps that
> > programming this solution was more expedient.
> >
> > I'd really like to see an RB that makes as few changes to the format as
> possible.
>
> Get VW7 :-).

Or Dolphin 5.02, which might even be out before that.

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Peter van Rooijen
In reply to this post by John Brant
"John Brant" <[hidden email]> wrote in message
news:tz_V8.113730$Uu2.20114@sccrnsc03...

> "Peter van Rooijen" <[hidden email]> wrote in message
> news:ag7udl$fa2$[hidden email]...
> >
> > John, forgive me for not checking this to be 100% sure before replying,
> but
> > I believe the formatter that you can plug in that way needs to support
> quite
> > a specific api to be pluggable. All that is really needed is source text
> or
> > the selector (if the method is already saved), plus the methodclass to
go
> > into the formatter, and then the formatted source to come out. My
> formatter
> > (any formatter) could easily support such a plug-in api. What I'm asking
> for
> > is support for that. If I missed that and it is present, please tell me!
>
> OK, you specify the #formatterClass:. This class needs to understand #new
> and whatever #new returns needs to understand #format:. That is all that
is
> required. The argument to the #format: message is the parse tree that
needs
> to be formatted and the return value is the formatted string. We cannot
pass
> a string, since parse tree can be constructed without any source.

Hi John,

OK, so you understand that that is my problem. The #new and #format:
protocol are not the problem. My formatter has to be able to deal with an RB
parse tree be able to plugged into the RB. But my formatter only knows how
to deal with VA parse trees.

I suppose I could let my formatter ask (in #format:) for the source string
as constructed from the RB parse tree, then parse that using the VA parser,
then construct formatted source from the VA parse tree. Hmmm, it's not
pretty but it would work. If ... The RB parser loses its lossi-/bugginess.

It would be nice if you could eliminate the need for such a workaround by
providing an interface whereby the pluggable formatter can ask to receive
either a source string or an RB parse tree, instead of being forced to deal
with an RB parse tree.

John can you tell me:
1) Have you upgraded the RB parser to be able to process comments in such a
way that they won't be moved around the code without need? I understand a
(re)formatting parser has to be smarter than a parser that is used just to
feed the compiler, but since you are using it for formatting, perhaps you
would consider accepting the extra demands that places on the parser.
2) Have you fixed the bug where literal strings that extend over multiple
lines get modified by the RB formatter?
3) Do you plan to do anything to improve the RB on VA, or are we now
basically on our own in that dialect?
4) Are you interested in bug reports at all? There is something not working
well with refactoring class methods, but I always forget what it is, and
just repair by hand. If I knew you'd want to know about it, I could make
anote next time I encounter it and describe the symptoms and conditions.

Thanks and regards,

Peter van Rooijen

> John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Bill Schwab-2
Peter,

> I suppose I could let my formatter ask (in #format:) for the source string
> as constructed from the RB parse tree, then parse that using the VA
parser,
> then construct formatted source from the VA parse tree. Hmmm, it's not
> pretty but it would work. If ... The RB parser loses its lossi-/bugginess.
>
> It would be nice if you could eliminate the need for such a workaround by
> providing an interface whereby the pluggable formatter can ask to receive
> either a source string or an RB parse tree, instead of being forced to
deal
> with an RB parse tree.

I'll second the call for increasing the RB's "respect" for the formatting of
code provided to it.  As an example, I have a fair amount of code that
generates other code, and a good bit more that generates HTML that is going
to get a heavy reworking as soon as I can get to it.  The code in question
is quite useful, and yes, clever formatting helps in understanding and more
importantly in modifying/cloning it to make short work of the next tedious
task.  I suspect the same will be true of code that write SQL statements.

After some tweaking of the RB's block formatting, the results are ok to
quite good in most situations.  The big area in which mechanical formatting
is problematic is comments in cascades; the positioning is lost and the
formatted results can become meaningless.

I'm convinced that the RB is useful, but so are comments, especially those
that reveal how (and why!!) something changed over a year or two.


> 2) Have you fixed the bug where literal strings that extend over multiple
> lines get modified by the RB formatter?

Thanks for the heads-up on this one.

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

John Brant
In reply to this post by Peter van Rooijen
"Peter van Rooijen" <[hidden email]> wrote in message
news:agmiff$js1$[hidden email]...
>
> OK, so you understand that that is my problem. The #new and #format:
> protocol are not the problem. My formatter has to be able to deal with an
RB
> parse tree be able to plugged into the RB. But my formatter only knows how
> to deal with VA parse trees.

When we wrote the RB for VA, VA hid all of the parsing code so we couldn't
use their code and had to write our own. Of course writing our own was good
for portability, but it also means that if someone extends the platform's
parser, then our stuff won't work with their stuff. Anyway, I'm guessing
that VA's parse trees look alot like the RB's parse trees so it wouldn't be
that difficult to migrate your formatter to use the RB's parse trees. Plus,
once you have a RB formatter, then it will work for a bunch of different
Smalltalks (including Dolphin).


> I suppose I could let my formatter ask (in #format:) for the source string
> as constructed from the RB parse tree, then parse that using the VA
parser,
> then construct formatted source from the VA parse tree. Hmmm, it's not
> pretty but it would work. If ... The RB parser loses its lossi-/bugginess.
>
> It would be nice if you could eliminate the need for such a workaround by
> providing an interface whereby the pluggable formatter can ask to receive
> either a source string or an RB parse tree, instead of being forced to
deal
> with an RB parse tree.

There are cases where it must receive a parse tree, so it always needs to
understand a parse tree.

> John can you tell me:
> 1) Have you upgraded the RB parser to be able to process comments in such
a
> way that they won't be moved around the code without need? I understand a
> (re)formatting parser has to be smarter than a parser that is used just to
> feed the compiler, but since you are using it for formatting, perhaps you
> would consider accepting the extra demands that places on the parser.

The problem with comments is that you don't know what code they go with. For
example, if you have '3 "comment" + 4', what is the comment going with? Is
it 3 or is it the #+ message send. If you have statement comments, you still
can't easily tell. For example, consider these two cases (hopefully, they
don't get reformatted to badly):

1)
    self doSomeReallyComplexThing.    "This is the start of the comment"
                                                        "This is the
continuation"
    self doAnotherThing.

2)
    self doSomething.
    "This is a comment for the next statement"
    self doAnotherThing.

In the first case, the comment on the line by itself goes with the previous
statement, and in the second case, the comment goes with the next statement.
A parser cannot make the distinction.

I have argued before that comments should be removed from the code and
should become annotations on the code. These annotations could then
attributed to a particular node in the parse tree by the user. I would like
some system like Word where the comment would either appear as bubble text
to the side of the code (like Word XP) or highlighted in yellow and when you
move the mouse over it, it would open a pop-up window of the comment (like
Word 97/2K?). When the comment was entered as an annotation, we would have
the information that is needed -- the exact node that the user is
commenting.

Anyway, the RB under VW7 (and soon to be for Dolphin 5 patch level 2) has
better support for keeping comments with their most appropriate parse node
instead of keeping them at the statement level.


> 2) Have you fixed the bug where literal strings that extend over multiple
> lines get modified by the RB formatter?

I don't know of any such bug. I'm guessing you are talking about a bug in
the VA text widgets. I can't remember all the details, but there used to be
a problem where if you set the text on a widget and immediately asked for
the text, you would get different strings. The widget was changing/removing
the line end characters.

> 3) Do you plan to do anything to improve the RB on VA, or are we now
> basically on our own in that dialect?

That depends on what jobs I get. If I get VA jobs, then I'll probably work
on updating the RB for VA, but if I don't get VA jobs, then I probably won't
do anything with it -- unless I don't get any jobs and have lots of free
time :-(. Tom Robinson has made a new VA version that (1) loads cleanly in
the VA6.0 image (the #-> method) and (2) contains code for retaining open
windows after image saves. I need to find out the procedures for getting
this on the st.cs archive.

> 4) Are you interested in bug reports at all? There is something not
working
> well with refactoring class methods, but I always forget what it is, and
> just repair by hand. If I knew you'd want to know about it, I could make
> anote next time I encounter it and describe the symptoms and conditions.

Sure, send the bug report. There are a few fixes waiting for a release
listed at:
    http://wiki.cs.uiuc.edu/RefactoringBrowser/Fixes
You problem may be listed there...


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Peter van Rooijen
"John Brant" <[hidden email]> wrote in message
news:WaIX8.333227$[hidden email]...
> "Peter van Rooijen" <[hidden email]> wrote in message
> news:agmiff$js1$[hidden email]...
> >
> > OK, so you understand that that is my problem. The #new and #format:
> > protocol are not the problem. My formatter has to be able to deal with
an
> RB
> > parse tree be able to plugged into the RB. But my formatter only knows
how
> > to deal with VA parse trees.
>
> When we wrote the RB for VA, VA hid all of the parsing code so we couldn't
> use their code and had to write our own. Of course writing our own was
good
> for portability, but it also means that if someone extends the platform's
> parser, then our stuff won't work with their stuff. Anyway, I'm guessing
> that VA's parse trees look alot like the RB's parse trees so it wouldn't
be
> that difficult to migrate your formatter to use the RB's parse trees.
Plus,
> once you have a RB formatter, then it will work for a bunch of different
> Smalltalks (including Dolphin).

Okay, I see the value in that. I am still undecided about what what my
position is regarding the differences between the VA and the RB parser.
Question arise like: 'what is the value of a generic Smalltalk parser?' 'how
should one treat dialect-specific constructs such as ## and {x.y}?'.

> > I suppose I could let my formatter ask (in #format:) for the source
string
> > as constructed from the RB parse tree, then parse that using the VA
> parser,
> > then construct formatted source from the VA parse tree. Hmmm, it's not
> > pretty but it would work. If ... The RB parser loses its
lossi-/bugginess.
> >
> > It would be nice if you could eliminate the need for such a workaround
by
> > providing an interface whereby the pluggable formatter can ask to
receive
> > either a source string or an RB parse tree, instead of being forced to
> deal
> > with an RB parse tree.
>
> There are cases where it must receive a parse tree, so it always needs to
> understand a parse tree.

I don;t follow that. My formatter could opt to reformat when source is
available only, could it not? It would do that, and for me it would not be
an important restriction.

> > John can you tell me:
> > 1) Have you upgraded the RB parser to be able to process comments in
such
> a
> > way that they won't be moved around the code without need? I understand
a
> > (re)formatting parser has to be smarter than a parser that is used just
to
> > feed the compiler, but since you are using it for formatting, perhaps
you
> > would consider accepting the extra demands that places on the parser.
>
> The problem with comments is that you don't know what code they go with.
For
> example, if you have '3 "comment" + 4', what is the comment going with? Is
> it 3 or is it the #+ message send. If you have statement comments, you
still

> can't easily tell. For example, consider these two cases (hopefully, they
> don't get reformatted to badly):
>
> 1)
>     self doSomeReallyComplexThing.    "This is the start of the comment"
>                                                         "This is the
> continuation"
>     self doAnotherThing.
>
> 2)
>     self doSomething.
>     "This is a comment for the next statement"
>     self doAnotherThing.
>
> In the first case, the comment on the line by itself goes with the
previous
> statement, and in the second case, the comment goes with the next
statement.
> A parser cannot make the distinction.

I accept that you cannot divine the author;s intention in general. But I
have the distinct impression that the RB parser takes less trouble than it
could to process comments thoughtfully. Is that impression correct?

>  have argued before that comments should be removed from the code and
> should become annotations on the code. These annotations could then
> attributed to a particular node in the parse tree by the user. I would
like
> some system like Word where the comment would either appear as bubble text
> to the side of the code (like Word XP) or highlighted in yellow and when
you
> move the mouse over it, it would open a pop-up window of the comment (like
> Word 97/2K?). When the comment was entered as an annotation, we would have
> the information that is needed -- the exact node that the user is
> commenting.
>
> Anyway, the RB under VW7 (and soon to be for Dolphin 5 patch level 2) has
> better support for keeping comments with their most appropriate parse node
> instead of keeping them at the statement level.

Is there anything in your agreement with Cincom that prevents such
improvements from showing up in other dialects?

> > 2) Have you fixed the bug where literal strings that extend over
multiple
> > lines get modified by the RB formatter?
>
> I don't know of any such bug. I'm guessing you are talking about a bug in
> the VA text widgets. I can't remember all the details, but there used to
be
> a problem where if you set the text on a widget and immediately asked for
> the text, you would get different strings. The widget was
changing/removing
> the line end characters.

I will try to reproduce it next time I see it, then tell you.

> > 3) Do you plan to do anything to improve the RB on VA, or are we now
> > basically on our own in that dialect?
>
> That depends on what jobs I get. If I get VA jobs, then I'll probably work
> on updating the RB for VA, but if I don't get VA jobs, then I probably
won't
> do anything with it -- unless I don't get any jobs and have lots of free
> time :-(. Tom Robinson has made a new VA version that (1) loads cleanly in
> the VA6.0 image (the #-> method) and (2) contains code for retaining open
> windows after image saves. I need to find out the procedures for getting
> this on the st.cs archive.

Good if it will be there!

> > 4) Are you interested in bug reports at all? There is something not
> working
> > well with refactoring class methods, but I always forget what it is, and
> > just repair by hand. If I knew you'd want to know about it, I could make
> > anote next time I encounter it and describe the symptoms and conditions.
>
> Sure, send the bug report. There are a few fixes waiting for a release
> listed at:
>     http://wiki.cs.uiuc.edu/RefactoringBrowser/Fixes
> You problem may be listed there...

I didn't see it, but it's good to know you want to hear abot it. I'll
investigate next time I encounter something and leyt you know.

Thanks,

Peter

> John Brant


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Bill Schwab
In reply to this post by John Brant
John,

> The problem with comments is that you don't know what code they go with.
For
> example, if you have '3 "comment" + 4', what is the comment going with? Is
> it 3 or is it the #+ message send. If you have statement comments, you
still

> can't easily tell. For example, consider these two cases (hopefully, they
> don't get reformatted to badly):
>
> 1)
>     self doSomeReallyComplexThing.    "This is the start of the comment"
>                                                         "This is the
> continuation"
>     self doAnotherThing.
>
> 2)
>     self doSomething.
>     "This is a comment for the next statement"
>     self doAnotherThing.
>
> In the first case, the comment on the line by itself goes with the
previous
> statement, and in the second case, the comment goes with the next
statement.
> A parser cannot make the distinction.

Very true.  But, we can learn how comments get assigned to statements and
arrange for things to appear at the correct level after the parser has done
its work.  The place where I have the biggest problem is comments in
cascades - they get scattered either above or below the cascade node, rather
than being assigned to the messages in the cascade.  Would it work to assign
the comments to the message that follows them?

I'm thinking of this as being analogous to handwriting recognition: the
recognizer and user train each other.


> I have argued before that comments should be removed from the code and
> should become annotations on the code. These annotations could then
> attributed to a particular node in the parse tree by the user. I would
like
> some system like Word where the comment would either appear as bubble text
> to the side of the code (like Word XP) or highlighted in yellow and when
you
> move the mouse over it, it would open a pop-up window of the comment (like
> Word 97/2K?). When the comment was entered as an annotation, we would have
> the information that is needed -- the exact node that the user is
> commenting.

Interesting.  We'd need an XML exchange format (or just an extension to SIF)
to go with it though, but, IIRC, that's in the works.  Even then, I still
think I'd like to have some simple rules for matching unassigned comments to
nodes, and then be able to override that if needed.  Also, what happens when
the node goes away?


> Anyway, the RB under VW7 (and soon to be for Dolphin 5 patch level 2) has
> better support for keeping comments with their most appropriate parse node
> instead of keeping them at the statement level.

Great!  You're probably way ahead of me on this, but, I wrote a small unit
test when I first began to suspect that parsing vs. formatting problems were
behind my "next biggest complaint" about the RB's formatting; it appears
below just in case it might be of use.

Have a good one,

Bill


===============

"Filed out from Dolphin Smalltalk 2002 release 5.00"!

TestCase subclass: #AcmeFormatterTestAreaTests
 instanceVariableNames: ''
 classVariableNames: ''
 poolDictionaries: ''
 classInstanceVariableNames: ''!
AcmeFormatterTestAreaTests guid: (GUID fromString:
'{3329E2DE-50F2-4B8D-8D89-4661491C06CC}')!
AcmeFormatterTestAreaTests comment: ''!
!AcmeFormatterTestAreaTests categoriesForClass!Unclassified! !
!AcmeFormatterTestAreaTests methodsFor!

format:selector

 ^SmalltalkSystem current formatterClass new format:( self parse:selector ).
!

gratuitousCascade
 "This is something to format"

 'hello'
  "First comment"
  size;

  "second comment"
  size;

  "third comment"
  size;
  yourself
!

parse:selector
  | aClass reformatted source parsed |

 aClass := self class.
 source := ( aClass compiledMethodAt:selector ) getSource.
 ^SmalltalkParser parseMethod:source in:aClass.
!

testGratuitousCascadeParsing
  | parsed cascade |

 parsed := self parse:#gratuitousCascade.
 "
  self format:#gratuitousCascade.
 "

 "Locate the cascade node and verify its type"
 cascade := parsed body statements first.
 self should:[cascade isKindOf:StCascadeNode].

 "There are three comments inside the cascade, one with
 each message"
 self should:[cascade messages allSatisfy:[:msg | msg comments size =1]].

 "I _think_ there should be one comment assigned to the top
 node"
 self should:[parsed comments size =1]! !
!AcmeFormatterTestAreaTests categoriesFor: #format:!public! !
!AcmeFormatterTestAreaTests categoriesFor: #gratuitousCascade!public! !
!AcmeFormatterTestAreaTests categoriesFor: #parse:!public! !
!AcmeFormatterTestAreaTests categoriesFor:
#testGratuitousCascadeParsing!public!Running! !

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Peter Goodall-4
In reply to this post by Don Roberts
In article <[hidden email]>,
[hidden email] says...

> In the version of the RB that made it into vw7, the reformatter tries to
> make minimal changes if possible. The other poster is right in that all
> of the analysis and manipulation is done with AST and we have to get the
> code back into readable form. To understand why even simple renamings
> aren't search-and-replace, consider:
>
> foo
> | foo|
> self foo: foo + sel foo
>
> Renaming the temporary variable foo requires figuring out which foo's are
> variables, and which aren't. The RB formats according to Kent Beck's best
> practices. If you don't like the standard formatting, you can always
> change the RBFormatter (it's pretty straightforward).
>

Shorter methods make formatting less important. Handful of lines
only, and Mr Beck's format works perfectly.

Remove your scroll-bars!

--Regards,
  Peter Goodall


Reply | Threaded
Open this post in threaded view
|

Re: RB and reformatting

Chris Uppal-3
In reply to this post by Peter van Rooijen
Peter van Rooijen wrote:

> Question arise like: 'what is the value of a generic Smalltalk parser?'
'how
> should one treat dialect-specific constructs such as ## and {x.y}?'.

Or, just for fun, my personal favourite "dialect-specific construct":

    42rodd.

Yes, I know it's a pathological example, but that's the kind of guy I am...
;-)

Dolphin treats this as:
     42r0 odd.
i.e.
    0 odd.
i.e.
    false.

VW treats it as 42895; which I guess is calculated by taking "odd" as a
base-42 number, with $o representing 24 and $d representing 13.  Why, I
don't know.  <shrug>.

VASt treats it as
    42 rodd.
which would normally provoke a DNU walkback.

Sqeak treats it as a syntax error, "digits expected".

Strongtalk treats it as a syntax error, "base must be betweeen 2 and 36".

According to ANSI (as I read it) it is requred to be an error (i.e. not
"implementation defined" or "undefined").

I don't currently have installations of  ST-MT or ST/X  (lost 'em in a hard
disk crash) or GNUSt.

    -- chris


12