Smalltalk › Usenets › Dolphin Smalltalk

RB and reformatting

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

34 messages Options

John Brant

Re: RB and reformatting

"Peter van Rooijen" <[hidden email]> wrote in message
news:agnr8q$47e$[hidden email]...
>
> Okay, I see the value in that. I am still undecided about what what my
> position is regarding the differences between the VA and the RB parser.
> Question arise like: 'what is the value of a generic Smalltalk parser?'
'how
> should one treat dialect-specific constructs such as ## and {x.y}?'.

Of course, I can question the usefulness of all the different dialects to
have different syntax. For example, is the ##() really needed? I replaced
all occurrences of ##() in Dolphin once, and didn't really notice any
slowdown (of course, I might not have ran the right code that shows where it
is needed). Would class variables be better to hold the ##() expressions? My
biggest complaint with the ##() expressions is that once they are compiled,
you can no longer do any of the standard Smalltalk searches (e.g., senders).

Anyway, I believe a generic Smalltalk parser is useful since it allows
things like a fairly portable RB. Without the generic parser, there wouldn't
be a generic parse tree rewriter which the RB uses to do all of its
refactorings. Smalllint also uses the generic parser for most of its rules.
The parsers in VW and VA are both setup to compile code. They are not setup
to transform code, and the RB needs to transform code. Furthermore, without
a generic parser, an RB for Dolphin would not have been possible to do in
Smalltalk (since Dolphin's parser is in C++). I just wish the generic parser
would become the standard parser for all Smalltalks.

> I don;t follow that. My formatter could opt to reformat when source is
> available only, could it not? It would do that, and for me it would not be
> an important restriction.

You could do something like:

RBMethodNode>>isUnmodified
^self = (RBParser parseMethod: self source onError: [:str :pos |
^false])

And in your formatter, just check if the method node isUnmodified. If so,
just get the method's source and format that with VA's parse trees.

> I accept that you cannot divine the author;s intention in general. But I
> have the distinct impression that the RB parser takes less trouble than it
> could to process comments thoughtfully. Is that impression correct?

You can always make more complex rules for formatting. The question is how
much does it buy you. If we spent a month updating the formatting rules
(making it more complex and less maintainable), and we only satisfied a
couple more people; would it be worth it? If you aren't getting paid for
such modifications and you have other more interesting things to do, I doubt
that many people would think it was worth it (especially, if you were
already satisfied with the output).

> > Anyway, the RB under VW7 (and soon to be for Dolphin 5 patch level 2)
has
> > better support for keeping comments with their most appropriate parse
node
> > instead of keeping them at the statement level.
>
> Is there anything in your agreement with Cincom that prevents such
> improvements from showing up in other dialects?

Nope, they licensed the source from us. We can do whatever we want with the
source, and Cincom can do whatever they want. I've already send the parser
modifications to Blair, so they will probably show up in the next set of
patches to Dolphin.

John Brant

Blair McGlashan

Re: RB and reformatting

In reply to this post by Chris Uppal-3

"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]...

> Peter van Rooijen wrote:
>
> > Question arise like: 'what is the value of a generic Smalltalk parser?'
> 'how
> > should one treat dialect-specific constructs such as ## and {x.y}?'.
>
> Or, just for fun, my personal favourite "dialect-specific construct":
>
> 42rodd.
> ...

Yes, but in this case all the dialects (including Dolphin) that don't report
this as a syntax error are wrong. This is not a difference in syntax between
dialects, but a bug in a number of them.

Personally I think there is a lot of value to having a common core parser
for Smalltalk - "standards" are open to interpretation, so a single
interpretation of that standard is (IMO) better than many, even if it is
necessary to extend that parser to support dialect specific syntax such as
for FFI calls, etc.

Regards

Blair

Blair McGlashan

Re: RB and reformatting

In reply to this post by John Brant

"John Brant" <[hidden email]> wrote in message
news:JCiY8.35250$[hidden email]...

> "Peter van Rooijen" <[hidden email]> wrote in message
> news:agnr8q$47e$[hidden email]...
> >
> > Okay, I see the value in that. I am still undecided about what what my
> > position is regarding the differences between the VA and the RB parser.
> > Question arise like: 'what is the value of a generic Smalltalk parser?'
> 'how
> > should one treat dialect-specific constructs such as ## and {x.y}?'.
>
> Of course, I can question the usefulness of all the different dialects to
> have different syntax. For example, is the ##() really needed?

No, it is not really needed, and frankly I wish we'd never put it in.
Nevertheless it is a fact of life now.

>...I replaced
> all occurrences of ##() in Dolphin once, and didn't really notice any
> slowdown (of course, I might not have ran the right code that shows where
it
> is needed).

Not all uses are performance optimizations. A common idom is the ##(self)
usage, which creates a "static binding" to the class without requiring an
explicit reference to the global. Of course these could be rewritten too by
exchanging the self reference for the class.

>...Would class variables be better to hold the ##() expressions?

Probably.

>...My
> biggest complaint with the ##() expressions is that once they are
compiled,
> you can no longer do any of the standard Smalltalk searches (e.g.,
senders).

Eliot pointed out to me that this is easily "fixed" by adding the literals
from the const expression to the literal frame of the method. In fact
because you list this as your biggest complaint, I have modified the Dolphin
compiler for 5.02 so that it retains references in the literal frame.
Although this loses some of the space optimizing quality of the ##(), that
probably isn't important. I've reduced the space overhead by retaining only
symbol and global references (and literal arrays in case they contain any of
those). It may be that even this is not appropriate, and it is better just
to retain all literals, so please reply with any comments you may have on
that.

Right so with that sorted, the next biggest problem is that the RB cannot
rewrite such expressions because the parser does not handle them correctly
:-).

>
> Anyway, I believe a generic Smalltalk parser is useful since it allows
> things like a fairly portable RB. Without the generic parser, there
wouldn't
> be a generic parse tree rewriter which the RB uses to do all of its
> refactorings. Smalllint also uses the generic parser for most of its
rules.
> The parsers in VW and VA are both setup to compile code. They are not
setup
> to transform code, and the RB needs to transform code. Furthermore,
without
> a generic parser, an RB for Dolphin would not have been possible to do in
> Smalltalk (since Dolphin's parser is in C++). I just wish the generic
parser
> would become the standard parser for all Smalltalks.

FWIW, I agree.

>...

Regards

Blair

Peter van Rooijen

Re: RB and reformatting

In reply to this post by Blair McGlashan

"Blair McGlashan" <[hidden email]> wrote in message
news:agu1s4$oa07f$[hidden email]...
> > 42rodd.
>
> Yes, but in this case all the dialects (including Dolphin) that don't
report
> this as a syntax error are wrong. This is not a difference in syntax
between
> dialects, but a bug in a number of them.
>
> Personally I think there is a lot of value to having a common core parser
> for Smalltalk

Blair,

I agree completely. This could work very well as a Camp Smalltalk project. A
standard parser (based on ANSI, but designed from the start to support
plug-ins for dialectversion-specific syntax constructs) would be great! This
would yield great code to study as well as save a lot of work in many areas,
and would also give us portable parse trees.

The beneficial effects of portable parse trees will be hard to overestimate.
I would suggest that the parser/parse trees be designed from the start to
support not only compilation, but also

- refactoring/rewriting
- source (re)formatting
- parse tree interpretation
- generation other than from source
- easy to keep around (cache) in the image
- easy queryability (so it's very simple - and fast - to ask things like
System allMethods select: [:m | (m referencesGlobal_nameString:
'AbtMRIManager') and: [m sendsSelector: #pruneDeadRequests]])

(to name but a few things that come to mind)

Some elements of the basic design could then be

- two way references between all elements of the parse tree
- direct links into the original source if available (i.e., keeping the
original source around)
- externalizability of the parse trees

What do you think?

Regards,

Peter van Rooijen

> - "standards" are open to interpretation, so a single
> interpretation of that standard is (IMO) better than many, even if it is
> necessary to extend that parser to support dialect specific syntax such as
> for FFI calls, etc.
>
> Regards
>
> Blair

Blair McGlashan

Re: RB and reformatting

Peter.

You wrote in message news:aguhhf$j9d$[hidden email]...

[Blair wrote]
> > Personally I think there is a lot of value to having a common core
parser
> > for Smalltalk
>
>...
> I agree completely. This could work very well as a Camp Smalltalk project.
A
> standard parser (based on ANSI, but designed from the start to support
> plug-ins for dialectversion-specific syntax constructs) would be great!
This
> would yield great code to study as well as save a lot of work in many
areas,
> and would also give us portable parse trees.

So why not use the RB parser? It appears that it already satisfies most of
your requirements. It is fast, well designed and I can testify that it is
easy to extend since we have already extended it (and associated formatters)
to parse (reformat) our own special FFI call syntax. I don't see the point
in inventing another parser since, quite apart from anything else, I doubt
it will be used. The best chance of getting a common parser adopted is the
RB parser. It is now in the Dolphin and VW images. Our intention is that it
should replace our own parser. I don't know Cincom's plans, but having two
parsers is not very satisfactory.

Regards

Blair

Peter van Rooijen

Re: RB and reformatting

"Blair McGlashan" <[hidden email]> wrote in message
news:agujus$og559$[hidden email]...
> Peter.
>
> So why not use the RB parser? It appears that it already satisfies most of
> your requirements. It is fast, well designed and I can testify that it is
> easy to extend since we have already extended it (and associated
formatters)
> to parse (reformat) our own special FFI call syntax. I don't see the point
> in inventing another parser since, quite apart from anything else, I doubt
> it will be used. The best chance of getting a common parser adopted is the
> RB parser. It is now in the Dolphin and VW images. Our intention is that
it
> should replace our own parser. I don't know Cincom's plans, but having two
> parsers is not very satisfactory.

What about the rights to the code? If people are really going to depend on
it, it would have to go PD, IMHO. That would be super-cool, and I'm sure it
would generate a lot of interest. That would be a really great way of
kickstarting this whole concept.

Regards,

Peter van Rooijen

> Regards
>
> Blair

Eliot Miranda

A Common Parser [Was: RB and reformatting]

In reply to this post by Blair McGlashan

Blair McGlashan wrote:

>
> Peter.
>
> You wrote in message news:aguhhf$j9d$[hidden email]...
>
> [Blair wrote]
> > > Personally I think there is a lot of value to having a common core
> parser
> > > for Smalltalk
> >
> >...
> > I agree completely. This could work very well as a Camp Smalltalk project.
> A
> > standard parser (based on ANSI, but designed from the start to support
> > plug-ins for dialectversion-specific syntax constructs) would be great!
> This
> > would yield great code to study as well as save a lot of work in many
> areas,
> > and would also give us portable parse trees.
>
> So why not use the RB parser? It appears that it already satisfies most of
> your requirements. It is fast, well designed and I can testify that it is
> easy to extend since we have already extended it (and associated formatters)
> to parse (reformat) our own special FFI call syntax. I don't see the point
> in inventing another parser since, quite apart from anything else, I doubt
> it will be used. The best chance of getting a common parser adopted is the
> RB parser. It is now in the Dolphin and VW images. Our intention is that it
> should replace our own parser. I don't know Cincom's plans, but having two
> parsers is not very satisfactory.

We haven't debriefed after vw7 to decide what to do yet, and given that
both parsers work, eliminating one isn't a high priority for us; we have
*lots* of much higher priority problems. However, at least some of
these other projects involve cleaning-up the code management system(s)
(compiling, loading, browsing, change recording, version control), and
in that context parsing is an important part of the mix.

My main concern is that as an engineering organization we don't have the
freedom from constraints that a typical CampSmalltalk project has. I
don't want to be forced by circumstance into the situation of appearing
to obstruct such a project simply because vw engineering can't respond
fast enough. I therefore plead for any such project to appreciate the
constraints of the vendors and for that project to try hard, especially
when gathering requirements and when testing, to involve the vendors.
For a project like this to succeed I think the vendors should be the
(XP) customer.

If such a project doesn't treat the vendors as the customer and delivers
something they can't use it won't get into the current generation of
products, and that would affect many in the community. If, on the other
hand, the project faces up to the practical and political difficulties
up front it has a much better chance of succeeding.

--
_______________,,,^..^,,,____________________________
Eliot "tired and shagged-out after a 9-month vw7 squawk" Miranda

Dave Harris-3

Re: RB and reformatting

In reply to this post by Blair McGlashan

[hidden email] (Blair McGlashan) wrote (abridged):
> The best chance of getting a common parser adopted is the
> RB parser. It is now in the Dolphin and VW images. Our intention is
> that it should replace our own parser.

Am I right in thinking that having the parser in Smalltalk is very
different to having the compiler in Smalltalk, but that for some purposes
it would be as useful? It could enable us to add new syntax, provided the
new syntax could (in effect) be translated into the old syntax.

Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
[hidden email] | And close your eyes with holy dread,
| For he on honey dew hath fed
http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."

Bijan Parsia-2

Re: RB and reformatting

On Mon, 15 Jul 2002, Dave Harris wrote:

> [hidden email] (Blair McGlashan) wrote (abridged):
> > The best chance of getting a common parser adopted is the
> > RB parser. It is now in the Dolphin and VW images. Our intention is
> > that it should replace our own parser.
>
> Am I right in thinking that having the parser in Smalltalk is very
> different to having the compiler in Smalltalk,

Er...I'm jumping into this thread rather late without checking back up it,
*but*, AFAIK, every smalltalk parser is implemented in Smalltalk. This is
absolutely true for Squeak.

> but that for some purposes
> it would be as useful? It could enable us to add new syntax, provided the
> new syntax could (in effect) be translated into the old syntax.

This is how the various alternative syntaxes in Squeak work, as well as
the Prolog/V port, and even the pretty printer.

Re: Blair about RB parser, there's a pretty good chance that it will
become "the" Squeak parser, as the current Squeak parser has some serious
inadequacies.

There's also the T-Gen Smalltalk Parser, but I suspect that the RB
advantages (especially being part of the RB!!) are overwhelming.

Cheers,
Bijan Parsia.

Paolo Bonzini-2

Re: RB and reformatting

In reply to this post by Dave Harris-3

> > The best chance of getting a common parser adopted is the
> > RB parser. It is now in the Dolphin and VW images. Our intention is
> > that it should replace our own parser.

... and GNU Smalltalk too :-) -- not yet released, but working on my hard disk.

Paolo

Dave Harris-3

Re: RB and reformatting

In reply to this post by Bijan Parsia-2

[hidden email] (Bijan Parsia) wrote (abridged):
> Er...I'm jumping into this thread rather late without checking back up
> it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk.
> This is absolutely true for Squeak.

I don't believe it's true for Dolphin. As I understand it, Dolphin's
parser is part of its compiler, and both are written in C++.

I gather they plan to move to a Smalltalk parser. I don't know if they
plan to move to a Smalltalk compiler, too. If they do, my question becomes
almost moot :-)

I say almost, because even if Dolphin's compiler was written in Smalltalk
and open to hacking, hacks to it would presumably not be portable to other
Smalltalk dialects. I imagine standardising the compiler is much harder
than standardising the parser. All the vendors want their own bytecodes
because they have different ideas about what the best bytecodes are;
bytecode design is part of VM design. So standardisation should stop at
the parser level.

Hence my earlier question: am I right in thinking standardisation at the
parser level is almost as useful as standardising the entire compiler?

Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
[hidden email] | And close your eyes with holy dread,
| For he on honey dew hath fed
http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."

Bijan Parsia-2

Re: RB and reformatting

On Tue, 16 Jul 2002, Dave Harris wrote:

> [hidden email] (Bijan Parsia) wrote (abridged):
> > Er...I'm jumping into this thread rather late without checking back up
> > it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk.
> > This is absolutely true for Squeak.
>
> I don't believe it's true for Dolphin. As I understand it, Dolphin's
> parser is part of its compiler, and both are written in C++.

Wow. I'm astonished. I wouldn't have thought the performance demands of a
typical smalltalk compiler or parser required non-Smalltalk code, and I
wouldn't have thought that a Smalltalk vendor would have *prefered* to
write 'em in C++ :) Shoot my expectations in the foot! ;)

> I gather they plan to move to a Smalltalk parser. I don't know if they
> plan to move to a Smalltalk compiler, too. If they do, my question becomes
> almost moot :-)

Heh.

> I say almost, because even if Dolphin's compiler was written in Smalltalk
> and open to hacking, hacks to it would presumably not be portable to other
> Smalltalk dialects. I imagine standardising the compiler is much harder
> than standardising the parser. All the vendors want their own bytecodes
> because they have different ideas about what the best bytecodes are;
> bytecode design is part of VM design. So standardisation should stop at
> the parser level.

You mean "stop at the parse node classes level"? There could be hooks for
code transformation and optimization before hitting the bytecode, and have
standard tools for manipulating and analyzing the bytecode are nice.

> Hence my earlier question: am I right in thinking standardisation at the
> parser level is almost as useful as standardising the entire compiler?

Probably. Though, really, control at lowerlevels is nice too, especially
if you're implementing languages (like Prolog) or language extentions
(like Server Pages).

Indeed, I'm so used to having that access, I didn't imagine that it
wouldn't be in dolphin (Squeak and VisualWorks having substationally the
same history, their compilers are in Smalltalk).

Cheers,
Bijan Parsia.

Andy Bower

Re: RB and reformatting

Bijan, Dave,

"Bijan Parsia" <[hidden email]> wrote in message
news:[hidden email]...

> On Tue, 16 Jul 2002, Dave Harris wrote:
>
> > [hidden email] (Bijan Parsia) wrote (abridged):
> > > Er...I'm jumping into this thread rather late without checking back up
> > > it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk.
> > > This is absolutely true for Squeak.
> >
> > I don't believe it's true for Dolphin. As I understand it, Dolphin's
> > parser is part of its compiler, and both are written in C++.
>
> Wow. I'm astonished. I wouldn't have thought the performance demands of a
> typical smalltalk compiler or parser required non-Smalltalk code, and I
> wouldn't have thought that a Smalltalk vendor would have *prefered* to
> write 'em in C++ :) Shoot my expectations in the foot! ;)

The Dolphin compiler (and parser) *is* currently written in C++. The reason
is that we needed somewhere to start when we were first writing the product
back in 1995. If you like, it was "the end of the recursion" when getting
the first Dolphin to boot. We had always intended to rewrites the beast in
Smalltalk but have never got around to it because at least the C++ one works
and there have always been plenty more urgent things to attend to. Sorry
about your foot BTW!

> > I gather they plan to move to a Smalltalk parser. I don't know if they
> > plan to move to a Smalltalk compiler, too. If they do, my question
becomes
> > almost moot :-)
>
> Heh.

We now make use of the RB parser as part of the refactorings in Dolphin 5
and we have had to put the work into getting this to correctly handle
Dolphin syntax (the external interface syntax in particular). This means we
currently have two parsers in the system which is no good thing since they
can quite possibly disagree in certain situations. So our intention (no
timescale yet) is to replace both the C++ compiler and parser by the RB
parser and a compiler written in Smalltalk. The interesting side effect of
this is that compilation will probably be faster. IIRC Blair did some tests
and the RB parser was significantly faster than the C++ one. Okay, okay, I
wrote the C++ version!!

[snip]

> Indeed, I'm so used to having that access, I didn't imagine that it
> wouldn't be in dolphin (Squeak and VisualWorks having substationally the
> same history, their compilers are in Smalltalk).

Squeak and VW never had to be rewritten from scratch though. As far as I
remember they are both directly descended from the original PARC images
tapes.

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---

John Brant

Re: RB and reformatting

In reply to this post by Peter van Rooijen

"Peter van Rooijen" <[hidden email]> wrote in message
news:aguhhf$j9d$[hidden email]...
>
> The beneficial effects of portable parse trees will be hard to
overestimate.
> I would suggest that the parser/parse trees be designed from the start to
> support not only compilation, but also
>
> - refactoring/rewriting
> - source (re)formatting
> - parse tree interpretation

The RB parser has these three. The parse tree interpreter works with both VA
and VW and could easily be ported to Dolphin.

> - generation other than from source

I'm not sure what you are wanting here, but it is possible to construct a RB
parse tree without parsing. Since they are just Smalltalk objects, you can
just create them directly.

> - easy to keep around (cache) in the image

I don't think this is a good idea. A parse tree is ~15x larger than the
corresponding source. In my VW image, there are >56,000 methods and it takes
only 30 seconds to parse them all. Of the 30 seconds, 18 seconds are
retrieving the source (XML parsing, the files are already cached into
memory) and only 12 seconds are for the RB parser. Caching all the parse
trees takes almost 200MB. Performing a global GC of 200MB takes 1.4 seconds.

> - easy queryability (so it's very simple - and fast - to ask things like
> System allMethods select: [:m | (m referencesGlobal_nameString:
> 'AbtMRIManager') and: [m sendsSelector: #pruneDeadRequests]])

The RB parse trees have support for such things. However, it is generally
better to perform these queries directly on the compiled methods since they
are already optimized for such queries.

> - two way references between all elements of the parse tree

I'm don't understand what you want here. The RB parse tree nodes have parent
and children references if that is what you are wanting.

> - direct links into the original source if available (i.e., keeping the
> original source around)

The RB parse tree nodes have source locations for all tokens. Nodes that are
added/changed after parsing will get their locations set to nil.

> - externalizability of the parse trees

For the reasons given above, I think it would be better to have the
externalization be the source, not a parse tree.

John Brant