Smalltalk › Cincom › VisualWorks

Separate parsers for compiler and RB

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

rnsmit

Separate parsers for compiler and RB

For some reasons I want to extend the Smalltalk language with extra syntax. The parsers for the compiler and the RB are separated(even the code highlighter has a separate parser). Also the parse tree nodes are separated(e.g. duplicated). Probably the best way to extend the language is to keep these separations. But what are the design reasons to separate the parse tree concepts and parsers of the RB, core compiler and codehighlighters? Does anyone knowns the major arguments to separate these concepts?

Martin McClure-3

Re: Separate parsers for compiler and RB

On 12/24/2011 01:12 PM, rnsmit wrote:
> For some reasons I want to extend the Smalltalk language with extra syntax.
> The parsers for the compiler and the RB are separated(even the code
> highlighter has a separate parser). Also the parse tree nodes are
> separated(e.g. duplicated). Probably the best way to extend the language is
> to keep these separations. But what are the design reasons to separate the
> parse tree concepts and parsers of the RB, core compiler and
> codehighlighters? Does anyone knowns the major arguments to separate these
> concepts?

IIRC, the parsers are separate only for historical reasons, and from
posts I've seen on various lists there seems to be general agreement
that a single parsing framework for all of these purposes would be more
desirable than the separate ones we have now.

Regards,

-Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Travis Griggs-4

Re: Separate parsers for compiler and RB

On Dec 28, 2011, at 11:17 AM, Martin McClure wrote:

> On 12/24/2011 01:12 PM, rnsmit wrote:
>> For some reasons I want to extend the Smalltalk language with extra syntax.
>> The parsers for the compiler and the RB are separated(even the code
>> highlighter has a separate parser). Also the parse tree nodes are
>> separated(e.g. duplicated). Probably the best way to extend the language is
>> to keep these separations. But what are the design reasons to separate the
>> parse tree concepts and parsers of the RB, core compiler and
>> codehighlighters? Does anyone knowns the major arguments to separate these
>> concepts?
>
> IIRC, the parsers are separate only for historical reasons, and from
> posts I've seen on various lists there seems to be general agreement
> that a single parsing framework for all of these purposes would be more
> desirable than the separate ones we have now.

Don Roberts once pointed out to me that not only were they different for historical reasons, but for pragmatic reasons as well.

The original Smalltalk parser was built to produce byte codes. As the language evolved over time, it was important that it have some degree of backwards compatibility. The base parser has 3 of its 8 instance variables dedicated simply to being flexible about what kind of VisualWorks heritage code it is willing to consume:

oldLanguage <Boolean> if true, accept the "Blue" Book syntax
newLanguage <Boolean> if true, accept revised syntax first used by ParcPlace Systems
extendedLanguage <Boolean> if true, accept extensions (type declarations, ByteArray literals) first used by ParcPlace Systems

What started out as a pretty straightforward RD parser has become more and more complex over the years as its been tweaked to handle things like fast block closures, dotted references, hinted code streams, etc. A primary concern for this parser has been to go fast. For incremental development, the speed's not a big deal, but when you're doing a large file in, or a big Store load, you want compilation to move as fast as possible.

The RBParser was built for the Refactoring project. It was built to support pattern matching so that the RB can do the very magic it does. Cross platform was important too. Not all Smalltalks have always exposed their parser/compiler, so they needed a parser/scanner that could be used independent of what the platform provided. And some speed was a concern as well. If you have to pattern match against the whole image for large refactorings, then you need to be able to parse fast. Unlike the VW core parser, it had the advantage of not having to worry about so much "heritage".

The highlighting parser was built on SmaCC for two main reasons that I remember (I'm sure John can weigh in on what I missed). SmaCC was what they were playing with at the time, so it was most appealing to use that. And it had easy support for partial completions, something that is trickier to do in the classic RD approach.

It as an open-ended-yet-to-be-run experiment of whether a single unified parser can be built that can simultaneously satisfy all or most of these needs.

The disadvantages with having 3 different parsers should be obvious. It's bloat. It means if you want to add support in VW for { . . } expressions (inline arrays), you've got to go do it 3 different places.

Which approach to use, is also interesting. On the one hand is the crafted recursive descent parsers. What I like about these, is that they're a) faster and b) I think they're better when someone wants to use the debugger to explore how the system works. OTOH, the grammar based parsers can make exploring language evolutions very easy and quick (at least from the parse point of view, you still have to do whatever work is behind your new construct). But you have to learn and be comfortable with the grammar language to do so. Debugging through a SmaCC parse sequence is nearly meaningless.

Jerry Kott has been working on a variant RBParser that can do partial ASTs. I am currently testing it. It currently runs about 2.5x faster than SmaCC based StParser. I'm hoping to use it to supplant the SmaCC derived RBCodeHighlighting stuff in our next release.

--
Travis Griggs
Objologist
I multiply all time estimates by pi, to account for running around in circles.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Terry Raymond

Re: Separate parsers for compiler and RB

Travis

> Jerry Kott has been working on a variant RBParser that can do partial
ASTs. I
> am currently testing it. It currently runs about 2.5x faster than SmaCC
based
> StParser. I'm hoping to use it to supplant the SmaCC derived
> RBCodeHighlighting stuff in our next release.

I hope it is easier to modify. I made some changes to support our product
and
would be disappointed if it was even harder to work with.

Terry

===========================================================
Terry Raymond
Crafted Smalltalk
80 Lazywood Ln.
Tiverton, RI 02878
(401) 624-4517 [hidden email]
===========================================================

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Travis Griggs
> Sent: Thursday, December 29, 2011 2:24 PM
> To: VWNC NC
> Subject: Re: [vwnc] Separate parsers for compiler and RB
>
> On Dec 28, 2011, at 11:17 AM, Martin McClure wrote:
>
> > On 12/24/2011 01:12 PM, rnsmit wrote:
> >> For some reasons I want to extend the Smalltalk language with extra
> syntax.
> >> The parsers for the compiler and the RB are separated(even the code
> >> highlighter has a separate parser). Also the parse tree nodes are
> >> separated(e.g. duplicated). Probably the best way to extend the
> >> language is to keep these separations. But what are the design
> >> reasons to separate the parse tree concepts and parsers of the RB,
> >> core compiler and codehighlighters? Does anyone knowns the major
> >> arguments to separate these concepts?
> >
> > IIRC, the parsers are separate only for historical reasons, and from
> > posts I've seen on various lists there seems to be general agreement
> > that a single parsing framework for all of these purposes would be
> > more desirable than the separate ones we have now.
>
> Don Roberts once pointed out to me that not only were they different for
> historical reasons, but for pragmatic reasons as well.
>
> The original Smalltalk parser was built to produce byte codes. As the

language
> evolved over time, it was important that it have some degree of backwards
> compatibility. The base parser has 3 of its 8 instance variables dedicated
> simply to being flexible about what kind of VisualWorks heritage code it
is
> willing to consume:
>
> oldLanguage <Boolean> if true, accept the "Blue" Book syntax
> newLanguage <Boolean> if true, accept revised syntax first
used
> by ParcPlace Systems
> extendedLanguage <Boolean> if true, accept extensions
> (type declarations, ByteArray literals) first used by ParcPlace Systems
>
> What started out as a pretty straightforward RD parser has become more and
> more complex over the years as its been tweaked to handle things like fast
> block closures, dotted references, hinted code streams, etc. A primary
> concern for this parser has been to go fast. For incremental development,
> the speed's not a big deal, but when you're doing a large file in, or a
big Store
> load, you want compilation to move as fast as possible.
>
> The RBParser was built for the Refactoring project. It was built to
support
> pattern matching so that the RB can do the very magic it does. Cross
platform

> was important too. Not all Smalltalks have always exposed their
> parser/compiler, so they needed a parser/scanner that could be used
> independent of what the platform provided. And some speed was a concern
> as well. If you have to pattern match against the whole image for large
> refactorings, then you need to be able to parse fast. Unlike the VW core
> parser, it had the advantage of not having to worry about so much
> "heritage".
>
> The highlighting parser was built on SmaCC for two main reasons that I
> remember (I'm sure John can weigh in on what I missed). SmaCC was what
> they were playing with at the time, so it was most appealing to use that.

And
> it had easy support for partial completions, something that is trickier to
do in
> the classic RD approach.
>
> It as an open-ended-yet-to-be-run experiment of whether a single unified
> parser can be built that can simultaneously satisfy all or most of these
needs.
>
> The disadvantages with having 3 different parsers should be obvious. It's
> bloat. It means if you want to add support in VW for { . . } expressions
(inline
> arrays), you've got to go do it 3 different places.
>
> Which approach to use, is also interesting. On the one hand is the crafted
> recursive descent parsers. What I like about these, is that they're a)
faster
> and b) I think they're better when someone wants to use the debugger to
> explore how the system works. OTOH, the grammar based parsers can make
> exploring language evolutions very easy and quick (at least from the parse
> point of view, you still have to do whatever work is behind your new
> construct). But you have to learn and be comfortable with the grammar
> language to do so. Debugging through a SmaCC parse sequence is nearly
> meaningless.
>
> Jerry Kott has been working on a variant RBParser that can do partial
ASTs. I
> am currently testing it. It currently runs about 2.5x faster than SmaCC
based
> StParser. I'm hoping to use it to supplant the SmaCC derived
> RBCodeHighlighting stuff in our next release.
>
>
> --
> Travis Griggs
> Objologist
> I multiply all time estimates by pi, to account for running around in
circles.
>
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Travis Griggs-4

Re: Separate parsers for compiler and RB

On Dec 29, 2011, at 11:52 AM, Terry Raymond wrote:

> Travis
>
>> Jerry Kott has been working on a variant RBParser that can do partial
> ASTs. I
>> am currently testing it. It currently runs about 2.5x faster than SmaCC
> based
>> StParser. I'm hoping to use it to supplant the SmaCC derived
>> RBCodeHighlighting stuff in our next release.
>
> I hope it is easier to modify. I made some changes to support our product
> and
> would be disappointed if it was even harder to work with.

Was it your plan to leave me guessing what these changes were? :)

If you don't give us a little more detail than that, your "hope" isn't much more than a "wish upon a star." :D

--
Travis Griggs
Objologist
"I did not have time to write you a short program, so I wrote you a long one instead."

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Martin McClure-3

Re: Separate parsers for compiler and RB

In reply to this post by Travis Griggs-4

On 12/29/2011 11:23 AM, Travis Griggs wrote:

> It as an open-ended-yet-to-be-run experiment of whether a single unified parser can be built that can simultaneously satisfy all or most of these needs.

One possibility is to have a single grammar, or 'recognizer', with
multiple parsers subclassed from it. This kind of technique is used in
parser combinator systems such as Newspeak's parser and the Smalltalk
parsing package PetitParser. It lets you have one place to maintain the
grammar, but different actions you want to take when parsing that
grammar under different circumstances are fairly cleanly separated.

Regards,

-Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Steffen Märcker

Re: Separate parsers for compiler and RB

In some of my recent projects (e.g. SimpleXPath, SimpleXO), I experimented
with PEG parsers. As they came out to be handy, readable and easy(er) to
debug, I moved the parsing code from SmaCC to Xtreams/Parsing and
PetitParser.

+1

Steffen

Am 29.12.2011, 23:15 Uhr, schrieb Martin McClure
<[hidden email]>:

> On 12/29/2011 11:23 AM, Travis Griggs wrote:
>
>> It as an open-ended-yet-to-be-run experiment of whether a single
>> unified parser can be built that can simultaneously satisfy all or most
>> of these needs.
>
> One possibility is to have a single grammar, or 'recognizer', with
> multiple parsers subclassed from it. This kind of technique is used in
> parser combinator systems such as Newspeak's parser and the Smalltalk
> parsing package PetitParser. It lets you have one place to maintain the
> grammar, but different actions you want to take when parsing that
> grammar under different circumstances are fairly cleanly separated.
>
> Regards,
>
> -Martin
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Steffen Märcker

Re: Separate parsers for compiler and RB

In reply to this post by Travis Griggs-4

Is there a chance that the new parsers will support binary selectors with
$|, e.g. #|@, #|= and such? I've posted about this issue earlier and it
turned out that ANSI actually allows those ones.

Ciao, Steffen

Am 29.12.2011, 20:23 Uhr, schrieb Travis Griggs <[hidden email]>:

> On Dec 28, 2011, at 11:17 AM, Martin McClure wrote:
>
>> On 12/24/2011 01:12 PM, rnsmit wrote:
>>> For some reasons I want to extend the Smalltalk language with extra
>>> syntax.
>>> The parsers for the compiler and the RB are separated(even the code
>>> highlighter has a separate parser). Also the parse tree nodes are
>>> separated(e.g. duplicated). Probably the best way to extend the
>>> language is
>>> to keep these separations. But what are the design reasons to separate
>>> the
>>> parse tree concepts and parsers of the RB, core compiler and
>>> codehighlighters? Does anyone knowns the major arguments to separate
>>> these
>>> concepts?
>>
>> IIRC, the parsers are separate only for historical reasons, and from
>> posts I've seen on various lists there seems to be general agreement
>> that a single parsing framework for all of these purposes would be more
>> desirable than the separate ones we have now.
>
> Don Roberts once pointed out to me that not only were they different for
> historical reasons, but for pragmatic reasons as well.
>
> The original Smalltalk parser was built to produce byte codes. As the
> language evolved over time, it was important that it have some degree of
> backwards compatibility. The base parser has 3 of its 8 instance
> variables dedicated simply to being flexible about what kind of
> VisualWorks heritage code it is willing to consume:
>
> oldLanguage <Boolean> if true, accept the "Blue" Book syntax
> newLanguage <Boolean> if true, accept revised syntax first used by
> ParcPlace Systems
> extendedLanguage <Boolean> if true, accept extensions (type
> declarations, ByteArray literals) first used by ParcPlace Systems
>
> What started out as a pretty straightforward RD parser has become more
> and more complex over the years as its been tweaked to handle things
> like fast block closures, dotted references, hinted code streams, etc. A
> primary concern for this parser has been to go fast. For incremental
> development, the speed's not a big deal, but when you're doing a large
> file in, or a big Store load, you want compilation to move as fast as
> possible.
>
> The RBParser was built for the Refactoring project. It was built to
> support pattern matching so that the RB can do the very magic it does.
> Cross platform was important too. Not all Smalltalks have always exposed
> their parser/compiler, so they needed a parser/scanner that could be
> used independent of what the platform provided. And some speed was a
> concern as well. If you have to pattern match against the whole image
> for large refactorings, then you need to be able to parse fast. Unlike
> the VW core parser, it had the advantage of not having to worry about so
> much "heritage".
>
> The highlighting parser was built on SmaCC for two main reasons that I
> remember (I'm sure John can weigh in on what I missed). SmaCC was what
> they were playing with at the time, so it was most appealing to use
> that. And it had easy support for partial completions, something that is
> trickier to do in the classic RD approach.
>
> It as an open-ended-yet-to-be-run experiment of whether a single unified
> parser can be built that can simultaneously satisfy all or most of these
> needs.
>
> The disadvantages with having 3 different parsers should be obvious.
> It's bloat. It means if you want to add support in VW for { . . }
> expressions (inline arrays), you've got to go do it 3 different places.
>
> Which approach to use, is also interesting. On the one hand is the
> crafted recursive descent parsers. What I like about these, is that
> they're a) faster and b) I think they're better when someone wants to
> use the debugger to explore how the system works. OTOH, the grammar
> based parsers can make exploring language evolutions very easy and quick
> (at least from the parse point of view, you still have to do whatever
> work is behind your new construct). But you have to learn and be
> comfortable with the grammar language to do so. Debugging through a
> SmaCC parse sequence is nearly meaningless.
>
> Jerry Kott has been working on a variant RBParser that can do partial
> ASTs. I am currently testing it. It currently runs about 2.5x faster
> than SmaCC based StParser. I'm hoping to use it to supplant the SmaCC
> derived RBCodeHighlighting stuff in our next release.
>
>
> --
> Travis Griggs
> Objologist
> I multiply all time estimates by pi, to account for running around in
> circles.
>
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc