"Peter van Rooijen" <[hidden email]> wrote in message
news:agnr8q$47e$[hidden email]... > > Okay, I see the value in that. I am still undecided about what what my > position is regarding the differences between the VA and the RB parser. > Question arise like: 'what is the value of a generic Smalltalk parser?' 'how > should one treat dialect-specific constructs such as ## and {x.y}?'. Of course, I can question the usefulness of all the different dialects to have different syntax. For example, is the ##() really needed? I replaced all occurrences of ##() in Dolphin once, and didn't really notice any slowdown (of course, I might not have ran the right code that shows where it is needed). Would class variables be better to hold the ##() expressions? My biggest complaint with the ##() expressions is that once they are compiled, you can no longer do any of the standard Smalltalk searches (e.g., senders). Anyway, I believe a generic Smalltalk parser is useful since it allows things like a fairly portable RB. Without the generic parser, there wouldn't be a generic parse tree rewriter which the RB uses to do all of its refactorings. Smalllint also uses the generic parser for most of its rules. The parsers in VW and VA are both setup to compile code. They are not setup to transform code, and the RB needs to transform code. Furthermore, without a generic parser, an RB for Dolphin would not have been possible to do in Smalltalk (since Dolphin's parser is in C++). I just wish the generic parser would become the standard parser for all Smalltalks. > I don;t follow that. My formatter could opt to reformat when source is > available only, could it not? It would do that, and for me it would not be > an important restriction. You could do something like: RBMethodNode>>isUnmodified ^self = (RBParser parseMethod: self source onError: [:str :pos | ^false]) And in your formatter, just check if the method node isUnmodified. If so, just get the method's source and format that with VA's parse trees. > I accept that you cannot divine the author;s intention in general. But I > have the distinct impression that the RB parser takes less trouble than it > could to process comments thoughtfully. Is that impression correct? You can always make more complex rules for formatting. The question is how much does it buy you. If we spent a month updating the formatting rules (making it more complex and less maintainable), and we only satisfied a couple more people; would it be worth it? If you aren't getting paid for such modifications and you have other more interesting things to do, I doubt that many people would think it was worth it (especially, if you were already satisfied with the output). > > Anyway, the RB under VW7 (and soon to be for Dolphin 5 patch level 2) has > > better support for keeping comments with their most appropriate parse node > > instead of keeping them at the statement level. > > Is there anything in your agreement with Cincom that prevents such > improvements from showing up in other dialects? Nope, they licensed the source from us. We can do whatever we want with the source, and Cincom can do whatever they want. I've already send the parser modifications to Blair, so they will probably show up in the next set of patches to Dolphin. John Brant |
In reply to this post by Chris Uppal-3
"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]... > Peter van Rooijen wrote: > > > Question arise like: 'what is the value of a generic Smalltalk parser?' > 'how > > should one treat dialect-specific constructs such as ## and {x.y}?'. > > Or, just for fun, my personal favourite "dialect-specific construct": > > 42rodd. > ... Yes, but in this case all the dialects (including Dolphin) that don't report this as a syntax error are wrong. This is not a difference in syntax between dialects, but a bug in a number of them. Personally I think there is a lot of value to having a common core parser for Smalltalk - "standards" are open to interpretation, so a single interpretation of that standard is (IMO) better than many, even if it is necessary to extend that parser to support dialect specific syntax such as for FFI calls, etc. Regards Blair |
In reply to this post by John Brant
"John Brant" <[hidden email]> wrote in message
news:JCiY8.35250$[hidden email]... > "Peter van Rooijen" <[hidden email]> wrote in message > news:agnr8q$47e$[hidden email]... > > > > Okay, I see the value in that. I am still undecided about what what my > > position is regarding the differences between the VA and the RB parser. > > Question arise like: 'what is the value of a generic Smalltalk parser?' > 'how > > should one treat dialect-specific constructs such as ## and {x.y}?'. > > Of course, I can question the usefulness of all the different dialects to > have different syntax. For example, is the ##() really needed? No, it is not really needed, and frankly I wish we'd never put it in. Nevertheless it is a fact of life now. >...I replaced > all occurrences of ##() in Dolphin once, and didn't really notice any > slowdown (of course, I might not have ran the right code that shows where it > is needed). Not all uses are performance optimizations. A common idom is the ##(self) usage, which creates a "static binding" to the class without requiring an explicit reference to the global. Of course these could be rewritten too by exchanging the self reference for the class. >...Would class variables be better to hold the ##() expressions? Probably. >...My > biggest complaint with the ##() expressions is that once they are compiled, > you can no longer do any of the standard Smalltalk searches (e.g., senders). Eliot pointed out to me that this is easily "fixed" by adding the literals from the const expression to the literal frame of the method. In fact because you list this as your biggest complaint, I have modified the Dolphin compiler for 5.02 so that it retains references in the literal frame. Although this loses some of the space optimizing quality of the ##(), that probably isn't important. I've reduced the space overhead by retaining only symbol and global references (and literal arrays in case they contain any of those). It may be that even this is not appropriate, and it is better just to retain all literals, so please reply with any comments you may have on that. Right so with that sorted, the next biggest problem is that the RB cannot rewrite such expressions because the parser does not handle them correctly :-). > > Anyway, I believe a generic Smalltalk parser is useful since it allows > things like a fairly portable RB. Without the generic parser, there wouldn't > be a generic parse tree rewriter which the RB uses to do all of its > refactorings. Smalllint also uses the generic parser for most of its rules. > The parsers in VW and VA are both setup to compile code. They are not setup > to transform code, and the RB needs to transform code. Furthermore, without > a generic parser, an RB for Dolphin would not have been possible to do in > Smalltalk (since Dolphin's parser is in C++). I just wish the generic parser > would become the standard parser for all Smalltalks. FWIW, I agree. >... Regards Blair |
In reply to this post by Blair McGlashan
"Blair McGlashan" <[hidden email]> wrote in message
news:agu1s4$oa07f$[hidden email]... > > 42rodd. > > Yes, but in this case all the dialects (including Dolphin) that don't report > this as a syntax error are wrong. This is not a difference in syntax between > dialects, but a bug in a number of them. > > Personally I think there is a lot of value to having a common core parser > for Smalltalk Blair, I agree completely. This could work very well as a Camp Smalltalk project. A standard parser (based on ANSI, but designed from the start to support plug-ins for dialectversion-specific syntax constructs) would be great! This would yield great code to study as well as save a lot of work in many areas, and would also give us portable parse trees. The beneficial effects of portable parse trees will be hard to overestimate. I would suggest that the parser/parse trees be designed from the start to support not only compilation, but also - refactoring/rewriting - source (re)formatting - parse tree interpretation - generation other than from source - easy to keep around (cache) in the image - easy queryability (so it's very simple - and fast - to ask things like System allMethods select: [:m | (m referencesGlobal_nameString: 'AbtMRIManager') and: [m sendsSelector: #pruneDeadRequests]]) (to name but a few things that come to mind) Some elements of the basic design could then be - two way references between all elements of the parse tree - direct links into the original source if available (i.e., keeping the original source around) - externalizability of the parse trees What do you think? Regards, Peter van Rooijen > - "standards" are open to interpretation, so a single > interpretation of that standard is (IMO) better than many, even if it is > necessary to extend that parser to support dialect specific syntax such as > for FFI calls, etc. > > Regards > > Blair |
Peter.
You wrote in message news:aguhhf$j9d$[hidden email]... [Blair wrote] > > Personally I think there is a lot of value to having a common core parser > > for Smalltalk > >... > I agree completely. This could work very well as a Camp Smalltalk project. A > standard parser (based on ANSI, but designed from the start to support > plug-ins for dialectversion-specific syntax constructs) would be great! This > would yield great code to study as well as save a lot of work in many areas, > and would also give us portable parse trees. So why not use the RB parser? It appears that it already satisfies most of your requirements. It is fast, well designed and I can testify that it is easy to extend since we have already extended it (and associated formatters) to parse (reformat) our own special FFI call syntax. I don't see the point in inventing another parser since, quite apart from anything else, I doubt it will be used. The best chance of getting a common parser adopted is the RB parser. It is now in the Dolphin and VW images. Our intention is that it should replace our own parser. I don't know Cincom's plans, but having two parsers is not very satisfactory. Regards Blair |
"Blair McGlashan" <[hidden email]> wrote in message
news:agujus$og559$[hidden email]... > Peter. > > So why not use the RB parser? It appears that it already satisfies most of > your requirements. It is fast, well designed and I can testify that it is > easy to extend since we have already extended it (and associated formatters) > to parse (reformat) our own special FFI call syntax. I don't see the point > in inventing another parser since, quite apart from anything else, I doubt > it will be used. The best chance of getting a common parser adopted is the > RB parser. It is now in the Dolphin and VW images. Our intention is that it > should replace our own parser. I don't know Cincom's plans, but having two > parsers is not very satisfactory. What about the rights to the code? If people are really going to depend on it, it would have to go PD, IMHO. That would be super-cool, and I'm sure it would generate a lot of interest. That would be a really great way of kickstarting this whole concept. Regards, Peter van Rooijen > Regards > > Blair |
In reply to this post by Blair McGlashan
Blair McGlashan wrote:
> > Peter. > > You wrote in message news:aguhhf$j9d$[hidden email]... > > [Blair wrote] > > > Personally I think there is a lot of value to having a common core > parser > > > for Smalltalk > > > >... > > I agree completely. This could work very well as a Camp Smalltalk project. > A > > standard parser (based on ANSI, but designed from the start to support > > plug-ins for dialectversion-specific syntax constructs) would be great! > This > > would yield great code to study as well as save a lot of work in many > areas, > > and would also give us portable parse trees. > > So why not use the RB parser? It appears that it already satisfies most of > your requirements. It is fast, well designed and I can testify that it is > easy to extend since we have already extended it (and associated formatters) > to parse (reformat) our own special FFI call syntax. I don't see the point > in inventing another parser since, quite apart from anything else, I doubt > it will be used. The best chance of getting a common parser adopted is the > RB parser. It is now in the Dolphin and VW images. Our intention is that it > should replace our own parser. I don't know Cincom's plans, but having two > parsers is not very satisfactory. We haven't debriefed after vw7 to decide what to do yet, and given that both parsers work, eliminating one isn't a high priority for us; we have *lots* of much higher priority problems. However, at least some of these other projects involve cleaning-up the code management system(s) (compiling, loading, browsing, change recording, version control), and in that context parsing is an important part of the mix. My main concern is that as an engineering organization we don't have the freedom from constraints that a typical CampSmalltalk project has. I don't want to be forced by circumstance into the situation of appearing to obstruct such a project simply because vw engineering can't respond fast enough. I therefore plead for any such project to appreciate the constraints of the vendors and for that project to try hard, especially when gathering requirements and when testing, to involve the vendors. For a project like this to succeed I think the vendors should be the (XP) customer. If such a project doesn't treat the vendors as the customer and delivers something they can't use it won't get into the current generation of products, and that would affect many in the community. If, on the other hand, the project faces up to the practical and political difficulties up front it has a much better chance of succeeding. -- _______________,,,^..^,,,____________________________ Eliot "tired and shagged-out after a 9-month vw7 squawk" Miranda |
In reply to this post by Blair McGlashan
[hidden email] (Blair McGlashan) wrote (abridged):
> The best chance of getting a common parser adopted is the > RB parser. It is now in the Dolphin and VW images. Our intention is > that it should replace our own parser. Am I right in thinking that having the parser in Smalltalk is very different to having the compiler in Smalltalk, but that for some purposes it would be as useful? It could enable us to add new syntax, provided the new syntax could (in effect) be translated into the old syntax. Dave Harris, Nottingham, UK | "Weave a circle round him thrice, [hidden email] | And close your eyes with holy dread, | For he on honey dew hath fed http://www.bhresearch.co.uk/ | And drunk the milk of Paradise." |
On Mon, 15 Jul 2002, Dave Harris wrote:
> [hidden email] (Blair McGlashan) wrote (abridged): > > The best chance of getting a common parser adopted is the > > RB parser. It is now in the Dolphin and VW images. Our intention is > > that it should replace our own parser. > > Am I right in thinking that having the parser in Smalltalk is very > different to having the compiler in Smalltalk, Er...I'm jumping into this thread rather late without checking back up it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk. This is absolutely true for Squeak. > but that for some purposes > it would be as useful? It could enable us to add new syntax, provided the > new syntax could (in effect) be translated into the old syntax. This is how the various alternative syntaxes in Squeak work, as well as the Prolog/V port, and even the pretty printer. Re: Blair about RB parser, there's a pretty good chance that it will become "the" Squeak parser, as the current Squeak parser has some serious inadequacies. There's also the T-Gen Smalltalk Parser, but I suspect that the RB advantages (especially being part of the RB!!) are overwhelming. Cheers, Bijan Parsia. |
In reply to this post by Dave Harris-3
> > The best chance of getting a common parser adopted is the
> > RB parser. It is now in the Dolphin and VW images. Our intention is > > that it should replace our own parser. ... and GNU Smalltalk too :-) -- not yet released, but working on my hard disk. Paolo |
In reply to this post by Bijan Parsia-2
[hidden email] (Bijan Parsia) wrote (abridged):
> Er...I'm jumping into this thread rather late without checking back up > it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk. > This is absolutely true for Squeak. I don't believe it's true for Dolphin. As I understand it, Dolphin's parser is part of its compiler, and both are written in C++. I gather they plan to move to a Smalltalk parser. I don't know if they plan to move to a Smalltalk compiler, too. If they do, my question becomes almost moot :-) I say almost, because even if Dolphin's compiler was written in Smalltalk and open to hacking, hacks to it would presumably not be portable to other Smalltalk dialects. I imagine standardising the compiler is much harder than standardising the parser. All the vendors want their own bytecodes because they have different ideas about what the best bytecodes are; bytecode design is part of VM design. So standardisation should stop at the parser level. Hence my earlier question: am I right in thinking standardisation at the parser level is almost as useful as standardising the entire compiler? Dave Harris, Nottingham, UK | "Weave a circle round him thrice, [hidden email] | And close your eyes with holy dread, | For he on honey dew hath fed http://www.bhresearch.co.uk/ | And drunk the milk of Paradise." |
On Tue, 16 Jul 2002, Dave Harris wrote:
> [hidden email] (Bijan Parsia) wrote (abridged): > > Er...I'm jumping into this thread rather late without checking back up > > it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk. > > This is absolutely true for Squeak. > > I don't believe it's true for Dolphin. As I understand it, Dolphin's > parser is part of its compiler, and both are written in C++. Wow. I'm astonished. I wouldn't have thought the performance demands of a typical smalltalk compiler or parser required non-Smalltalk code, and I wouldn't have thought that a Smalltalk vendor would have *prefered* to write 'em in C++ :) Shoot my expectations in the foot! ;) > I gather they plan to move to a Smalltalk parser. I don't know if they > plan to move to a Smalltalk compiler, too. If they do, my question becomes > almost moot :-) Heh. > I say almost, because even if Dolphin's compiler was written in Smalltalk > and open to hacking, hacks to it would presumably not be portable to other > Smalltalk dialects. I imagine standardising the compiler is much harder > than standardising the parser. All the vendors want their own bytecodes > because they have different ideas about what the best bytecodes are; > bytecode design is part of VM design. So standardisation should stop at > the parser level. You mean "stop at the parse node classes level"? There could be hooks for code transformation and optimization before hitting the bytecode, and have standard tools for manipulating and analyzing the bytecode are nice. > Hence my earlier question: am I right in thinking standardisation at the > parser level is almost as useful as standardising the entire compiler? Probably. Though, really, control at lowerlevels is nice too, especially if you're implementing languages (like Prolog) or language extentions (like Server Pages). Indeed, I'm so used to having that access, I didn't imagine that it wouldn't be in dolphin (Squeak and VisualWorks having substationally the same history, their compilers are in Smalltalk). Cheers, Bijan Parsia. |
Bijan, Dave,
"Bijan Parsia" <[hidden email]> wrote in message news:[hidden email]... > On Tue, 16 Jul 2002, Dave Harris wrote: > > > [hidden email] (Bijan Parsia) wrote (abridged): > > > Er...I'm jumping into this thread rather late without checking back up > > > it, *but*, AFAIK, every smalltalk parser is implemented in Smalltalk. > > > This is absolutely true for Squeak. > > > > I don't believe it's true for Dolphin. As I understand it, Dolphin's > > parser is part of its compiler, and both are written in C++. > > Wow. I'm astonished. I wouldn't have thought the performance demands of a > typical smalltalk compiler or parser required non-Smalltalk code, and I > wouldn't have thought that a Smalltalk vendor would have *prefered* to > write 'em in C++ :) Shoot my expectations in the foot! ;) The Dolphin compiler (and parser) *is* currently written in C++. The reason is that we needed somewhere to start when we were first writing the product back in 1995. If you like, it was "the end of the recursion" when getting the first Dolphin to boot. We had always intended to rewrites the beast in Smalltalk but have never got around to it because at least the C++ one works and there have always been plenty more urgent things to attend to. Sorry about your foot BTW! > > I gather they plan to move to a Smalltalk parser. I don't know if they > > plan to move to a Smalltalk compiler, too. If they do, my question becomes > > almost moot :-) > > Heh. We now make use of the RB parser as part of the refactorings in Dolphin 5 and we have had to put the work into getting this to correctly handle Dolphin syntax (the external interface syntax in particular). This means we currently have two parsers in the system which is no good thing since they can quite possibly disagree in certain situations. So our intention (no timescale yet) is to replace both the C++ compiler and parser by the RB parser and a compiler written in Smalltalk. The interesting side effect of this is that compilation will probably be faster. IIRC Blair did some tests and the RB parser was significantly faster than the C++ one. Okay, okay, I wrote the C++ version!! [snip] > Indeed, I'm so used to having that access, I didn't imagine that it > wouldn't be in dolphin (Squeak and VisualWorks having substationally the > same history, their compilers are in Smalltalk). Squeak and VW never had to be rewritten from scratch though. As far as I remember they are both directly descended from the original PARC images tapes. Best Regards, Andy Bower Dolphin Support http://www.object-arts.com --- Are you trying too hard? http://www.object-arts.com/Relax.htm --- |
In reply to this post by Peter van Rooijen
"Peter van Rooijen" <[hidden email]> wrote in message
news:aguhhf$j9d$[hidden email]... > > The beneficial effects of portable parse trees will be hard to overestimate. > I would suggest that the parser/parse trees be designed from the start to > support not only compilation, but also > > - refactoring/rewriting > - source (re)formatting > - parse tree interpretation The RB parser has these three. The parse tree interpreter works with both VA and VW and could easily be ported to Dolphin. > - generation other than from source I'm not sure what you are wanting here, but it is possible to construct a RB parse tree without parsing. Since they are just Smalltalk objects, you can just create them directly. > - easy to keep around (cache) in the image I don't think this is a good idea. A parse tree is ~15x larger than the corresponding source. In my VW image, there are >56,000 methods and it takes only 30 seconds to parse them all. Of the 30 seconds, 18 seconds are retrieving the source (XML parsing, the files are already cached into memory) and only 12 seconds are for the RB parser. Caching all the parse trees takes almost 200MB. Performing a global GC of 200MB takes 1.4 seconds. > - easy queryability (so it's very simple - and fast - to ask things like > System allMethods select: [:m | (m referencesGlobal_nameString: > 'AbtMRIManager') and: [m sendsSelector: #pruneDeadRequests]]) The RB parse trees have support for such things. However, it is generally better to perform these queries directly on the compiled methods since they are already optimized for such queries. > - two way references between all elements of the parse tree I'm don't understand what you want here. The RB parse tree nodes have parent and children references if that is what you are wanting. > - direct links into the original source if available (i.e., keeping the > original source around) The RB parse tree nodes have source locations for all tokens. Nodes that are added/changed after parsing will get their locations set to nil. > - externalizability of the parse trees For the reasons given above, I think it would be better to have the externalization be the source, not a parse tree. John Brant |
Free forum by Nabble | Edit this page |