I have just downloaded the Dolphin version of SmaCC, with the vague idea
that I might try to produce a better HTML parser than the SqueakMap one. However, I have just spent a frustrating time trying to understand how the example parsers in the distribution work. I have concluded that either I am a prize idiot (which is always possible, of course :-) ) or the example Smalltalk parser is seriously broken. If anyone else has tried this, could you please put me out of my misery. I first tried just reading the parser specification, most of which seemed fairly easy to understand. I was puzzled, however, to find references to a method #asQualifiedReference in the actions, because I could find no trace of such a method. I thought 'Oh well, let's try parsing some real Smalltalk, no doubt that will show me where I am wrong.' However, I have been unable to get anything to parse correctly with the code as distributed. After a lot of single-stepping through the debugger, I decided that the parser was not parsing numbers correctly. As a check, I asked it to evaluate: StParser parseExpresson: '1' and sure enough it failed, saying 'StScanner cannot understand #id'. The problem is that the #id is meant to be sent to a SmaCCToken, but the scanner is not returning a token for a number. I tracked this down to the method StScanner>>number, whose code reads: number stream skip: -1. self scanNumber: self numberId I have verified that #scanNumber: /is/ returning a token, but this method is not. I put a caret at the start of the last line, and now it works correctly. I am not yet sure enough about Smalltalk syntax to know whether this is obvious. However, although I can now parse numbers correctly, the parser still falls over on my original test input (which is in fact the source code of the method StParser class>>simplifyExpression: - I just chose a method more or less at random). Now I have no idea where I can go, because I cannot find my way around the relevant bits of the class hierarchy. The relevant error message is 'StMethodNode does not understand #tags:', but the highlighted code in the debugger (in method StParser>>reduceActionForMethod3:) seems to be addressing a class RBMethodNode, which does not exist; when I highlight this name and try to browse it, a browser opens on StMethodNode. However, assuming that these two class names have somehow been made aliases, I looked at the code for the failing method in StParser: reduceActionForMethod3: nodes ^(RBMethodNode selector: (nodes at: 1) first contents asSymbol arguments: (nodes at: 1) last body: (nodes at: 2) last) tags: (nodes at: 2) first; yourself and noticed that StMethodNode has a method #tag: but not #tags:, so I changed the code above to refer to #tag: - this gave me a parse which completed successfully, though the inspector said the result is an invalid StMethodNode - I suspect this is because the #printOn: method does not work correctly. There seem to be curious interactions between the class StParser in the SmaCC package and some of the classes in the Smalltalk Parser package; does this explain some of my problems? Even if so, it does not explain the changes I had to make to the StParser and StScanner methods mentioned above. By the way, I did run the unit tests supplied with the SmaCC package, and they all passed. It does look as though the problems are in the example StParser, and maybe not in the superclass SmaCCParser. On the other hand, the method StParser>>reduceActionForMethod3: is automatically generated, so I am still worried about trusting SmaCC too much. Sorry this has gone on so long. Typing it out is a sort of therapy, but I would really like to know where I have gone wrong. Should I try T-Gen instead? Thanks in advance Peter Kenny |
On Tue, 28 Sep 2004 23:47:21 +0100, Peter Kenny <[hidden email]>
wrote: > By the way, I did run the unit tests supplied with the SmaCC package, and > they all passed. It does look as though the problems are in the example > StParser, and maybe not in the superclass SmaCCParser. On the other hand, > the method StParser>>reduceActionForMethod3: is automatically generated, > so > I am still worried about trusting SmaCC too much. > > Sorry this has gone on so long. Typing it out is a sort of therapy, but I > would really like to know where I have gone wrong. Should I try T-Gen > instead? > Did you try the tutorial and other links in the last paragraph of http://www.refactory.com/Software/SmaCC/ ? Last I tried, CParser was working (CParser parse: 'int x = 1+1;'). Didn't try StParser though. -- Regards HweeBoon MotionObj |
"Yar Hwee Boon" <[hidden email]> wrote in message
news:[hidden email]... > Did you try the tutorial and other links in the last paragraph of > http://www.refactory.com/Software/SmaCC/ ? Last I tried, CParser was > working (CParser parse: 'int x = 1+1;'). Didn't try StParser though. > Yes, I read through the tutorials etc. before I even downloaded the software, but they are pretty superficial. I can't yet work out, for example, whether I can fix the scanner so that it automatically treats accented letters in French and German as letters, or whether I have to enumerate them all. I find the only way to understand something like this is to work through code in detail. I have lots of Smalltalk code available to try out, but no C, so I chose StParser. Not encouraging so far! Peter Kenny |
In reply to this post by Peter Kenny-2
Peter Kenny wrote:
> I first tried just reading the parser specification, most of which seemed > fairly easy to understand. I was puzzled, however, to find references to a > method #asQualifiedReference in the actions, because I could find no trace > of such a method. The parser was written in VisualWorks. VW supports things like "Foo.Bar baz". Foo.Bar is a qualified reference. If you try to parse "Foo.Bar baz" in Dolphin, you'll get an error, but in VW it should work. > After a lot of single-stepping through the debugger, I decided that the > parser was not parsing numbers correctly. As a check, I asked it to > evaluate: > > StParser parseExpresson: '1' > > and sure enough it failed, saying 'StScanner cannot understand #id'. The > problem is that the #id is meant to be sent to a SmaCCToken, but the scanner > is not returning a token for a number. I tracked this down to the method > StScanner>>number, whose code reads: > > number > stream skip: -1. > self scanNumber: self numberId Yes, this should have a return. In an earlier version of SmaCC, the #createTokenFor: method would execute a block with a non-local return to return the token, but this was changed to make it more explicit. However, as you have noticed, the StParser examples didn't get changed. In addition to the #number method, you also need to change the #negativeNumber method. > However, although I can now parse numbers correctly, the parser still falls > over on my original test input (which is in fact the source code of the > method StParser class>>simplifyExpression: - I just chose a method more or > less at random). Now I have no idea where I can go, because I cannot find my > way around the relevant bits of the class hierarchy. The relevant error > message is 'StMethodNode does not understand #tags:', but the highlighted > code in the debugger (in method StParser>>reduceActionForMethod3:) seems to > be addressing a class RBMethodNode, which does not exist; when I highlight > this name and try to browse it, a browser opens on StMethodNode. However, > assuming that these two class names have somehow been made aliases, I looked > at the code for the failing method in StParser: > > reduceActionForMethod3: nodes > ^(RBMethodNode > selector: (nodes at: 1) first contents asSymbol > arguments: (nodes at: 1) last > body: (nodes at: 2) last) > tags: (nodes at: 2) first; > yourself > > and noticed that StMethodNode has a method #tag: but not #tags:, so I > changed the code above to refer to #tag: - this gave me a parse which > completed successfully, though the inspector said the result is an invalid > StMethodNode - I suspect this is because the #printOn: method does not work > correctly. This was generated by SmaCC from the code in the Method productions. For example, the code above came from "{(RBMethodNode selector: '1' first contents asSymbol arguments: '1' last body: '2' last) tags: '2' first; yourself}" -- the code for the third production of Method. If you remove the #tags: message, then it should work. In VW, there can be multiple primitive/annotation tags for a single method. > There seem to be curious interactions between the class StParser in the > SmaCC package and some of the classes in the Smalltalk Parser package; does > this explain some of my problems? Even if so, it does not explain the > changes I had to make to the StParser and StScanner methods mentioned above. Yes, the Smalltalk Parser package was derived from the Refactoring Browser's parser, and the StParser class was written for the RB's parser nodes. > By the way, I did run the unit tests supplied with the SmaCC package, and > they all passed. It does look as though the problems are in the example > StParser, and maybe not in the superclass SmaCCParser. On the other hand, > the method StParser>>reduceActionForMethod3: is automatically generated, so > I am still worried about trusting SmaCC too much. Everything you mention was in the StParser. None of the bugs were generated from SmaCC, but were in the code for the StParser. John Brant |
"John Brant" <[hidden email]> wrote in message
news:[hidden email]... > Everything you mention was in the StParser. None of the bugs were > generated from SmaCC, but were in the code for the StParser. > Thanks for the quick reply. I'm glad to know that it was not all down to my stupidity. I understand the problems of maintaining multiple versions of code, but it did give an unfortunate first impression. As a side effect, however, crawling all over the program trying to find the bugs has taught me quite a bit about how it all works, so the frustration was not all in vain. I shall press on with my own experiments with SmaCC. Peter Kenny |
In reply to this post by John Brant
John Brant wrote:
> Everything you mention was in the StParser. None of the bugs were > generated from SmaCC, but were in the code for the StParser. John, I just got interested enough in SmaCC myself to try it out. Following the tutorial (just up to the first use of 'Compile LALR(1)'), results in a message Error: Object class>>doIt at line 1: unrecognised character '`' being written to the Transcript as the classes are generated. Running the test cases does the same thing, although it tends to get lost in the other warnings. Putting a check in SmalltalkParser>>evaluateStatements shows that it's attempting to compile and evaluate the expression: Character value: `{:dict | RBLiteralNode value: (dict at: 'literal') value codePoint} as a result of SmalltalkParse>>parseOptimisedExpression attempting to deal with: ##( ... as above ...) which ultimately seems to come from SmaCCNode>>addImplementationSpecificRewritesTo: I suspect that the re-write rule is intended to /generate/ a compile-time expression, not to have the ##(...) expression evaluated at once as the rule is created. Perhaps RBPatternParser needs an override of parseOptimisedExpression ? -- chris |
Free forum by Nabble | Edit this page |