... for anyone who has some time to spare. I am in the process of rewriting
my ChunkBrowser goodie (it was getting a little messy internally) and I spent a couple of hours this morning on the following. I thought it might be worth mentioning here as a sort of programming exercise just to see how many different ways of doing it we could come up with - no winners, losers or prizes though. It's quite simple. Given a String containing a Smalltalk method which is known to be syntactically and semantically valid (although not necessarily compilable in the current image) you have to answer the full selector of the method contained in the String. I've included a little framework below that runs all the methods in the current image through the parser and reports how many were parsed incorrectly and how long it takes. The framework also runs a empty block to give a ratio which should, I hope, be independent of the underlying machine speed. All you have to do is edit the #parse: method (adding more methods if/as needed) ... The idea is to get 0 errors and a ratio that is as low as possible. The current result of my efforts (I'll post it if there is any interest, but it's not pretty!) on a Default D4.01 image, which I imagine can be improved on, is Incorrect: 0 Comp: 1277 Parse: 3358 Ratio: 2.63 Ian Snip and save the following into a file. File/FileIn to load it into a Dolphin 4 image Start up with "Parser new run" -~-~-~ snip "Filed out from Dolphin Smalltalk 2000 release 4.01"! Object subclass: #Parser instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' classInstanceVariableNames: ''! Parser comment: ''! Parser guid: (GUID fromString: '{65806B9F-2D03-41AF-8B60-7ADF22505F6B}')! !Parser categoriesForClass!Unclassified! ! !Parser methodsFor! getComparisonTime ^Time millisecondsToRun: [ Class allMethodsDo: [:each | each getSource]]! getParseTime | wrong | ^(Time millisecondsToRun: [ wrong := 0. Class allMethodsDo: [:each | (self parse: each getSource) = each selector asString ifFalse: [wrong := wrong + 1]]]) -> wrong! parse: aString ^'Your Code Here'! run | comparison parse | MemoryManager current collectGarbage. comparison := self getComparisonTime. MemoryManager current collectGarbage. parse := self getParseTime. MessageBox notify: (String writeStream nextPutAll: 'Incorrect: '; print: parse value; cr; nextPutAll: 'Comparison (mS): '; print: comparison; cr; nextPutAll: 'Parse (mS): '; print: parse key; cr; nextPutAll: 'Parse/Comparison ratio: ' ; print: ((parse key / comparison) roundTo: 0.01); contents)! ! !Parser categoriesFor: #getComparisonTime!*-unclassified!public! ! !Parser categoriesFor: #getParseTime!*-unclassified!public! ! !Parser categoriesFor: #parse:!*-unclassified!public! ! !Parser categoriesFor: #run!*-unclassified!public! ! |
"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:jwUF7.3628$ur6.217388@wards... > It's quite simple. Given a String containing a Smalltalk method which is > known to be syntactically and semantically valid (although not necessarily > compilable in the current image) you have to answer the full selector of the > method contained in the String. Here's my untested ad-hoc solution (assuming that there are spaces between the selector fragments and the parameters): parse: aString stream := aString readStream. token := stream upTo: $ . token last = $: ifTrue: [ selector := WriteStream on: (String new: 64). [token last = $:] whileTrue: [ selector nextPutAll: token. stream skipTo: $ . token := stream upTo: $ ]. ^selector contents]. ^token One could probably improve things by adding a few #skipSeparators and rewriting the #upTo: lines to either read a sequence of allowed binary selector characters or any number of letters, digits or underlines upto and including the : bye -- Stefan Matthias Aust // Truth Until Paradox |
Stefan,
> One could probably improve things by adding a few #skipSeparators and > rewriting the #upTo: lines to either read a sequence of allowed binary > selector characters or any number of letters, digits or underlines upto > and including the : There's a bit more to it though. Your code, when inserted in the framework, causes a walkback which prevents it running. I got round that (QwikFix) by replacing stream skipTo: $ . with stream skipWhile: [:c | c isWhitespace]. which allows for multiple spaces between parameters. I also had to remove the Inspector>>x method from the 4.01 image because I couldn't see a way of getting round that. (Aside: I've never noticed that method before - what, if anything, is it supposed to do??) After doing this the framework runs with your method, and quite fast too, but ends up with 12959 errors where the parsed selector doesn't match the actual selector. It's one of those problems that looks quite simple but there are a lot of little subtleties that can trip you up. I found it an interesting challenge anyway <g> Regards Ian |
In reply to this post by Ian Bartholomew-5
Ian,
My stab is below. It includes some extensions to ReadStream (is this allowed?), although it would be more appropriate to create a dedicated subclass. Results are: Incorrect: 0 Comparison (mS): 1789 Parse (mS): 4232 Parse/Comparison ratio: 2.35 It's an interesting excerise; in particular I hadn't considered the use of comments in a method selector, e.g. WebBrowserShell>>OnBeforeNavigate2... Thanks for a novel challenge. John Aspinall Solutions Software http://www.solutionsoft.co.uk/restore --- !Parser methodsFor! parse: aString | readStream nextSection writeStream | readStream := ReadStream on: aString. nextSection:= readStream firstSection. "Quick check for unary/binary selectors" (nextSection last == $:) ifFalse: [^nextSection]. writeStream := WriteStream on: (String new: 50). [writeStream nextPutAll: nextSection. readStream skipArgument. nextSection := readStream nextSection. nextSection notEmpty and: [(nextSection last == $:)]] whileTrue. ^writeStream contents! ! !Parser categoriesFor: #parse:!*-unclassified!public! ! !ReadStream methodsFor! firstSection "Optimised version of nextSection for analysing beginning of method" self skipWhile: [:c | c isSeparator not and: [(c == $:) ifTrue: [self next. false] ifFalse: [true]]]. ^self contents copyFrom: 1 to: self position ! nextSection "Return the next 'section' from the head of a method" | first | self skipSeparators ifFalse: [^'']. "Skip any comments" self peek == $" ifTrue: [self next; skipTo: $". ^self nextSection]. first := self position. self skipWhile: [ :c | c isAlphaNumeric or: [c == $_]]. self peek == $: ifTrue: [self next]. ^self contents copyFrom: first+1 to: self position ! skipArgument self skipSeparators ifFalse: [^nil]. "Skip any preceeding comments" self peek == $" ifTrue: [self next; skipTo: $". ^self skipArgument]. self skipWhile: [:c | c isSeparator not] ! ! !ReadStream categoriesFor: #firstSection!positioning!private! ! !ReadStream categoriesFor: #nextSection!positioning!private! ! !ReadStream categoriesFor: #skipArgument!positioning!private! ! Ian Bartholomew <[hidden email]> wrote in message news:jwUF7.3628$ur6.217388@wards... > ... for anyone who has some time to spare. I am in the process of rewriting > my ChunkBrowser goodie (it was getting a little messy internally) and I > spent a couple of hours this morning on the following. I thought it might be > worth mentioning here as a sort of programming exercise just to see how many > different ways of doing it we could come up with - no winners, losers or > prizes though. > > It's quite simple. Given a String containing a Smalltalk method which is > known to be syntactically and semantically valid (although not necessarily > compilable in the current image) you have to answer the full selector of the > method contained in the String. > > I've included a little framework below that runs all the methods in the > current image through the parser and reports how many were parsed > incorrectly and how long it takes. The framework also runs a empty block to > give a ratio which should, I hope, be independent of the underlying machine > speed. All you have to do is edit the #parse: method (adding more methods > if/as needed) ... > > The idea is to get 0 errors and a ratio that is as low as possible. The > current result of my efforts (I'll post it if there is any interest, but > it's not pretty!) on a Default D4.01 image, which I imagine can be improved > on, is > > Incorrect: 0 > Comp: 1277 > Parse: 3358 > Ratio: 2.63 > > Ian > > Snip |
In reply to this post by Ian Bartholomew-5
"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:9s9gsv$128ft4$[hidden email]... > There's a bit more to it though. Your code, when inserted in the framework, > causes a walkback which prevents it running. You're right. Below is a version that is actually working !Parser methodsFor! parse: aString | stream scan token selector | token := (stream := aString readStream) nextToken. token last == $: ifFalse: [^token]. selector := String writeStream: 128. [token last == $: and: [token first isLetter or: [token first == $_]]] whileTrue: [ selector nextPutAll: token. token := stream nextToken; nextToken]. ^selector contents! ! !Parser categoriesFor: #parse:!*-unclassified!public! ! !Stream methodsFor! nextToken | s n | self skipSeparators ifFalse: [^' ']. self peek == $" ifTrue: [^self next; skipTo: $"; nextToken]. s := String writeStream: 128. [self atEnd ifTrue: [^' ']. (n := self next) isSeparator or: [n == $:]] whileFalse: [s nextPut: n]. n == $: ifTrue: [s nextPut: n]. ^s contents! ! !Stream categoriesFor: #nextToken!accessing!public! ! Unfortunately, John Aspinall's contribution is faster :-( My ratio is only 2.88. I think, his solution is near the optimum. One could remove the notEmpty test by assuring that the nextSection will always return at least one character and one could also remove a few "self"s from the Stream methods. Interestingly, if I run John's code, I get a ration of 2.82. (1120ms comparing and 3155ms parsing). bye -- Stefan Matthias Aust // Truth Until Paradox |
In reply to this post by John Aspinall-2
John,
> My stab is below. It includes some extensions to ReadStream (is this > allowed?), although it would be more appropriate to create a dedicated > subclass. The only rule was that there are no rules <g> > It's an interesting excerise; in particular I hadn't considered the use of > comments in a method selector, e.g. WebBrowserShell>>OnBeforeNavigate2... Yes, that's the one that caught me. One other way in which your code fails (there are no instances of this in the image so I suppose it's not really applicable to this thread) is if there is any whitespace before the first character. Easily fixed by a couple of small changes to #firstSection. One other thing I thought of while writing this - comments *before* the selector are legitimate and compile correctly. No one would do that though - would they <g> I also found the exercise interesting, which was why I mentioned it in the group. My first version, in the original ChunkBrowser, used Streams but I thought that a version that accessed the String directly would be quicker. I was surprised to find that not only was there little difference in speed but the non-stream version was much more complicated than I expected. I've posted it below, but please wear protective glasses before looking at it. I think I'll go back to a stream version for the ChunkBrowser though, the slight difference in performance is probably not worth the ugly code!! Regards Ian "Filed out from Dolphin Smalltalk 2000 release 4.01"! Parser subclass: #IanParser instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' classInstanceVariableNames: ''! IanParser comment: ''! IanParser guid: (GUID fromString: '{0CE35693-FC18-4002-8DD5-518C96D32430}')! !IanParser categoriesForClass!Unclassified! ! !IanParser methodsFor! parse: aString | position positionEnd selector | selector := String new. position := 1. "skip whitespace" [(aString at: position) isSeparator] whileTrue: [position := position + 1]. "just a selector" positionEnd := (position to: aString size) detect: [:i | (self validSelectorCharacters includes: (aString at: i)) not] ifNone: [0]. positionEnd = 0 ifTrue: [^aString copyFrom: position]. "One of the operators or a selector with no args" (aString at: positionEnd) = $: ifFalse: [^aString copyFrom: position to: positionEnd - 1]. "A selector with an argument so iterate, picking up the selector but ignoring the args" selector := aString copyFrom: position to: positionEnd. position := positionEnd. [ "skip whitespace (starting from :)" [position := position + 1. (aString at: position) isSeparator] whileTrue. "skip argument" position := (position to: aString size) detect: [:i | (self validArgumentCharacters includes: (aString at: i)) not] ifNone: [^selector]. "skip whitespace (starting from )" [(aString at: position) isSeparator] whileTrue: [position := position + 1]. "skip comments" [position < aString size and: [(aString at: position) = $"]] whileTrue: [ position := position + 1. [(aString at: position) = $"] whileFalse: [position := position + 1]. position := position + 1. [position < aString size and: [(aString at: position) isSeparator]] whileTrue: [position := position + 1]]. "possibly the next part of a selector - nb use argument characters for selector now" positionEnd := (position to: aString size) detect: [:i | (self validArgumentCharacters includes: (aString at: i)) not] ifNone: [^selector]. "Not a : means that we have found the start of the method text so use what we have" (aString at: positionEnd) = $: ifFalse: [^selector]. "Append the new part of the selector and continue" selector := selector , (aString copyFrom: position to: positionEnd). position := positionEnd] repeat. self halt "shouldn't get here"! validArgumentCharacters ^'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_'! validSelectorCharacters ^'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~_->=<|&,+*/ \%@'! ! !IanParser categoriesFor: #parse:!*-unclassified!public! ! !IanParser categoriesFor: #validArgumentCharacters!*-unclassified!public! ! !IanParser categoriesFor: #validSelectorCharacters!*-unclassified!public! ! |
In reply to this post by Stefan Matthias Aust
Stefan,
> You're right. Below is a version that is actually working Thanks, that runs fine and copes with the leading whitespace that I mentioned in the other thread. > Unfortunately, John Aspinall's contribution is faster :-( My ratio is only > 2.88. I think, his solution is near the optimum. One could remove the > notEmpty test by assuring that the nextSection will always return at least > one character and one could also remove a few "self"s from the Stream > methods. It does seem that whatever way you do it the 2.5 to 3 ratio is about the best that can be achieved. I did wonder, I haven't had any time to try much today, if defining (big) Streams in an instVar and then reusing them, rather than creating a new one for each String, might save a bit of time because of the reduced object creation/garbage collection. > Interestingly, if I run John's code, I get a ration of 2.82. (1120ms > comparing and 3155ms parsing). I was seeing a similar difference in the ratio as well. I did wonder if the amount of memory or OS in John's system might make a difference. There does also seem a little bit of difference between results even when running the framework on the same system. I'm not sure why. Regards Ian |
"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:jjbG7.4050$h4.261803@stones... > It does seem that whatever way you do it the 2.5 to 3 ratio is about the > best that can be achieved. I did wonder, I haven't had any time to try much > today, if defining (big) Streams in an instVar and then reusing them, rather > than creating a new one for each String, might save a bit of time because of > the reduced object creation/garbage collection. Don't think so. I looked at a profile generated with your Profiler and it seems that most time is spent in #isSeparator, #skipSeparators and skipping comments. My image contains some 20600 methods, so the memory for all streams is less than 6 MB (in my code). I don't think that this would trigger a GC already. > I was seeing a similar difference in the ratio as well. I did wonder if the > amount of memory or OS in John's system might make a difference. I've 384 MB if that matters. bye -- Stefan Matthias Aust // Truth Until Paradox |
In reply to this post by Ian Bartholomew-5
> I was seeing a similar difference in the ratio as well. I did wonder if
> the amount of memory or OS in John's system might make a difference. > There does also seem a little bit of difference between results even > when running the framework on the same system. I'm not sure why. FWIW, I was getting results varying between 2.29 and 2.39 - your code gives similar results here. Oddly, both show up significantly less (1.5ish) on the first run after firing up the image, the difference being due to a higher comparison figure. Cheers. John |
In reply to this post by Ian Bartholomew-5
Thanks all...
I had a look through all the replies to this post, both here and via e-mail, mulled it all over and came up with the following implementation. I think it's a reasonable compromise between readability, efficiency and accuracy [1] although others may disagree. Thanks to all who had a go. I think this is probably closest to Stefan's version but they all played a part in the thinking process. Regards Ian [1] From a quick series of test's I couldn't fool it, but maybe somebody else can? !String methodsFor! parseSelector "Answers a String representing the Smalltalk selector parsed from the start of the receiver. Embedded comments are ignored Test on the current image with - Class allMethodsDo: [:each | (each getSource parseSelector) = each selector asString ifFalse: [self halt]] Thanks to posters in the Dolphin newsgroup, and via e-mail, for suggestions on the best ways to implement this" | readStream writeStream token | #idbAdded. readStream := self readStream. (token := readStream nextSelectorToken) isSelector ifFalse: [^token]. writeStream := String writeStream: 100. [token isSelector] whileTrue: [ writeStream nextPutAll: token. token := readStream nextSelectorToken; nextSelectorToken]. ^writeStream contents! ! !String categoriesFor: #parseSelector!accessing!idb goodies!private! ! !PositionableStream methodsFor! nextSelectorToken "Answers the next token on the receiver and leaves the receiver positioned immediately after the token. A token in this context refers to any sequence of non separator characters terminated by either a separator, the end of the stream or a :. Used for parsing method selectors and arguments. Embedded comments, or sequences of comments, are skipped. This method may answer an empty String" | start | #idbAdded. self skipSeparators ifFalse: [^String new]. (self peekFor: $") ifTrue: [ self skipTo: $". ^self nextSelectorToken]. start := position + 1. [ | char | self atEnd ifTrue: [^collection copyFrom: start to: readLimit]. char := self next. char isSeparator ifTrue: [^collection copyFrom: start to: position - 1]. char = $: ifTrue: [^collection copyFrom: start to: position]] repeat! ! !PositionableStream categoriesFor: #nextSelectorToken!accessing!idb goodies!private! ! !String methodsFor! isSelector "Answers true if the receiver is a valid part of a method selector. i.e. is in the format [A-Za-z_]{anything}[:]" #idbAdded. ^self notEmpty and: [self last = $: and: [| char | (char := self first) isLetter or: [char = $_]]]! ! !String categoriesFor: #isSelector!idb goodies!private!testing! ! |
My last word on this subject (although it's been bugging me I suspect
everyone else is getting cheesed off with it <g>) Chris Uppal pointed out that missing separators cause the parser to fail - test^self +arg^arg+1 as well as the unexpected (!) :arg^arg+1 all compile and work but fail the existing parser. I've got a new version that is a bit more complex, I've moved it all into a separate class, but works with everything that I can think of to throw at it and with very little performance penalty. If anyone wants a copy the please mail me. Regards Ian |
Free forum by Nabble | Edit this page |