A little light exercise ...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

A little light exercise ...

Ian Bartholomew-5
... for anyone who has some time to spare. I am in the process of rewriting
my ChunkBrowser goodie (it was getting a little messy internally) and I
spent a couple of hours this morning on the following. I thought it might be
worth mentioning here as a sort of programming exercise just to see how many
different ways of doing it we could come up with - no winners, losers or
prizes though.

It's quite simple. Given a String containing a Smalltalk method which is
known to be syntactically and semantically valid (although not necessarily
compilable in the current image) you have to answer the full selector of the
method contained in the String.

I've included a little framework below that runs all the methods in the
current image through the parser and reports how many were parsed
incorrectly and how long it takes. The framework also runs a empty block to
give a ratio which should, I hope, be independent of the underlying machine
speed.  All you have to do is edit the #parse: method (adding more methods
if/as needed) ...

The idea is to get 0 errors and a ratio that is as low as possible. The
current result of my efforts (I'll post it if there is any interest, but
it's not pretty!) on a Default D4.01 image, which I imagine can be improved
on, is

Incorrect: 0
Comp: 1277
Parse: 3358
Ratio:  2.63

Ian

Snip and save the following into a file.
File/FileIn to load it into a Dolphin 4 image
Start up with "Parser new run"

-~-~-~ snip

"Filed out from Dolphin Smalltalk 2000 release 4.01"!
Object subclass: #Parser
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    classInstanceVariableNames: ''!

Parser comment: ''!

Parser guid: (GUID fromString: '{65806B9F-2D03-41AF-8B60-7ADF22505F6B}')!

!Parser categoriesForClass!Unclassified! !

!Parser methodsFor!
getComparisonTime
    ^Time millisecondsToRun: [
        Class allMethodsDo: [:each |
            each getSource]]!

getParseTime
    | wrong |
    ^(Time millisecondsToRun: [
        wrong := 0.
        Class allMethodsDo: [:each |
            (self parse: each getSource) = each selector asString
                ifFalse: [wrong := wrong + 1]]]) -> wrong!

parse: aString
    ^'Your Code Here'!

run
    | comparison parse |
    MemoryManager current collectGarbage.
    comparison := self getComparisonTime.
    MemoryManager current collectGarbage.
    parse := self getParseTime.

    MessageBox notify: (String writeStream
        nextPutAll: 'Incorrect: ';
        print: parse value;
        cr;
        nextPutAll: 'Comparison (mS): ';
        print: comparison;
        cr;
        nextPutAll: 'Parse (mS): ';
        print: parse key;
        cr;
        nextPutAll: 'Parse/Comparison ratio: ' ;
        print: ((parse key / comparison) roundTo: 0.01);
        contents)! !

!Parser categoriesFor: #getComparisonTime!*-unclassified!public! !
!Parser categoriesFor: #getParseTime!*-unclassified!public! !
!Parser categoriesFor: #parse:!*-unclassified!public! !
!Parser categoriesFor: #run!*-unclassified!public! !


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Stefan Matthias Aust
"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:jwUF7.3628$ur6.217388@wards...

> It's quite simple. Given a String containing a Smalltalk method which is
> known to be syntactically and semantically valid (although not necessarily
> compilable in the current image) you have to answer the full selector of
the
> method contained in the String.

Here's my untested ad-hoc solution (assuming that there are spaces between
the selector fragments and the parameters):

parse: aString
    stream := aString readStream.
    token := stream upTo: $ .
    token last = $:
        ifTrue: [
            selector := WriteStream on: (String new: 64).
            [token last = $:] whileTrue: [
                selector nextPutAll: token.
                stream skipTo: $ .
                token := stream upTo: $ ].
            ^selector contents].
    ^token

One could probably improve things by adding a few #skipSeparators and
rewriting the #upTo: lines to either read a sequence of allowed binary
selector characters or any number of letters, digits or underlines upto and
including the :

bye
--
Stefan Matthias Aust  //  Truth Until Paradox


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Ian Bartholomew-5
Stefan,

> One could probably improve things by adding a few #skipSeparators and
> rewriting the #upTo: lines to either read a sequence of allowed binary
> selector characters or any number of letters, digits or underlines upto
> and including the :

There's a bit more to it though. Your code, when inserted in the framework,
causes a walkback which prevents it running. I got round that (QwikFix) by
replacing

stream skipTo: $ .

with

stream skipWhile: [:c | c isWhitespace].

which allows for multiple spaces between parameters.  I also had to remove
the Inspector>>x method from the 4.01 image because I couldn't see a way of
getting round that. (Aside: I've never noticed that method before - what, if
anything, is it supposed to do??)

After doing this the framework runs with your method, and quite fast too,
but ends up with 12959 errors where the parsed selector doesn't match the
actual selector.  It's one of those problems that looks quite simple but
there are a lot of little subtleties that can trip you up.  I found it an
interesting challenge anyway <g>

Regards
    Ian


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

John Aspinall-2
In reply to this post by Ian Bartholomew-5
Ian,

My stab is below. It includes some extensions to ReadStream (is this
allowed?), although it would be more appropriate to create a dedicated
subclass. Results are:

Incorrect: 0
Comparison (mS): 1789
Parse (mS): 4232
Parse/Comparison ratio: 2.35

It's an interesting excerise; in particular I hadn't considered the use of
comments in a method selector, e.g. WebBrowserShell>>OnBeforeNavigate2...

Thanks for a novel challenge.


John Aspinall
Solutions Software
http://www.solutionsoft.co.uk/restore

---

!Parser methodsFor!

parse: aString

    | readStream nextSection writeStream |

    readStream := ReadStream on: aString.
    nextSection:= readStream firstSection.

    "Quick check for unary/binary selectors"
    (nextSection last == $:) ifFalse: [^nextSection].

    writeStream := WriteStream on: (String new: 50).

    [writeStream nextPutAll: nextSection.

    readStream skipArgument.
    nextSection := readStream nextSection.
    nextSection notEmpty and: [(nextSection last == $:)]] whileTrue.

    ^writeStream contents! !
!Parser categoriesFor: #parse:!*-unclassified!public! !

!ReadStream methodsFor!

firstSection

    "Optimised version of nextSection for analysing beginning of method"

    self skipWhile: [:c | c isSeparator not and: [(c == $:) ifTrue: [self
next. false] ifFalse: [true]]].
    ^self contents copyFrom: 1 to: self position
!

nextSection

    "Return the next 'section' from the head of a method"

    | first |

    self skipSeparators ifFalse: [^''].

    "Skip any comments"
    self peek == $" ifTrue:
        [self next; skipTo: $".
        ^self nextSection].

    first := self position.
    self skipWhile: [ :c | c isAlphaNumeric or: [c == $_]].

    self peek == $: ifTrue: [self next].
    ^self contents copyFrom: first+1 to: self position
!

skipArgument

    self skipSeparators ifFalse: [^nil].

    "Skip any preceeding comments"
    self peek == $" ifTrue:
        [self next; skipTo: $".
        ^self skipArgument].

    self skipWhile: [:c | c isSeparator not] ! !
!ReadStream categoriesFor: #firstSection!positioning!private! !
!ReadStream categoriesFor: #nextSection!positioning!private! !
!ReadStream categoriesFor: #skipArgument!positioning!private! !



Ian Bartholomew <[hidden email]> wrote in message
news:jwUF7.3628$ur6.217388@wards...
> ... for anyone who has some time to spare. I am in the process of
rewriting
> my ChunkBrowser goodie (it was getting a little messy internally) and I
> spent a couple of hours this morning on the following. I thought it might
be
> worth mentioning here as a sort of programming exercise just to see how
many
> different ways of doing it we could come up with - no winners, losers or
> prizes though.
>
> It's quite simple. Given a String containing a Smalltalk method which is
> known to be syntactically and semantically valid (although not necessarily
> compilable in the current image) you have to answer the full selector of
the
> method contained in the String.
>
> I've included a little framework below that runs all the methods in the
> current image through the parser and reports how many were parsed
> incorrectly and how long it takes. The framework also runs a empty block
to
> give a ratio which should, I hope, be independent of the underlying
machine
> speed.  All you have to do is edit the #parse: method (adding more methods
> if/as needed) ...
>
> The idea is to get 0 errors and a ratio that is as low as possible. The
> current result of my efforts (I'll post it if there is any interest, but
> it's not pretty!) on a Default D4.01 image, which I imagine can be
improved

> on, is
>
> Incorrect: 0
> Comp: 1277
> Parse: 3358
> Ratio:  2.63
>
> Ian
>
> Snip


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Stefan Matthias Aust
In reply to this post by Ian Bartholomew-5
"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:9s9gsv$128ft4$[hidden email]...

> There's a bit more to it though. Your code, when inserted in the
framework,
> causes a walkback which prevents it running.

You're right. Below is a version that is actually working

!Parser methodsFor!

parse: aString
 | stream scan token selector |
 token := (stream := aString readStream) nextToken.
 token last == $: ifFalse: [^token].
 selector := String writeStream: 128.
 [token last == $: and: [token first isLetter or: [token first == $_]]]
whileTrue: [
  selector nextPutAll: token.
  token := stream nextToken; nextToken].
 ^selector contents! !
!Parser categoriesFor: #parse:!*-unclassified!public! !

!Stream methodsFor!

nextToken
 | s n |
 self skipSeparators ifFalse: [^' '].
 self peek == $" ifTrue: [^self next; skipTo: $"; nextToken].
 s := String writeStream: 128.
 [self atEnd ifTrue: [^' ']. (n := self next) isSeparator or: [n == $:]]
  whileFalse: [s nextPut: n].
 n == $: ifTrue: [s nextPut: n].
 ^s contents! !
!Stream categoriesFor: #nextToken!accessing!public! !


Unfortunately, John Aspinall's contribution is faster :-( My ratio is only
2.88. I think, his solution is near the optimum. One could remove the
notEmpty test by assuring that the nextSection will always return at least
one character and one could also remove a few "self"s from the Stream
methods.

Interestingly, if I run John's code, I get a ration of 2.82. (1120ms
comparing and 3155ms parsing).

bye
--
Stefan Matthias Aust  //  Truth Until Paradox


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Ian Bartholomew-5
In reply to this post by John Aspinall-2
John,

> My stab is below. It includes some extensions to ReadStream (is this
> allowed?), although it would be more appropriate to create a dedicated
> subclass.

The only rule was that there are no rules <g>

> It's an interesting excerise; in particular I hadn't considered the use of
> comments in a method selector, e.g. WebBrowserShell>>OnBeforeNavigate2...

Yes, that's the one that caught me.  One other way in which your code fails
(there are no instances of this in the image so I suppose it's not really
applicable to this thread) is if there is any whitespace before the first
character. Easily fixed by a couple of small changes to #firstSection.

One other thing I thought of while writing this - comments *before* the
selector are legitimate and compile correctly.  No one would do that
though - would they <g>

I also found the exercise interesting, which was why I mentioned it in the
group. My first version, in the original ChunkBrowser, used Streams but I
thought that a version that accessed the String directly would be quicker. I
was surprised to find that not only was there little difference in speed but
the non-stream version was much more complicated than I expected. I've
posted it below, but please wear protective glasses before looking at it.

I think I'll go back to a stream version for the ChunkBrowser though, the
slight difference in performance is probably not worth the ugly code!!

Regards
    Ian

"Filed out from Dolphin Smalltalk 2000 release 4.01"!
Parser subclass: #IanParser
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    classInstanceVariableNames: ''!

IanParser comment: ''!

IanParser guid: (GUID fromString: '{0CE35693-FC18-4002-8DD5-518C96D32430}')!

!IanParser categoriesForClass!Unclassified! !
!IanParser methodsFor!
parse: aString
    | position positionEnd selector |
    selector := String new.
    position := 1.

    "skip whitespace"
    [(aString at: position) isSeparator] whileTrue: [position := position +
1].

    "just a selector"
    positionEnd := (position to: aString size)
        detect: [:i | (self validSelectorCharacters includes: (aString at:
i)) not]
        ifNone: [0].
        positionEnd = 0 ifTrue: [^aString copyFrom: position].

    "One of the operators or a selector with no args"
    (aString at: positionEnd) = $:
        ifFalse: [^aString copyFrom: position to: positionEnd - 1].

    "A selector with an argument so iterate, picking up the selector but
        ignoring the args"
    selector := aString copyFrom: position to: positionEnd.
    position := positionEnd.

    [ "skip whitespace (starting from :)"
        [position := position + 1.
        (aString at: position) isSeparator] whileTrue.

        "skip argument"
        position := (position to: aString size)
            detect: [:i | (self validArgumentCharacters includes: (aString
at: i)) not]
            ifNone: [^selector].

        "skip whitespace (starting from )"
        [(aString at: position) isSeparator] whileTrue: [position :=
position + 1].

        "skip comments"
        [position < aString size and: [(aString at: position) = $"]]
whileTrue: [
            position := position + 1.
            [(aString at: position) = $"] whileFalse: [position := position
+ 1].
            position := position + 1.
            [position < aString size and: [(aString at: position)
isSeparator]]
                whileTrue: [position := position + 1]].

        "possibly the next part of a selector - nb use argument characters
for
            selector now"
        positionEnd := (position to: aString size)
            detect: [:i | (self validArgumentCharacters includes: (aString
at: i)) not]
            ifNone: [^selector].

        "Not a : means that we have found the start of the method text so
use what
            we have"
        (aString at: positionEnd) = $: ifFalse: [^selector].

        "Append the new part of the selector and continue"
        selector := selector , (aString copyFrom: position to: positionEnd).

        position := positionEnd] repeat.

    self halt "shouldn't get here"!

validArgumentCharacters
^'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_'!

validSelectorCharacters
^'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~_->=<|&,+*/
\%@'! !

!IanParser categoriesFor: #parse:!*-unclassified!public! !
!IanParser categoriesFor: #validArgumentCharacters!*-unclassified!public! !
!IanParser categoriesFor: #validSelectorCharacters!*-unclassified!public! !


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Ian Bartholomew-5
In reply to this post by Stefan Matthias Aust
Stefan,

> You're right. Below is a version that is actually working

Thanks, that runs fine and copes with the leading whitespace that I
mentioned in the other thread.

> Unfortunately, John Aspinall's contribution is faster :-( My ratio is only
> 2.88. I think, his solution is near the optimum. One could remove the
> notEmpty test by assuring that the nextSection will always return at least
> one character and one could also remove a few "self"s from the Stream
> methods.

It does seem that whatever way you do it the 2.5 to 3 ratio is about the
best that can be achieved. I did wonder, I haven't had any time to try much
today, if defining (big) Streams in an instVar and then reusing them, rather
than creating a new one for each String, might save a bit of time because of
the reduced object creation/garbage collection.

> Interestingly, if I run John's code, I get a ration of 2.82. (1120ms
> comparing and 3155ms parsing).

I was seeing a similar difference in the ratio as well. I did wonder if the
amount of memory or OS in John's system might make a difference.  There does
also seem a little bit of difference between results even when running the
framework on the same system.  I'm not sure why.

Regards
    Ian


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Stefan Matthias Aust
"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:jjbG7.4050$h4.261803@stones...

> It does seem that whatever way you do it the 2.5 to 3 ratio is about the
> best that can be achieved. I did wonder, I haven't had any time to try
much
> today, if defining (big) Streams in an instVar and then reusing them,
rather
> than creating a new one for each String, might save a bit of time because
of
> the reduced object creation/garbage collection.

Don't think so. I looked at a profile generated with your Profiler and it
seems that most time is spent in #isSeparator, #skipSeparators and skipping
comments. My image contains some 20600 methods, so the memory for all
streams is less than 6 MB (in my code).  I don't think that this would
trigger a GC already.

> I was seeing a similar difference in the ratio as well. I did wonder if
the
> amount of memory or OS in John's system might make a difference.

I've 384 MB if that matters.

bye
--
Stefan Matthias Aust  //  Truth Until Paradox


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

John Aspinall-2
In reply to this post by Ian Bartholomew-5
> I was seeing a similar difference in the ratio as well. I did wonder if
> the amount of memory or OS in John's system might make a difference.
> There does also seem a little bit of difference between results even
> when running the framework on the same system.  I'm not sure why.

FWIW, I was getting results varying between 2.29 and 2.39 - your code gives
similar results here.

Oddly, both show up significantly less (1.5ish) on the first run after
firing up the image, the difference being due to a higher comparison figure.

Cheers.

John


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Ian Bartholomew-5
In reply to this post by Ian Bartholomew-5
Thanks all...

I had a look through all the replies to this post, both here and via e-mail,
mulled it all over and came up with the following implementation. I think
it's a reasonable compromise between readability, efficiency and accuracy
[1] although others may disagree.

Thanks to all who had a go. I think this is probably closest to Stefan's
version but they all played a part in the thinking process.

Regards
    Ian

[1] From a quick series of test's I couldn't fool it, but maybe somebody
else can?


!String methodsFor!
parseSelector
    "Answers a String representing the Smalltalk selector parsed
    from the start of the receiver. Embedded comments are ignored
    Test on the current image with -
        Class allMethodsDo: [:each |
            (each getSource parseSelector) = each selector asString
                ifFalse: [self halt]]
    Thanks to posters in the Dolphin newsgroup, and via e-mail, for
suggestions
    on the best ways to implement this"

    | readStream writeStream token |
    #idbAdded.
    readStream := self readStream.
    (token := readStream nextSelectorToken) isSelector ifFalse: [^token].

    writeStream := String writeStream: 100.
    [token isSelector] whileTrue: [
        writeStream nextPutAll: token.
        token := readStream
            nextSelectorToken;
            nextSelectorToken].
    ^writeStream contents! !
!String categoriesFor: #parseSelector!accessing!idb goodies!private! !

!PositionableStream methodsFor!
nextSelectorToken
    "Answers the next token on the receiver and leaves the receiver
positioned
    immediately after the token. A token in this context refers to any
sequence
    of non separator characters terminated by either a separator, the end of
the
    stream or a :. Used for parsing method selectors and arguments.
    Embedded comments, or sequences of comments, are skipped.
    This method may answer an empty String"

    | start |
    #idbAdded.
    self skipSeparators ifFalse: [^String new].
    (self peekFor: $")
        ifTrue: [
            self skipTo: $".
            ^self nextSelectorToken].

    start := position + 1.
    [ | char |
        self atEnd ifTrue: [^collection copyFrom: start to: readLimit].
        char := self next.
        char isSeparator ifTrue: [^collection copyFrom: start to: position -
1].
        char = $: ifTrue: [^collection copyFrom: start to: position]]
repeat! !
!PositionableStream categoriesFor: #nextSelectorToken!accessing!idb
goodies!private! !

!String methodsFor!
isSelector
    "Answers true if the receiver is a valid part of a method selector.
    i.e. is in the format [A-Za-z_]{anything}[:]"

    #idbAdded.
    ^self notEmpty
        and: [self last = $:
        and: [| char | (char := self first) isLetter or: [char = $_]]]! !
!String categoriesFor: #isSelector!idb goodies!private!testing! !


Reply | Threaded
Open this post in threaded view
|

Re: A little light exercise ...

Ian Bartholomew-5
My last word on this subject (although it's been bugging me I suspect
everyone else is getting cheesed off with it <g>)

Chris Uppal pointed out that missing separators cause the parser to fail -

test^self

+arg^arg+1

as well as the unexpected (!)

:arg^arg+1

all compile and work but fail the existing parser.  I've got a new version
that is a bit more complex, I've moved it all into a separate class, but
works with everything that I can think of to throw at it and with very
little performance penalty.

If anyone wants a copy the please mail me.

Regards
    Ian