Spur images

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Spur images

Benjamin Pollack-2
My apologies if this is already spelled out somewhere and I simply can't
find it, but are there any Spur images for Pharo yet?  I can only find
Squeak ones.  It's not a big deal either way; I was just playing with an
algorithm that, after very heavy optimization, I was able to get down to
about 278ms per pass (from ~700ms initially from a naive
implementation).  For contrast, the equivalent Python runs in ~80ms.
Looking at MessageTally, it seems at least half of that 278ms is spent
noodling around in WideStrings and other small things that the Spur
object format ought to help with, so it'd be interesting to see how much
of a speed boost that gives without any further work.

Reply | Threaded
Open this post in threaded view
|

Re: Spur images

stepharo
Pharo has a new classbuilder with first class instance variables so the
bootstrap took longer and
doing so esteban and guille found bugs in Spur. So regularly but hidden
from the fame, esteban goes over it.
and we hope soon to have some images to beta testers.

> My apologies if this is already spelled out somewhere and I simply can't
> find it, but are there any Spur images for Pharo yet?  I can only find
> Squeak ones.  It's not a big deal either way; I was just playing with an
> algorithm that, after very heavy optimization, I was able to get down to
> about 278ms per pass (from ~700ms initially from a naive
> implementation).  For contrast, the equivalent Python runs in ~80ms.
> Looking at MessageTally, it seems at least half of that 278ms is spent
> noodling around in WideStrings and other small things that the Spur
> object format ought to help with, so it'd be interesting to see how much
> of a speed boost that gives without any further work.
tell us more about your algo because we want to know.



Reply | Threaded
Open this post in threaded view
|

Re: Spur images

EstebanLM
as Stef says, we are close, but not there yet.
I’m working on it and I hope to have something really soon.

Esteban

> On 03 Oct 2014, at 22:13, stepharo <[hidden email]> wrote:
>
> Pharo has a new classbuilder with first class instance variables so the bootstrap took longer and
> doing so esteban and guille found bugs in Spur. So regularly but hidden from the fame, esteban goes over it.
> and we hope soon to have some images to beta testers.
>> My apologies if this is already spelled out somewhere and I simply can't
>> find it, but are there any Spur images for Pharo yet?  I can only find
>> Squeak ones.  It's not a big deal either way; I was just playing with an
>> algorithm that, after very heavy optimization, I was able to get down to
>> about 278ms per pass (from ~700ms initially from a naive
>> implementation).  For contrast, the equivalent Python runs in ~80ms.
>> Looking at MessageTally, it seems at least half of that 278ms is spent
>> noodling around in WideStrings and other small things that the Spur
>> object format ought to help with, so it'd be interesting to see how much
>> of a speed boost that gives without any further work.
> tell us more about your algo because we want to know.
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Benjamin Pollack-2
In reply to this post by stepharo
On Fri, 03 Oct 2014 16:13:35 -0400, stepharo <[hidden email]> wrote:
>> Looking at MessageTally, it seems at least half of that 278ms is spent
>> noodling around in WideStrings and other small things that the Spur
>> object format ought to help with, so it'd be interesting to see how much
>> of a speed boost that gives without any further work.
> tell us more about your algo because we want to know.

It's not that exciting (it's a problem we give job applicants as a  
take-home coding test), but I'll describe it, since it highlights a common  
corner where Pharo does not perform terribly well.

Basically, we give you a pile of play lists (a text file where each line  
is "Song1,Song2,Song3" and so on, representing a single play list), and  
then ask you to tell us which songs appear frequently together (where  
"frequently" is something we define--say, at least 100 times, for  
example).  While there are interesting complex algorithms, a simple one  
that works well for small datasets is simply to walk all of play lists and  
build up a hashtable that maps pairs of bands to a count, then walk the  
hashtable and print out any keys whose counts are over the threshold.  If  
this sounds like "dump all the permutations of each play list into a Bag  
and then walk Bag>>valuesAndCounts," you're right.  As I said, it's not  
that complex.

Python does very well on this because it hits on three things--hashtables,  
sets, and string processing--for which Python has obscenely good  
implementations in carefully optimized C.  (In fact, the 80ms time I  
quoted you is surprisingly close to the maximum speed you can get with  
genuine C or C++ implementation on the sample dataset as well.)

Pharo struggles to match Python for the same three reasons, but  
*especially* because its String functions are just much slower.  In fact,  
if you implement the algorithm I described above naïvely in Pharo, the  
execution time is about 780ms--nearly ten times slower.  After some  
reflection, I chopped the time to 270ms by converting all the band names  
to integers as the first step of the process (ArtistLister>>generateMap:  
in the profile trace below).  Not only does this avoid WideString>>hash  
and friends; it also allows me to bring in IdentityDictionary and  
IdentitySet in most locations due to Pharo's object format.

But there's a lot left.  I'm attaching the profile output below.  Notice  
that Pharo spends more time reading and processing the file (about 120ms)  
than the Python program spends executing total.

--Benjamin


  - 279 tallies, 279 msec.

**Tree**
--------------------------------
Process: (40s) Morphic UI Process: nil
--------------------------------
69.2% {193ms} ArtistLister>>groupsIn:
   |18.6% {52ms} FileReference(AbstractFileReference)>>contents
   |  |17.6% {49ms} MultiByteFileStream(FileStream)>>contents
   |  |  17.6% {49ms} MultiByteFileStream>>next:
   |  |    5.0% {14ms} ByteString>>at:put:
   |  |      |5.0% {14ms} WideString class>>from:
   |  |    5.0% {14ms} WideString>>at:put:
   |  |    5.0% {14ms} primitives
   |  |    1.4% {4ms} MultiByteFileStream>>next
   |  |      1.4% {4ms} UTF8TextConverter>>nextFromStream:
   |14.7% {41ms} WideString(String)>>lines
   |  |14.7% {41ms} WideString(String)>>linesDo:
   |  |  10.0% {28ms} WideString>>copyFrom:to:
   |  |    |5.4% {15ms} WideString(String)>>isOctetString
   |  |    |  |3.2% {9ms} WideString>>at:
   |  |    |  |  |2.2% {6ms} Character class>>value:
   |  |    |  |2.2% {6ms} primitives
   |  |    |4.7% {13ms} WideString(String)>>asOctetString
   |  |    |  3.6% {10ms} primitives
   |  |  4.7% {13ms} WideString(String)>>lineIndicesDo:
   |  |    4.7% {13ms} WideString(String)>>indexOf:startingAt:
   |  |      4.7% {13ms} WideString class(String  
class)>>indexOfAscii:inString:startingAt:
   |10.8% {30ms} ByteString(Object)>>split:
   |  |10.0% {28ms} ByteString(Object)>>split:do:
   |  |  5.7% {16ms} WideString>>copyFrom:to:
   |  |    |3.2% {9ms} WideString(String)>>asOctetString
   |  |    |2.5% {7ms} WideString(String)>>isOctetString
   |  |    |  1.8% {5ms} primitives
   |  |  4.3% {12ms} ByteString(SequenceableCollection)>>split:indicesDo:
   |  |    2.9% {8ms}  
WideString(SequenceableCollection)>>indexOfSubCollection:startingAt:
   |  |      |2.9% {8ms}  
WideString(String)>>indexOfSubCollection:startingAt:ifAbsent:
   |  |      |  2.9% {8ms} WideString(String)>>findString:startingAt:
   |  |      |    2.9% {8ms}  
WideString(String)>>findString:startingAt:caseSensitive:
   |  |      |      2.9% {8ms}  
WideString(String)>>findSubstring:in:startingAt:matchTable:
   |  |    1.4% {4ms} primitives
   |7.2% {20ms} Set(Collection)>>addAll:
   |  |5.0% {14ms} Set>>add:
   |  |  |3.6% {10ms} Set>>scanFor:
   |  |  |  |3.6% {10ms} ByteString(String)>>=
   |  |  |1.4% {4ms} Set(HashedCollection)>>atNewIndex:put:
   |  |  |  1.4% {4ms} Set(HashedCollection)>>fullCheck
   |  |  |    1.4% {4ms} Set>>grow
   |  |1.4% {4ms} OrderedCollection>>do:
   |5.4% {15ms} ArtistLister>>generateMap:
   |  |5.4% {15ms} IdentityDictionary(Dictionary)>>at:put:
   |  |  3.9% {11ms} Dictionary(HashedCollection)>>atNewIndex:put:
   |  |    |3.2% {9ms} Dictionary(HashedCollection)>>fullCheck
   |  |    |  2.5% {7ms} Dictionary(HashedCollection)>>grow
   |  |    |    1.4% {4ms} Dictionary>>noCheckAdd:
   |  |    |      1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil:
   |  |  1.4% {4ms} IdentityDictionary(HashedCollection)>>findElementOrNil:
   |4.3% {12ms} OrderedCollection>>collect:
   |  |4.3% {12ms} OrderedCollection>>addLast:
   |4.3% {12ms} Dictionary>>at:
   |  |4.3% {12ms} Dictionary>>at:ifAbsent:
   |  |  2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil:
   |  |    |2.2% {6ms} Dictionary>>scanFor:
   |  |    |  1.4% {4ms} ByteString(String)>>=
   |  |  2.2% {6ms} primitives
   |3.9% {11ms} ByteString(String)>>trimmed
   |  3.9% {11ms} ByteString(String)>>trimBoth
   |    3.2% {9ms} ByteString(String)>>trimBoth:
   |      2.2% {6ms} primitives
19.4% {54ms} ArtistLister>>popularPairsIn:onlyKeeping:
   |15.8% {44ms} OrderedCollection(Collection)>>asBag
   |  |15.8% {44ms} Bag class(Collection class)>>withAll:
   |  |  15.8% {44ms} Bag(Collection)>>addAll:
   |  |    13.6% {38ms} Bag>>add:
   |  |      |13.6% {38ms} Bag>>add:withOccurrences:
   |  |      |  10.0% {28ms} Dictionary>>at:put:
   |  |      |    |8.6% {24ms}  
Dictionary(HashedCollection)>>findElementOrNil:
   |  |      |    |  |7.9% {22ms} Dictionary>>scanFor:
   |  |      |    |  |  4.7% {13ms} Array(SequenceableCollection)>>hash
   |  |      |    |  |    |3.2% {9ms} primitives
   |  |      |    |  |  2.5% {7ms} Array(SequenceableCollection)>>=
   |  |      |    |  |    2.5% {7ms}  
Array(SequenceableCollection)>>hasEqualElements:
   |  |      |    |  |      1.4% {4ms} primitives
   |  |      |    |1.4% {4ms} Dictionary(HashedCollection)>>atNewIndex:put:
   |  |      |  3.6% {10ms} Dictionary>>at:ifAbsent:
   |  |      |    2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil:
   |  |      |      |2.2% {6ms} Dictionary>>scanFor:
   |  |      |      |  1.4% {4ms} Array(SequenceableCollection)>>=
   |  |      |      |    1.4% {4ms}  
Array(SequenceableCollection)>>hasEqualElements:
   |  |      |    1.4% {4ms} primitives
   |  |    2.2% {6ms} OrderedCollection>>do:
   |2.5% {7ms} ArtistLister>>uniquePairs:do:
4.3% {12ms} ArtistLister>>bandsAboveThresholdIn:
   |4.3% {12ms} Bag(Collection)>>addAll:
   |  4.3% {12ms} Bag>>add:
   |    4.3% {12ms} Bag>>add:withOccurrences:
   |      2.5% {7ms} Dictionary>>at:put:
   |      1.8% {5ms} Dictionary>>at:ifAbsent:
3.2% {9ms} Set>>includes:
   2.5% {7ms} Set(HashedCollection)>>findElementOrNil:
     2.5% {7ms} Set>>scanFor:

**Leaves**
6.8% {19ms} WideString(String)>>asOctetString
6.5% {18ms} ByteString(String)>>=
5.4% {15ms} Dictionary>>at:ifAbsent:
5.0% {14ms} WideString>>at:put:
5.0% {14ms} WideString class>>from:
5.0% {14ms} MultiByteFileStream>>next:
5.0% {14ms} WideString(String)>>isOctetString
4.7% {13ms} WideString class(String  
class)>>indexOfAscii:inString:startingAt:
4.3% {12ms} OrderedCollection>>addLast:
3.9% {11ms} WideString>>at:
3.6% {10ms} Array(SequenceableCollection)>>do:
3.6% {10ms} OrderedCollection>>do:
3.2% {9ms} Set>>scanFor:
3.2% {9ms} Dictionary(HashedCollection)>>atNewIndex:put:
3.2% {9ms} Array(SequenceableCollection)>>hash
2.5% {7ms} ArtistLister>>uniquePairs:do:
2.2% {6ms} Dictionary>>scanFor:
2.2% {6ms} Character class>>value:
2.2% {6ms} ByteString(String)>>trimBoth:
2.2% {6ms} Array(SequenceableCollection)>>hasEqualElements:
1.8% {5ms} Array class(Behavior)>>inheritsFrom:
1.4% {4ms} ByteString(SequenceableCollection)>>split:indicesDo:
1.4% {4ms} SmallInteger>>hashMultiply
1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil:

**Memory**
        old +5,124,784 bytes
        young -340,984 bytes
        used +4,783,800 bytes
        free +295,464 bytes

**GCs**
        full 0 totalling 0ms (0.0% uptime)
        incr 22 totalling 22ms (8.0% uptime), avg 1.0ms
        tenures 15 (avg 1 GCs/tenure)
        root table 0 overflows

Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Sven Van Caekenberghe-2
How come you got WideStrings ?
What does the input look like, can you give a partial example ?

On 05 Oct 2014, at 16:03, Benjamin Pollack <[hidden email]> wrote:

> On Fri, 03 Oct 2014 16:13:35 -0400, stepharo <[hidden email]> wrote:
>>> Looking at MessageTally, it seems at least half of that 278ms is spent
>>> noodling around in WideStrings and other small things that the Spur
>>> object format ought to help with, so it'd be interesting to see how much
>>> of a speed boost that gives without any further work.
>> tell us more about your algo because we want to know.
>
> It's not that exciting (it's a problem we give job applicants as a take-home coding test), but I'll describe it, since it highlights a common corner where Pharo does not perform terribly well.
>
> Basically, we give you a pile of play lists (a text file where each line is "Song1,Song2,Song3" and so on, representing a single play list), and then ask you to tell us which songs appear frequently together (where "frequently" is something we define--say, at least 100 times, for example).  While there are interesting complex algorithms, a simple one that works well for small datasets is simply to walk all of play lists and build up a hashtable that maps pairs of bands to a count, then walk the hashtable and print out any keys whose counts are over the threshold.  If this sounds like "dump all the permutations of each play list into a Bag and then walk Bag>>valuesAndCounts," you're right.  As I said, it's not that complex.
>
> Python does very well on this because it hits on three things--hashtables, sets, and string processing--for which Python has obscenely good implementations in carefully optimized C.  (In fact, the 80ms time I quoted you is surprisingly close to the maximum speed you can get with genuine C or C++ implementation on the sample dataset as well.)
>
> Pharo struggles to match Python for the same three reasons, but *especially* because its String functions are just much slower.  In fact, if you implement the algorithm I described above naïvely in Pharo, the execution time is about 780ms--nearly ten times slower.  After some reflection, I chopped the time to 270ms by converting all the band names to integers as the first step of the process (ArtistLister>>generateMap: in the profile trace below).  Not only does this avoid WideString>>hash and friends; it also allows me to bring in IdentityDictionary and IdentitySet in most locations due to Pharo's object format.
>
> But there's a lot left.  I'm attaching the profile output below.  Notice that Pharo spends more time reading and processing the file (about 120ms) than the Python program spends executing total.
>
> --Benjamin
>
>
> - 279 tallies, 279 msec.
>
> **Tree**
> --------------------------------
> Process: (40s) Morphic UI Process: nil
> --------------------------------
> 69.2% {193ms} ArtistLister>>groupsIn:
>  |18.6% {52ms} FileReference(AbstractFileReference)>>contents
>  |  |17.6% {49ms} MultiByteFileStream(FileStream)>>contents
>  |  |  17.6% {49ms} MultiByteFileStream>>next:
>  |  |    5.0% {14ms} ByteString>>at:put:
>  |  |      |5.0% {14ms} WideString class>>from:
>  |  |    5.0% {14ms} WideString>>at:put:
>  |  |    5.0% {14ms} primitives
>  |  |    1.4% {4ms} MultiByteFileStream>>next
>  |  |      1.4% {4ms} UTF8TextConverter>>nextFromStream:
>  |14.7% {41ms} WideString(String)>>lines
>  |  |14.7% {41ms} WideString(String)>>linesDo:
>  |  |  10.0% {28ms} WideString>>copyFrom:to:
>  |  |    |5.4% {15ms} WideString(String)>>isOctetString
>  |  |    |  |3.2% {9ms} WideString>>at:
>  |  |    |  |  |2.2% {6ms} Character class>>value:
>  |  |    |  |2.2% {6ms} primitives
>  |  |    |4.7% {13ms} WideString(String)>>asOctetString
>  |  |    |  3.6% {10ms} primitives
>  |  |  4.7% {13ms} WideString(String)>>lineIndicesDo:
>  |  |    4.7% {13ms} WideString(String)>>indexOf:startingAt:
>  |  |      4.7% {13ms} WideString class(String class)>>indexOfAscii:inString:startingAt:
>  |10.8% {30ms} ByteString(Object)>>split:
>  |  |10.0% {28ms} ByteString(Object)>>split:do:
>  |  |  5.7% {16ms} WideString>>copyFrom:to:
>  |  |    |3.2% {9ms} WideString(String)>>asOctetString
>  |  |    |2.5% {7ms} WideString(String)>>isOctetString
>  |  |    |  1.8% {5ms} primitives
>  |  |  4.3% {12ms} ByteString(SequenceableCollection)>>split:indicesDo:
>  |  |    2.9% {8ms} WideString(SequenceableCollection)>>indexOfSubCollection:startingAt:
>  |  |      |2.9% {8ms} WideString(String)>>indexOfSubCollection:startingAt:ifAbsent:
>  |  |      |  2.9% {8ms} WideString(String)>>findString:startingAt:
>  |  |      |    2.9% {8ms} WideString(String)>>findString:startingAt:caseSensitive:
>  |  |      |      2.9% {8ms} WideString(String)>>findSubstring:in:startingAt:matchTable:
>  |  |    1.4% {4ms} primitives
>  |7.2% {20ms} Set(Collection)>>addAll:
>  |  |5.0% {14ms} Set>>add:
>  |  |  |3.6% {10ms} Set>>scanFor:
>  |  |  |  |3.6% {10ms} ByteString(String)>>=
>  |  |  |1.4% {4ms} Set(HashedCollection)>>atNewIndex:put:
>  |  |  |  1.4% {4ms} Set(HashedCollection)>>fullCheck
>  |  |  |    1.4% {4ms} Set>>grow
>  |  |1.4% {4ms} OrderedCollection>>do:
>  |5.4% {15ms} ArtistLister>>generateMap:
>  |  |5.4% {15ms} IdentityDictionary(Dictionary)>>at:put:
>  |  |  3.9% {11ms} Dictionary(HashedCollection)>>atNewIndex:put:
>  |  |    |3.2% {9ms} Dictionary(HashedCollection)>>fullCheck
>  |  |    |  2.5% {7ms} Dictionary(HashedCollection)>>grow
>  |  |    |    1.4% {4ms} Dictionary>>noCheckAdd:
>  |  |    |      1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil:
>  |  |  1.4% {4ms} IdentityDictionary(HashedCollection)>>findElementOrNil:
>  |4.3% {12ms} OrderedCollection>>collect:
>  |  |4.3% {12ms} OrderedCollection>>addLast:
>  |4.3% {12ms} Dictionary>>at:
>  |  |4.3% {12ms} Dictionary>>at:ifAbsent:
>  |  |  2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil:
>  |  |    |2.2% {6ms} Dictionary>>scanFor:
>  |  |    |  1.4% {4ms} ByteString(String)>>=
>  |  |  2.2% {6ms} primitives
>  |3.9% {11ms} ByteString(String)>>trimmed
>  |  3.9% {11ms} ByteString(String)>>trimBoth
>  |    3.2% {9ms} ByteString(String)>>trimBoth:
>  |      2.2% {6ms} primitives
> 19.4% {54ms} ArtistLister>>popularPairsIn:onlyKeeping:
>  |15.8% {44ms} OrderedCollection(Collection)>>asBag
>  |  |15.8% {44ms} Bag class(Collection class)>>withAll:
>  |  |  15.8% {44ms} Bag(Collection)>>addAll:
>  |  |    13.6% {38ms} Bag>>add:
>  |  |      |13.6% {38ms} Bag>>add:withOccurrences:
>  |  |      |  10.0% {28ms} Dictionary>>at:put:
>  |  |      |    |8.6% {24ms} Dictionary(HashedCollection)>>findElementOrNil:
>  |  |      |    |  |7.9% {22ms} Dictionary>>scanFor:
>  |  |      |    |  |  4.7% {13ms} Array(SequenceableCollection)>>hash
>  |  |      |    |  |    |3.2% {9ms} primitives
>  |  |      |    |  |  2.5% {7ms} Array(SequenceableCollection)>>=
>  |  |      |    |  |    2.5% {7ms} Array(SequenceableCollection)>>hasEqualElements:
>  |  |      |    |  |      1.4% {4ms} primitives
>  |  |      |    |1.4% {4ms} Dictionary(HashedCollection)>>atNewIndex:put:
>  |  |      |  3.6% {10ms} Dictionary>>at:ifAbsent:
>  |  |      |    2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil:
>  |  |      |      |2.2% {6ms} Dictionary>>scanFor:
>  |  |      |      |  1.4% {4ms} Array(SequenceableCollection)>>=
>  |  |      |      |    1.4% {4ms} Array(SequenceableCollection)>>hasEqualElements:
>  |  |      |    1.4% {4ms} primitives
>  |  |    2.2% {6ms} OrderedCollection>>do:
>  |2.5% {7ms} ArtistLister>>uniquePairs:do:
> 4.3% {12ms} ArtistLister>>bandsAboveThresholdIn:
>  |4.3% {12ms} Bag(Collection)>>addAll:
>  |  4.3% {12ms} Bag>>add:
>  |    4.3% {12ms} Bag>>add:withOccurrences:
>  |      2.5% {7ms} Dictionary>>at:put:
>  |      1.8% {5ms} Dictionary>>at:ifAbsent:
> 3.2% {9ms} Set>>includes:
>  2.5% {7ms} Set(HashedCollection)>>findElementOrNil:
>    2.5% {7ms} Set>>scanFor:
>
> **Leaves**
> 6.8% {19ms} WideString(String)>>asOctetString
> 6.5% {18ms} ByteString(String)>>=
> 5.4% {15ms} Dictionary>>at:ifAbsent:
> 5.0% {14ms} WideString>>at:put:
> 5.0% {14ms} WideString class>>from:
> 5.0% {14ms} MultiByteFileStream>>next:
> 5.0% {14ms} WideString(String)>>isOctetString
> 4.7% {13ms} WideString class(String class)>>indexOfAscii:inString:startingAt:
> 4.3% {12ms} OrderedCollection>>addLast:
> 3.9% {11ms} WideString>>at:
> 3.6% {10ms} Array(SequenceableCollection)>>do:
> 3.6% {10ms} OrderedCollection>>do:
> 3.2% {9ms} Set>>scanFor:
> 3.2% {9ms} Dictionary(HashedCollection)>>atNewIndex:put:
> 3.2% {9ms} Array(SequenceableCollection)>>hash
> 2.5% {7ms} ArtistLister>>uniquePairs:do:
> 2.2% {6ms} Dictionary>>scanFor:
> 2.2% {6ms} Character class>>value:
> 2.2% {6ms} ByteString(String)>>trimBoth:
> 2.2% {6ms} Array(SequenceableCollection)>>hasEqualElements:
> 1.8% {5ms} Array class(Behavior)>>inheritsFrom:
> 1.4% {4ms} ByteString(SequenceableCollection)>>split:indicesDo:
> 1.4% {4ms} SmallInteger>>hashMultiply
> 1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil:
>
> **Memory**
> old +5,124,784 bytes
> young -340,984 bytes
> used +4,783,800 bytes
> free +295,464 bytes
>
> **GCs**
> full 0 totalling 0ms (0.0% uptime)
> incr 22 totalling 22ms (8.0% uptime), avg 1.0ms
> tenures 15 (avg 1 GCs/tenure)
> root table 0 overflows
>


Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Benjamin Pollack-2
On Sun, 05 Oct 2014 10:36:31 -0400, Sven Van Caekenberghe <[hidden email]>  
wrote:

> How come you got WideStrings ?
> What does the input look like, can you give a partial example ?

I'm guessing I got WideStrings because the file is indeed in UTF-8, with  
lots of characters outside the lower 128 code points.  A sample couple of  
lines might look like

        光田康典,The National,Sarah McLachlan,周杰倫,Indochine,Rise Against,City  
and Colour,Cæcilie Norby,El Cumbanchero,Death Letter
        The Beatles,The Who,Barenaked Ladies,The Doors,Bob Dylan

These are two play lists, one per line; each comma-delimited element is a  
band on that play list.

The full line for reading and tokenization is just "pathToFile  
asFileReference contents lines collect: [ :line | (',' split: line)  
collect: [ :ea | ea trimmed ]]."  Based on the profile indicating that a  
lot of time is lost on things like WideString>>copyFrom:to:, I wasn't  
optimistic about trying to stream the contents instead of just calling  
"contents lines", but I admit I didn't try.

--Benjamin

Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Sven Van Caekenberghe-2
Hi Benjamin,

On 05 Oct 2014, at 22:37, Benjamin Pollack <[hidden email]> wrote:

> On Sun, 05 Oct 2014 10:36:31 -0400, Sven Van Caekenberghe <[hidden email]> wrote:
>
>> How come you got WideStrings ?
>> What does the input look like, can you give a partial example ?
>
> I'm guessing I got WideStrings because the file is indeed in UTF-8, with lots of characters outside the lower 128 code points.  A sample couple of lines might look like
>
> 光田康典,The National,Sarah McLachlan,周杰倫,Indochine,Rise Against,City and Colour,Cæcilie Norby,El Cumbanchero,Death Letter
> The Beatles,The Who,Barenaked Ladies,The Doors,Bob Dylan
>
> These are two play lists, one per line; each comma-delimited element is a band on that play list.
>
> The full line for reading and tokenization is just "pathToFile asFileReference contents lines collect: [ :line | (',' split: line) collect: [ :ea | ea trimmed ]]."  Based on the profile indicating that a lot of time is lost on things like WideString>>copyFrom:to:, I wasn't optimistic about trying to stream the contents instead of just calling "contents lines", but I admit I didn't try.
>
> --Benjamin

Working with WideStrings is way slower than working with ByteStrings, there is just no way around that. What is especially slow is the automagic switch from ByteString to WideString, for example in a String>>#streamContents: because a #becomeForward: is involved. If that happens for every line or every token, that would be crazy.

Apart from that, the tokenisation is not very efficient, #lines is a copy of your whole contents, so is the #split: and #trimmed. The algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye for performance might yield better results.

But I guess this is not really an exercise in optimisation. If it is, you should give us the dataset and code (and maybe runnable python code as reference), with some comments.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Benjamin Pollack-2
On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <[hidden email]>  
wrote:

> Working with WideStrings is way slower than working with ByteStrings,  
> there is just no way around that. What is especially slow is the  
> automagic switch from ByteString to WideString, for example in a  
> String>>#streamContents: because a #becomeForward: is involved. If that  
> happens for every line or every token, that would be crazy.
>
> Apart from that, the tokenisation is not very efficient, #lines is a  
> copy of your whole contents, so is the #split: and #trimmed. The  
> algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye  
> for performance might yield better results.
>
> But I guess this is not really an exercise in optimisation. If it is,  
> you should give us the dataset and code (and maybe runnable python code  
> as reference), with some comments.

Meh, optimizing the languages maybe, but not the code.  The Python code  
I'm comparing against (and the "sane" C++ code I keep around, for that  
matter) makes these exact same decisions in the interest of readability.  
Indeed, the equivalent from my Python code for that line is

     with open(path) as f:
         lines = [[word.strip() for word in line.split(',')] for line in f]

This does avoid loading the whole into RAM just to split it, but is  
otherwise identical (i.e., strip() allocates an extra string, split()  
allocates lots of extra strings, and the list comprehension is analogous  
to the #collect: call, etc.), so it seems a fair comparison to me.

If you're curious, the maximum speed I've been able to get out of a  
carefully crafted, but not unreadable, C that simply passes around a bunch  
of (start, stop) tuples, rather than strings, comes down to about 40ms.  I  
know I can get better than that, but I lost interest in rewriting all the  
string routines.  I figure another 10ms-20ms could come off, at which  
point it'd finally be as much I/O bound as anything else.

--Benjamin

Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Benjamin Pollack-2
In reply to this post by Sven Van Caekenberghe-2
On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <[hidden email]>  
wrote:
[snip]
> Apart from that, the tokenisation is not very efficient, #lines is a  
> copy of your whole contents, so is the #split: and #trimmed. The  
> algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye  
> for performance might yield better results.

So I was reflecting on this more.  If String and WideString were  
immutable, then it'd be easy to avoid all of these copies; you could  
instead pass around very tiny objects that had only three members (a  
String, a start position, a stop position), and avoid copying very much  
data.  It's that String and WideString are mutable that preclude that.  
For fun, since I know I won't mutate the stringsin this example, I  
actually did a quick spike where I replaced #copyFrom:to: with a new  
method I introduced called #viewFrom:to: that returned a StringView.  I'll  
post the code when I have a chance to clean it up if there's interest, but  
it looks like it pretty handedly chops off 120-150ms from that runtime  
(i.e., double the speed).

Has there been any thought to introducing some immutable collections?  Or  
maybe I'm just missing them?  They'd be useful not just for String and  
WideString, but really for basically any of the collection types.  The  
implementation in most cases would be as simple as overriding #at:put: and  
friends to throw "self shouldNotImplement", and then providing  
methods/classes like the one I introduced to allow taking advantage of the  
newfound immutability.

If there's interest, I'd be happy to submit a Slice we could use as a  
concrete RFC of what this could look like.

--Benjamin

Reply | Threaded
Open this post in threaded view
|

Re: Spur images

Sven Van Caekenberghe-2

On 10 Oct 2014, at 03:40, Benjamin Pollack <[hidden email]> wrote:

> On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <[hidden email]> wrote:
> [snip]
>> Apart from that, the tokenisation is not very efficient, #lines is a copy of your whole contents, so is the #split: and #trimmed. The algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye for performance might yield better results.
>
> So I was reflecting on this more.  If String and WideString were immutable, then it'd be easy to avoid all of these copies; you could instead pass around very tiny objects that had only three members (a String, a start position, a stop position), and avoid copying very much data.  It's that String and WideString are mutable that preclude that.  For fun, since I know I won't mutate the stringsin this example, I actually did a quick spike where I replaced #copyFrom:to: with a new method I introduced called #viewFrom:to: that returned a StringView.  I'll post the code when I have a chance to clean it up if there's interest, but it looks like it pretty handedly chops off 120-150ms from that runtime (i.e., double the speed).
>
> Has there been any thought to introducing some immutable collections?  Or maybe I'm just missing them?  They'd be useful not just for String and WideString, but really for basically any of the collection types.  The implementation in most cases would be as simple as overriding #at:put: and friends to throw "self shouldNotImplement", and then providing methods/classes like the one I introduced to allow taking advantage of the newfound immutability.
>
> If there's interest, I'd be happy to submit a Slice we could use as a concrete RFC of what this could look like.
>
> --Benjamin

I think it is interesting that you get real measurable improvements with user defined string views.

I have always felt that a problem for going in that direction is that most primitives (which are important to get good basic string performance) are not flexible enough to be used really efficiently. More concretely, they should take start/stop indexes on all string arguments. For example, ByteString class>>#compare:with:collated: or #stringHash:initialHash: - it should be possible to do these operations on substrings *without creating the substrings*.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Spur images

abergel
In reply to this post by Benjamin Pollack-2
Interesting.

When working on AstroCloud (http://astrocloudy.wordpress.com) we find out that Bitmap and FormCanvas were way too slow. Performing a global modification (e.g., changing a red for a blue) took long long time for very large images (2000 x 2000 pixels).
We did some intensive profiling and aggressive optimizations until the points we could not optimize anymore. We then tried to avoid manipulating objects in some very specific part of the code. For example, instead of having instance of the class Color, we have an array (unique and preallocated) that contains the red, green, and blue values.
Having these “primitives” values was convenient to call either a C function or very low level functions with native boost. We got a ~ x 30 speed up.

I do not know much about wide strings, but I suspect you can do something like that.  

So, my advice is: identify the method that cost you a lot, and write it natively. This has worked for us, in a non-trivial setting.

Let us know

Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.



On Oct 3, 2014, at 5:07 PM, Benjamin Pollack <[hidden email]> wrote:

> My apologies if this is already spelled out somewhere and I simply can't
> find it, but are there any Spur images for Pharo yet?  I can only find
> Squeak ones.  It's not a big deal either way; I was just playing with an
> algorithm that, after very heavy optimization, I was able to get down to
> about 278ms per pass (from ~700ms initially from a naive
> implementation).  For contrast, the equivalent Python runs in ~80ms.
> Looking at MessageTally, it seems at least half of that 278ms is spent
> noodling around in WideStrings and other small things that the Spur
> object format ought to help with, so it'd be interesting to see how much
> of a speed boost that gives without any further work.
>