My apologies if this is already spelled out somewhere and I simply can't
find it, but are there any Spur images for Pharo yet? I can only find Squeak ones. It's not a big deal either way; I was just playing with an algorithm that, after very heavy optimization, I was able to get down to about 278ms per pass (from ~700ms initially from a naive implementation). For contrast, the equivalent Python runs in ~80ms. Looking at MessageTally, it seems at least half of that 278ms is spent noodling around in WideStrings and other small things that the Spur object format ought to help with, so it'd be interesting to see how much of a speed boost that gives without any further work. |
Pharo has a new classbuilder with first class instance variables so the
bootstrap took longer and doing so esteban and guille found bugs in Spur. So regularly but hidden from the fame, esteban goes over it. and we hope soon to have some images to beta testers. > My apologies if this is already spelled out somewhere and I simply can't > find it, but are there any Spur images for Pharo yet? I can only find > Squeak ones. It's not a big deal either way; I was just playing with an > algorithm that, after very heavy optimization, I was able to get down to > about 278ms per pass (from ~700ms initially from a naive > implementation). For contrast, the equivalent Python runs in ~80ms. > Looking at MessageTally, it seems at least half of that 278ms is spent > noodling around in WideStrings and other small things that the Spur > object format ought to help with, so it'd be interesting to see how much > of a speed boost that gives without any further work. |
as Stef says, we are close, but not there yet.
I’m working on it and I hope to have something really soon. Esteban > On 03 Oct 2014, at 22:13, stepharo <[hidden email]> wrote: > > Pharo has a new classbuilder with first class instance variables so the bootstrap took longer and > doing so esteban and guille found bugs in Spur. So regularly but hidden from the fame, esteban goes over it. > and we hope soon to have some images to beta testers. >> My apologies if this is already spelled out somewhere and I simply can't >> find it, but are there any Spur images for Pharo yet? I can only find >> Squeak ones. It's not a big deal either way; I was just playing with an >> algorithm that, after very heavy optimization, I was able to get down to >> about 278ms per pass (from ~700ms initially from a naive >> implementation). For contrast, the equivalent Python runs in ~80ms. >> Looking at MessageTally, it seems at least half of that 278ms is spent >> noodling around in WideStrings and other small things that the Spur >> object format ought to help with, so it'd be interesting to see how much >> of a speed boost that gives without any further work. > tell us more about your algo because we want to know. > > > |
In reply to this post by stepharo
On Fri, 03 Oct 2014 16:13:35 -0400, stepharo <[hidden email]> wrote:
>> Looking at MessageTally, it seems at least half of that 278ms is spent >> noodling around in WideStrings and other small things that the Spur >> object format ought to help with, so it'd be interesting to see how much >> of a speed boost that gives without any further work. > tell us more about your algo because we want to know. It's not that exciting (it's a problem we give job applicants as a take-home coding test), but I'll describe it, since it highlights a common corner where Pharo does not perform terribly well. Basically, we give you a pile of play lists (a text file where each line is "Song1,Song2,Song3" and so on, representing a single play list), and then ask you to tell us which songs appear frequently together (where "frequently" is something we define--say, at least 100 times, for example). While there are interesting complex algorithms, a simple one that works well for small datasets is simply to walk all of play lists and build up a hashtable that maps pairs of bands to a count, then walk the hashtable and print out any keys whose counts are over the threshold. If this sounds like "dump all the permutations of each play list into a Bag and then walk Bag>>valuesAndCounts," you're right. As I said, it's not that complex. Python does very well on this because it hits on three things--hashtables, sets, and string processing--for which Python has obscenely good implementations in carefully optimized C. (In fact, the 80ms time I quoted you is surprisingly close to the maximum speed you can get with genuine C or C++ implementation on the sample dataset as well.) Pharo struggles to match Python for the same three reasons, but *especially* because its String functions are just much slower. In fact, if you implement the algorithm I described above naïvely in Pharo, the execution time is about 780ms--nearly ten times slower. After some reflection, I chopped the time to 270ms by converting all the band names to integers as the first step of the process (ArtistLister>>generateMap: in the profile trace below). Not only does this avoid WideString>>hash and friends; it also allows me to bring in IdentityDictionary and IdentitySet in most locations due to Pharo's object format. But there's a lot left. I'm attaching the profile output below. Notice that Pharo spends more time reading and processing the file (about 120ms) than the Python program spends executing total. --Benjamin - 279 tallies, 279 msec. **Tree** -------------------------------- Process: (40s) Morphic UI Process: nil -------------------------------- 69.2% {193ms} ArtistLister>>groupsIn: |18.6% {52ms} FileReference(AbstractFileReference)>>contents | |17.6% {49ms} MultiByteFileStream(FileStream)>>contents | | 17.6% {49ms} MultiByteFileStream>>next: | | 5.0% {14ms} ByteString>>at:put: | | |5.0% {14ms} WideString class>>from: | | 5.0% {14ms} WideString>>at:put: | | 5.0% {14ms} primitives | | 1.4% {4ms} MultiByteFileStream>>next | | 1.4% {4ms} UTF8TextConverter>>nextFromStream: |14.7% {41ms} WideString(String)>>lines | |14.7% {41ms} WideString(String)>>linesDo: | | 10.0% {28ms} WideString>>copyFrom:to: | | |5.4% {15ms} WideString(String)>>isOctetString | | | |3.2% {9ms} WideString>>at: | | | | |2.2% {6ms} Character class>>value: | | | |2.2% {6ms} primitives | | |4.7% {13ms} WideString(String)>>asOctetString | | | 3.6% {10ms} primitives | | 4.7% {13ms} WideString(String)>>lineIndicesDo: | | 4.7% {13ms} WideString(String)>>indexOf:startingAt: | | 4.7% {13ms} WideString class(String class)>>indexOfAscii:inString:startingAt: |10.8% {30ms} ByteString(Object)>>split: | |10.0% {28ms} ByteString(Object)>>split:do: | | 5.7% {16ms} WideString>>copyFrom:to: | | |3.2% {9ms} WideString(String)>>asOctetString | | |2.5% {7ms} WideString(String)>>isOctetString | | | 1.8% {5ms} primitives | | 4.3% {12ms} ByteString(SequenceableCollection)>>split:indicesDo: | | 2.9% {8ms} WideString(SequenceableCollection)>>indexOfSubCollection:startingAt: | | |2.9% {8ms} WideString(String)>>indexOfSubCollection:startingAt:ifAbsent: | | | 2.9% {8ms} WideString(String)>>findString:startingAt: | | | 2.9% {8ms} WideString(String)>>findString:startingAt:caseSensitive: | | | 2.9% {8ms} WideString(String)>>findSubstring:in:startingAt:matchTable: | | 1.4% {4ms} primitives |7.2% {20ms} Set(Collection)>>addAll: | |5.0% {14ms} Set>>add: | | |3.6% {10ms} Set>>scanFor: | | | |3.6% {10ms} ByteString(String)>>= | | |1.4% {4ms} Set(HashedCollection)>>atNewIndex:put: | | | 1.4% {4ms} Set(HashedCollection)>>fullCheck | | | 1.4% {4ms} Set>>grow | |1.4% {4ms} OrderedCollection>>do: |5.4% {15ms} ArtistLister>>generateMap: | |5.4% {15ms} IdentityDictionary(Dictionary)>>at:put: | | 3.9% {11ms} Dictionary(HashedCollection)>>atNewIndex:put: | | |3.2% {9ms} Dictionary(HashedCollection)>>fullCheck | | | 2.5% {7ms} Dictionary(HashedCollection)>>grow | | | 1.4% {4ms} Dictionary>>noCheckAdd: | | | 1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil: | | 1.4% {4ms} IdentityDictionary(HashedCollection)>>findElementOrNil: |4.3% {12ms} OrderedCollection>>collect: | |4.3% {12ms} OrderedCollection>>addLast: |4.3% {12ms} Dictionary>>at: | |4.3% {12ms} Dictionary>>at:ifAbsent: | | 2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil: | | |2.2% {6ms} Dictionary>>scanFor: | | | 1.4% {4ms} ByteString(String)>>= | | 2.2% {6ms} primitives |3.9% {11ms} ByteString(String)>>trimmed | 3.9% {11ms} ByteString(String)>>trimBoth | 3.2% {9ms} ByteString(String)>>trimBoth: | 2.2% {6ms} primitives 19.4% {54ms} ArtistLister>>popularPairsIn:onlyKeeping: |15.8% {44ms} OrderedCollection(Collection)>>asBag | |15.8% {44ms} Bag class(Collection class)>>withAll: | | 15.8% {44ms} Bag(Collection)>>addAll: | | 13.6% {38ms} Bag>>add: | | |13.6% {38ms} Bag>>add:withOccurrences: | | | 10.0% {28ms} Dictionary>>at:put: | | | |8.6% {24ms} Dictionary(HashedCollection)>>findElementOrNil: | | | | |7.9% {22ms} Dictionary>>scanFor: | | | | | 4.7% {13ms} Array(SequenceableCollection)>>hash | | | | | |3.2% {9ms} primitives | | | | | 2.5% {7ms} Array(SequenceableCollection)>>= | | | | | 2.5% {7ms} Array(SequenceableCollection)>>hasEqualElements: | | | | | 1.4% {4ms} primitives | | | |1.4% {4ms} Dictionary(HashedCollection)>>atNewIndex:put: | | | 3.6% {10ms} Dictionary>>at:ifAbsent: | | | 2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil: | | | |2.2% {6ms} Dictionary>>scanFor: | | | | 1.4% {4ms} Array(SequenceableCollection)>>= | | | | 1.4% {4ms} Array(SequenceableCollection)>>hasEqualElements: | | | 1.4% {4ms} primitives | | 2.2% {6ms} OrderedCollection>>do: |2.5% {7ms} ArtistLister>>uniquePairs:do: 4.3% {12ms} ArtistLister>>bandsAboveThresholdIn: |4.3% {12ms} Bag(Collection)>>addAll: | 4.3% {12ms} Bag>>add: | 4.3% {12ms} Bag>>add:withOccurrences: | 2.5% {7ms} Dictionary>>at:put: | 1.8% {5ms} Dictionary>>at:ifAbsent: 3.2% {9ms} Set>>includes: 2.5% {7ms} Set(HashedCollection)>>findElementOrNil: 2.5% {7ms} Set>>scanFor: **Leaves** 6.8% {19ms} WideString(String)>>asOctetString 6.5% {18ms} ByteString(String)>>= 5.4% {15ms} Dictionary>>at:ifAbsent: 5.0% {14ms} WideString>>at:put: 5.0% {14ms} WideString class>>from: 5.0% {14ms} MultiByteFileStream>>next: 5.0% {14ms} WideString(String)>>isOctetString 4.7% {13ms} WideString class(String class)>>indexOfAscii:inString:startingAt: 4.3% {12ms} OrderedCollection>>addLast: 3.9% {11ms} WideString>>at: 3.6% {10ms} Array(SequenceableCollection)>>do: 3.6% {10ms} OrderedCollection>>do: 3.2% {9ms} Set>>scanFor: 3.2% {9ms} Dictionary(HashedCollection)>>atNewIndex:put: 3.2% {9ms} Array(SequenceableCollection)>>hash 2.5% {7ms} ArtistLister>>uniquePairs:do: 2.2% {6ms} Dictionary>>scanFor: 2.2% {6ms} Character class>>value: 2.2% {6ms} ByteString(String)>>trimBoth: 2.2% {6ms} Array(SequenceableCollection)>>hasEqualElements: 1.8% {5ms} Array class(Behavior)>>inheritsFrom: 1.4% {4ms} ByteString(SequenceableCollection)>>split:indicesDo: 1.4% {4ms} SmallInteger>>hashMultiply 1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil: **Memory** old +5,124,784 bytes young -340,984 bytes used +4,783,800 bytes free +295,464 bytes **GCs** full 0 totalling 0ms (0.0% uptime) incr 22 totalling 22ms (8.0% uptime), avg 1.0ms tenures 15 (avg 1 GCs/tenure) root table 0 overflows |
How come you got WideStrings ?
What does the input look like, can you give a partial example ? On 05 Oct 2014, at 16:03, Benjamin Pollack <[hidden email]> wrote: > On Fri, 03 Oct 2014 16:13:35 -0400, stepharo <[hidden email]> wrote: >>> Looking at MessageTally, it seems at least half of that 278ms is spent >>> noodling around in WideStrings and other small things that the Spur >>> object format ought to help with, so it'd be interesting to see how much >>> of a speed boost that gives without any further work. >> tell us more about your algo because we want to know. > > It's not that exciting (it's a problem we give job applicants as a take-home coding test), but I'll describe it, since it highlights a common corner where Pharo does not perform terribly well. > > Basically, we give you a pile of play lists (a text file where each line is "Song1,Song2,Song3" and so on, representing a single play list), and then ask you to tell us which songs appear frequently together (where "frequently" is something we define--say, at least 100 times, for example). While there are interesting complex algorithms, a simple one that works well for small datasets is simply to walk all of play lists and build up a hashtable that maps pairs of bands to a count, then walk the hashtable and print out any keys whose counts are over the threshold. If this sounds like "dump all the permutations of each play list into a Bag and then walk Bag>>valuesAndCounts," you're right. As I said, it's not that complex. > > Python does very well on this because it hits on three things--hashtables, sets, and string processing--for which Python has obscenely good implementations in carefully optimized C. (In fact, the 80ms time I quoted you is surprisingly close to the maximum speed you can get with genuine C or C++ implementation on the sample dataset as well.) > > Pharo struggles to match Python for the same three reasons, but *especially* because its String functions are just much slower. In fact, if you implement the algorithm I described above naïvely in Pharo, the execution time is about 780ms--nearly ten times slower. After some reflection, I chopped the time to 270ms by converting all the band names to integers as the first step of the process (ArtistLister>>generateMap: in the profile trace below). Not only does this avoid WideString>>hash and friends; it also allows me to bring in IdentityDictionary and IdentitySet in most locations due to Pharo's object format. > > But there's a lot left. I'm attaching the profile output below. Notice that Pharo spends more time reading and processing the file (about 120ms) than the Python program spends executing total. > > --Benjamin > > > - 279 tallies, 279 msec. > > **Tree** > -------------------------------- > Process: (40s) Morphic UI Process: nil > -------------------------------- > 69.2% {193ms} ArtistLister>>groupsIn: > |18.6% {52ms} FileReference(AbstractFileReference)>>contents > | |17.6% {49ms} MultiByteFileStream(FileStream)>>contents > | | 17.6% {49ms} MultiByteFileStream>>next: > | | 5.0% {14ms} ByteString>>at:put: > | | |5.0% {14ms} WideString class>>from: > | | 5.0% {14ms} WideString>>at:put: > | | 5.0% {14ms} primitives > | | 1.4% {4ms} MultiByteFileStream>>next > | | 1.4% {4ms} UTF8TextConverter>>nextFromStream: > |14.7% {41ms} WideString(String)>>lines > | |14.7% {41ms} WideString(String)>>linesDo: > | | 10.0% {28ms} WideString>>copyFrom:to: > | | |5.4% {15ms} WideString(String)>>isOctetString > | | | |3.2% {9ms} WideString>>at: > | | | | |2.2% {6ms} Character class>>value: > | | | |2.2% {6ms} primitives > | | |4.7% {13ms} WideString(String)>>asOctetString > | | | 3.6% {10ms} primitives > | | 4.7% {13ms} WideString(String)>>lineIndicesDo: > | | 4.7% {13ms} WideString(String)>>indexOf:startingAt: > | | 4.7% {13ms} WideString class(String class)>>indexOfAscii:inString:startingAt: > |10.8% {30ms} ByteString(Object)>>split: > | |10.0% {28ms} ByteString(Object)>>split:do: > | | 5.7% {16ms} WideString>>copyFrom:to: > | | |3.2% {9ms} WideString(String)>>asOctetString > | | |2.5% {7ms} WideString(String)>>isOctetString > | | | 1.8% {5ms} primitives > | | 4.3% {12ms} ByteString(SequenceableCollection)>>split:indicesDo: > | | 2.9% {8ms} WideString(SequenceableCollection)>>indexOfSubCollection:startingAt: > | | |2.9% {8ms} WideString(String)>>indexOfSubCollection:startingAt:ifAbsent: > | | | 2.9% {8ms} WideString(String)>>findString:startingAt: > | | | 2.9% {8ms} WideString(String)>>findString:startingAt:caseSensitive: > | | | 2.9% {8ms} WideString(String)>>findSubstring:in:startingAt:matchTable: > | | 1.4% {4ms} primitives > |7.2% {20ms} Set(Collection)>>addAll: > | |5.0% {14ms} Set>>add: > | | |3.6% {10ms} Set>>scanFor: > | | | |3.6% {10ms} ByteString(String)>>= > | | |1.4% {4ms} Set(HashedCollection)>>atNewIndex:put: > | | | 1.4% {4ms} Set(HashedCollection)>>fullCheck > | | | 1.4% {4ms} Set>>grow > | |1.4% {4ms} OrderedCollection>>do: > |5.4% {15ms} ArtistLister>>generateMap: > | |5.4% {15ms} IdentityDictionary(Dictionary)>>at:put: > | | 3.9% {11ms} Dictionary(HashedCollection)>>atNewIndex:put: > | | |3.2% {9ms} Dictionary(HashedCollection)>>fullCheck > | | | 2.5% {7ms} Dictionary(HashedCollection)>>grow > | | | 1.4% {4ms} Dictionary>>noCheckAdd: > | | | 1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil: > | | 1.4% {4ms} IdentityDictionary(HashedCollection)>>findElementOrNil: > |4.3% {12ms} OrderedCollection>>collect: > | |4.3% {12ms} OrderedCollection>>addLast: > |4.3% {12ms} Dictionary>>at: > | |4.3% {12ms} Dictionary>>at:ifAbsent: > | | 2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil: > | | |2.2% {6ms} Dictionary>>scanFor: > | | | 1.4% {4ms} ByteString(String)>>= > | | 2.2% {6ms} primitives > |3.9% {11ms} ByteString(String)>>trimmed > | 3.9% {11ms} ByteString(String)>>trimBoth > | 3.2% {9ms} ByteString(String)>>trimBoth: > | 2.2% {6ms} primitives > 19.4% {54ms} ArtistLister>>popularPairsIn:onlyKeeping: > |15.8% {44ms} OrderedCollection(Collection)>>asBag > | |15.8% {44ms} Bag class(Collection class)>>withAll: > | | 15.8% {44ms} Bag(Collection)>>addAll: > | | 13.6% {38ms} Bag>>add: > | | |13.6% {38ms} Bag>>add:withOccurrences: > | | | 10.0% {28ms} Dictionary>>at:put: > | | | |8.6% {24ms} Dictionary(HashedCollection)>>findElementOrNil: > | | | | |7.9% {22ms} Dictionary>>scanFor: > | | | | | 4.7% {13ms} Array(SequenceableCollection)>>hash > | | | | | |3.2% {9ms} primitives > | | | | | 2.5% {7ms} Array(SequenceableCollection)>>= > | | | | | 2.5% {7ms} Array(SequenceableCollection)>>hasEqualElements: > | | | | | 1.4% {4ms} primitives > | | | |1.4% {4ms} Dictionary(HashedCollection)>>atNewIndex:put: > | | | 3.6% {10ms} Dictionary>>at:ifAbsent: > | | | 2.2% {6ms} Dictionary(HashedCollection)>>findElementOrNil: > | | | |2.2% {6ms} Dictionary>>scanFor: > | | | | 1.4% {4ms} Array(SequenceableCollection)>>= > | | | | 1.4% {4ms} Array(SequenceableCollection)>>hasEqualElements: > | | | 1.4% {4ms} primitives > | | 2.2% {6ms} OrderedCollection>>do: > |2.5% {7ms} ArtistLister>>uniquePairs:do: > 4.3% {12ms} ArtistLister>>bandsAboveThresholdIn: > |4.3% {12ms} Bag(Collection)>>addAll: > | 4.3% {12ms} Bag>>add: > | 4.3% {12ms} Bag>>add:withOccurrences: > | 2.5% {7ms} Dictionary>>at:put: > | 1.8% {5ms} Dictionary>>at:ifAbsent: > 3.2% {9ms} Set>>includes: > 2.5% {7ms} Set(HashedCollection)>>findElementOrNil: > 2.5% {7ms} Set>>scanFor: > > **Leaves** > 6.8% {19ms} WideString(String)>>asOctetString > 6.5% {18ms} ByteString(String)>>= > 5.4% {15ms} Dictionary>>at:ifAbsent: > 5.0% {14ms} WideString>>at:put: > 5.0% {14ms} WideString class>>from: > 5.0% {14ms} MultiByteFileStream>>next: > 5.0% {14ms} WideString(String)>>isOctetString > 4.7% {13ms} WideString class(String class)>>indexOfAscii:inString:startingAt: > 4.3% {12ms} OrderedCollection>>addLast: > 3.9% {11ms} WideString>>at: > 3.6% {10ms} Array(SequenceableCollection)>>do: > 3.6% {10ms} OrderedCollection>>do: > 3.2% {9ms} Set>>scanFor: > 3.2% {9ms} Dictionary(HashedCollection)>>atNewIndex:put: > 3.2% {9ms} Array(SequenceableCollection)>>hash > 2.5% {7ms} ArtistLister>>uniquePairs:do: > 2.2% {6ms} Dictionary>>scanFor: > 2.2% {6ms} Character class>>value: > 2.2% {6ms} ByteString(String)>>trimBoth: > 2.2% {6ms} Array(SequenceableCollection)>>hasEqualElements: > 1.8% {5ms} Array class(Behavior)>>inheritsFrom: > 1.4% {4ms} ByteString(SequenceableCollection)>>split:indicesDo: > 1.4% {4ms} SmallInteger>>hashMultiply > 1.4% {4ms} Dictionary(HashedCollection)>>findElementOrNil: > > **Memory** > old +5,124,784 bytes > young -340,984 bytes > used +4,783,800 bytes > free +295,464 bytes > > **GCs** > full 0 totalling 0ms (0.0% uptime) > incr 22 totalling 22ms (8.0% uptime), avg 1.0ms > tenures 15 (avg 1 GCs/tenure) > root table 0 overflows > |
On Sun, 05 Oct 2014 10:36:31 -0400, Sven Van Caekenberghe <[hidden email]>
wrote: > How come you got WideStrings ? > What does the input look like, can you give a partial example ? I'm guessing I got WideStrings because the file is indeed in UTF-8, with lots of characters outside the lower 128 code points. A sample couple of lines might look like 光田康典,The National,Sarah McLachlan,周杰倫,Indochine,Rise Against,City and Colour,Cæcilie Norby,El Cumbanchero,Death Letter The Beatles,The Who,Barenaked Ladies,The Doors,Bob Dylan These are two play lists, one per line; each comma-delimited element is a band on that play list. The full line for reading and tokenization is just "pathToFile asFileReference contents lines collect: [ :line | (',' split: line) collect: [ :ea | ea trimmed ]]." Based on the profile indicating that a lot of time is lost on things like WideString>>copyFrom:to:, I wasn't optimistic about trying to stream the contents instead of just calling "contents lines", but I admit I didn't try. --Benjamin |
Hi Benjamin,
On 05 Oct 2014, at 22:37, Benjamin Pollack <[hidden email]> wrote: > On Sun, 05 Oct 2014 10:36:31 -0400, Sven Van Caekenberghe <[hidden email]> wrote: > >> How come you got WideStrings ? >> What does the input look like, can you give a partial example ? > > I'm guessing I got WideStrings because the file is indeed in UTF-8, with lots of characters outside the lower 128 code points. A sample couple of lines might look like > > 光田康典,The National,Sarah McLachlan,周杰倫,Indochine,Rise Against,City and Colour,Cæcilie Norby,El Cumbanchero,Death Letter > The Beatles,The Who,Barenaked Ladies,The Doors,Bob Dylan > > These are two play lists, one per line; each comma-delimited element is a band on that play list. > > The full line for reading and tokenization is just "pathToFile asFileReference contents lines collect: [ :line | (',' split: line) collect: [ :ea | ea trimmed ]]." Based on the profile indicating that a lot of time is lost on things like WideString>>copyFrom:to:, I wasn't optimistic about trying to stream the contents instead of just calling "contents lines", but I admit I didn't try. > > --Benjamin Working with WideStrings is way slower than working with ByteStrings, there is just no way around that. What is especially slow is the automagic switch from ByteString to WideString, for example in a String>>#streamContents: because a #becomeForward: is involved. If that happens for every line or every token, that would be crazy. Apart from that, the tokenisation is not very efficient, #lines is a copy of your whole contents, so is the #split: and #trimmed. The algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye for performance might yield better results. But I guess this is not really an exercise in optimisation. If it is, you should give us the dataset and code (and maybe runnable python code as reference), with some comments. Sven |
On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <[hidden email]>
wrote: > Working with WideStrings is way slower than working with ByteStrings, > there is just no way around that. What is especially slow is the > automagic switch from ByteString to WideString, for example in a > String>>#streamContents: because a #becomeForward: is involved. If that > happens for every line or every token, that would be crazy. > > Apart from that, the tokenisation is not very efficient, #lines is a > copy of your whole contents, so is the #split: and #trimmed. The > algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye > for performance might yield better results. > > But I guess this is not really an exercise in optimisation. If it is, > you should give us the dataset and code (and maybe runnable python code > as reference), with some comments. Meh, optimizing the languages maybe, but not the code. The Python code I'm comparing against (and the "sane" C++ code I keep around, for that matter) makes these exact same decisions in the interest of readability. Indeed, the equivalent from my Python code for that line is with open(path) as f: lines = [[word.strip() for word in line.split(',')] for line in f] This does avoid loading the whole into RAM just to split it, but is otherwise identical (i.e., strip() allocates an extra string, split() allocates lots of extra strings, and the list comprehension is analogous to the #collect: call, etc.), so it seems a fair comparison to me. If you're curious, the maximum speed I've been able to get out of a carefully crafted, but not unreadable, C that simply passes around a bunch of (start, stop) tuples, rather than strings, comes down to about 40ms. I know I can get better than that, but I lost interest in rewriting all the string routines. I figure another 10ms-20ms could come off, at which point it'd finally be as much I/O bound as anything else. --Benjamin |
In reply to this post by Sven Van Caekenberghe-2
On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <[hidden email]>
wrote: [snip] > Apart from that, the tokenisation is not very efficient, #lines is a > copy of your whole contents, so is the #split: and #trimmed. The > algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye > for performance might yield better results. So I was reflecting on this more. If String and WideString were immutable, then it'd be easy to avoid all of these copies; you could instead pass around very tiny objects that had only three members (a String, a start position, a stop position), and avoid copying very much data. It's that String and WideString are mutable that preclude that. For fun, since I know I won't mutate the stringsin this example, I actually did a quick spike where I replaced #copyFrom:to: with a new method I introduced called #viewFrom:to: that returned a StringView. I'll post the code when I have a chance to clean it up if there's interest, but it looks like it pretty handedly chops off 120-150ms from that runtime (i.e., double the speed). Has there been any thought to introducing some immutable collections? Or maybe I'm just missing them? They'd be useful not just for String and WideString, but really for basically any of the collection types. The implementation in most cases would be as simple as overriding #at:put: and friends to throw "self shouldNotImplement", and then providing methods/classes like the one I introduced to allow taking advantage of the newfound immutability. If there's interest, I'd be happy to submit a Slice we could use as a concrete RFC of what this could look like. --Benjamin |
On 10 Oct 2014, at 03:40, Benjamin Pollack <[hidden email]> wrote: > On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe <[hidden email]> wrote: > [snip] >> Apart from that, the tokenisation is not very efficient, #lines is a copy of your whole contents, so is the #split: and #trimmed. The algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye for performance might yield better results. > > So I was reflecting on this more. If String and WideString were immutable, then it'd be easy to avoid all of these copies; you could instead pass around very tiny objects that had only three members (a String, a start position, a stop position), and avoid copying very much data. It's that String and WideString are mutable that preclude that. For fun, since I know I won't mutate the stringsin this example, I actually did a quick spike where I replaced #copyFrom:to: with a new method I introduced called #viewFrom:to: that returned a StringView. I'll post the code when I have a chance to clean it up if there's interest, but it looks like it pretty handedly chops off 120-150ms from that runtime (i.e., double the speed). > > Has there been any thought to introducing some immutable collections? Or maybe I'm just missing them? They'd be useful not just for String and WideString, but really for basically any of the collection types. The implementation in most cases would be as simple as overriding #at:put: and friends to throw "self shouldNotImplement", and then providing methods/classes like the one I introduced to allow taking advantage of the newfound immutability. > > If there's interest, I'd be happy to submit a Slice we could use as a concrete RFC of what this could look like. > > --Benjamin I think it is interesting that you get real measurable improvements with user defined string views. I have always felt that a problem for going in that direction is that most primitives (which are important to get good basic string performance) are not flexible enough to be used really efficiently. More concretely, they should take start/stop indexes on all string arguments. For example, ByteString class>>#compare:with:collated: or #stringHash:initialHash: - it should be possible to do these operations on substrings *without creating the substrings*. Sven |
In reply to this post by Benjamin Pollack-2
Interesting.
When working on AstroCloud (http://astrocloudy.wordpress.com) we find out that Bitmap and FormCanvas were way too slow. Performing a global modification (e.g., changing a red for a blue) took long long time for very large images (2000 x 2000 pixels). We did some intensive profiling and aggressive optimizations until the points we could not optimize anymore. We then tried to avoid manipulating objects in some very specific part of the code. For example, instead of having instance of the class Color, we have an array (unique and preallocated) that contains the red, green, and blue values. Having these “primitives” values was convenient to call either a C function or very low level functions with native boost. We got a ~ x 30 speed up. I do not know much about wide strings, but I suspect you can do something like that. So, my advice is: identify the method that cost you a lot, and write it natively. This has worked for us, in a non-trivial setting. Let us know Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. On Oct 3, 2014, at 5:07 PM, Benjamin Pollack <[hidden email]> wrote: > My apologies if this is already spelled out somewhere and I simply can't > find it, but are there any Spur images for Pharo yet? I can only find > Squeak ones. It's not a big deal either way; I was just playing with an > algorithm that, after very heavy optimization, I was able to get down to > about 278ms per pass (from ~700ms initially from a naive > implementation). For contrast, the equivalent Python runs in ~80ms. > Looking at MessageTally, it seems at least half of that 278ms is spent > noodling around in WideStrings and other small things that the Spur > object format ought to help with, so it'd be interesting to see how much > of a speed boost that gives without any further work. > |
Free forum by Nabble | Edit this page |