PositionableStream>>nextWord slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

PositionableStream>>nextWord slow

hding
I was trying to slurp up a file of about 30000 words or so using
PositionableStream>>nextWord on a ReadWriteStream on the file and it
was quite unexpectedly slow (Dolphin 5.0 PL3).  My guess is that the
problem is that it uses the contents method, which may wind up doing a
lot of copying.

For my purpose I was able to use something like the following modified
method, though I don't know that it's a general solution to the
problem.

nextWord2

        "Answer the next word in the receiver's element stream,
        delimited by elements which answer true to #isSeparator.
        Answer nil if there are no more words in the receiver."

        | word |
        self skipSeparators ifFalse: [^nil].
        word := WriteStream on: String new.
        [self peek ifNil: [^word contents]
                   ifNotNil: [:next | next isSeparator not]]
                whileTrue: [word nextPut: self next].
        ^word contents



--
Howard Ding
<[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: PositionableStream>>nextWord slow

Blair McGlashan-2
Howard

You wrote in message news:[hidden email]...
> I was trying to slurp up a file of about 30000 words or so using
> PositionableStream>>nextWord on a ReadWriteStream on the file and it
> was quite unexpectedly slow (Dolphin 5.0 PL3).  My guess is that the
> problem is that it uses the contents method, which may wind up doing a
> lot of copying.
> ...

You're right, this methods implementation is slow for large streams.
#nextWord should also be pushed up into SequencedStream so that it is
available for StdioFileStreams too. Recorded as #1241, patch below.

Thanks

Blair
-----------------------

!SequencedStream methodsFor!

nextWord
 "Answer the next 'word' in the receiver's element stream, where a word is
defined as
 a sequence of one or more elements delimited by an elements which answer
true to
 #isSeparator. Leading separators are skipped. Answer nil if there are no
more words
 in the receiver."

 | wordStream element |
 self skipSeparators ifFalse: [^nil].
 wordStream := self contentsSpecies writeStream: 32.
 [self atEnd] whileFalse:
   [element := self next.
   element isSeparator ifTrue: [^wordStream contents].
   wordStream nextPut: element].
 ^wordStream contents! !
!SequencedStream categoriesFor: #nextWord!accessing!public! !

PositionableStream removeSelector: #nextWord ifAbsent: []!