SequenceableCollection>>beginsWith:

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SequenceableCollection>>beginsWith:

Chris Uppal-3
[I was plain wrong with the last of these I posted, but I'll risk another
one anyway]

I think the implementation of SequenceableCollection>>beginsWith: is a bit
wonkey.  It's based on #indexOfSubCollection:startingAt: answering 1, but
what that means is that if the receiver *doesn't* start with the prefix,
then it takes time proportional to the length of the receiver to find out.

E.g. I loaded a couple of Mbytes of a binary file into a ByteArray; checking
to see it began with one "magic number" for PDF data ('%PDF1.2' asByteArray)
took around half a second, replacing the implementation of #beginsWith: with
the obvious loop speeded the check up by 5 orders of magnitude...

BTW, I and others have sent a few posts about sub-optimal implementations of
algorithms lately.  It's nice to be able to balance it with a more positive
note; I've just had to create a specialised file scanner not unlike the
Windows find-in-files feature -- my Dolphin version is faster than the
Windows one, even though I haven't worked out how to do overlapped disk IO
yet (is this even possible -- I'm thinking of reading file data in one
thread and scanning it in another ?).

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: SequenceableCollection>>beginsWith:

Blair McGlashan
Chris

"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]...
> [I was plain wrong with the last of these I posted, but I'll risk another
> one anyway]
>
> I think the implementation of SequenceableCollection>>beginsWith: is a bit
> wonkey.  It's based on #indexOfSubCollection:startingAt: answering 1, but
> what that means is that if the receiver *doesn't* start with the prefix,
> then it takes time proportional to the length of the receiver to find out.
>...

Thanks, that is a bit daft. Defect # 251.

>
> BTW, I and others have sent a few posts about sub-optimal implementations
of
> algorithms lately.  It's nice to be able to balance it with a more
positive
> note; I've just had to create a specialised file scanner not unlike the
> Windows find-in-files feature -- my Dolphin version is faster than the
> Windows one, even though I haven't worked out how to do overlapped disk IO
> yet (is this even possible -- I'm thinking of reading file data in one
> thread and scanning it in another ?).

It is possible, but I'd be concerned that the thread synchronisation
overhead might overwhelm the gains. An approach which might be very fast
would be to use memory mapped files - that way one could treat each file as
a byte array and scan it with fast primitives such as the Boyer-Moore
algorithm behind String>>indexOfSubCollection:startingAt: (primitive 149).
Leaving the OS to load the file into memory by demand paging is both simple
and probably more efficient. Unfortunately though, while the primitive
itself works just as well for ByteArrays, it doesn't handle
<ExternalAddress>s. The underlying stringSearch() function could be exported
from the VM (if anyone is interested) and called through the FFI mechanism
to search arbitrary blocks of memory very efficiently.

Regards

Blair