status of split join in pharo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

status of split join in pharo

Oscar Nierstrasz

Hi Folks,

What's the status of split and join?

http://bugs.squeak.org/view.php?id=4874

I need them for a project and since they are not there yet, I have  
gone back to RubyShards.

http://squeaksource.com/RubyShards/

Keith, do you think this can be integrated into Pharo already, or does  
it need some work still?

(I think our implementation is not particularly efficient, but it  
should be serviceable.)

Cheers,
- on


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse
oscar

it would be nice to see if the implementation make also sense in  
presence of other collection.
I do not have the time to check but if a group of people can check and  
report this will
help us to take a decision.
I would like to have split and join in general because this is a cool  
abstraction.
Stef

On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote:

>
> Hi Folks,
>
> What's the status of split and join?
>
> http://bugs.squeak.org/view.php?id=4874
>
> I need them for a project and since they are not there yet, I have
> gone back to RubyShards.
>
> http://squeaksource.com/RubyShards/
>
> Keith, do you think this can be integrated into Pharo already, or does
> it need some work still?
>
> (I think our implementation is not particularly efficient, but it
> should be serviceable.)
>
> Cheers,
> - on
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse
In reply to this post by Oscar Nierstrasz
I checked the code and it looks nice

Now I found

split: regexString
        self assert: regexString size > 0 description: 'Cannot split on empty  
regex'.
        ^regexString asRegex split: self

and assert: expects a block.
for the test

        testSplit
self assert: (eg split: 'the') size = 4. self assert: (eg split: 't\w
+e') size = 5. self assert: (eg split: 'hello') size = 1.
        it would be nice to have also an assert showing the result and not  
just its size


then I have a question what splitBy: returns?
does it return a collection that is the same kind of the receiver?
Why do you hard code an OrderedCollection in splitBySubCollection:  
aSplitter.
  self species would do it too?
Syef


On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote:

>
> Hi Folks,
>
> What's the status of split and join?
>
> http://bugs.squeak.org/view.php?id=4874
>
> I need them for a project and since they are not there yet, I have
> gone back to RubyShards.
>
> http://squeaksource.com/RubyShards/
>
> Keith, do you think this can be integrated into Pharo already, or does
> it need some work still?
>
> (I think our implementation is not particularly efficient, but it
> should be serviceable.)
>
> Cheers,
> - on
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz
In reply to this post by Stéphane Ducasse

OK.  split: assumes regexes with cannot work for collections, but this  
can be generalized, as you suggest.

Will try to have a look at this.

- on

On Apr 5, 2009, at 10:26, Stéphane Ducasse wrote:

> oscar
>
> it would be nice to see if the implementation make also sense in
> presence of other collection.
> I do not have the time to check but if a group of people can check and
> report this will
> help us to take a decision.
> I would like to have split and join in general because this is a cool
> abstraction.
> Stef
>
> On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote:
>
>>
>> Hi Folks,
>>
>> What's the status of split and join?
>>
>> http://bugs.squeak.org/view.php?id=4874
>>
>> I need them for a project and since they are not there yet, I have
>> gone back to RubyShards.
>>
>> http://squeaksource.com/RubyShards/
>>
>> Keith, do you think this can be integrated into Pharo already, or  
>> does
>> it need some work still?
>>
>> (I think our implementation is not particularly efficient, but it
>> should be serviceable.)
>>
>> Cheers,
>> - on
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz
In reply to this post by Stéphane Ducasse

Hi Stef,

Object>>assert: also works with Boolean args, just like  
TestCase>>assert:

Does it make any difference?

I cannot find the testSplit method you mention.  I am looking at  
RubyShards-on.6.mcz

Am I looking in the wrong place?  I think Keith has written some more  
tests (according to Mantis).  Are they part of Keith's tests?

I think Damien Pollet (cdlm) write splitBy:

I suggest we rename the package SplitJoin.

But I want to know who is charge first -- is it Keith, or Damien, or  
Stef (or me)?  I do not want to step on anyone's toes.

Cheers,
- on

On Apr 5, 2009, at 10:33, Stéphane Ducasse wrote:

> I checked the code and it looks nice
>
> Now I found
>
> split: regexString
> self assert: regexString size > 0 description: 'Cannot split on empty
> regex'.
> ^regexString asRegex split: self
>
> and assert: expects a block.
> for the test
>
> testSplit
> self assert: (eg split: 'the') size = 4. self assert: (eg split: 't\w
> +e') size = 5. self assert: (eg split: 'hello') size = 1.
> it would be nice to have also an assert showing the result and not
> just its size
>
>
> then I have a question what splitBy: returns?
> does it return a collection that is the same kind of the receiver?
> Why do you hard code an OrderedCollection in splitBySubCollection:
> aSplitter.
>  self species would do it too?
> Syef
>
>
> On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote:
>
>>
>> Hi Folks,
>>
>> What's the status of split and join?
>>
>> http://bugs.squeak.org/view.php?id=4874
>>
>> I need them for a project and since they are not there yet, I have
>> gone back to RubyShards.
>>
>> http://squeaksource.com/RubyShards/
>>
>> Keith, do you think this can be integrated into Pharo already, or  
>> does
>> it need some work still?
>>
>> (I think our implementation is not particularly efficient, but it
>> should be serviceable.)
>>
>> Cheers,
>> - on
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

keith1y
Oscar Nierstrasz wrote:

> Hi Stef,
>
> Object>>assert: also works with Boolean args, just like  
> TestCase>>assert:
>
> Does it make any difference?
>
> I cannot find the testSplit method you mention.  I am looking at  
> RubyShards-on.6.mcz
>
> Am I looking in the wrong place?  I think Keith has written some more  
> tests (according to Mantis).  Are they part of Keith's tests?
>
> I think Damien Pollet (cdlm) write splitBy:
>
> I suggest we rename the package SplitJoin.
>
> But I want to know who is charge first -- is it Keith, or Damien, or  
> Stef (or me)?  I do not want to step on anyone's toes.
>
> Cheers,
> - on
>  
I wrote the split join implementation that is available on mantis

http://bugs.squeak.org/view.php?id=4874

I use it all the time, if you would like to improve on what is there, please continue to contribute to the mantis page discussion/tests and code. That way we will get an polished implementation that can be added to squeak or to pharo.

The suggestion to use #species would be fine (I never use species myself, because I dont understand what its really for).

When stef says "I have checked the code and it looks nice" he didnt say which code he checked, so I am confused.

Keith




_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse
>>
> I wrote the split join implementation that is available on mantis
>
> http://bugs.squeak.org/view.php?id=4874
>
> I use it all the time, if you would like to improve on what is  
> there, please continue to contribute to the mantis page discussion/
> tests and code. That way we will get an polished implementation that  
> can be added to squeak or to pharo.
>
> The suggestion to use #species would be fine (I never use species  
> myself, because I dont understand what its really for).

or class
the point is that you get back a collection of the same kind of the  
receiver
>
>
> When stef says "I have checked the code and it looks nice" he didnt  
> say which code he checked, so I am confused.

I looked at the latest version in the repository mentioned by oscar  
rubyshards


>
>
> Keith
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

keith1y
Stéphane Ducasse wrote:

>> I wrote the split join implementation that is available on mantis
>>
>> http://bugs.squeak.org/view.php?id=4874
>>
>> I use it all the time, if you would like to improve on what is  
>> there, please continue to contribute to the mantis page discussion/
>> tests and code. That way we will get an polished implementation that  
>> can be added to squeak or to pharo.
>>
>> The suggestion to use #species would be fine (I never use species  
>> myself, because I dont understand what its really for).
>>    
>
> or class
> the point is that you get back a collection of the same kind of the  
> receiver
>  
>> When stef says "I have checked the code and it looks nice" he didnt  
>> say which code he checked, so I am confused.
>>    
>
> I looked at the latest version in the repository mentioned by oscar  
> rubyshards
>
>  
Which appears to me to be the opposite of what Oscar suggested. If I
read the email, he asked what the status of mantis 4874 was,
anticipating that it be integrated. He had "gone back" to ruby shards in
the absense of the integration of 4784.

Keith


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse
>>
> Which appears to me to be the opposite of what Oscar suggested.

I do not know
he asked about rbyshard and I looked at them. Period.

> If I
> read the email, he asked what the status of mantis 4874 was,
> anticipating that it be integrated. He had "gone back" to ruby  
> shards in
> the absense of the integration of 4784.
>
> Keith
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz
In reply to this post by keith1y

Hi Keith,

Now I see there are attached files in Mantis.  But they all seem to  
date from 2006, whereas your latest comments are  from Jan 2009.  Are  
there more recent files from 2009 that I should look at?  If so, where  
are they?

What is the best way to proceed?  Shall  I create a Join project on  
SqueakSource, and if it is updated, post the latest version on Mantis  
too?

Cheers,
- on

On Apr 5, 2009, at 16:08, Keith Hodges wrote:

> Stéphane Ducasse wrote:
>>> I wrote the split join implementation that is available on mantis
>>>
>>> http://bugs.squeak.org/view.php?id=4874
>>>
>>> I use it all the time, if you would like to improve on what is
>>> there, please continue to contribute to the mantis page discussion/
>>> tests and code. That way we will get an polished implementation that
>>> can be added to squeak or to pharo.
>>>
>>> The suggestion to use #species would be fine (I never use species
>>> myself, because I dont understand what its really for).
>>>
>>
>> or class
>> the point is that you get back a collection of the same kind of the
>> receiver
>>
>>> When stef says "I have checked the code and it looks nice" he didnt
>>> say which code he checked, so I am confused.
>>>
>>
>> I looked at the latest version in the repository mentioned by oscar
>> rubyshards
>>
>>
> Which appears to me to be the opposite of what Oscar suggested. If I
> read the email, he asked what the status of mantis 4874 was,
> anticipating that it be integrated. He had "gone back" to ruby  
> shards in
> the absense of the integration of 4784.
>
> Keith
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz

OK, I had a closer look.

Keith's implementation is completely different from, and pre-dates,  
that of Damien and myself.

Keith's version works for SequenceableCollections, and uses a sequence  
to split a sequence.

Ours is more tailored towards Strings, and uses a regex to split a  
String.

Perhaps we can consider a merge in which sequences can be split using  
sequences, and Strings can additionally be split using regexes.

We should also take efficiency into account.  I did not run any  
benchmarks yet to compare the implementations

Who is interested in merging these two?

Cheers,
- on

On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:

>
> Hi Keith,
>
> Now I see there are attached files in Mantis.  But they all seem to
> date from 2006, whereas your latest comments are  from Jan 2009.  Are
> there more recent files from 2009 that I should look at?  If so, where
> are they?
>
> What is the best way to proceed?  Shall  I create a Join project on
> SqueakSource, and if it is updated, post the latest version on Mantis
> too?
>
> Cheers,
> - on
>
> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>
>> Stéphane Ducasse wrote:
>>>> I wrote the split join implementation that is available on mantis
>>>>
>>>> http://bugs.squeak.org/view.php?id=4874
>>>>
>>>> I use it all the time, if you would like to improve on what is
>>>> there, please continue to contribute to the mantis page discussion/
>>>> tests and code. That way we will get an polished implementation  
>>>> that
>>>> can be added to squeak or to pharo.
>>>>
>>>> The suggestion to use #species would be fine (I never use species
>>>> myself, because I dont understand what its really for).
>>>>
>>>
>>> or class
>>> the point is that you get back a collection of the same kind of the
>>> receiver
>>>
>>>> When stef says "I have checked the code and it looks nice" he didnt
>>>> say which code he checked, so I am confused.
>>>>
>>>
>>> I looked at the latest version in the repository mentioned by oscar
>>> rubyshards
>>>
>>>
>> Which appears to me to be the opposite of what Oscar suggested. If I
>> read the email, he asked what the status of mantis 4874 was,
>> anticipating that it be integrated. He had "gone back" to ruby
>> shards in
>> the absense of the integration of 4784.
>>
>> Keith
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse
I would be in favor to have a nice oo solution :)
I do not know what means "uses a sequence  to split a sequence."

Stef

> OK, I had a closer look.
>
> Keith's implementation is completely different from, and pre-dates,
> that of Damien and myself.
>
> Keith's version works for SequenceableCollections, and uses a sequence
> to split a sequence.
>
> Ours is more tailored towards Strings, and uses a regex to split a
> String.
>
> Perhaps we can consider a merge in which sequences can be split using
> sequences, and Strings can additionally be split using regexes.
>
> We should also take efficiency into account.  I did not run any
> benchmarks yet to compare the implementations
>
> Who is interested in merging these two?
>
> Cheers,
> - on
>
> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:
>
>>
>> Hi Keith,
>>
>> Now I see there are attached files in Mantis.  But they all seem to
>> date from 2006, whereas your latest comments are  from Jan 2009.  Are
>> there more recent files from 2009 that I should look at?  If so,  
>> where
>> are they?
>>
>> What is the best way to proceed?  Shall  I create a Join project on
>> SqueakSource, and if it is updated, post the latest version on Mantis
>> too?
>>
>> Cheers,
>> - on
>>
>> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>>
>>> Stéphane Ducasse wrote:
>>>>> I wrote the split join implementation that is available on mantis
>>>>>
>>>>> http://bugs.squeak.org/view.php?id=4874
>>>>>
>>>>> I use it all the time, if you would like to improve on what is
>>>>> there, please continue to contribute to the mantis page  
>>>>> discussion/
>>>>> tests and code. That way we will get an polished implementation
>>>>> that
>>>>> can be added to squeak or to pharo.
>>>>>
>>>>> The suggestion to use #species would be fine (I never use species
>>>>> myself, because I dont understand what its really for).
>>>>>
>>>>
>>>> or class
>>>> the point is that you get back a collection of the same kind of the
>>>> receiver
>>>>
>>>>> When stef says "I have checked the code and it looks nice" he  
>>>>> didnt
>>>>> say which code he checked, so I am confused.
>>>>>
>>>>
>>>> I looked at the latest version in the repository mentioned by oscar
>>>> rubyshards
>>>>
>>>>
>>> Which appears to me to be the opposite of what Oscar suggested. If I
>>> read the email, he asked what the status of mantis 4874 was,
>>> anticipating that it be integrated. He had "gone back" to ruby
>>> shards in
>>> the absense of the integration of 4784.
>>>
>>> Keith
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz

With Keith's version you can do this:

#(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11)

I was assuming that the thing we use to split was a regex string.

'hello there' split: '\s'

Actually I see that Damien added this possibility in RubyShards as  
well.  This also works:

#(1 10 11 2 10 11 3 10 11 4) split: #(10 11)

It seems that RubyShards is more general, but we need to take a closer  
look at both solutions.  The interfaces are not the same.  There may  
be differences in performance.

- on


On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote:

> I would be in favor to have a nice oo solution :)
> I do not know what means "uses a sequence  to split a sequence."
>
> Stef
>
>> OK, I had a closer look.
>>
>> Keith's implementation is completely different from, and pre-dates,
>> that of Damien and myself.
>>
>> Keith's version works for SequenceableCollections, and uses a  
>> sequence
>> to split a sequence.
>>
>> Ours is more tailored towards Strings, and uses a regex to split a
>> String.
>>
>> Perhaps we can consider a merge in which sequences can be split using
>> sequences, and Strings can additionally be split using regexes.
>>
>> We should also take efficiency into account.  I did not run any
>> benchmarks yet to compare the implementations
>>
>> Who is interested in merging these two?
>>
>> Cheers,
>> - on
>>
>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:
>>
>>>
>>> Hi Keith,
>>>
>>> Now I see there are attached files in Mantis.  But they all seem to
>>> date from 2006, whereas your latest comments are  from Jan 2009.  
>>> Are
>>> there more recent files from 2009 that I should look at?  If so,
>>> where
>>> are they?
>>>
>>> What is the best way to proceed?  Shall  I create a Join project on
>>> SqueakSource, and if it is updated, post the latest version on  
>>> Mantis
>>> too?
>>>
>>> Cheers,
>>> - on
>>>
>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>>>
>>>> Stéphane Ducasse wrote:
>>>>>> I wrote the split join implementation that is available on mantis
>>>>>>
>>>>>> http://bugs.squeak.org/view.php?id=4874
>>>>>>
>>>>>> I use it all the time, if you would like to improve on what is
>>>>>> there, please continue to contribute to the mantis page
>>>>>> discussion/
>>>>>> tests and code. That way we will get an polished implementation
>>>>>> that
>>>>>> can be added to squeak or to pharo.
>>>>>>
>>>>>> The suggestion to use #species would be fine (I never use species
>>>>>> myself, because I dont understand what its really for).
>>>>>>
>>>>>
>>>>> or class
>>>>> the point is that you get back a collection of the same kind of  
>>>>> the
>>>>> receiver
>>>>>
>>>>>> When stef says "I have checked the code and it looks nice" he
>>>>>> didnt
>>>>>> say which code he checked, so I am confused.
>>>>>>
>>>>>
>>>>> I looked at the latest version in the repository mentioned by  
>>>>> oscar
>>>>> rubyshards
>>>>>
>>>>>
>>>> Which appears to me to be the opposite of what Oscar suggested.  
>>>> If I
>>>> read the email, he asked what the status of mantis 4874 was,
>>>> anticipating that it be integrated. He had "gone back" to ruby
>>>> shards in
>>>> the absense of the integration of 4784.
>>>>
>>>> Keith
>>>>
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz

About performance:

I just did a quick experiment in the pier migration application where  
I need split and join.

I use split and join to remove comments from HTMl files.  I ran the  
tests without removing comments, and removing them using the two  
different split/join implementations.

Keith's sequence splitter is blindingly fast, imposing no discernable  
overhead, whereas my regex version slows all the tests down by 100%!

I would still like to have splitting on regexes, but it should  
probably not be the default for strings.  Maybe we can improve the  
implementation and speed it up ...

- on

On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote:

>
> With Keith's version you can do this:
>
> #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11)
>
> I was assuming that the thing we use to split was a regex string.
>
> 'hello there' split: '\s'
>
> Actually I see that Damien added this possibility in RubyShards as
> well.  This also works:
>
> #(1 10 11 2 10 11 3 10 11 4) split: #(10 11)
>
> It seems that RubyShards is more general, but we need to take a closer
> look at both solutions.  The interfaces are not the same.  There may
> be differences in performance.
>
> - on
>
>
> On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote:
>
>> I would be in favor to have a nice oo solution :)
>> I do not know what means "uses a sequence  to split a sequence."
>>
>> Stef
>>
>>> OK, I had a closer look.
>>>
>>> Keith's implementation is completely different from, and pre-dates,
>>> that of Damien and myself.
>>>
>>> Keith's version works for SequenceableCollections, and uses a
>>> sequence
>>> to split a sequence.
>>>
>>> Ours is more tailored towards Strings, and uses a regex to split a
>>> String.
>>>
>>> Perhaps we can consider a merge in which sequences can be split  
>>> using
>>> sequences, and Strings can additionally be split using regexes.
>>>
>>> We should also take efficiency into account.  I did not run any
>>> benchmarks yet to compare the implementations
>>>
>>> Who is interested in merging these two?
>>>
>>> Cheers,
>>> - on
>>>
>>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:
>>>
>>>>
>>>> Hi Keith,
>>>>
>>>> Now I see there are attached files in Mantis.  But they all seem to
>>>> date from 2006, whereas your latest comments are  from Jan 2009.
>>>> Are
>>>> there more recent files from 2009 that I should look at?  If so,
>>>> where
>>>> are they?
>>>>
>>>> What is the best way to proceed?  Shall  I create a Join project on
>>>> SqueakSource, and if it is updated, post the latest version on
>>>> Mantis
>>>> too?
>>>>
>>>> Cheers,
>>>> - on
>>>>
>>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>>>>
>>>>> Stéphane Ducasse wrote:
>>>>>>> I wrote the split join implementation that is available on  
>>>>>>> mantis
>>>>>>>
>>>>>>> http://bugs.squeak.org/view.php?id=4874
>>>>>>>
>>>>>>> I use it all the time, if you would like to improve on what is
>>>>>>> there, please continue to contribute to the mantis page
>>>>>>> discussion/
>>>>>>> tests and code. That way we will get an polished implementation
>>>>>>> that
>>>>>>> can be added to squeak or to pharo.
>>>>>>>
>>>>>>> The suggestion to use #species would be fine (I never use  
>>>>>>> species
>>>>>>> myself, because I dont understand what its really for).
>>>>>>>
>>>>>>
>>>>>> or class
>>>>>> the point is that you get back a collection of the same kind of
>>>>>> the
>>>>>> receiver
>>>>>>
>>>>>>> When stef says "I have checked the code and it looks nice" he
>>>>>>> didnt
>>>>>>> say which code he checked, so I am confused.
>>>>>>>
>>>>>>
>>>>>> I looked at the latest version in the repository mentioned by
>>>>>> oscar
>>>>>> rubyshards
>>>>>>
>>>>>>
>>>>> Which appears to me to be the opposite of what Oscar suggested.
>>>>> If I
>>>>> read the email, he asked what the status of mantis 4874 was,
>>>>> anticipating that it be integrated. He had "gone back" to ruby
>>>>> shards in
>>>>> the absense of the integration of 4784.
>>>>>
>>>>> Keith
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- 
>>>>> project
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz

Oops. I made a mistake in the experiment.  There is actually less  
difference than I thought.

Here we load a web site, optionally using split and join to remove all  
comments.  My regex version seems to be only marginally worse than  
Keith's sequence splitting.


5289 "ON split on regex"
5327

5165 "KH split on sequence"
5160

2153 "no splitting"
2160

So regex splitting seems to be feasible.

I can try to have a closer look and propose a merged solution, but  
right now my plate is rather full.

- on


On Apr 5, 2009, at 18:14, Oscar Nierstrasz wrote:

>
> About performance:
>
> I just did a quick experiment in the pier migration application where
> I need split and join.
>
> I use split and join to remove comments from HTMl files.  I ran the
> tests without removing comments, and removing them using the two
> different split/join implementations.
>
> Keith's sequence splitter is blindingly fast, imposing no discernable
> overhead, whereas my regex version slows all the tests down by 100%!
>
> I would still like to have splitting on regexes, but it should
> probably not be the default for strings.  Maybe we can improve the
> implementation and speed it up ...
>
> - on
>
> On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote:
>
>>
>> With Keith's version you can do this:
>>
>> #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11)
>>
>> I was assuming that the thing we use to split was a regex string.
>>
>> 'hello there' split: '\s'
>>
>> Actually I see that Damien added this possibility in RubyShards as
>> well.  This also works:
>>
>> #(1 10 11 2 10 11 3 10 11 4) split: #(10 11)
>>
>> It seems that RubyShards is more general, but we need to take a  
>> closer
>> look at both solutions.  The interfaces are not the same.  There may
>> be differences in performance.
>>
>> - on
>>
>>
>> On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote:
>>
>>> I would be in favor to have a nice oo solution :)
>>> I do not know what means "uses a sequence  to split a sequence."
>>>
>>> Stef
>>>
>>>> OK, I had a closer look.
>>>>
>>>> Keith's implementation is completely different from, and pre-dates,
>>>> that of Damien and myself.
>>>>
>>>> Keith's version works for SequenceableCollections, and uses a
>>>> sequence
>>>> to split a sequence.
>>>>
>>>> Ours is more tailored towards Strings, and uses a regex to split a
>>>> String.
>>>>
>>>> Perhaps we can consider a merge in which sequences can be split
>>>> using
>>>> sequences, and Strings can additionally be split using regexes.
>>>>
>>>> We should also take efficiency into account.  I did not run any
>>>> benchmarks yet to compare the implementations
>>>>
>>>> Who is interested in merging these two?
>>>>
>>>> Cheers,
>>>> - on
>>>>
>>>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:
>>>>
>>>>>
>>>>> Hi Keith,
>>>>>
>>>>> Now I see there are attached files in Mantis.  But they all seem  
>>>>> to
>>>>> date from 2006, whereas your latest comments are  from Jan 2009.
>>>>> Are
>>>>> there more recent files from 2009 that I should look at?  If so,
>>>>> where
>>>>> are they?
>>>>>
>>>>> What is the best way to proceed?  Shall  I create a Join project  
>>>>> on
>>>>> SqueakSource, and if it is updated, post the latest version on
>>>>> Mantis
>>>>> too?
>>>>>
>>>>> Cheers,
>>>>> - on
>>>>>
>>>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>>>>>
>>>>>> Stéphane Ducasse wrote:
>>>>>>>> I wrote the split join implementation that is available on
>>>>>>>> mantis
>>>>>>>>
>>>>>>>> http://bugs.squeak.org/view.php?id=4874
>>>>>>>>
>>>>>>>> I use it all the time, if you would like to improve on what is
>>>>>>>> there, please continue to contribute to the mantis page
>>>>>>>> discussion/
>>>>>>>> tests and code. That way we will get an polished implementation
>>>>>>>> that
>>>>>>>> can be added to squeak or to pharo.
>>>>>>>>
>>>>>>>> The suggestion to use #species would be fine (I never use
>>>>>>>> species
>>>>>>>> myself, because I dont understand what its really for).
>>>>>>>>
>>>>>>>
>>>>>>> or class
>>>>>>> the point is that you get back a collection of the same kind of
>>>>>>> the
>>>>>>> receiver
>>>>>>>
>>>>>>>> When stef says "I have checked the code and it looks nice" he
>>>>>>>> didnt
>>>>>>>> say which code he checked, so I am confused.
>>>>>>>>
>>>>>>>
>>>>>>> I looked at the latest version in the repository mentioned by
>>>>>>> oscar
>>>>>>> rubyshards
>>>>>>>
>>>>>>>
>>>>>> Which appears to me to be the opposite of what Oscar suggested.
>>>>>> If I
>>>>>> read the email, he asked what the status of mantis 4874 was,
>>>>>> anticipating that it be integrated. He had "gone back" to ruby
>>>>>> shards in
>>>>>> the absense of the integration of 4784.
>>>>>>
>>>>>> Keith
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-
>>>>>> project
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- 
>>>>> project
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

Kind regards,
Oscar Nierstrasz
---
Prof. Dr. O. Nierstrasz    -- [hidden email]
Software Composition Group -- http://www.iam.unibe.ch/~scg
University of Bern         -- Tel/Fax +41 31 631.4618/3355
vcard:  http://www.iam.unibe.ch/~oscar/oscarNierstrasz.vcf


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse
ok let us know when you have a solution ready for integration because  
this would be 8 methods well spend :)

On Apr 5, 2009, at 6:27 PM, Oscar Nierstrasz wrote:

>
> Oops. I made a mistake in the experiment.  There is actually less
> difference than I thought.
>
> Here we load a web site, optionally using split and join to remove all
> comments.  My regex version seems to be only marginally worse than
> Keith's sequence splitting.
>
>
> 5289 "ON split on regex"
> 5327
>
> 5165 "KH split on sequence"
> 5160
>
> 2153 "no splitting"
> 2160
>
> So regex splitting seems to be feasible.
>
> I can try to have a closer look and propose a merged solution, but
> right now my plate is rather full.
>
> - on
>
>
> On Apr 5, 2009, at 18:14, Oscar Nierstrasz wrote:
>
>>
>> About performance:
>>
>> I just did a quick experiment in the pier migration application where
>> I need split and join.
>>
>> I use split and join to remove comments from HTMl files.  I ran the
>> tests without removing comments, and removing them using the two
>> different split/join implementations.
>>
>> Keith's sequence splitter is blindingly fast, imposing no discernable
>> overhead, whereas my regex version slows all the tests down by 100%!
>>
>> I would still like to have splitting on regexes, but it should
>> probably not be the default for strings.  Maybe we can improve the
>> implementation and speed it up ...
>>
>> - on
>>
>> On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote:
>>
>>>
>>> With Keith's version you can do this:
>>>
>>> #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11)
>>>
>>> I was assuming that the thing we use to split was a regex string.
>>>
>>> 'hello there' split: '\s'
>>>
>>> Actually I see that Damien added this possibility in RubyShards as
>>> well.  This also works:
>>>
>>> #(1 10 11 2 10 11 3 10 11 4) split: #(10 11)
>>>
>>> It seems that RubyShards is more general, but we need to take a
>>> closer
>>> look at both solutions.  The interfaces are not the same.  There may
>>> be differences in performance.
>>>
>>> - on
>>>
>>>
>>> On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote:
>>>
>>>> I would be in favor to have a nice oo solution :)
>>>> I do not know what means "uses a sequence  to split a sequence."
>>>>
>>>> Stef
>>>>
>>>>> OK, I had a closer look.
>>>>>
>>>>> Keith's implementation is completely different from, and pre-
>>>>> dates,
>>>>> that of Damien and myself.
>>>>>
>>>>> Keith's version works for SequenceableCollections, and uses a
>>>>> sequence
>>>>> to split a sequence.
>>>>>
>>>>> Ours is more tailored towards Strings, and uses a regex to split a
>>>>> String.
>>>>>
>>>>> Perhaps we can consider a merge in which sequences can be split
>>>>> using
>>>>> sequences, and Strings can additionally be split using regexes.
>>>>>
>>>>> We should also take efficiency into account.  I did not run any
>>>>> benchmarks yet to compare the implementations
>>>>>
>>>>> Who is interested in merging these two?
>>>>>
>>>>> Cheers,
>>>>> - on
>>>>>
>>>>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:
>>>>>
>>>>>>
>>>>>> Hi Keith,
>>>>>>
>>>>>> Now I see there are attached files in Mantis.  But they all seem
>>>>>> to
>>>>>> date from 2006, whereas your latest comments are  from Jan 2009.
>>>>>> Are
>>>>>> there more recent files from 2009 that I should look at?  If so,
>>>>>> where
>>>>>> are they?
>>>>>>
>>>>>> What is the best way to proceed?  Shall  I create a Join project
>>>>>> on
>>>>>> SqueakSource, and if it is updated, post the latest version on
>>>>>> Mantis
>>>>>> too?
>>>>>>
>>>>>> Cheers,
>>>>>> - on
>>>>>>
>>>>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>>>>>>
>>>>>>> Stéphane Ducasse wrote:
>>>>>>>>> I wrote the split join implementation that is available on
>>>>>>>>> mantis
>>>>>>>>>
>>>>>>>>> http://bugs.squeak.org/view.php?id=4874
>>>>>>>>>
>>>>>>>>> I use it all the time, if you would like to improve on what is
>>>>>>>>> there, please continue to contribute to the mantis page
>>>>>>>>> discussion/
>>>>>>>>> tests and code. That way we will get an polished  
>>>>>>>>> implementation
>>>>>>>>> that
>>>>>>>>> can be added to squeak or to pharo.
>>>>>>>>>
>>>>>>>>> The suggestion to use #species would be fine (I never use
>>>>>>>>> species
>>>>>>>>> myself, because I dont understand what its really for).
>>>>>>>>>
>>>>>>>>
>>>>>>>> or class
>>>>>>>> the point is that you get back a collection of the same kind of
>>>>>>>> the
>>>>>>>> receiver
>>>>>>>>
>>>>>>>>> When stef says "I have checked the code and it looks nice" he
>>>>>>>>> didnt
>>>>>>>>> say which code he checked, so I am confused.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I looked at the latest version in the repository mentioned by
>>>>>>>> oscar
>>>>>>>> rubyshards
>>>>>>>>
>>>>>>>>
>>>>>>> Which appears to me to be the opposite of what Oscar suggested.
>>>>>>> If I
>>>>>>> read the email, he asked what the status of mantis 4874 was,
>>>>>>> anticipating that it be integrated. He had "gone back" to ruby
>>>>>>> shards in
>>>>>>> the absense of the integration of 4784.
>>>>>>>
>>>>>>> Keith
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pharo-project mailing list
>>>>>>> [hidden email]
>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-
>>>>>>> project
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [hidden email]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-
>>>>>> project
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [hidden email]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- 
>>>>> project
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> Kind regards,
> Oscar Nierstrasz
> ---
> Prof. Dr. O. Nierstrasz    -- [hidden email]
> Software Composition Group -- http://www.iam.unibe.ch/~scg
> University of Bern         -- Tel/Fax +41 31 631.4618/3355
> vcard:  http://www.iam.unibe.ch/~oscar/oscarNierstrasz.vcf
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

keith1y
In reply to this post by Oscar Nierstrasz
Oscar Nierstrasz wrote:

> Oops. I made a mistake in the experiment.  There is actually less  
> difference than I thought.
>
> Here we load a web site, optionally using split and join to remove all  
> comments.  My regex version seems to be only marginally worse than  
> Keith's sequence splitting.
>
>
> 5289 "ON split on regex"
> 5327
>
> 5165 "KH split on sequence"
> 5160
>
> 2153 "no splitting"
> 2160
>
> So regex splitting seems to be feasible.
>  
splitOn: uses double dispatch so that it is the argument that determines
who it performs the splitting. All you have to do it implement
Regex-#splitUp: aString

Keith


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Oscar Nierstrasz
In reply to this post by Stéphane Ducasse

Hi Syef,  ;-)

I am going over the two split-join implementations, merging  
functionality, generalizing and making the interfaces consistent.

About your proposal:

splitOn: should always return an OrderedCollection containing elements  
of the same type as the receiver.

'banana' splitOn: 'an' -> an OrderedCollection('b' '' 'a')

It doesn't make sense for this to return a String.

When joining, of course the original type should be returned no matter  
what kind of sequenceable collection was split.

I am preparing tests for all the cases.

- on

On Apr 5, 2009, at 10:33, Stéphane Ducasse wrote:

> then I have a question what splitBy: returns?
> does it return a collection that is the same kind of the receiver?
> Why do you hard code an OrderedCollection in splitBySubCollection:
> aSplitter.
>  self species would do it too?
> Syef


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: status of split join in pharo

Stéphane Ducasse

On Apr 11, 2009, at 10:47 AM, Oscar Nierstrasz wrote:

>
> Hi Syef,  ;-)

:)

> I am going over the two split-join implementations, merging
> functionality, generalizing and making the interfaces consistent.
>
> About your proposal:
>
> splitOn: should always return an OrderedCollection containing elements
> of the same type as the receiver.
>
> 'banana' splitOn: 'an' -> an OrderedCollection('b' '' 'a')

ok :)


> It doesn't make sense for this to return a String.
>
> When joining, of course the original type should be returned no matter
> what kind of sequenceable collection was split.
>
> I am preparing tests for all the cases.

excellent.
When I see all the methods that we should remove from morph
I'm convinced that this addition will be a good one proportionally to  
its size

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [on] status of split join in pharo

Oscar Nierstrasz
In reply to this post by Oscar Nierstrasz

I have attempted to merge the functionality of Keith's Join package  
and Damien's and my RubyShards.

The protocol is basically Keith's.

Send splitOn: to a collection to split it, or send split: to the  
splitter with the collection as argument.

The reverse messages are joinUsing: and join:.

Everything has been generalized to work not only with Arrays and  
Strings but also OrderedCollections and SortedCollections.  Not only  
sequences and blocks can be used as splitters, but also objects and  
regexes.

Documentation is in SplitJoinTest on the class side, or see the class  
comment.

See http://www.squeaksource.com/SplitJoin.html

The latest version has been pushed into the PharoInbox.

- on


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
12