Hi Folks, What's the status of split and join? http://bugs.squeak.org/view.php?id=4874 I need them for a project and since they are not there yet, I have gone back to RubyShards. http://squeaksource.com/RubyShards/ Keith, do you think this can be integrated into Pharo already, or does it need some work still? (I think our implementation is not particularly efficient, but it should be serviceable.) Cheers, - on _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
oscar
it would be nice to see if the implementation make also sense in presence of other collection. I do not have the time to check but if a group of people can check and report this will help us to take a decision. I would like to have split and join in general because this is a cool abstraction. Stef On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote: > > Hi Folks, > > What's the status of split and join? > > http://bugs.squeak.org/view.php?id=4874 > > I need them for a project and since they are not there yet, I have > gone back to RubyShards. > > http://squeaksource.com/RubyShards/ > > Keith, do you think this can be integrated into Pharo already, or does > it need some work still? > > (I think our implementation is not particularly efficient, but it > should be serviceable.) > > Cheers, > - on > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Oscar Nierstrasz
I checked the code and it looks nice
Now I found split: regexString self assert: regexString size > 0 description: 'Cannot split on empty regex'. ^regexString asRegex split: self and assert: expects a block. for the test testSplit self assert: (eg split: 'the') size = 4. self assert: (eg split: 't\w +e') size = 5. self assert: (eg split: 'hello') size = 1. it would be nice to have also an assert showing the result and not just its size then I have a question what splitBy: returns? does it return a collection that is the same kind of the receiver? Why do you hard code an OrderedCollection in splitBySubCollection: aSplitter. self species would do it too? Syef On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote: > > Hi Folks, > > What's the status of split and join? > > http://bugs.squeak.org/view.php?id=4874 > > I need them for a project and since they are not there yet, I have > gone back to RubyShards. > > http://squeaksource.com/RubyShards/ > > Keith, do you think this can be integrated into Pharo already, or does > it need some work still? > > (I think our implementation is not particularly efficient, but it > should be serviceable.) > > Cheers, > - on > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
OK. split: assumes regexes with cannot work for collections, but this can be generalized, as you suggest. Will try to have a look at this. - on On Apr 5, 2009, at 10:26, Stéphane Ducasse wrote: > oscar > > it would be nice to see if the implementation make also sense in > presence of other collection. > I do not have the time to check but if a group of people can check and > report this will > help us to take a decision. > I would like to have split and join in general because this is a cool > abstraction. > Stef > > On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote: > >> >> Hi Folks, >> >> What's the status of split and join? >> >> http://bugs.squeak.org/view.php?id=4874 >> >> I need them for a project and since they are not there yet, I have >> gone back to RubyShards. >> >> http://squeaksource.com/RubyShards/ >> >> Keith, do you think this can be integrated into Pharo already, or >> does >> it need some work still? >> >> (I think our implementation is not particularly efficient, but it >> should be serviceable.) >> >> Cheers, >> - on >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Hi Stef, Object>>assert: also works with Boolean args, just like TestCase>>assert: Does it make any difference? I cannot find the testSplit method you mention. I am looking at RubyShards-on.6.mcz Am I looking in the wrong place? I think Keith has written some more tests (according to Mantis). Are they part of Keith's tests? I think Damien Pollet (cdlm) write splitBy: I suggest we rename the package SplitJoin. But I want to know who is charge first -- is it Keith, or Damien, or Stef (or me)? I do not want to step on anyone's toes. Cheers, - on On Apr 5, 2009, at 10:33, Stéphane Ducasse wrote: > I checked the code and it looks nice > > Now I found > > split: regexString > self assert: regexString size > 0 description: 'Cannot split on empty > regex'. > ^regexString asRegex split: self > > and assert: expects a block. > for the test > > testSplit > self assert: (eg split: 'the') size = 4. self assert: (eg split: 't\w > +e') size = 5. self assert: (eg split: 'hello') size = 1. > it would be nice to have also an assert showing the result and not > just its size > > > then I have a question what splitBy: returns? > does it return a collection that is the same kind of the receiver? > Why do you hard code an OrderedCollection in splitBySubCollection: > aSplitter. > self species would do it too? > Syef > > > On Apr 5, 2009, at 9:51 AM, Oscar Nierstrasz wrote: > >> >> Hi Folks, >> >> What's the status of split and join? >> >> http://bugs.squeak.org/view.php?id=4874 >> >> I need them for a project and since they are not there yet, I have >> gone back to RubyShards. >> >> http://squeaksource.com/RubyShards/ >> >> Keith, do you think this can be integrated into Pharo already, or >> does >> it need some work still? >> >> (I think our implementation is not particularly efficient, but it >> should be serviceable.) >> >> Cheers, >> - on >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Oscar Nierstrasz wrote:
> Hi Stef, > > Object>>assert: also works with Boolean args, just like > TestCase>>assert: > > Does it make any difference? > > I cannot find the testSplit method you mention. I am looking at > RubyShards-on.6.mcz > > Am I looking in the wrong place? I think Keith has written some more > tests (according to Mantis). Are they part of Keith's tests? > > I think Damien Pollet (cdlm) write splitBy: > > I suggest we rename the package SplitJoin. > > But I want to know who is charge first -- is it Keith, or Damien, or > Stef (or me)? I do not want to step on anyone's toes. > > Cheers, > - on > http://bugs.squeak.org/view.php?id=4874 I use it all the time, if you would like to improve on what is there, please continue to contribute to the mantis page discussion/tests and code. That way we will get an polished implementation that can be added to squeak or to pharo. The suggestion to use #species would be fine (I never use species myself, because I dont understand what its really for). When stef says "I have checked the code and it looks nice" he didnt say which code he checked, so I am confused. Keith _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>>
> I wrote the split join implementation that is available on mantis > > http://bugs.squeak.org/view.php?id=4874 > > I use it all the time, if you would like to improve on what is > there, please continue to contribute to the mantis page discussion/ > tests and code. That way we will get an polished implementation that > can be added to squeak or to pharo. > > The suggestion to use #species would be fine (I never use species > myself, because I dont understand what its really for). or class the point is that you get back a collection of the same kind of the receiver > > > When stef says "I have checked the code and it looks nice" he didnt > say which code he checked, so I am confused. I looked at the latest version in the repository mentioned by oscar rubyshards > > > Keith > > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Stéphane Ducasse wrote:
>> I wrote the split join implementation that is available on mantis >> >> http://bugs.squeak.org/view.php?id=4874 >> >> I use it all the time, if you would like to improve on what is >> there, please continue to contribute to the mantis page discussion/ >> tests and code. That way we will get an polished implementation that >> can be added to squeak or to pharo. >> >> The suggestion to use #species would be fine (I never use species >> myself, because I dont understand what its really for). >> > > or class > the point is that you get back a collection of the same kind of the > receiver > >> When stef says "I have checked the code and it looks nice" he didnt >> say which code he checked, so I am confused. >> > > I looked at the latest version in the repository mentioned by oscar > rubyshards > > read the email, he asked what the status of mantis 4874 was, anticipating that it be integrated. He had "gone back" to ruby shards in the absense of the integration of 4784. Keith _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>>
> Which appears to me to be the opposite of what Oscar suggested. I do not know he asked about rbyshard and I looked at them. Period. > If I > read the email, he asked what the status of mantis 4874 was, > anticipating that it be integrated. He had "gone back" to ruby > shards in > the absense of the integration of 4784. > > Keith > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by keith1y
Hi Keith, Now I see there are attached files in Mantis. But they all seem to date from 2006, whereas your latest comments are from Jan 2009. Are there more recent files from 2009 that I should look at? If so, where are they? What is the best way to proceed? Shall I create a Join project on SqueakSource, and if it is updated, post the latest version on Mantis too? Cheers, - on On Apr 5, 2009, at 16:08, Keith Hodges wrote: > Stéphane Ducasse wrote: >>> I wrote the split join implementation that is available on mantis >>> >>> http://bugs.squeak.org/view.php?id=4874 >>> >>> I use it all the time, if you would like to improve on what is >>> there, please continue to contribute to the mantis page discussion/ >>> tests and code. That way we will get an polished implementation that >>> can be added to squeak or to pharo. >>> >>> The suggestion to use #species would be fine (I never use species >>> myself, because I dont understand what its really for). >>> >> >> or class >> the point is that you get back a collection of the same kind of the >> receiver >> >>> When stef says "I have checked the code and it looks nice" he didnt >>> say which code he checked, so I am confused. >>> >> >> I looked at the latest version in the repository mentioned by oscar >> rubyshards >> >> > Which appears to me to be the opposite of what Oscar suggested. If I > read the email, he asked what the status of mantis 4874 was, > anticipating that it be integrated. He had "gone back" to ruby > shards in > the absense of the integration of 4784. > > Keith > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
OK, I had a closer look. Keith's implementation is completely different from, and pre-dates, that of Damien and myself. Keith's version works for SequenceableCollections, and uses a sequence to split a sequence. Ours is more tailored towards Strings, and uses a regex to split a String. Perhaps we can consider a merge in which sequences can be split using sequences, and Strings can additionally be split using regexes. We should also take efficiency into account. I did not run any benchmarks yet to compare the implementations Who is interested in merging these two? Cheers, - on On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote: > > Hi Keith, > > Now I see there are attached files in Mantis. But they all seem to > date from 2006, whereas your latest comments are from Jan 2009. Are > there more recent files from 2009 that I should look at? If so, where > are they? > > What is the best way to proceed? Shall I create a Join project on > SqueakSource, and if it is updated, post the latest version on Mantis > too? > > Cheers, > - on > > On Apr 5, 2009, at 16:08, Keith Hodges wrote: > >> Stéphane Ducasse wrote: >>>> I wrote the split join implementation that is available on mantis >>>> >>>> http://bugs.squeak.org/view.php?id=4874 >>>> >>>> I use it all the time, if you would like to improve on what is >>>> there, please continue to contribute to the mantis page discussion/ >>>> tests and code. That way we will get an polished implementation >>>> that >>>> can be added to squeak or to pharo. >>>> >>>> The suggestion to use #species would be fine (I never use species >>>> myself, because I dont understand what its really for). >>>> >>> >>> or class >>> the point is that you get back a collection of the same kind of the >>> receiver >>> >>>> When stef says "I have checked the code and it looks nice" he didnt >>>> say which code he checked, so I am confused. >>>> >>> >>> I looked at the latest version in the repository mentioned by oscar >>> rubyshards >>> >>> >> Which appears to me to be the opposite of what Oscar suggested. If I >> read the email, he asked what the status of mantis 4874 was, >> anticipating that it be integrated. He had "gone back" to ruby >> shards in >> the absense of the integration of 4784. >> >> Keith >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
I would be in favor to have a nice oo solution :)
I do not know what means "uses a sequence to split a sequence." Stef > OK, I had a closer look. > > Keith's implementation is completely different from, and pre-dates, > that of Damien and myself. > > Keith's version works for SequenceableCollections, and uses a sequence > to split a sequence. > > Ours is more tailored towards Strings, and uses a regex to split a > String. > > Perhaps we can consider a merge in which sequences can be split using > sequences, and Strings can additionally be split using regexes. > > We should also take efficiency into account. I did not run any > benchmarks yet to compare the implementations > > Who is interested in merging these two? > > Cheers, > - on > > On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote: > >> >> Hi Keith, >> >> Now I see there are attached files in Mantis. But they all seem to >> date from 2006, whereas your latest comments are from Jan 2009. Are >> there more recent files from 2009 that I should look at? If so, >> where >> are they? >> >> What is the best way to proceed? Shall I create a Join project on >> SqueakSource, and if it is updated, post the latest version on Mantis >> too? >> >> Cheers, >> - on >> >> On Apr 5, 2009, at 16:08, Keith Hodges wrote: >> >>> Stéphane Ducasse wrote: >>>>> I wrote the split join implementation that is available on mantis >>>>> >>>>> http://bugs.squeak.org/view.php?id=4874 >>>>> >>>>> I use it all the time, if you would like to improve on what is >>>>> there, please continue to contribute to the mantis page >>>>> discussion/ >>>>> tests and code. That way we will get an polished implementation >>>>> that >>>>> can be added to squeak or to pharo. >>>>> >>>>> The suggestion to use #species would be fine (I never use species >>>>> myself, because I dont understand what its really for). >>>>> >>>> >>>> or class >>>> the point is that you get back a collection of the same kind of the >>>> receiver >>>> >>>>> When stef says "I have checked the code and it looks nice" he >>>>> didnt >>>>> say which code he checked, so I am confused. >>>>> >>>> >>>> I looked at the latest version in the repository mentioned by oscar >>>> rubyshards >>>> >>>> >>> Which appears to me to be the opposite of what Oscar suggested. If I >>> read the email, he asked what the status of mantis 4874 was, >>> anticipating that it be integrated. He had "gone back" to ruby >>> shards in >>> the absense of the integration of 4784. >>> >>> Keith >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
With Keith's version you can do this: #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11) I was assuming that the thing we use to split was a regex string. 'hello there' split: '\s' Actually I see that Damien added this possibility in RubyShards as well. This also works: #(1 10 11 2 10 11 3 10 11 4) split: #(10 11) It seems that RubyShards is more general, but we need to take a closer look at both solutions. The interfaces are not the same. There may be differences in performance. - on On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote: > I would be in favor to have a nice oo solution :) > I do not know what means "uses a sequence to split a sequence." > > Stef > >> OK, I had a closer look. >> >> Keith's implementation is completely different from, and pre-dates, >> that of Damien and myself. >> >> Keith's version works for SequenceableCollections, and uses a >> sequence >> to split a sequence. >> >> Ours is more tailored towards Strings, and uses a regex to split a >> String. >> >> Perhaps we can consider a merge in which sequences can be split using >> sequences, and Strings can additionally be split using regexes. >> >> We should also take efficiency into account. I did not run any >> benchmarks yet to compare the implementations >> >> Who is interested in merging these two? >> >> Cheers, >> - on >> >> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote: >> >>> >>> Hi Keith, >>> >>> Now I see there are attached files in Mantis. But they all seem to >>> date from 2006, whereas your latest comments are from Jan 2009. >>> Are >>> there more recent files from 2009 that I should look at? If so, >>> where >>> are they? >>> >>> What is the best way to proceed? Shall I create a Join project on >>> SqueakSource, and if it is updated, post the latest version on >>> Mantis >>> too? >>> >>> Cheers, >>> - on >>> >>> On Apr 5, 2009, at 16:08, Keith Hodges wrote: >>> >>>> Stéphane Ducasse wrote: >>>>>> I wrote the split join implementation that is available on mantis >>>>>> >>>>>> http://bugs.squeak.org/view.php?id=4874 >>>>>> >>>>>> I use it all the time, if you would like to improve on what is >>>>>> there, please continue to contribute to the mantis page >>>>>> discussion/ >>>>>> tests and code. That way we will get an polished implementation >>>>>> that >>>>>> can be added to squeak or to pharo. >>>>>> >>>>>> The suggestion to use #species would be fine (I never use species >>>>>> myself, because I dont understand what its really for). >>>>>> >>>>> >>>>> or class >>>>> the point is that you get back a collection of the same kind of >>>>> the >>>>> receiver >>>>> >>>>>> When stef says "I have checked the code and it looks nice" he >>>>>> didnt >>>>>> say which code he checked, so I am confused. >>>>>> >>>>> >>>>> I looked at the latest version in the repository mentioned by >>>>> oscar >>>>> rubyshards >>>>> >>>>> >>>> Which appears to me to be the opposite of what Oscar suggested. >>>> If I >>>> read the email, he asked what the status of mantis 4874 was, >>>> anticipating that it be integrated. He had "gone back" to ruby >>>> shards in >>>> the absense of the integration of 4784. >>>> >>>> Keith >>>> >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [hidden email] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
About performance: I just did a quick experiment in the pier migration application where I need split and join. I use split and join to remove comments from HTMl files. I ran the tests without removing comments, and removing them using the two different split/join implementations. Keith's sequence splitter is blindingly fast, imposing no discernable overhead, whereas my regex version slows all the tests down by 100%! I would still like to have splitting on regexes, but it should probably not be the default for strings. Maybe we can improve the implementation and speed it up ... - on On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote: > > With Keith's version you can do this: > > #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11) > > I was assuming that the thing we use to split was a regex string. > > 'hello there' split: '\s' > > Actually I see that Damien added this possibility in RubyShards as > well. This also works: > > #(1 10 11 2 10 11 3 10 11 4) split: #(10 11) > > It seems that RubyShards is more general, but we need to take a closer > look at both solutions. The interfaces are not the same. There may > be differences in performance. > > - on > > > On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote: > >> I would be in favor to have a nice oo solution :) >> I do not know what means "uses a sequence to split a sequence." >> >> Stef >> >>> OK, I had a closer look. >>> >>> Keith's implementation is completely different from, and pre-dates, >>> that of Damien and myself. >>> >>> Keith's version works for SequenceableCollections, and uses a >>> sequence >>> to split a sequence. >>> >>> Ours is more tailored towards Strings, and uses a regex to split a >>> String. >>> >>> Perhaps we can consider a merge in which sequences can be split >>> using >>> sequences, and Strings can additionally be split using regexes. >>> >>> We should also take efficiency into account. I did not run any >>> benchmarks yet to compare the implementations >>> >>> Who is interested in merging these two? >>> >>> Cheers, >>> - on >>> >>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote: >>> >>>> >>>> Hi Keith, >>>> >>>> Now I see there are attached files in Mantis. But they all seem to >>>> date from 2006, whereas your latest comments are from Jan 2009. >>>> Are >>>> there more recent files from 2009 that I should look at? If so, >>>> where >>>> are they? >>>> >>>> What is the best way to proceed? Shall I create a Join project on >>>> SqueakSource, and if it is updated, post the latest version on >>>> Mantis >>>> too? >>>> >>>> Cheers, >>>> - on >>>> >>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote: >>>> >>>>> Stéphane Ducasse wrote: >>>>>>> I wrote the split join implementation that is available on >>>>>>> mantis >>>>>>> >>>>>>> http://bugs.squeak.org/view.php?id=4874 >>>>>>> >>>>>>> I use it all the time, if you would like to improve on what is >>>>>>> there, please continue to contribute to the mantis page >>>>>>> discussion/ >>>>>>> tests and code. That way we will get an polished implementation >>>>>>> that >>>>>>> can be added to squeak or to pharo. >>>>>>> >>>>>>> The suggestion to use #species would be fine (I never use >>>>>>> species >>>>>>> myself, because I dont understand what its really for). >>>>>>> >>>>>> >>>>>> or class >>>>>> the point is that you get back a collection of the same kind of >>>>>> the >>>>>> receiver >>>>>> >>>>>>> When stef says "I have checked the code and it looks nice" he >>>>>>> didnt >>>>>>> say which code he checked, so I am confused. >>>>>>> >>>>>> >>>>>> I looked at the latest version in the repository mentioned by >>>>>> oscar >>>>>> rubyshards >>>>>> >>>>>> >>>>> Which appears to me to be the opposite of what Oscar suggested. >>>>> If I >>>>> read the email, he asked what the status of mantis 4874 was, >>>>> anticipating that it be integrated. He had "gone back" to ruby >>>>> shards in >>>>> the absense of the integration of 4784. >>>>> >>>>> Keith >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> [hidden email] >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>> project >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [hidden email] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Oops. I made a mistake in the experiment. There is actually less difference than I thought. Here we load a web site, optionally using split and join to remove all comments. My regex version seems to be only marginally worse than Keith's sequence splitting. 5289 "ON split on regex" 5327 5165 "KH split on sequence" 5160 2153 "no splitting" 2160 So regex splitting seems to be feasible. I can try to have a closer look and propose a merged solution, but right now my plate is rather full. - on On Apr 5, 2009, at 18:14, Oscar Nierstrasz wrote: > > About performance: > > I just did a quick experiment in the pier migration application where > I need split and join. > > I use split and join to remove comments from HTMl files. I ran the > tests without removing comments, and removing them using the two > different split/join implementations. > > Keith's sequence splitter is blindingly fast, imposing no discernable > overhead, whereas my regex version slows all the tests down by 100%! > > I would still like to have splitting on regexes, but it should > probably not be the default for strings. Maybe we can improve the > implementation and speed it up ... > > - on > > On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote: > >> >> With Keith's version you can do this: >> >> #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11) >> >> I was assuming that the thing we use to split was a regex string. >> >> 'hello there' split: '\s' >> >> Actually I see that Damien added this possibility in RubyShards as >> well. This also works: >> >> #(1 10 11 2 10 11 3 10 11 4) split: #(10 11) >> >> It seems that RubyShards is more general, but we need to take a >> closer >> look at both solutions. The interfaces are not the same. There may >> be differences in performance. >> >> - on >> >> >> On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote: >> >>> I would be in favor to have a nice oo solution :) >>> I do not know what means "uses a sequence to split a sequence." >>> >>> Stef >>> >>>> OK, I had a closer look. >>>> >>>> Keith's implementation is completely different from, and pre-dates, >>>> that of Damien and myself. >>>> >>>> Keith's version works for SequenceableCollections, and uses a >>>> sequence >>>> to split a sequence. >>>> >>>> Ours is more tailored towards Strings, and uses a regex to split a >>>> String. >>>> >>>> Perhaps we can consider a merge in which sequences can be split >>>> using >>>> sequences, and Strings can additionally be split using regexes. >>>> >>>> We should also take efficiency into account. I did not run any >>>> benchmarks yet to compare the implementations >>>> >>>> Who is interested in merging these two? >>>> >>>> Cheers, >>>> - on >>>> >>>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote: >>>> >>>>> >>>>> Hi Keith, >>>>> >>>>> Now I see there are attached files in Mantis. But they all seem >>>>> to >>>>> date from 2006, whereas your latest comments are from Jan 2009. >>>>> Are >>>>> there more recent files from 2009 that I should look at? If so, >>>>> where >>>>> are they? >>>>> >>>>> What is the best way to proceed? Shall I create a Join project >>>>> on >>>>> SqueakSource, and if it is updated, post the latest version on >>>>> Mantis >>>>> too? >>>>> >>>>> Cheers, >>>>> - on >>>>> >>>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote: >>>>> >>>>>> Stéphane Ducasse wrote: >>>>>>>> I wrote the split join implementation that is available on >>>>>>>> mantis >>>>>>>> >>>>>>>> http://bugs.squeak.org/view.php?id=4874 >>>>>>>> >>>>>>>> I use it all the time, if you would like to improve on what is >>>>>>>> there, please continue to contribute to the mantis page >>>>>>>> discussion/ >>>>>>>> tests and code. That way we will get an polished implementation >>>>>>>> that >>>>>>>> can be added to squeak or to pharo. >>>>>>>> >>>>>>>> The suggestion to use #species would be fine (I never use >>>>>>>> species >>>>>>>> myself, because I dont understand what its really for). >>>>>>>> >>>>>>> >>>>>>> or class >>>>>>> the point is that you get back a collection of the same kind of >>>>>>> the >>>>>>> receiver >>>>>>> >>>>>>>> When stef says "I have checked the code and it looks nice" he >>>>>>>> didnt >>>>>>>> say which code he checked, so I am confused. >>>>>>>> >>>>>>> >>>>>>> I looked at the latest version in the repository mentioned by >>>>>>> oscar >>>>>>> rubyshards >>>>>>> >>>>>>> >>>>>> Which appears to me to be the opposite of what Oscar suggested. >>>>>> If I >>>>>> read the email, he asked what the status of mantis 4874 was, >>>>>> anticipating that it be integrated. He had "gone back" to ruby >>>>>> shards in >>>>>> the absense of the integration of 4784. >>>>>> >>>>>> Keith >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pharo-project mailing list >>>>>> [hidden email] >>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>>> project >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> [hidden email] >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>> project >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [hidden email] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > Kind regards, Oscar Nierstrasz --- Prof. Dr. O. Nierstrasz -- [hidden email] Software Composition Group -- http://www.iam.unibe.ch/~scg University of Bern -- Tel/Fax +41 31 631.4618/3355 vcard: http://www.iam.unibe.ch/~oscar/oscarNierstrasz.vcf _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
ok let us know when you have a solution ready for integration because
this would be 8 methods well spend :) On Apr 5, 2009, at 6:27 PM, Oscar Nierstrasz wrote: > > Oops. I made a mistake in the experiment. There is actually less > difference than I thought. > > Here we load a web site, optionally using split and join to remove all > comments. My regex version seems to be only marginally worse than > Keith's sequence splitting. > > > 5289 "ON split on regex" > 5327 > > 5165 "KH split on sequence" > 5160 > > 2153 "no splitting" > 2160 > > So regex splitting seems to be feasible. > > I can try to have a closer look and propose a merged solution, but > right now my plate is rather full. > > - on > > > On Apr 5, 2009, at 18:14, Oscar Nierstrasz wrote: > >> >> About performance: >> >> I just did a quick experiment in the pier migration application where >> I need split and join. >> >> I use split and join to remove comments from HTMl files. I ran the >> tests without removing comments, and removing them using the two >> different split/join implementations. >> >> Keith's sequence splitter is blindingly fast, imposing no discernable >> overhead, whereas my regex version slows all the tests down by 100%! >> >> I would still like to have splitting on regexes, but it should >> probably not be the default for strings. Maybe we can improve the >> implementation and speed it up ... >> >> - on >> >> On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote: >> >>> >>> With Keith's version you can do this: >>> >>> #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11) >>> >>> I was assuming that the thing we use to split was a regex string. >>> >>> 'hello there' split: '\s' >>> >>> Actually I see that Damien added this possibility in RubyShards as >>> well. This also works: >>> >>> #(1 10 11 2 10 11 3 10 11 4) split: #(10 11) >>> >>> It seems that RubyShards is more general, but we need to take a >>> closer >>> look at both solutions. The interfaces are not the same. There may >>> be differences in performance. >>> >>> - on >>> >>> >>> On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote: >>> >>>> I would be in favor to have a nice oo solution :) >>>> I do not know what means "uses a sequence to split a sequence." >>>> >>>> Stef >>>> >>>>> OK, I had a closer look. >>>>> >>>>> Keith's implementation is completely different from, and pre- >>>>> dates, >>>>> that of Damien and myself. >>>>> >>>>> Keith's version works for SequenceableCollections, and uses a >>>>> sequence >>>>> to split a sequence. >>>>> >>>>> Ours is more tailored towards Strings, and uses a regex to split a >>>>> String. >>>>> >>>>> Perhaps we can consider a merge in which sequences can be split >>>>> using >>>>> sequences, and Strings can additionally be split using regexes. >>>>> >>>>> We should also take efficiency into account. I did not run any >>>>> benchmarks yet to compare the implementations >>>>> >>>>> Who is interested in merging these two? >>>>> >>>>> Cheers, >>>>> - on >>>>> >>>>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote: >>>>> >>>>>> >>>>>> Hi Keith, >>>>>> >>>>>> Now I see there are attached files in Mantis. But they all seem >>>>>> to >>>>>> date from 2006, whereas your latest comments are from Jan 2009. >>>>>> Are >>>>>> there more recent files from 2009 that I should look at? If so, >>>>>> where >>>>>> are they? >>>>>> >>>>>> What is the best way to proceed? Shall I create a Join project >>>>>> on >>>>>> SqueakSource, and if it is updated, post the latest version on >>>>>> Mantis >>>>>> too? >>>>>> >>>>>> Cheers, >>>>>> - on >>>>>> >>>>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote: >>>>>> >>>>>>> Stéphane Ducasse wrote: >>>>>>>>> I wrote the split join implementation that is available on >>>>>>>>> mantis >>>>>>>>> >>>>>>>>> http://bugs.squeak.org/view.php?id=4874 >>>>>>>>> >>>>>>>>> I use it all the time, if you would like to improve on what is >>>>>>>>> there, please continue to contribute to the mantis page >>>>>>>>> discussion/ >>>>>>>>> tests and code. That way we will get an polished >>>>>>>>> implementation >>>>>>>>> that >>>>>>>>> can be added to squeak or to pharo. >>>>>>>>> >>>>>>>>> The suggestion to use #species would be fine (I never use >>>>>>>>> species >>>>>>>>> myself, because I dont understand what its really for). >>>>>>>>> >>>>>>>> >>>>>>>> or class >>>>>>>> the point is that you get back a collection of the same kind of >>>>>>>> the >>>>>>>> receiver >>>>>>>> >>>>>>>>> When stef says "I have checked the code and it looks nice" he >>>>>>>>> didnt >>>>>>>>> say which code he checked, so I am confused. >>>>>>>>> >>>>>>>> >>>>>>>> I looked at the latest version in the repository mentioned by >>>>>>>> oscar >>>>>>>> rubyshards >>>>>>>> >>>>>>>> >>>>>>> Which appears to me to be the opposite of what Oscar suggested. >>>>>>> If I >>>>>>> read the email, he asked what the status of mantis 4874 was, >>>>>>> anticipating that it be integrated. He had "gone back" to ruby >>>>>>> shards in >>>>>>> the absense of the integration of 4784. >>>>>>> >>>>>>> Keith >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pharo-project mailing list >>>>>>> [hidden email] >>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>>>> project >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pharo-project mailing list >>>>>> [hidden email] >>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>>> project >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> [hidden email] >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>> project >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [hidden email] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > Kind regards, > Oscar Nierstrasz > --- > Prof. Dr. O. Nierstrasz -- [hidden email] > Software Composition Group -- http://www.iam.unibe.ch/~scg > University of Bern -- Tel/Fax +41 31 631.4618/3355 > vcard: http://www.iam.unibe.ch/~oscar/oscarNierstrasz.vcf > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Oscar Nierstrasz
Oscar Nierstrasz wrote:
> Oops. I made a mistake in the experiment. There is actually less > difference than I thought. > > Here we load a web site, optionally using split and join to remove all > comments. My regex version seems to be only marginally worse than > Keith's sequence splitting. > > > 5289 "ON split on regex" > 5327 > > 5165 "KH split on sequence" > 5160 > > 2153 "no splitting" > 2160 > > So regex splitting seems to be feasible. > who it performs the splitting. All you have to do it implement Regex-#splitUp: aString Keith _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Hi Syef, ;-) I am going over the two split-join implementations, merging functionality, generalizing and making the interfaces consistent. About your proposal: splitOn: should always return an OrderedCollection containing elements of the same type as the receiver. 'banana' splitOn: 'an' -> an OrderedCollection('b' '' 'a') It doesn't make sense for this to return a String. When joining, of course the original type should be returned no matter what kind of sequenceable collection was split. I am preparing tests for all the cases. - on On Apr 5, 2009, at 10:33, Stéphane Ducasse wrote: > then I have a question what splitBy: returns? > does it return a collection that is the same kind of the receiver? > Why do you hard code an OrderedCollection in splitBySubCollection: > aSplitter. > self species would do it too? > Syef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Apr 11, 2009, at 10:47 AM, Oscar Nierstrasz wrote: > > Hi Syef, ;-) :) > I am going over the two split-join implementations, merging > functionality, generalizing and making the interfaces consistent. > > About your proposal: > > splitOn: should always return an OrderedCollection containing elements > of the same type as the receiver. > > 'banana' splitOn: 'an' -> an OrderedCollection('b' '' 'a') ok :) > It doesn't make sense for this to return a String. > > When joining, of course the original type should be returned no matter > what kind of sequenceable collection was split. > > I am preparing tests for all the cases. excellent. When I see all the methods that we should remove from morph I'm convinced that this addition will be a good one proportionally to its size Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Oscar Nierstrasz
I have attempted to merge the functionality of Keith's Join package and Damien's and my RubyShards. The protocol is basically Keith's. Send splitOn: to a collection to split it, or send split: to the splitter with the collection as argument. The reverse messages are joinUsing: and join:. Everything has been generalized to work not only with Arrays and Strings but also OrderedCollections and SortedCollections. Not only sequences and blocks can be used as splitters, but also objects and regexes. Documentation is in SplitJoinTest on the class side, or see the class comment. See http://www.squeaksource.com/SplitJoin.html The latest version has been pushed into the PharoInbox. - on _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |