Smalltalk › Pharo › Pharo Smalltalk Developers

optimizing select:thenCollect:

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

25 messages Options

Stéphane Ducasse

optimizing select:thenCollect:

Hi all

apparently

Collection>>select: selectBlock thenCollect: collectBlock
"Utility method to improve readability."

^ (self select: selectBlock) collect: collectBlock

so it would be good to avoid to do two passes on the same collection.
Any takers?

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Nicolas Cellier

Re: optimizing select:thenCollect:

Not sure how you will handle this at such a high generality...
Reserve optimization to subclasses maybe...

Nicolas

2009/7/15 Stéphane Ducasse <[hidden email]>:

> Hi all
>
> apparently
>
> Collection>>select: selectBlock thenCollect: collectBlock
> "Utility method to improve readability."
>
> ^ (self select: selectBlock) collect: collectBlock
>
> so it would be good to avoid to do two passes on the same collection.
> Any takers?
>
> Stef
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

hernan.wilkinson

Re: optimizing select:thenCollect:

In reply to this post by Stéphane Ducasse

Just a quick thought:

select: selectBlock thenCollect: collectBlock

    | newCollection |
    newCollection := self species new.
    self do: [:each | (selectBlock value: each) ifTrue: [newCollection add: (collectBlock value: each)]].
    ^newCollection

But, we can not say that is going to be faster always... if newCollection grows too much, it could be slower. Also it wont work on SortedCollection becuase SortedCollection has to return an OrderCollection and species returns a SortedCollection... I did not see test for this message, it would be great to have them and try :-)

On Wed, Jul 15, 2009 at 5:01 PM, Stéphane Ducasse <[hidden email]> wrote:

Hi all

apparently

Collection>>select: selectBlock thenCollect: collectBlock
"Utility method to improve readability."

^ (self select: selectBlock) collect: collectBlock

so it would be good to avoid to do two passes on the same collection.
Any takers?

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Lukas Renggli

Re: optimizing select:thenCollect:

> select: selectBlock thenCollect: collectBlock
>
>     | newCollection |
>     newCollection := self species new.
>     self do: [:each | (selectBlock value: each) ifTrue: [newCollection add:
> (collectBlock value: each)]].
>     ^newCollection

That code would also break Dictionary and Array.

I think the methods #select:thenCollect: and #collect:thenSelect:
should be removed:

- This is a low level optimization that the compiler (or JIT) should
optimize, not the developer.
- 'a select: b thenCollect: c' is longer to write than '(a select: b)
collect: c'. Furthermore it is not that obvious what it does. I have
seen several people being confused about it.
- This is just #select:thenCollect:, what about
#reject:thenCollect:select:thenInjectInto:? What about all subclasses
that require a new implementation (Array, Dictionary, ...) of such
methods.

If people really want to be able to combine enumerator methods for
higher efficiently the Collection hierarchy should be fixed. Having
external iterator objects like in C++ and Java is not that bad after
all:

result := aCollection iterator
select: [ :e | ... ];
collect: [ :e | ... ];
contents

The accessors #select:, #reject:, #collect: ... on the iterator object
would just iteratively create a new iteration block that is lazily
executed when the result is accessed with #contents, #size or #do:.

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Paolo Bonzini-2

Re: optimizing select:thenCollect:

> If people really want to be able to combine enumerator methods for
> higher efficiently the Collection hierarchy should be fixed. Having
> external iterator objects like in C++ and Java is not that bad after
> all:
>
> result := aCollection iterator
> select: [ :e | ... ];
> collect: [ :e | ... ];
> contents

It's already there and it's called #readStream. :-) It only lacks #size.

See http://code.google.com/p/pharo/issues/detail?id=958 for an outline
of the implementation.

Paolo

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Lukas Renggli

Re: optimizing select:thenCollect:

>> If people really want to be able to combine enumerator methods for
>> higher efficiently the Collection hierarchy should be fixed. Having
>> external iterator objects like in C++ and Java is not that bad after
>> all:
>>
>> result := aCollection iterator
>> select: [ :e | ... ];
>> collect: [ :e | ... ];
>> contents
>
> It's already there and it's called #readStream. :-) It only lacks #size.

Not in Pharo, but it lacks all iteration methods.

> See http://code.google.com/p/pharo/issues/detail?id=958 for an outline
> of the implementation.

That would be a welcome addition to Pharo.

Having an iterator class also avoids having to implement all these
iteration methods over and over again, e.g. in visitors. Is this going
to be ANSI?

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by Lukas Renggli

so in that case I would be in favor to remove them because this
does not help and just bloat the image.

On Jul 16, 2009, at 8:21 AM, Lukas Renggli wrote:

>> select: selectBlock thenCollect: collectBlock
>>
>> | newCollection |
>> newCollection := self species new.
>> self do: [:each | (selectBlock value: each) ifTrue:
>> [newCollection add:
>> (collectBlock value: each)]].
>> ^newCollection
>
> That code would also break Dictionary and Array.
>
> I think the methods #select:thenCollect: and #collect:thenSelect:
> should be removed:
>
> - This is a low level optimization that the compiler (or JIT) should
> optimize, not the developer.
> - 'a select: b thenCollect: c' is longer to write than '(a select: b)
> collect: c'. Furthermore it is not that obvious what it does. I have
> seen several people being confused about it.
> - This is just #select:thenCollect:, what about
> #reject:thenCollect:select:thenInjectInto:? What about all subclasses
> that require a new implementation (Array, Dictionary, ...) of such
> methods.
>
> If people really want to be able to combine enumerator methods for
> higher efficiently the Collection hierarchy should be fixed. Having
> external iterator objects like in C++ and Java is not that bad after
> all:
>
> result := aCollection iterator
> select: [ :e | ... ];
> collect: [ :e | ... ];
> contents
>
> The accessors #select:, #reject:, #collect: ... on the iterator object
> would just iteratively create a new iteration block that is lazily
> executed when the result is accessed with #contents, #size or #do:.
>
> Cheers,
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by Lukas Renggli

>>> If people really want to be able to combine enumerator methods for
>>> higher efficiently the Collection hierarchy should be fixed.

Paolo why this would be faster?

>>> Having
>>> external iterator objects like in C++ and Java is not that bad after
>>> all:
>>>
>>> result := aCollection iterator
>>> select: [ :e | ... ];
>>> collect: [ :e | ... ];
>>> contents
>>
>> It's already there and it's called #readStream. :-) It only lacks
>> #size.
>
> Not in Pharo, but it lacks all iteration methods.
>
>> See http://code.google.com/p/pharo/issues/detail?id=958 for an
>> outline
>> of the implementation.
>
> That would be a welcome addition to Pharo.
>
> Having an iterator class also avoids having to implement all these
> iteration methods over and over again, e.g. in visitors.

lukas do you have an example because I do not see why
Also with traits you just have to implement do: and you get all the
other one for free.

> Is this going
> to be ANSI?

DNU? why ANSI has to be in the story?

>
> Lukas
>
> --
> Lukas Renggli
> http://www.lukas-renggli.ch
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by Paolo Bonzini-2

Paolo

why Iterable as a superclass of stream and collection?
Why not a trait?
Then I'm wondering why this would be faster.

Hi all, I would like to ask your opinion about adding the "Iterable"
common
superclass of Collection and Stream to Pharo. This is already in GNU
Smalltalk and is in my opinion a better alternative to the "lazy
collections" idea.

Here is how I did it:

1) Well, create the class and make Stream/Collection subclasses of it.

2) Push there from Collection #do: (abstract), #inject:into:, #fold: (if
you have it), #do:separatedBy:, #detect:, #detect:ifNone:, #count: (if
you
have it), #allSatisfy:, #noneSatisfy:, #conform:, #contains: (if you
have
the last two).

3) Add #nextPutAllOn:

nextPutAllOn: aStream
"Write all the objects in the receiver to aStream"
self do: [ :each | aStream nextPut: each ]

4) Change Stream>>#nextPutAllOn: to use "source nextPutAllOn: self".

5) On top of this I added "filtering" streams to implement #select:,
#reject: and #collect:. Here is a clean-room MIT implementation:

Stream >> select: block
^FilterStream new stream: source block: block result: true

Stream >> reject: block
^FilterStream new stream: source block: block result: false

Stream >> collect: block
^FilterStream new stream: source block: block

FilterStream >> atEnd
atEnd isNil ifTrue: [
[stream atEnd ifTrue: [^atEnd := true].
next := stream next.
(block value: next) == result] whileFalse.
atEnd := false].
^atEnd

FilterStream >> next
self atEnd ifTrue: [^pastEnd].
atEnd := nil.
^next

CollectStream >> atEnd
^stream atEnd

CollectStream >> next
^block value: stream next

On Jul 16, 2009, at 9:57 AM, Paolo Bonzini wrote:

>
>> If people really want to be able to combine enumerator methods for
>> higher efficiently the Collection hierarchy should be fixed. Having
>> external iterator objects like in C++ and Java is not that bad after
>> all:
>>
>> result := aCollection iterator
>> select: [ :e | ... ];
>> collect: [ :e | ... ];
>> contents
>
> It's already there and it's called #readStream. :-) It only lacks
> #size.
>
> See http://code.google.com/p/pharo/issues/detail?id=958 for an outline
> of the implementation.
>
> Paolo
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Andres Valloud-4

Re: optimizing select:thenCollect:

In reply to this post by Stéphane Ducasse

I'd favor removing such methods. The allocation optimization can be
implemented better whenever it's needed, and IME it doesn't happen that
often.

Stéphane Ducasse wrote:

> so in that case I would be in favor to remove them because this
> does not help and just bloat the image.
>
>
>
> On Jul 16, 2009, at 8:21 AM, Lukas Renggli wrote:
>
>
>>> select: selectBlock thenCollect: collectBlock
>>>
>>> | newCollection |
>>> newCollection := self species new.
>>> self do: [:each | (selectBlock value: each) ifTrue:
>>> [newCollection add:
>>> (collectBlock value: each)]].
>>> ^newCollection
>>>
>> That code would also break Dictionary and Array.
>>
>> I think the methods #select:thenCollect: and #collect:thenSelect:
>> should be removed:
>>
>> - This is a low level optimization that the compiler (or JIT) should
>> optimize, not the developer.
>> - 'a select: b thenCollect: c' is longer to write than '(a select: b)
>> collect: c'. Furthermore it is not that obvious what it does. I have
>> seen several people being confused about it.
>> - This is just #select:thenCollect:, what about
>> #reject:thenCollect:select:thenInjectInto:? What about all subclasses
>> that require a new implementation (Array, Dictionary, ...) of such
>> methods.
>>
>> If people really want to be able to combine enumerator methods for
>> higher efficiently the Collection hierarchy should be fixed. Having
>> external iterator objects like in C++ and Java is not that bad after
>> all:
>>
>> result := aCollection iterator
>> select: [ :e | ... ];
>> collect: [ :e | ... ];
>> contents
>>
>> The accessors #select:, #reject:, #collect: ... on the iterator object
>> would just iteratively create a new iteration block that is lazily
>> executed when the result is accessed with #contents, #size or #do:.
>>
>> Cheers,
>> Lukas
>>
>> --
>> Lukas Renggli
>> http://www.lukas-renggli.ch
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
> .
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Lukas Renggli

Re: optimizing select:thenCollect:

In reply to this post by Stéphane Ducasse

>> Having an iterator class also avoids having to implement all these
>> iteration methods over and over again, e.g. in visitors.
>
> lukas do you have an example because I do not see why
> Also with traits you just have to implement do: and you get all the
> other one for free.

The class and metaclass system of Smalltalk is already at the limit of
complexity that I can handle. Traits make this existing model even
more complicated. Even with support from a trait aware browser I find
traits extermely hard to use. To me the most important thing is that
at all points in time I know exactly what changes when I compile a
method. This is absolutely not clear when using traits.

So far I haven't seen a single example where traits would really help.
As Paolo suggested the problem can be simply solved by introducing an
Iterable superclass to Collection and Stream.

Now you might say that this only works if I am not inheriting from
some other important class. True, an Iterable-trait would avoid that
problem. However, adding such a trait to an existing hierarchy looks
extremely scary to me. It potentially pollutes an existing protocol
with existing code that is most likely unrelated (or maybe it already
contains some iterator code for something else). I believe, that in
this case *not* using traits forces you to do the right thing:
delegate to another object that knows how to iterate over your object.

>> Is this going
>> to be ANSI?
>
> DNU? why ANSI has to be in the story?

Paolo is in the ANSI committee, I though. Maybe that will be added?

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Paolo Bonzini-2

Re: optimizing select:thenCollect:

On 07/16/2009 12:04 PM, Lukas Renggli wrote:
> I believe, that in
> this case*not* using traits forces you to do the right thing:
> delegate to another object that knows how to iterate over your object.

Agreed. Adding a #do: method to a class can often be the right thing,
but I still have to see one case when a non-Collection, non-Stream class
would have advantages from a public #inject:into: or #fold: or #detect:
method. A sidenote -- things that do not count:

1) #select:/#reject:/#collect:, because those would be abstract in
Iterable anyway.

2) using the above methods from within the class itself, because 99.99%
of the cases you will have an underlying collection or stream and you
can access that underlying object without violating information hiding.

>>> >> Is this going
>>> >> to be ANSI?
>> >
>> > DNU? why ANSI has to be in the story?
>
> Paolo is in the ANSI committee, I though. Maybe that will be added?

There is no ANSI committee, that required to pay money and was not
considered viable.

There is a mailing list (run by Bruce Badger) and a protocol (hosted at
the GNU Smalltalk site, see the "STEPs" link in the home page) to submit
proposals for extensions to Smalltalk, hopefully so that they can be
implemented cross vendor. But it has been dormient for too long. :-(

Paolo

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Paolo Bonzini-2

Re: optimizing select:thenCollect:

In reply to this post by Stéphane Ducasse

On 07/16/2009 11:38 AM, Stéphane Ducasse wrote:
> Paolo
>
> why Iterable as a superclass of stream and collection?
> Why not a trait?
> Then I'm wondering why this would be faster.

Maybe for just "(a select: ...) collect: b" it would not be faster.

However, the important thing is to provide:

1) a new abstraction. Providing more polymorphism between Collections
and Streams can only do good, especially when the semantics are 100%
identical as for #detect: or #fold: (based on #do:). In GNU Smalltalk I
did a bit more by providing also #readStream and #nextPutAllOn:, but
that's not necessary so I kept my pharo proposal minimal.

2) a new bag of tools. Lazy filtering has been proposed in many
different ways, so it *is* useful. Adding Iterable is in my opinion the
best way to provide this tool in a way that fits the simplicity of the
standard Smalltalk class library (this is a way to say that
#doesNotUnderstand: tricks have their place, but other ways should be
explored as well).

Paolo

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

EstebanLM

Re: optimizing select:thenCollect:

In reply to this post by Andres Valloud-4

I use them a lot... just because "select:thenCollect:" looks clearer to
me than "(select:)collect:", and I can indent the code in a more
fashion way to me.
Of course, if you choose remove it, I can change my code without any
problem (it is not so hard, after all :) ), but I just wanted to point
that optimization is not always the reason because a protocol can be
added... some times, is gramatics too.

Cheers,
Esteban

On 2009-07-16 07:02:56 -0300, Andres Valloud
<[hidden email]> said:

> I'd favor removing such methods. The allocation optimization can be
>
> implemented better whenever it's needed, and IME it doesn't happen that
>
> often.
>
> Stéphane Ducasse wrote:
>> so in that case I would be in favor to remove them because this
>> does not help and just bloat the image.
>>
>>
>>
>> On Jul 16, 2009, at 8:21 AM, Lukas Renggli wrote:
>>
>>
>
>>>> select: selectBlock thenCollect: collectBlock
>>>>
>>>> | newCollection |
>>>> newCollection := self species new.
>>>> self do: [:each | (selectBlock value: each) ifTrue:
>
>>>> [newCollection add:
>>>> (collectBlock value: each)]].
>>>> ^newCollection
>>>>
>
>>> That code would also break Dictionary and Array.
>>>
>>> I think the methods #select:thenCollect: and #collect:thenSelect:
>>> should be removed:
>>>
>>> - This is a low level optimization that the compiler (or JIT) should
>>> optimize, not the developer.
>>> - 'a select: b thenCollect: c' is longer to write than '(a select: b)
>>> collect: c'. Furthermore it is not that obvious what it does. I have
>>> seen several people being confused about it.
>>> - This is just #select:thenCollect:, what about
>>> #reject:thenCollect:select:thenInjectInto:? What about all subclasses
>>> that require a new implementation (Array, Dictionary, ...) of such
>>> methods.
>>>
>>> If people really want to be able to combine enumerator methods for
>>> higher efficiently the Collection hierarchy should be fixed. Having
>>> external iterator objects like in C++ and Java is not that bad after
>>> all:
>>>
>>> result := aCollection iterator
>>> select: [ :e | ... ];
>>> collect: [ :e | ... ];
>>> contents
>>>
>>> The accessors #select:, #reject:, #collect: ... on the iterator object
>>> would just iteratively create a new iteration block that is lazily
>>> executed when the result is accessed with #contents, #size or #do:.
>>>
>>> Cheers,
>>> Lukas
>>>
>>> --
>
>>> Lukas Renggli
>>> http://www.lukas-renggli.ch
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> .

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Lukas Renggli

Re: optimizing select:thenCollect:

> I use them a lot... just because "select:thenCollect:" looks clearer to
> me than "(select:)collect:", and I can indent the code in a more
> fashion way to me.
> Of course, if you choose remove it, I can change my code without any
> problem (it is not so hard, after all :) ), but I just wanted to point
> that optimization is not always the reason because a protocol can be
> added... some times, is gramatics too.

The following rewriter changes your code automatically. We use it as
part of Slime to ensure portability.

ParseTreeRewriter new
replace: '``@collection collect: ``@block1 thenDo: ``@block2'
with: '(``@collection collect: ``@block1) do: ``@block2';
replace: '``@collection collect: ``@block1 thenSelect: ``@block2'
with: '(``@collection collect: ``@block1) select: ``@block2';
replace: '``@collection reject: ``@block1 thenDo: ``@block2'
with: '(``@collection reject: ``@block1) do: ``@block2';
replace: '``@collection select: ``@block1 thenCollect: ``@block2'
with: '(``@collection select: ``@block1) collect: ``@block2';
replace: '``@collection select: ``@block1 thenDo: ``@block2'
with: '(``@collection select: ``@block1) do: ``@block2'

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by Lukas Renggli

On Jul 16, 2009, at 12:04 PM, Lukas Renggli wrote:

>>> Having an iterator class also avoids having to implement all these
>>> iteration methods over and over again, e.g. in visitors.
>>
>> lukas do you have an example because I do not see why
>> Also with traits you just have to implement do: and you get all the
>> other one for free.
>
> The class and metaclass system of Smalltalk is already at the limit of
> complexity that I can handle. Traits make this existing model even
> more complicated. Even with support from a trait aware browser I find
> traits extermely hard to use. To me the most important thing is that
> at all points in time I know exactly what changes when I compile a
> method. This is absolutely not clear when using traits.

I find your statements a bit rude.
>
> So far I haven't seen a single example where traits would really help.

Really!
I'm writing a trait to get magritte-aware xml output without having to
always inherit
from a top class or to copy and paste all the time the same code in my
classes.

> As Paolo suggested the problem can be simply solved by introducing an
> Iterable superclass to Collection and Stream.

I do not see how his solution will offer a faster implementation

> Now you might say that this only works if I am not inheriting from
> some other important class. True, an Iterable-trait would avoid that
> problem. However, adding such a trait to an existing hierarchy looks
> extremely scary to me. It potentially pollutes an existing protocol
> with existing code that is most likely unrelated (or maybe it already
> contains some iterator code for something else).

like what collect: select: do: reject: inject:into: ?

I think that you should also have a look at what other people are doing.

> I believe, that in
> this case *not* using traits forces you to do the right thing:
> delegate to another object that knows how to iterate over your object.
>
>>> Is this going
>>> to be ANSI?
>>
>> DNU? why ANSI has to be in the story?
>
> Paolo is in the ANSI committee, I though. Maybe that will be added?

ANSI is dead. It was a bunch of specification pushed apart because
smalltalk was hot
and vendors wanted or not a feature in.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by Paolo Bonzini-2

On Jul 16, 2009, at 2:09 PM, Paolo Bonzini wrote:

> On 07/16/2009 12:04 PM, Lukas Renggli wrote:
>> I believe, that in
>> this case*not* using traits forces you to do the right thing:
>> delegate to another object that knows how to iterate over your
>> object.
>
> Agreed. Adding a #do: method to a class can often be the right thing,
> but I still have to see one case when a non-Collection, non-Stream
> class
> would have advantages from a public #inject:into: or #fold: or
> #detect:
> method. A sidenote -- things that do not count:

In the moose environemnt we have group and they cannot inherit from
Iterable.

> 1) #select:/#reject:/#collect:, because those would be abstract in
> Iterable anyway.
>
> 2) using the above methods from within the class itself, because
> 99.99%
> of the cases you will have an underlying collection or stream and you
> can access that underlying object without violating information
> hiding.

yes so what?
How this stream based iomplementation goes faster that the default
collection one?

>
>>>>>> Is this going
>>>>>> to be ANSI?
>>>>
>>>> DNU? why ANSI has to be in the story?
>>
>> Paolo is in the ANSI committee, I though. Maybe that will be added?
>
> There is no ANSI committee, that required to pay money and was not
> considered viable.
>
> There is a mailing list (run by Bruce Badger) and a protocol (hosted
> at
> the GNU Smalltalk site, see the "STEPs" link in the home page) to
> submit
> proposals for extensions to Smalltalk, hopefully so that they can be
> implemented cross vendor. But it has been dormient for too long. :-(
>
> Paolo
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by Paolo Bonzini-2

On Jul 16, 2009, at 2:16 PM, Paolo Bonzini wrote:

> On 07/16/2009 11:38 AM, Stéphane Ducasse wrote:
>> Paolo
>>
>> why Iterable as a superclass of stream and collection?
>> Why not a trait?
>> Then I'm wondering why this would be faster.
>
> Maybe for just "(a select: ...) collect: b" it would not be faster.
>
> However, the important thing is to provide:
>
> 1) a new abstraction. Providing more polymorphism between Collections
> and Streams can only do good, especially when the semantics are 100%
> identical as for #detect: or #fold: (based on #do:). In GNU
> Smalltalk I
> did a bit more by providing also #readStream and #nextPutAllOn:, but
> that's not necessary so I kept my pharo proposal minimal.
>
> 2) a new bag of tools. Lazy filtering has been proposed in many
> different ways, so it *is* useful. Adding Iterable is in my opinion
> the
> best way to provide this tool in a way that fits the simplicity of the
> standard Smalltalk class library (this is a way to say that
> #doesNotUnderstand: tricks have their place, but other ways should be
> explored as well).

sure but for me my point was collect:thenSelect: looks badly
implemented.
Now you can that composed Stream. Damien has that in Nile since two
years too.
>
> Paolo
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: optimizing select:thenCollect:

In reply to this post by EstebanLM

On Jul 16, 2009, at 2:44 PM, Esteban Lorenzano wrote:

> I use them a lot... just because "select:thenCollect:" looks clearer
> to
> me than "(select:)collect:", and I can indent the code in a more
> fashion way to me.
> Of course, if you choose remove it, I can change my code without any
> problem (it is not so hard, after all :) ), but I just wanted to point
> that optimization is not always the reason because a protocol can be
> added... some times, is gramatics too.

indeed but then it would be good to have it efficient also

>
> Cheers,
> Esteban
>
> On 2009-07-16 07:02:56 -0300, Andres Valloud
> <[hidden email]> said:
>
>> I'd favor removing such methods. The allocation optimization can be
>>
>> implemented better whenever it's needed, and IME it doesn't happen
>> that
>>
>> often.
>>
>> Stéphane Ducasse wrote:
>>> so in that case I would be in favor to remove them because this
>>> does not help and just bloat the image.
>>>
>>>
>>>
>>> On Jul 16, 2009, at 8:21 AM, Lukas Renggli wrote:
>>>
>>>
>>
>>>>> select: selectBlock thenCollect: collectBlock
>>>>>
>>>>> | newCollection |
>>>>> newCollection := self species new.
>>>>> self do: [:each | (selectBlock value: each) ifTrue:
>>
>>>>> [newCollection add:
>>>>> (collectBlock value: each)]].
>>>>> ^newCollection
>>>>>
>>
>>>> That code would also break Dictionary and Array.
>>>>
>>>> I think the methods #select:thenCollect: and #collect:thenSelect:
>>>> should be removed:
>>>>
>>>> - This is a low level optimization that the compiler (or JIT)
>>>> should
>>>> optimize, not the developer.
>>>> - 'a select: b thenCollect: c' is longer to write than '(a
>>>> select: b)
>>>> collect: c'. Furthermore it is not that obvious what it does. I
>>>> have
>>>> seen several people being confused about it.
>>>> - This is just #select:thenCollect:, what about
>>>> #reject:thenCollect:select:thenInjectInto:? What about all
>>>> subclasses
>>>> that require a new implementation (Array, Dictionary, ...) of such
>>>> methods.
>>>>
>>>> If people really want to be able to combine enumerator methods for
>>>> higher efficiently the Collection hierarchy should be fixed. Having
>>>> external iterator objects like in C++ and Java is not that bad
>>>> after
>>>> all:
>>>>
>>>> result := aCollection iterator
>>>> select: [ :e | ... ];
>>>> collect: [ :e | ... ];
>>>> contents
>>>>
>>>> The accessors #select:, #reject:, #collect: ... on the iterator
>>>> object
>>>> would just iteratively create a new iteration block that is lazily
>>>> executed when the result is accessed with #contents, #size or #do:.
>>>>
>>>> Cheers,
>>>> Lukas
>>>>
>>>> --
>>
>>>> Lukas Renggli
>>>> http://www.lukas-renggli.ch
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>> .
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Lukas Renggli

Re: optimizing select:thenCollect:

In reply to this post by Stéphane Ducasse

>> The class and metaclass system of Smalltalk is already at the limit of
>> complexity that I can handle. Traits make this existing model even
>> more complicated. Even with support from a trait aware browser I find
>> traits extermely hard to use. To me the most important thing is that
>> at all points in time I know exactly what changes when I compile a
>> method. This is absolutely not clear when using traits.
>
> I find your statements a bit rude.

I am not saying that you should not use them. I am just trying to
explain why I am not using them myself.

> I'm writing a trait to get magritte-aware xml output without having to
> always inherit
> from a top class or to copy and paste all the time the same code in my
> classes.

I would use delegation for that. Both, Pier and Magritte use that pattern too.

>> As Paolo suggested the problem can be simply solved by introducing an
>> Iterable superclass to Collection and Stream.
>
> I do not see how his solution will offer a faster implementation

Evaluating

((aCollection
select: [ :e | ... ])
collect: [ :e | ... ])

iterates twice over aCollection, while

((aCollection readStream
select: [ :e | ... ])
collect: [ :e | ... ])
contents

will iterate only once. Furthermore the second approach allows any
combination of any of the enumeration methods, not just a few
predefined ones. Also the implementation is less error prone, because
it just uses the standard methods on collection and stream.

>> Now you might say that this only works if I am not inheriting from
>> some other important class. True, an Iterable-trait would avoid that
>> problem. However, adding such a trait to an existing hierarchy looks
>> extremely scary to me. It potentially pollutes an existing protocol
>> with existing code that is most likely unrelated (or maybe it already
>> contains some iterator code for something else).
>
> like what collect: select: do: reject: inject:into: ?

What if you want to iterate forward and backward over a collection?
What if you have multiple things in your model object that you want to
iterate over?

Would you add several iterable traits and prefix all selectors?

aSystem do: ...
aSystem reverseDo: ...
aSystem itemsDo: ...

Or rather use delegation?

aSystem iterator do: ...
aSystem reverseIterator do: ...
aSystem itemsIterator do: ...

I find the second solution much nicer because it extracts the strategy
how to iterate over something to a separate object. In about any model
there are multiple collections involved and multiple ways of iterating
through these objects, embedding the iterators directly into the
container seems wrong to me.

> I think that you should also have a look at what other people are doing.

All I am saying is that collection iteration is one of the few things
that Smalltalk got wrong.

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project