Smalltalk › Pharo › Pharo Smalltalk Developers

collection flatCollect: #something VS (collection collect: #something) flattened

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Julien Delplanque

collection flatCollect: #something VS (collection collect: #something) flattened

Hello everyone,

While using #flatCollect: on a collection, I realized that, for example,
these two code snippets do not behave the same way:

#(1 (1)) flatCollect: #yourself. "Raise an error because the array does
not contain only collections"

(#(1 (1)) collect: #yourself) flattened "Returns #(1 1)

Shouldn't these two code snippets behave the same way?

Thanks in advance for your answers.

Regards,

Julien

Henrik-Nergaard

Re: collection flatCollect: #something VS (collection collect: #something) flattened

>Shouldn't these two code snippets behave the same way?

#flatCollect: expects that aBlock returns a collection for each element (see method comment) and only flattens one level, while # flattened expands all sub collections it finds:
----------------------------------------------------------------
#( #(1 #(2) ) ) flatCollect: [ :x | x ]. "#(1 #(2))"
#( #(1 #(2) ) ) flattened. "#(1 2)"
----------------------------------------------------------------

Ps. Using a symbol instead of a block reduces performance.
[ 1 to: 1e9 do: [ :each | each ] ] timeToRun. "0:00:00:02.463"
[ 1 to: 1e9 do: #yourself ] timeToRun. "0:00:00:11.468"

Best regards,
Henrik

-----Opprinnelig melding-----
Fra: Pharo-dev [mailto:[hidden email]] På vegne av Julien Delplanque
Sendt: 12 January 2017 11:27
Til: Pharo Development List <[hidden email]>
Emne: [Pharo-dev] collection flatCollect: #something VS (collection collect: #something) flattened

Hello everyone,

While using #flatCollect: on a collection, I realized that, for example, these two code snippets do not behave the same way:

#(1 (1)) flatCollect: #yourself. "Raise an error because the array does not contain only collections"

(#(1 (1)) collect: #yourself) flattened "Returns #(1 1)

Shouldn't these two code snippets behave the same way?

Thanks in advance for your answers.

Regards,

Julien

Julien Delplanque

Re: collection flatCollect: #something VS (collection collect: #something) flattened

On 12/01/17 12:32, Henrik Nergaard wrote:
>> Shouldn't these two code snippets behave the same way?
> #flatCollect: expects that aBlock returns a collection for each element (see method comment) and only flattens one level, while # flattened expands all sub collections it finds:
> ----------------------------------------------------------------
> #( #(1 #(2) ) ) flatCollect: [ :x | x ]. "#(1 #(2))"
> #( #(1 #(2) ) ) flattened. "#(1 2)"
> ----------------------------------------------------------------
Oh, ok so it's a feature :)
>
> Ps. Using a symbol instead of a block reduces performance.
> [ 1 to: 1e9 do: [ :each | each ] ] timeToRun. "0:00:00:02.463"
> [ 1 to: 1e9 do: #yourself ] timeToRun. "0:00:00:11.468"
Wow, I used symbols to make the example clear but I didn't know that.
That's sad, I think it is sexier to use a symbol to do this kind of
things. :(

Regards,

Julien

John Brant-2

Re: collection flatCollect: #something VS (collection collect: #something) flattened

On 01/12/2017 06:45 AM, Julien Delplanque wrote:
> On 12/01/17 12:32, Henrik Nergaard wrote:
>> Ps. Using a symbol instead of a block reduces performance.
>> [ 1 to: 1e9 do: [ :each | each ] ] timeToRun. "0:00:00:02.463"
>> [ 1 to: 1e9 do: #yourself ] timeToRun. "0:00:00:11.468"
> Wow, I used symbols to make the example clear but I didn't know that.
> That's sad, I think it is sexier to use a symbol to do this kind of
> things. :(

I'm not sure what this test is supposed to show. The first one is just a
loop counting to 1 billion inlined in a method. The second one is a
message send of #to:do: which is implemented as a whileTrue: loop which
will send the #value: message to #yourself. Essentially, it is showing
the time to evaluate "#yourself value: someInt" 1 billion times.

I think that a better test to show the performance difference is this:

[ 1 to: 1000000000 do: [ :i | [ :e | e ] value: i ] ] timeToRun.
"0:00:00:14.917"

[ 1 to: 1000000000 do: [ :i | #yourself value: i ] ] timeToRun.
"0:00:00:07.846"

These results might lead you to believe that symbols are faster than
blocks. However, the first one is also creating 1 billion blocks. If we
create the block once, then blocks are faster:

[ | b | b := [ :e | e ]. 1 to: 1000000000 do: [ :i | b value: i ] ]
timeToRun. "0:00:00:04.515"

So, if you know how many blocks you will create and how often each block
is evaluated, you could come up with the optimal solution. Or, you could
just write your code so the intent is expressed clearly and not worry
about performance until it is needed.

John Brant

Henrik-Nergaard

Re: collection flatCollect: #something VS (collection collect: #something) flattened

The test was meant to show the overhead when a symbol is used as a message argument instead of a block, not the evaluation of each of them by #value:
--------------------------------------------
#(a b c) collect: #asUppercase "#('A' 'B' 'C')"
#(a b c) collect: [ :each | each asUppercase ]. "#('A' 'B' 'C')"
--------------------------------------------

Best regards,
Henrik

-----Opprinnelig melding-----
Fra: Pharo-dev [mailto:[hidden email]] På vegne av John Brant
Sendt: 12 January 2017 14:40
Til: [hidden email]
Emne: Re: [Pharo-dev] collection flatCollect: #something VS (collection collect: #something) flattened

On 01/12/2017 06:45 AM, Julien Delplanque wrote:
> On 12/01/17 12:32, Henrik Nergaard wrote:
>> Ps. Using a symbol instead of a block reduces performance.
>> [ 1 to: 1e9 do: [ :each | each ] ] timeToRun. "0:00:00:02.463"
>> [ 1 to: 1e9 do: #yourself ] timeToRun. "0:00:00:11.468"
> Wow, I used symbols to make the example clear but I didn't know that.
> That's sad, I think it is sexier to use a symbol to do this kind of
> things. :(

I'm not sure what this test is supposed to show. The first one is just a loop counting to 1 billion inlined in a method. The second one is a message send of #to:do: which is implemented as a whileTrue: loop which will send the #value: message to #yourself. Essentially, it is showing the time to evaluate "#yourself value: someInt" 1 billion times.

I think that a better test to show the performance difference is this:

[ 1 to: 1000000000 do: [ :i | [ :e | e ] value: i ] ] timeToRun.
"0:00:00:14.917"

[ 1 to: 1000000000 do: [ :i | #yourself value: i ] ] timeToRun.
"0:00:00:07.846"

These results might lead you to believe that symbols are faster than blocks. However, the first one is also creating 1 billion blocks. If we create the block once, then blocks are faster:

[ | b | b := [ :e | e ]. 1 to: 1000000000 do: [ :i | b value: i ] ] timeToRun. "0:00:00:04.515"

So, if you know how many blocks you will create and how often each block is evaluated, you could come up with the optimal solution. Or, you could just write your code so the intent is expressed clearly and not worry about performance until it is needed.

John Brant

John Brant-2

Re: collection flatCollect: #something VS (collection collect: #something) flattened

On 01/12/2017 08:03 AM, Henrik Nergaard wrote:
> The test was meant to show the overhead when a symbol is used as a message argument instead of a block, not the evaluation of each of them by #value:
> --------------------------------------------
> #(a b c) collect: #asUppercase "#('A' 'B' 'C')"
> #(a b c) collect: [ :each | each asUppercase ]. "#('A' 'B' 'C')"
> --------------------------------------------

Passing a symbol will be faster than creating and then passing a block.
If the block has already been creating then both should be about the
same speed. Evaluating a block is faster than evaluating a symbol like a
block (i.e., sending the #value: message).

Your initial tests used the #to:do: message. That message gets inlined
when the last argument is a one argument block. The code for the first
example "1 to: 1e9 do: [ :each | each ]" would have been inlined to look
something like this:

| current end |
end := 1e9.
current := 1.
[current <= end] whileTrue: [current := current + 1]

So there are no "real" block arguments as the "[:each | each]" gets removed.

BTW, there is a bug in the #to:do: inlining. The first argument is
evaluated before the receiver. Here's a test case that fails:

| stream |
stream := ReadStream on: #(2 1).
stream next to: stream next
do: [ :i | self error: 'This should not occur' ]

This should be evaluated as "2 to: 1 do: ...". Instead it is evaluated
as "1 to: 2 do: ..." .

John Brant