OrderedSet pitfall

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Faster_OrderedSet.st (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)

Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann

Boris,

 

It isn't even a tradeoff between allocation and speed if the collections are pre-sized properly. Collection class>>withAll: is missing overrides for collections that can be pre-sized.

 

Here is a 98% increase on the 95% increase I showed earlier.

 

| values |

values := (1 to: 500000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

818

817

 

46724

53783

 

1 - (817 / 46724) * -100.0

-98.2514

 

Just add this method to the code I provided earlier:

 

OrderedSet class>>withAll: aCollection

                " Answer a new instance of this class,

                whose elements are the elements of aCollection. "

 

                | newCollection |

                newCollection := self new: aCollection size.

                newCollection addAll: aCollection.

                ^newCollection

 

 

Paul Baumann

 

 

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:34
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Andres Valloud-6
In reply to this post by Paul Baumann
If your application spends a lot of time in findKeyOrNil: (or similar),
then it's because of collisions.  Here are two ways to go about it (I'm
sure you know most of this, but I think it's worthwhile writing it down
in general terms):

a) If you can avoid the collisions, then improve the hash function so
you can achieve O(1) efficiency on hashing (meaning: you find what you
are looking for in 1.01 or less lookups on average).  You can use the
Hash Analysis Tool to quantitatively and qualitatively evaluate hash
functions (available from the public Store repository, bundle "Hash
Analysis Tool" --- if you want more hash functions and data sets, load
"Hash Analysis Tool - Extensions" also).  Note that besides the extra
comparisons, hash misses cost a lot because the more you miss, the more
random memory reads you have to do in order to find what you are looking
for.  For example, I recently rewrote the non-interruptible GC.  The new
implementation is a bit more complex computationally, but since it
avoids unnecessary memory traffic it is up to 35% faster.

b) If you cannot avoid the collisions (e.g.: because you're using
identity hash), you should change the storage strategy of the hashed
collection.  For example, instead of open addressing linear probing (the
default strategy in every Smalltalk I know), use hash buckets.  You can
see an example of a hash bucketed set in the class SymbolTable, and
there is also the parcel UniqueObjectTable that generalizes the
approach, plus an example parcel that makes the image start sharing
bytecode arrays across all compiled codes.  You can find the parcels
under the parcels directory.

Finally, there's also the hashing book.

On 5/31/2011 7:28 AM, Paul Baumann wrote:

> Boris,
>
> Agreed. The biggest performance hit in the Collections is the hidden
> cost of #to:do: enumerations. It doesn't show in the profiler so tuning
> tends to focus on other costs instead of considering the design itself.
> Your example with OrderedSet>>includes: shows an extreme of how costly
> it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also
> suffer costs like these. There are collection designs that avoids those
> costs while also allowing index searches using primitive instead of #to:do:.
>
> Paul Baumann
>
> *From:*Boris Popov, DeepCove Labs [mailto:[hidden email]]
> *Sent:* Tuesday, May 31, 2011 10:15
> *To:* Paul Baumann; [hidden email]
> *Subject:* RE: OrderedSet pitfall
> *Importance:* High
>
> Paul,
>
> Totally. My comment was mostly to point out that there’s quite a bit of
> room for improvement in that base system class considering that it seems
> such an obvious candidate for this specific task based on the name alone.
>
> Regards,
>
> -Boris
>
> *From:*Paul Baumann [mailto:[hidden email]]
> *Sent:* 31 May 2011 10:11
> *To:* Boris Popov, DeepCove Labs; [hidden email]
> *Subject:* RE: OrderedSet pitfall
>
> Boris,
>
> That 24 minute measurement was funny, and easily avoided. The attached
> code makes a 95% performance improvement in that measurement. It could
> be made even faster by using different Dictionary implementation.
>
> Paul Baumann
>
> | values |
>
> values := (1 to: 50000) collect: [:i | Time microsecondClock].
>
> Time millisecondsToRun: [OrderedSet withAll: values].
>
> "Faster_OrderedSet"
>
> 1348
>
> "Original"
>
> 29433
>
> 1 - (1348 / 29433) * -100.0
>
> -95.4201
>
> *From:*[hidden email] [mailto:[hidden email]] *On
> Behalf Of *Boris Popov, DeepCove Labs
> *Sent:* Monday, May 30, 2011 15:16
> *To:* [hidden email]
> *Subject:* Re: [vwnc] OrderedSet pitfall
>
> Heh,
>
> values := (1 to: 500000) collect: [:i | Time _microsecondClock_].
>
> values asSet size. 93303
>
> Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"
>
> Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new
>
> into:
>
> [:assoc :ea |
>
> (assoc value includes: ea)
>
> ifFalse:
>
> [assoc value add: ea.
>
> assoc key add: ea].
>
> assoc]) key]. 157
>
> -Boris
>
> *From:*[hidden email] [mailto:[hidden email]] *On
> Behalf Of *Boris Popov, DeepCove Labs
> *Sent:* 30 May 2011 13:39
> *To:* [hidden email]
> *Subject:* [vwnc] OrderedSet pitfall
>
> FYI, in case someone else is tempted to use stock OrderedSet as a way of
> removing duplicates from their sequenceable collections whilst
> preserving their order,
>
> values := (1 to: 500000) collect: [:i | Time millisecondClockValue].
>
> values asSet size. 259
>
> Time millisecondsToRun: [OrderedSet withAll: values]. 4086
>
> Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new
>
> into:
>
> [:assoc :ea |
>
> (assoc value includes: ea)
>
> ifFalse:
>
> [assoc value add: ea.
>
> assoc key add: ea].
>
> assoc]) key]. 149
>
> Regards,
>
> -Boris
>
> ------------------------------------------------------------------------
>
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have
> reason to believe you are not an intended recipient of this message,
> please delete it and notify the sender. This message may not represent
> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or
> affiliates, and does not constitute a contract or guarantee. Unencrypted
> electronic mail is not secure and the recipient of this message is
> expected to provide safeguards from viruses and pursue alternate means
> of communication where privacy or a binding message is desired.
>
>
> ------------------------------------------------------------------------
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have
> reason to believe you are not an intended recipient of this message,
> please delete it and notify the sender. This message may not represent
> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or
> affiliates, and does not constitute a contract or guarantee. Unencrypted
> electronic mail is not secure and the recipient of this message is
> expected to provide safeguards from viruses and pursue alternate means
> of communication where privacy or a binding message is desired.
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann
In reply to this post by Paul Baumann

Whoops. Pre-sizing didn't compensate for the allocation cost between designs. There is a tradeoff between memory and speed. The performance difference is huge,  though I'm not going to wait over 24 minutes to get a precise number for the 500000x test to compare with 817 ms.

 

| values |

values := (1 to: 1000) collect: [:i | Time microsecondClock].

AllocationProfiler profile: [OrderedSet withAll: values].

 

"Faster_OrderedSet + pre-size"

12256 bytes

 

"Original OrderedSet"

5240 bytes

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Baumann
Sent: Tuesday, May 31, 2011 11:35
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Boris,

 

It isn't even a tradeoff between allocation and speed if the collections are pre-sized properly. Collection class>>withAll: is missing overrides for collections that can be pre-sized.

 

Here is a 98% increase on the 95% increase I showed earlier.

 

| values |

values := (1 to: 500000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

818

817

 

46724

53783

 

1 - (817 / 46724) * -100.0

-98.2514

 

Just add this method to the code I provided earlier:

 

OrderedSet class>>withAll: aCollection

                " Answer a new instance of this class,

                whose elements are the elements of aCollection. "

 

                | newCollection |

                newCollection := self new: aCollection size.

                newCollection addAll: aCollection.

                ^newCollection

 

 

Paul Baumann

 

 

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:34
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Alan Knight-2
In reply to this post by Boris Popov, DeepCove Labs (SNN)
Haven't looked in detail, but I suspect that one could just use a binaryIndexOf:ifAbsent: search rather than a linear search and get a very large improvement without needing to introduce an additional collection or reliance on hashing. And for the particular case of addAll:, if that method knew enough to just sort the elements and then go once through them rejecting all those that were equal to the previous element, it would be quite a lot faster, probably faster than something relying on hashing.



[hidden email]
May 31, 2011 10:34 AM


Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:28 AM


Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:15 AM


Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:11 AM


Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 30, 2011 3:16 PM


Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)
In reply to this post by Paul Baumann

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 184

Time millisecondsToRun: [OrderedSet withAll: values]. 2795

AllocationProfiler profile: [OrderedSet withAll: values]. 2660 bytes

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 112

AllocationProfiler profile: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 5360 bytes

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 12:57
To: Paul Baumann; Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Whoops. Pre-sizing didn't compensate for the allocation cost between designs. There is a tradeoff between memory and speed. The performance difference is huge,  though I'm not going to wait over 24 minutes to get a precise number for the 500000x test to compare with 817 ms.

 

| values |

values := (1 to: 1000) collect: [:i | Time microsecondClock].

AllocationProfiler profile: [OrderedSet withAll: values].

 

"Faster_OrderedSet + pre-size"

12256 bytes

 

"Original OrderedSet"

5240 bytes

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Baumann
Sent: Tuesday, May 31, 2011 11:35
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Boris,

 

It isn't even a tradeoff between allocation and speed if the collections are pre-sized properly. Collection class>>withAll: is missing overrides for collections that can be pre-sized.

 

Here is a 98% increase on the 95% increase I showed earlier.

 

| values |

values := (1 to: 500000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

818

817

 

46724

53783

 

1 - (817 / 46724) * -100.0

-98.2514

 

Just add this method to the code I provided earlier:

 

OrderedSet class>>withAll: aCollection

                " Answer a new instance of this class,

                whose elements are the elements of aCollection. "

 

                | newCollection |

                newCollection := self new: aCollection size.

                newCollection addAll: aCollection.

                ^newCollection

 

 

Paul Baumann

 

 

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:34
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Alan Knight-2
In reply to this post by Alan Knight-2
OK, I guess I really didn't look in detail. I was thinking the OrderedSet kept them in a sorted order, rather than just the order in which they had been added.



[hidden email]
May 31, 2011 12:58 PM


Haven't looked in detail, but I suspect that one could just use a binaryIndexOf:ifAbsent: search rather than a linear search and get a very large improvement without needing to introduce an additional collection or reliance on hashing. And for the particular case of addAll:, if that method knew enough to just sort the elements and then go once through them rejecting all those that were equal to the previous element, it would be quite a lot faster, probably faster than something relying on hashing.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:34 AM


Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:28 AM


Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:15 AM


Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


[hidden email]
May 31, 2011 10:11 AM


Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)
In reply to this post by Alan Knight-2

Alan,

 

The idea is to preserve the order, plus not all objects would answer to #<=?

 

I hope I didn’t completely misread your intent.

 

-Boris

 

From: Alan Knight [mailto:[hidden email]]
Sent: 31 May 2011 12:59
To: Boris Popov, DeepCove Labs
Cc: Paul Baumann; [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Haven't looked in detail, but I suspect that one could just use a binaryIndexOf:ifAbsent: search rather than a linear search and get a very large improvement without needing to introduce an additional collection or reliance on hashing. And for the particular case of addAll:, if that method knew enough to just sort the elements and then go once through them rejecting all those that were equal to the previous element, it would be quite a lot faster, probably faster than something relying on hashing.



 

[hidden email]
May 31, 2011 10:34 AM

 



Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


 

[hidden email]
May 31, 2011 10:28 AM

 



Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


 

[hidden email]
May 31, 2011 10:15 AM

 



Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


 

[hidden email]
May 31, 2011 10:11 AM

 



Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


 

[hidden email]
May 30, 2011 3:16 PM

 



Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann
In reply to this post by Andres Valloud-6
Andres,

Yes, findKeyOrNil: costs are from collisions where I said misses (oops!). It is nice to know about the Hash Analysis Tool. I have your book on hashing, but frankly have never read it. I'm appreciate the attention you draw on hash distribution though; that can be a big problem that few people are aware of.

Hashed bucket design is certainly better. It compensates for a narrow hash range in VW. It shortens the linear search for empty slots as the collection reaches capacity. It grows in smaller steps without having to rehash. Identity collections can use the primitive you added to VW years ago. A hashed bucket design that also local indexes with conditional buckets is even better, and it is efficient for both large and small collections. Values can be within association instances to be a true Dictionary substitute. An optimized variety would avoid creating the associations.

UniqueObjectTable was not in my 7.7.1 installation; perhaps it is a 7.8 improvement or a feature I hadn't installed. SymbolTable is in my 7.7.1 image but not 7.5. I'm glad Cincom is finally making improvements in this area, now bring them to the collections that everyone else uses. People have only been raising these topics for a couple decades now. I can give you a good implementation to start with if you aren't too far into the official Cincom one.

>>example parcel that makes the image start sharing bytecode arrays across all compiled codes

That should be more than example. VisualAge had that optimization two decades ago as part of a runtime packaging step. It reduced runtime image size significantly. Later they made compiled method optimizations like that automatic using special methods.

We all appreciate the performance improvements Cincom makes. You guys made big advances at the VM level. Now that that is done, make some to the base too. Just make sure there is a good migration plan.

Cincom should also take advantage of your primitive 442 in other collections with the same structure (like OrderedCollection):

OrderedCollection>>identityIndexOf: oldValue replaceWith: newValue startingAt: startIndex stoppingAt: stopIndex
        "Searches the specified portion of the receiver for the first reference
         to oldValue, replacing that reference with newValue, and answers
         the 1-based index at which the oldValue was found. Zero is returned if
         the oldValue cannot be found. The search is restricted to and inclusive
         of the specified start and stop indices. The primitive fails if the start and
         stop indices are not SmallIntegers, if either is out of range, or if the
         receiver is immutable and oldValue ~~ newValue."

        ^self
                basic_identityIndexOf: oldValue
                replaceWith: newValue
                startingAt: startIndex + firstIndex - 1
                stoppingAt: stopIndex + lastIndex - 1


Too bad an equalityIndexOf:replaceWith:startingAt:stoppingAt: can't be created to do equality comparisons with a primitive.

Paul Baumann



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Andres Valloud
Sent: Tuesday, May 31, 2011 11:54
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

If your application spends a lot of time in findKeyOrNil: (or similar),
then it's because of collisions.  Here are two ways to go about it (I'm
sure you know most of this, but I think it's worthwhile writing it down
in general terms):

a) If you can avoid the collisions, then improve the hash function so
you can achieve O(1) efficiency on hashing (meaning: you find what you
are looking for in 1.01 or less lookups on average).  You can use the
Hash Analysis Tool to quantitatively and qualitatively evaluate hash
functions (available from the public Store repository, bundle "Hash
Analysis Tool" --- if you want more hash functions and data sets, load
"Hash Analysis Tool - Extensions" also).  Note that besides the extra
comparisons, hash misses cost a lot because the more you miss, the more
random memory reads you have to do in order to find what you are looking
for.  For example, I recently rewrote the non-interruptible GC.  The new
implementation is a bit more complex computationally, but since it
avoids unnecessary memory traffic it is up to 35% faster.

b) If you cannot avoid the collisions (e.g.: because you're using
identity hash), you should change the storage strategy of the hashed
collection.  For example, instead of open addressing linear probing (the
default strategy in every Smalltalk I know), use hash buckets.  You can
see an example of a hash bucketed set in the class SymbolTable, and
there is also the parcel UniqueObjectTable that generalizes the
approach, plus an example parcel that makes the image start sharing
bytecode arrays across all compiled codes.  You can find the parcels
under the parcels directory.

Finally, there's also the hashing book.

On 5/31/2011 7:28 AM, Paul Baumann wrote:

> Boris,
>
> Agreed. The biggest performance hit in the Collections is the hidden
> cost of #to:do: enumerations. It doesn't show in the profiler so tuning
> tends to focus on other costs instead of considering the design itself.
> Your example with OrderedSet>>includes: shows an extreme of how costly
> it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also
> suffer costs like these. There are collection designs that avoids those
> costs while also allowing index searches using primitive instead of #to:do:.
>
> Paul Baumann
>
> *From:*Boris Popov, DeepCove Labs [mailto:[hidden email]]
> *Sent:* Tuesday, May 31, 2011 10:15
> *To:* Paul Baumann; [hidden email]
> *Subject:* RE: OrderedSet pitfall
> *Importance:* High
>
> Paul,
>
> Totally. My comment was mostly to point out that there's quite a bit of
> room for improvement in that base system class considering that it seems
> such an obvious candidate for this specific task based on the name alone.
>
> Regards,
>
> -Boris
>
> *From:*Paul Baumann [mailto:[hidden email]]
> *Sent:* 31 May 2011 10:11
> *To:* Boris Popov, DeepCove Labs; [hidden email]
> *Subject:* RE: OrderedSet pitfall
>
> Boris,
>
> That 24 minute measurement was funny, and easily avoided. The attached
> code makes a 95% performance improvement in that measurement. It could
> be made even faster by using different Dictionary implementation.
>
> Paul Baumann
>
> | values |
>
> values := (1 to: 50000) collect: [:i | Time microsecondClock].
>
> Time millisecondsToRun: [OrderedSet withAll: values].
>
> "Faster_OrderedSet"
>
> 1348
>
> "Original"
>
> 29433
>
> 1 - (1348 / 29433) * -100.0
>
> -95.4201
>
> *From:*[hidden email] [mailto:[hidden email]] *On
> Behalf Of *Boris Popov, DeepCove Labs
> *Sent:* Monday, May 30, 2011 15:16
> *To:* [hidden email]
> *Subject:* Re: [vwnc] OrderedSet pitfall
>
> Heh,
>
> values := (1 to: 500000) collect: [:i | Time _microsecondClock_].
>
> values asSet size. 93303
>
> Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"
>
> Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new
>
> into:
>
> [:assoc :ea |
>
> (assoc value includes: ea)
>
> ifFalse:
>
> [assoc value add: ea.
>
> assoc key add: ea].
>
> assoc]) key]. 157
>
> -Boris
>
> *From:*[hidden email] [mailto:[hidden email]] *On
> Behalf Of *Boris Popov, DeepCove Labs
> *Sent:* 30 May 2011 13:39
> *To:* [hidden email]
> *Subject:* [vwnc] OrderedSet pitfall
>
> FYI, in case someone else is tempted to use stock OrderedSet as a way of
> removing duplicates from their sequenceable collections whilst
> preserving their order,
>
> values := (1 to: 500000) collect: [:i | Time millisecondClockValue].
>
> values asSet size. 259
>
> Time millisecondsToRun: [OrderedSet withAll: values]. 4086
>
> Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new
>
> into:
>
> [:assoc :ea |
>
> (assoc value includes: ea)
>
> ifFalse:
>
> [assoc value add: ea.
>
> assoc key add: ea].
>
> assoc]) key]. 149
>
> Regards,
>
> -Boris
>
> ------------------------------------------------------------------------
>
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have
> reason to believe you are not an intended recipient of this message,
> please delete it and notify the sender. This message may not represent
> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or
> affiliates, and does not constitute a contract or guarantee. Unencrypted
> electronic mail is not secure and the recipient of this message is
> expected to provide safeguards from viruses and pursue alternate means
> of communication where privacy or a binding message is desired.
>
>
> ------------------------------------------------------------------------
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have
> reason to believe you are not an intended recipient of this message,
> please delete it and notify the sender. This message may not represent
> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or
> affiliates, and does not constitute a contract or guarantee. Unencrypted
> electronic mail is not secure and the recipient of this message is
> expected to provide safeguards from viruses and pursue alternate means
> of communication where privacy or a binding message is desired.
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Andres Valloud-6
Paul,

On 5/31/2011 10:33 AM, Paul Baumann wrote:

> Identity collections can use the primitive you
> added to VW years ago.

I think the primitives you refer to were always there, all I did was to
use them.

> A hashed bucket design that also local indexes
> with conditional buckets is even better, and it is efficient for both
> large and small collections.

What do you mean?

> UniqueObjectTable was not in my 7.7.1 installation; perhaps it is a
> 7.8 improvement or a feature I hadn't installed. SymbolTable is in my
> 7.7.1 image but not 7.5. I'm glad Cincom is finally making
> improvements in this area, now bring them to the collections that
> everyone else uses. People have only been raising these topics for a
> couple decades now. I can give you a good implementation to start
> with if you aren't too far into the official Cincom one.

 > We all appreciate the performance improvements Cincom makes. You guys
 > made big advances at the VM level. Now that that is done, make some
 > to the base too. Just make sure there is a good migration plan.

I do have some hashing items on my plate for later in this release
cycle, but they relate more to identity hash.  I'm a bit too much
involved with GC and related items right now, I have to defer to Alan
for prioritization on general hashed collection work (e.g.: providing
multiple storage strategies).  I can say, though, that the hash bucketed
set implementation in UniqueObjectTable is pluggable and thread safe.

> That should be more than example. VisualAge had that optimization two
> decades ago as part of a runtime packaging step. It reduced runtime
> image size significantly. Later they made compiled method
> optimizations like that automatic using special methods.

The runtime packager unifies bytecode arrays too.  What the parcel does
is provide that functionality all the time as you write code.

> Too bad an equalityIndexOf:replaceWith:startingAt:stoppingAt: can't
> be created to do equality comparisons with a primitive.

Yes, this is a pain.  It's also why it's very important to find good
hash functions that find your object in essentially 1 try.  In
particular, if you can find a good quality hash function that allows you
to use open addressing linear probing (as opposed to e.g.: hash
buckets), then things go really fast because you find your object
sending basicAt: just once.

As further proof that hashing is awesome :D, right now method
dictionaries are sorted collections.  Method lookup involves either
linear search or binary search, depending on the method dictionary size.
  If method dictionaries were actual hashed collections, though,
megamorphic message sends would speed up by somewhere between 40% and
70% (I know because I did the experiment, so these percentages are
actual figures).  The gains occur because linear search and binary
search require more reads (and, in the case of binary search, the reads
occur at random locations), plus with binary search the jumps in the
code behave at random so the CPU can't predict them either.  And note
that these gains were obtained using the (comparatively slower) % (mod)
operator instead of & (bitAnd) to calculate the access index...

Andres.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann
>> A hashed bucket design that also local indexes
>> with conditional buckets is even better, and it is efficient for both
>> large and small collections.

>What do you mean?

It would be better to just send you the code. The design I last implemented is thread safe (with semaphore only needed for growth). Sequence and primitives avoid semaphore costs while keeping it thread safe. It is a design that I've shared before, so there may be some related history.

>I can say, though, that the hash bucketed set implementation in UniqueObjectTable is pluggable and thread safe

Being "pluggable" can mean many things. I assume you are saying that it can have either weak keys, values, both, or neither (mine does that too). It may mean that it contains internal objects for external reference (like associations), or may avoid that cost entirely (mine does that too). It may mean that it can be configured for either identity or equality comparisons (mine does that too). It may mean that it can be configured for how to handle duplicates (mine does that too). It is a flexible and fast kind of collection that can act like several other standard collections while being faster for many uses.

Let me know if/when you want to see it. I wrote it a couple years ago. It will take a day to find the code and see if it needs any changes to work with newer VW releases. It is with a larger body of code that I'd trim out. It was created to scale well while minimizing memory consumption. It worked perfect and demonstrated value, but the official implementation it was to take the place of had also evolved to be good enough.

>"megamorphic message sends"

I'd never heard term, but I think I get the gist of it.

I sometimes do optimizations like those at the application level. For some large enumerations it is much faster to compile a special block that is execute more efficiently and avoid performs and context switches. The GemStone Collection Views implementation that I presented at Smalltalk Solutions uses this technique to allow dynamic queries while being very fast.

An application-level cross-indexing design is probably closer though. I recently implemented one of those. It entirely eliminates queries, as objects are cross referenced through object affiliations that are automatically maintained. There can be many combinations and classifications of affiliations, but the number of objects in each affiliation group is relatively small and avoids query because results are pre-determined. It removes the need to query through root collections. It is easy to compare/contrast items between affiliation groups. Join-like objects referring to affiliation groups means results can be live and dynamic.

In your case, the methods are affiliated with other methods to narrow the search for a match by a small set of possibilities by context. What I'm describing doesn't need much hashing though; it is more of a relationship matrix.

Paul Baumann



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Andres Valloud
Sent: Tuesday, May 31, 2011 15:54
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

Paul,

On 5/31/2011 10:33 AM, Paul Baumann wrote:

> Identity collections can use the primitive you
> added to VW years ago.

I think the primitives you refer to were always there, all I did was to
use them.

> A hashed bucket design that also local indexes
> with conditional buckets is even better, and it is efficient for both
> large and small collections.

What do you mean?

> UniqueObjectTable was not in my 7.7.1 installation; perhaps it is a
> 7.8 improvement or a feature I hadn't installed. SymbolTable is in my
> 7.7.1 image but not 7.5. I'm glad Cincom is finally making
> improvements in this area, now bring them to the collections that
> everyone else uses. People have only been raising these topics for a
> couple decades now. I can give you a good implementation to start
> with if you aren't too far into the official Cincom one.

 > We all appreciate the performance improvements Cincom makes. You guys
 > made big advances at the VM level. Now that that is done, make some
 > to the base too. Just make sure there is a good migration plan.

I do have some hashing items on my plate for later in this release
cycle, but they relate more to identity hash.  I'm a bit too much
involved with GC and related items right now, I have to defer to Alan
for prioritization on general hashed collection work (e.g.: providing
multiple storage strategies).  I can say, though, that the hash bucketed
set implementation in UniqueObjectTable is pluggable and thread safe.

> That should be more than example. VisualAge had that optimization two
> decades ago as part of a runtime packaging step. It reduced runtime
> image size significantly. Later they made compiled method
> optimizations like that automatic using special methods.

The runtime packager unifies bytecode arrays too.  What the parcel does
is provide that functionality all the time as you write code.

> Too bad an equalityIndexOf:replaceWith:startingAt:stoppingAt: can't
> be created to do equality comparisons with a primitive.

Yes, this is a pain.  It's also why it's very important to find good
hash functions that find your object in essentially 1 try.  In
particular, if you can find a good quality hash function that allows you
to use open addressing linear probing (as opposed to e.g.: hash
buckets), then things go really fast because you find your object
sending basicAt: just once.

As further proof that hashing is awesome :D, right now method
dictionaries are sorted collections.  Method lookup involves either
linear search or binary search, depending on the method dictionary size.
  If method dictionaries were actual hashed collections, though,
megamorphic message sends would speed up by somewhere between 40% and
70% (I know because I did the experiment, so these percentages are
actual figures).  The gains occur because linear search and binary
search require more reads (and, in the case of binary search, the reads
occur at random locations), plus with binary search the jumps in the
code behave at random so the CPU can't predict them either.  And note
that these gains were obtained using the (comparatively slower) % (mod)
operator instead of & (bitAnd) to calculate the access index...

Andres.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Andres Valloud-6
> It would be better to just send you the code. The design I last
> implemented is thread safe (with semaphore only needed for growth).
> Sequence and primitives avoid semaphore costs while keeping it thread
> safe. It is a design that I've shared before, so there may be some
> related history.

I can take a look at it, sure.

> Being "pluggable" can mean many things. I assume you are saying that
> it can have either weak keys, values, both, or neither (mine does
> that too).

I mean pluggable in that you can define how #= and #hash are implemented
by the objects in question.  It does not have pluggable storage strategies.

>> "megamorphic message sends"
> I'd never heard term, but I think I get the gist of it.

Well, more precisely it should be "megamorphic message send sites"... or
send sites where the number of classes overflows the caching capacity of
a PIC.  Good examples of megamorphism can usually be found by looking at
places where classes receive messages.

For instance, isKindOf: has a strong tendency to become megamorphic.  If
method dictionaries were hashed collections, then horrific cases such as

awfulSlow := Object withAllSubclasses reverse.
Time millisecondsToRun: [awfulSlow do: [:each | each isKindOf: Object]]

would run anywhere from 40% to 70% faster.  Of course the above is not
meant to reflect a realistic application, so the point is merely the
illustration.  Nevertheless, real applications do send isKindOf:, and
sometimes to a large extent.  Here is the actual code from a workspace
of an image with hashed method dictionaries:

"After, before conversion"
classes := Object withAllSubclasses reverse.
Time millisecondsToRun:
        [
                3000 timesRepeat: [classes do: [:x | x yourself]]
        ] 6485


"After, after conversion (rollback to loop unroll)"
classes := Object withAllSubclasses reverse.
Time millisecondsToRun:
        [
                3000 timesRepeat: [classes do: [:x | x yourself]]
        ] 3903

6485 / 3903.0 1.66154

"Before"
Time millisecondsToRun: [200 timesRepeat: [classes do: [:x | x isKindOf:
Object class]]] 1187

"After"
Time millisecondsToRun: [200 timesRepeat: [classes do: [:x | x isKindOf:
Object class]]] 860

"Before"
Time millisecondsToRun: [200 timesRepeat: [classes do: [:x | x
canUnderstand: #yourself]]] 3089

"After"
Time millisecondsToRun: [200 timesRepeat: [classes do: [:x | x
canUnderstand: #yourself]]] 2210

3089 / 2210.0 1.39774

Andres.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Paul Baumann
In reply to this post by Boris Popov, DeepCove Labs (SNN)

Boris,

 

Your allocation results are different because your computer is faster. Your computer answered 32% fewer equal values than mine did from #millisecondClockValue. Your computer was 50% faster in the "OrderedSet withAll:" test and  65% faster on the OC->Set association. Your computer allocated 33% fewer bytes for that test because you had more equal values. Not a good comparison of implementation differences.

 

| values |

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size.

 

268

271

 

 

BTW, the OrderedSet class>>withAll: method I showed is not good for general use because so few of the additions were equal. The execution time was great, but the allocation was wasteful. There are techniques to pre-size with a reasonable estimate that will perform well, but that was not it. You'll get better performance in your examples if you pre-size the collections yourself according to what you estimate for your application.

 

"AllocationProfiler profile: [(OrderedSet new: 300) addAll: values]."  "1240 bytes"

 

...(values inject: (OrderedCollection new: 300) -> (Set new: 300)...

 

Your use of the Set would be more allocation efficient in your example code than the Dictionary I used in the refactored OrderedSet, but it is not faster when compared on a single computer. The OC->Set code (with pre-size shown above) ran in 221 ms on my computer, but the refactored OrderedSet  with the internal Dictionary did the work in 176 ms. Allocation bytes went from 4968 bytes to 9592 bytes; the dictionary is of course allocating Associations. Another Dictionary implementation would be faster while also being more efficient.

 

The Dictionary however would be faster when testing removals.  OrderedSet could be made to use either.  The allocations would be comparable because they rely on the same underlying collections.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 13:03
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 184

Time millisecondsToRun: [OrderedSet withAll: values]. 2795

AllocationProfiler profile: [OrderedSet withAll: values]. 2660 bytes

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 112

AllocationProfiler profile: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 5360 bytes

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 12:57
To: Paul Baumann; Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Whoops. Pre-sizing didn't compensate for the allocation cost between designs. There is a tradeoff between memory and speed. The performance difference is huge,  though I'm not going to wait over 24 minutes to get a precise number for the 500000x test to compare with 817 ms.

 

| values |

values := (1 to: 1000) collect: [:i | Time microsecondClock].

AllocationProfiler profile: [OrderedSet withAll: values].

 

"Faster_OrderedSet + pre-size"

12256 bytes

 

"Original OrderedSet"

5240 bytes

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Baumann
Sent: Tuesday, May 31, 2011 11:35
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Boris,

 

It isn't even a tradeoff between allocation and speed if the collections are pre-sized properly. Collection class>>withAll: is missing overrides for collections that can be pre-sized.

 

Here is a 98% increase on the 95% increase I showed earlier.

 

| values |

values := (1 to: 500000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

818

817

 

46724

53783

 

1 - (817 / 46724) * -100.0

-98.2514

 

Just add this method to the code I provided earlier:

 

OrderedSet class>>withAll: aCollection

                " Answer a new instance of this class,

                whose elements are the elements of aCollection. "

 

                | newCollection |

                newCollection := self new: aCollection size.

                newCollection addAll: aCollection.

                ^newCollection

 

 

Paul Baumann

 

 

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:34
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.



This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: OrderedSet pitfall

Boris Popov, DeepCove Labs (SNN)

Paul,

 

Oh, I wasn’t intending this as a comparison with your results, rather as a basic report of comparing base implementation with an #inject:into: alternative that I ended up using for this specific purpose of removing duplicates and preserving order. Unfortunately, I’m a bit swamped these days to really follow the in-depth discussion on the improvements that could be made to OrderedSet internally to achieve better results, but it sounds like there are some sound ideas for Cincom to ponder if/when they choose to.

 

Thanks!

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 02 June 2011 09:58
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Your allocation results are different because your computer is faster. Your computer answered 32% fewer equal values than mine did from #millisecondClockValue. Your computer was 50% faster in the "OrderedSet withAll:" test and  65% faster on the OC->Set association. Your computer allocated 33% fewer bytes for that test because you had more equal values. Not a good comparison of implementation differences.

 

| values |

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size.

 

268

271

 

 

BTW, the OrderedSet class>>withAll: method I showed is not good for general use because so few of the additions were equal. The execution time was great, but the allocation was wasteful. There are techniques to pre-size with a reasonable estimate that will perform well, but that was not it. You'll get better performance in your examples if you pre-size the collections yourself according to what you estimate for your application.

 

"AllocationProfiler profile: [(OrderedSet new: 300) addAll: values]."  "1240 bytes"

 

...(values inject: (OrderedCollection new: 300) -> (Set new: 300)...

 

Your use of the Set would be more allocation efficient in your example code than the Dictionary I used in the refactored OrderedSet, but it is not faster when compared on a single computer. The OC->Set code (with pre-size shown above) ran in 221 ms on my computer, but the refactored OrderedSet  with the internal Dictionary did the work in 176 ms. Allocation bytes went from 4968 bytes to 9592 bytes; the dictionary is of course allocating Associations. Another Dictionary implementation would be faster while also being more efficient.

 

The Dictionary however would be faster when testing removals.  OrderedSet could be made to use either.  The allocations would be comparable because they rely on the same underlying collections.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 13:03
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 184

Time millisecondsToRun: [OrderedSet withAll: values]. 2795

AllocationProfiler profile: [OrderedSet withAll: values]. 2660 bytes

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 112

AllocationProfiler profile: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 5360 bytes

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 12:57
To: Paul Baumann; Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Whoops. Pre-sizing didn't compensate for the allocation cost between designs. There is a tradeoff between memory and speed. The performance difference is huge,  though I'm not going to wait over 24 minutes to get a precise number for the 500000x test to compare with 817 ms.

 

| values |

values := (1 to: 1000) collect: [:i | Time microsecondClock].

AllocationProfiler profile: [OrderedSet withAll: values].

 

"Faster_OrderedSet + pre-size"

12256 bytes

 

"Original OrderedSet"

5240 bytes

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Baumann
Sent: Tuesday, May 31, 2011 11:35
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Boris,

 

It isn't even a tradeoff between allocation and speed if the collections are pre-sized properly. Collection class>>withAll: is missing overrides for collections that can be pre-sized.

 

Here is a 98% increase on the 95% increase I showed earlier.

 

| values |

values := (1 to: 500000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

818

817

 

46724

53783

 

1 - (817 / 46724) * -100.0

-98.2514

 

Just add this method to the code I provided earlier:

 

OrderedSet class>>withAll: aCollection

                " Answer a new instance of this class,

                whose elements are the elements of aCollection. "

 

                | newCollection |

                newCollection := self new: aCollection size.

                newCollection addAll: aCollection.

                ^newCollection

 

 

Paul Baumann

 

 

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:34
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Trading memory for speed, one could simply add an (identity) ‘set’ ivar to an OrderedSet to keep track of unique values, while using ‘values’ ivar to keep track of the order, which is essentially what my #inject:into: example was doing to remove duplicates from ordered collection.

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:28
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

Agreed. The biggest performance hit in the Collections is the hidden cost of #to:do: enumerations. It doesn't show in the profiler so tuning tends to focus on other costs instead of considering the design itself. Your example with OrderedSet>>includes: shows an extreme of how costly it can be. Methods like Dictionary>>findKeyOrNil: (on a miss) also suffer costs like these. There are collection designs that avoids those costs while also allowing index searches using primitive instead of #to:do:.

 

Paul Baumann

 

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, May 31, 2011 10:15
To: Paul Baumann; [hidden email]
Subject: RE: OrderedSet pitfall
Importance: High

 

Paul,

 

Totally. My comment was mostly to point out that there’s quite a bit of room for improvement in that base system class considering that it seems such an obvious candidate for this specific task based on the name alone.

 

Regards,

 

-Boris

 

From: Paul Baumann [mailto:[hidden email]]
Sent: 31 May 2011 10:11
To: Boris Popov, DeepCove Labs; [hidden email]
Subject: RE: OrderedSet pitfall

 

Boris,

 

That 24 minute measurement was funny, and easily avoided. The attached code makes a 95% performance improvement in that measurement. It could be made even faster by using different Dictionary implementation.

 

Paul Baumann

 

 

| values |

values := (1 to: 50000) collect: [:i | Time microsecondClock].

Time millisecondsToRun: [OrderedSet withAll: values].

 

"Faster_OrderedSet"

1348

"Original"

  29433

 

1 - (1348 / 29433) * -100.0

-95.4201

 

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: Monday, May 30, 2011 15:16
To: [hidden email]
Subject: Re: [vwnc] OrderedSet pitfall

 

Heh,

 

values := (1 to: 500000) collect: [:i | Time microsecondClock].

values asSet size. 93303

Time millisecondsToRun: [OrderedSet withAll: values]. 1475110 "24 minutes"

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 157

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Popov, DeepCove Labs
Sent: 30 May 2011 13:39
To: [hidden email]
Subject: [vwnc] OrderedSet pitfall

 

FYI, in case someone else is tempted to use stock OrderedSet as a way of removing duplicates from their sequenceable collections whilst preserving their order,

 

values := (1 to: 500000) collect: [:i | Time millisecondClockValue].

values asSet size. 259

Time millisecondsToRun: [OrderedSet withAll: values]. 4086

Time millisecondsToRun: [(values inject: OrderedCollection new -> Set new

                into:

                                [:assoc :ea |

                                (assoc value includes: ea)

                                                ifFalse:

                                                                [assoc value add: ea.

                                                                assoc key add: ea].

                                assoc]) key]. 149

 

Regards,

 

-Boris

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

 


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc