On Wed, 23 Dec 2009, Stéphane Ducasse wrote:
>> Of course, using repeated #add: is inefficient (#addAll: has a rule >>> for sorting once or at every #add:). >> >> That's true, but #addAll:'s rule is not generally useful. For example: >> >> | s | >> s := (1 to: 300000) asSortedCollection. >> [ s addAll: (1 to: 100000) ] timeToRun ===> 24698 >> >> | s | >> s := (1 to: 300000) asSortedCollection. >> [ s addAll: (1 to: 100001) ] timeToRun ===> 1124 (+1 element makes a big difference) > > why + 1 makes a so big difference? performance of the different methods of addition: addAll: aCollection aCollection size > (self size // 3) ifTrue: [aCollection do: [:each | self addLast: each]. self reSort] ifFalse: [aCollection do: [:each | self add: each]]. ^ aCollection A better check could solve this, but who wants to come up with a formula that's easy to calculate (aka fast) and gives accurate results? Levente > >> And in squeak: >> | s | >> s := (1 to: 300000) asOrderedCollection. >> [ s addAll: (1 to: 100000); sort ] timeToRun ===> 763 (mergesort beats quicksort) > > cool. > This is fun to see that during years we pushed so that squeak moves and we got bashed > for that and now squeak is moving because of us. At least we succeeded in something. > >> So the only use-case where SortedCollection is useful (IMO) is when you >> - have to keep the collection sorted and >> - you rarely some elements to it >> >> Getting rid of other uses is hard because #asSortedCollection always returns a copy, while most #as* selectors don't. >> >> >> Levente >> >> >>> >>>> If you pick the changes from squeak trunk, Array and OrderedCollection (and >>>> SortedCollection) will understand #sort and #sort: and will have the same >>>> semantics: sort in place (by a stable sorting algorithm). >>>> >>> >>> Don't open a new issue. There is >>> http://code.google.com/p/pharo/issues/detail?id=1214 >>> http://code.google.com/p/pharo/issues/detail?id=1346 >>> >>> Nicolas >>> >>>> >>>> Levente >>>> >>>>> >>>>> Stef >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> [hidden email] >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [hidden email] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>>>>
>>>> >> why + 1 makes a so big difference? > > Because #addAll: has a simple check that doesn't care about the performance of the different methods of addition: > > addAll: aCollection > aCollection size > (self size // 3) > ifTrue: > [aCollection do: [:each | self addLast: each]. > self reSort] > ifFalse: [aCollection do: [:each | self add: each]]. > ^ aCollection > > A better check could solve this, but who wants to come up with a formula that's easy to calculate (aka fast) and gives accurate results? Thanks. I like these kind of conversation because they deeply illustrate what I always like in Smalltalk: learning by reading the system. Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Levente Uzonyi-2
Levente Uzonyi wrote:
> > > So the only use-case where SortedCollection is useful (IMO) is when you > - have to keep the collection sorted and > - you rarely some elements to it > > Getting rid of other uses is hard because #asSortedCollection always > returns a copy, while most #as* selectors don't. I was slightly confused by this, but now think I see what you're saying. Most #as* selectors return an instance of a different class, but answer self if the receiver is already of that class. But collections always answer a copy, even if it is already of that class. Wondering why this was designed this way... Perhaps it's the expected mutability of collections? When you send #asInteger, you don't care if you get the receiver back, because it's not mutable. But with a collection you almost always mutate it. In the case of #asString, Strings are mutable, but are not *commonly* mutated. Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Nicolas Cellier
Nicolas Cellier wrote:
>> >> #sorted, #sorted: -- VW already has this. VA is adding it. >> Squeak/Pharo should add it, IMO, but we'll add it to Grease if not. It >> returns a sorted copy of SequenceableCollections and a sorted Array of >> all other collections. >> > > Oh, that's exactly the one I would have chosen :) > We should go for it, both Squeak and Pharo. I fear that #sorted is too close to #sort, but with very different semantics, and could cause confusion. I'd think twice before following VW's lead on this one. Is this a common enough case to be warrant its own selector, instead of "myCollection copy sort" which is much clearer in intent? If so, perhaps #copySorted would be clearer? Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Stéphane Ducasse wrote:
> > Not really :) > I was more looking at the user perspective. > You often want an orderedCollection (add:) vs Array at: > but which deals with duplicate as a Set. Could you clarify? What's an example? What is the index of something that you add but was already there? Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Nicolas Cellier
Nicolas Cellier wrote:
>> SortedCollection should be used as rarely as possible. In the past it was >> used for sorting which is a misuse of this data structure (I wonder why such >> data strucure exists at all, it's almost never useful). >> > > The only case should be to keep collection automatically sorted when > its contents further evolves... Yes, that has always been its only purpose. And it seems valuable, though I'd like to see it expanded to a collection with 0-n automatically-maintained sort orders, rather than exactly one sort order. Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Adrian Kuhn
Adrian Kuhn wrote:
> Levente Uzonyi <leves@...> writes: > >>> what is TreeSet? >> The common data structure to implement ordered sets. Which one do you use? >> >> IMO ordered sets (where ordering means the same as in OrderedCollection) >> should be implemented with a hashtable of linked elements. > > You're right. > > I confused ordered and sorted. > I'd think that a hashtable of elements that are linked into a tree structure would also be an attractive implementation of a sorted set. Insert and delete time would be O(log n) but lookup time would be O(1). Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Martin McClure-2
On Wed, 23 Dec 2009, Martin McClure wrote:
> Levente Uzonyi wrote: >> >> >> So the only use-case where SortedCollection is useful (IMO) is when you >> - have to keep the collection sorted and >> - you rarely some elements to it >> >> Getting rid of other uses is hard because #asSortedCollection always >> returns a copy, while most #as* selectors don't. > > I was slightly confused by this, but now think I see what you're saying. > > Most #as* selectors return an instance of a different class, but answer > self if the receiver is already of that class. But collections always > answer a copy, even if it is already of that class. All collections answer self, except instances of OrderedCollection SortedCollection and CharacterSet (at least in squeak). > > Wondering why this was designed this way... > I think these methods (#asOrderedCollection #asSortedCollection #asCharacterSet) were added later than the others and those who added them didn't think about this implementation detail. > Perhaps it's the expected mutability of collections? When you send > #asInteger, you don't care if you get the receiver back, because it's > not mutable. But with a collection you almost always mutate it. In the That's not true, LargeIntegers are mutable. I don't have numbers, but I think that most senders of #as* don't mutate the collection. Levente > case of #asString, Strings are mutable, but are not *commonly* mutated. > > Regards, > > -Martin > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Martin McClure-2
On Wed, 23 Dec 2009, Martin McClure wrote:
> Adrian Kuhn wrote: >> Levente Uzonyi <leves@...> writes: >> >>>> what is TreeSet? >>> The common data structure to implement ordered sets. Which one do you use? >>> >>> IMO ordered sets (where ordering means the same as in OrderedCollection) >>> should be implemented with a hashtable of linked elements. >> >> You're right. >> >> I confused ordered and sorted. >> > > I'd think that a hashtable of elements that are linked into a tree > structure would also be an attractive implementation of a sorted set. > Insert and delete time would be O(log n) but lookup time would be O(1). I don't see the point here. Without a tree structure (just a list) it's O(1) for both. Levente > > Regards, > > -Martin > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Stéphane Ducasse wrote:
>> #sortBlock: is defined by ANSI on SortedCollection as an accessor and >> an instance creation method. That's what it sounds like and shouldn't >> be implemented on non-sorted collections. > > Why? what would be the reason not to use sortBlock: for others? The intention is different. #sortBlock: tells a collection to maintain itself as sorted even after additions and deletions. This message is not appropriate for collections that do not have that ability. #sortBlock: is not the best name for this functionality, but I think it's bad to use #sortBlock: with a different meaning. > > >> #sorted, #sorted: -- VW already has this. VA is adding it. > > What is sorted? a boolean predicate method? > What a bad name. Did they forgot Kent? An example of my previous message -- this name is confusing and should probably not be used. >> Squeak/Pharo should add it, IMO, but we'll add it to Grease if not. It >> returns a sorted copy of SequenceableCollections and a sorted Array of >> all other collections. > > Please not this is bogus. It does not work with certain subclasses of Sequenceable > And much more important it has a bad name. > Because with a nice name > sortCopy then it would be much better. > So with one method in VW you can get it > > sortCopy > ^ self sorted +1 Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Levente Uzonyi-2
2009/12/23 Levente Uzonyi <[hidden email]>:
> On Wed, 23 Dec 2009, Martin McClure wrote: > >> Levente Uzonyi wrote: >>> >>> >>> So the only use-case where SortedCollection is useful (IMO) is when you >>> - have to keep the collection sorted and >>> - you rarely some elements to it >>> >>> Getting rid of other uses is hard because #asSortedCollection always >>> returns a copy, while most #as* selectors don't. >> >> I was slightly confused by this, but now think I see what you're saying. >> >> Most #as* selectors return an instance of a different class, but answer >> self if the receiver is already of that class. But collections always >> answer a copy, even if it is already of that class. > > All collections answer self, except instances of OrderedCollection > SortedCollection and CharacterSet (at least in squeak). > Not to be confused with (self as: Array) that always answer a copy in Squeak/Pharo. While looking at it, FixedIdentitySet asArray ^ self, is this really expected ? Nicolas >> >> Wondering why this was designed this way... >> > > I think these methods (#asOrderedCollection #asSortedCollection > #asCharacterSet) were added later than the others and those who added them > didn't think about this implementation detail. > >> Perhaps it's the expected mutability of collections? When you send >> #asInteger, you don't care if you get the receiver back, because it's >> not mutable. But with a collection you almost always mutate it. In the > > That's not true, LargeIntegers are mutable. I don't have numbers, but I > think that most senders of #as* don't mutate the collection. > > > Levente > >> case of #asString, Strings are mutable, but are not *commonly* mutated. >> >> Regards, >> >> -Martin >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Stéphane Ducasse <stephane.ducasse@...> writes:
> I was more looking at the user perspective. > You often want an orderedCollection (add:) vs Array at: > but which deals with duplicate as a Set. A simple method should do that job Collection >> #addIfAbsent: element (self includes: element) ifFalse: [ self add: element ] usage coll := OrderedCollection new. coll addIfAbsent: #foo. coll addIfAbsent: #bar. coll addIfAbsent: #qux. coll addIfAbsent: #bar. self assert: coll asArray = #(foo bar qux). --AA _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Stéphane Ducasse wrote:
>> >> Not all collections are sortable and #sort: is sorting in-place, so it's hardly doable. > > But it would be nice to have sort: and sortBlock: polymorph. > Because right now we cannot have OrderedCollection and Array or SortedCollection interchanged. OrderedCollection, Array, and SortedCollection by design all have different semantics, not just different internal implementations. In such a case you cannot expect to be able to interchange them in all circumstances. In many cases you can, but when for instance you use #sortBlock: you'd better not have an Array or OrderedCollection, and when you use #add: you'd better not have an Array, because those classes do not support the *meanings* of those selectors. Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Martin McClure-2
2009/12/23 Martin McClure <[hidden email]>:
> Nicolas Cellier wrote: >>> >>> #sorted, #sorted: -- VW already has this. VA is adding it. >>> Squeak/Pharo should add it, IMO, but we'll add it to Grease if not. It >>> returns a sorted copy of SequenceableCollections and a sorted Array of >>> all other collections. >>> >> >> Oh, that's exactly the one I would have chosen :) >> We should go for it, both Squeak and Pharo. > > I fear that #sorted is too close to #sort, but with very different > semantics, and could cause confusion. I'd think twice before following > VW's lead on this one. Is this a common enough case to be warrant its > own selector, instead of "myCollection copy sort" which is much clearer > in intent? If so, perhaps #copySorted would be clearer? > > Regards, > > -Martin > I'd like a selector that I can apply to any collection like a Set too. For example, in some implementations keys are a Set (original st80) in others keys are an Array (Squeak trunk/Pharo). So, in order to have more portable pieces of code I ended up with (self keys asArray sort). Maybe the most simple thing would be to define Collection>>sort ^self asArray sort > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
2009/12/23 Nicolas Cellier <[hidden email]>:
> 2009/12/23 Martin McClure <[hidden email]>: >> Nicolas Cellier wrote: >>>> >>>> #sorted, #sorted: -- VW already has this. VA is adding it. >>>> Squeak/Pharo should add it, IMO, but we'll add it to Grease if not. It >>>> returns a sorted copy of SequenceableCollections and a sorted Array of >>>> all other collections. >>>> >>> >>> Oh, that's exactly the one I would have chosen :) >>> We should go for it, both Squeak and Pharo. >> >> I fear that #sorted is too close to #sort, but with very different >> semantics, and could cause confusion. I'd think twice before following >> VW's lead on this one. Is this a common enough case to be warrant its >> own selector, instead of "myCollection copy sort" which is much clearer >> in intent? If so, perhaps #copySorted would be clearer? >> >> Regards, >> >> -Martin >> > > I'd like a selector that I can apply to any collection like a Set too. > For example, in some implementations keys are a Set (original st80) in > others keys are an Array (Squeak trunk/Pharo). > So, in order to have more portable pieces of code I ended up with > (self keys asArray sort). > Maybe the most simple thing would be to define > Collection>>sort > ^self asArray sort Arg, I pressed tab, then (oops) space and sent the message too soon... I wanted to add this comment: "Sort the collection in place if possible, otherwise answer a sorted Array. Subclasses that can sort in place should override this message." > >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Adrian Kuhn
2009/12/23 Adrian Kuhn <[hidden email]>:
> Stéphane Ducasse <stephane.ducasse@...> writes: > >> I was more looking at the user perspective. >> You often want an orderedCollection (add:) vs Array at: >> but which deals with duplicate as a Set. > > A simple method should do that job > > Collection >> #addIfAbsent: element > (self includes: element) ifFalse: [ self add: element ] > > usage > > coll := OrderedCollection new. > coll addIfAbsent: #foo. > coll addIfAbsent: #bar. > coll addIfAbsent: #qux. > coll addIfAbsent: #bar. > self assert: coll asArray = #(foo bar qux). > > --AA > Yes in O(n)... The challenge was to provide a fast implementation. > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Nicolas Cellier
Nicolas Cellier wrote:
>>> >> I'd like a selector that I can apply to any collection like a Set too. >> For example, in some implementations keys are a Set (original st80) in >> others keys are an Array (Squeak trunk/Pharo). >> So, in order to have more portable pieces of code I ended up with >> (self keys asArray sort). >> Maybe the most simple thing would be to define >> Collection>>sort >> ^self asArray sort > > Arg, I pressed tab, then (oops) space and sent the message too soon... > I wanted to add this comment: > "Sort the collection in place if possible, otherwise answer a sorted Array. > Subclasses that can sort in place should override this message." > Interesting. Possibly good. But what would you do when you knew you wanted a *copy* that was sorted? "myCollection copy sort" would work, but might or might not create two copies, depending on whether the original class was one that could be sorted in place. I'm now leaning toward defining #copySorted or some such on Collection for this purpose. And *maybe* also doing what you propose with #sort. Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Nicolas Cellier
Nicolas Cellier wrote:
> > While looking at it, FixedIdentitySet asArray ^ self, is this really expected ? FixedIdentitySet is a very odd beast. I'm afraid I don't see the value of it at all, but I haven't looked at the code that uses it. Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Levente Uzonyi-2
Levente Uzonyi wrote:
>>> >> I'd think that a hashtable of elements that are linked into a tree >> structure would also be an attractive implementation of a sorted set. >> Insert and delete time would be O(log n) but lookup time would be O(1). > > I don't see the point here. Without a tree structure (just a list) it's > O(1) for both. How do you get O(1) for sorted inserts into a list? That's usually considered an O(n) operation. Or am I misunderstanding what you mean by list? Regards, -Martin _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Wed, 23 Dec 2009, Martin McClure wrote:
> Levente Uzonyi wrote: >>>> >>> I'd think that a hashtable of elements that are linked into a tree >>> structure would also be an attractive implementation of a sorted set. >>> Insert and delete time would be O(log n) but lookup time would be O(1). >> >> I don't see the point here. Without a tree structure (just a list) it's >> O(1) for both. > > How do you get O(1) for sorted inserts into a list? That's usually > considered an O(n) operation. Or am I misunderstanding what you mean by > list? Well, just realized that you are thinking about a sorted set. The original idea was to create an ordered set. Levente > > Regards, > > -Martin > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |