On 25.10.2016, at 23:54, Chris Muller <[hidden email]> wrote: > Hi Tobias and Eliot, > >>>> Folks there is NO BASIS at the level of Collection for assuming that >>>> do: [ : each | ^ >>>> false ] is faster than ^self size=0. In fact, the proper assumption >>>> is the opposite -- that #size is the optimized method, not #do:. >>> >>> I don't understand on what _either_ performance-assumption is grounded. > > Exactly. The original basis for making this change is groundless. No Chris, I said I don't understand _both_ assumptions. > >> self size = 0 doesn't work for infinite collections. > > What's an infinite collection, For example, a generator… or a right-open interval? [...] >> ts or more the "clever" implementation is faster. > > Then add Bag>>#isEmpty, don't change Collection>>#isEmpty, because it > changes a 30-year old assumption about how #isEmpty works for every > other kind of Collection. Kent Beck gives this example for 'Simple Delegation': Object subclass: #Vector instanceVariableNames: ‘elements’ Vector>>do: aBlock elements do: aBlock It seems the #do:-based implementation is 'more correct' here. I found no other recommendation in either Beck's or Thomas' style guides. To _me_ this suggests that what you perceive as 30-year-old-assumption is not as universal as '#do: should suffice'. Best regards -Tobias |
In reply to this post by monty-3
On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote:
> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override. > > Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too. I like Eliot's new implementation, in part because it teaches the reader something about how iteration works. It is still simple enough for an inexperienced person to read, but it also challenges the reader to think about the control flow. That said, I suspect that Monty and Chris are among the people with the most real-world experience in dealing with performance on very large collections. So, I would like to ask from a strictly practical point of view, does the change that we are discussing here cause real-world problems for Gemstone and Magma applications? Does the change make the maintenance of large collection classes more difficult for cross-dialect portability? Are there any collection classes (especially in Gemstone and Magma) that become slower in a real-world measurable sense as a result of this change? Dave |
+1.
Sent from my iPhone > On Oct 25, 2016, at 16:42, David T. Lewis <[hidden email]> wrote: > >> On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote: >> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override. >> >> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too. > > I like Eliot's new implementation, in part because it teaches the reader something > about how iteration works. It is still simple enough for an inexperienced person > to read, but it also challenges the reader to think about the control flow. > > That said, I suspect that Monty and Chris are among the people with the most > real-world experience in dealing with performance on very large collections. > > So, I would like to ask from a strictly practical point of view, does the change > that we are discussing here cause real-world problems for Gemstone and Magma > applications? > > Does the change make the maintenance of large collection classes more difficult > for cross-dialect portability? > > Are there any collection classes (especially in Gemstone and Magma) that become > slower in a real-world measurable sense as a result of this change? > > Dave > > |
In reply to this post by Nicolas Cellier
Hi Nicolas, I understand the angle you are seeing this from, but these
we distill these points down, we end up with no real-world value; but some unintended negatives, in fact.. > did you see the implementation of Collection>>size? > > size > "Answer how many elements the receiver contains." > > | tally | > tally := 0. > self do: [:each | tally := tally + 1]. > ^ tally > > Let's forget a minute about specific subclasses. > Does iterating over the whole collection is really the best idea for > implementing isEmpty? Asking this question is already assuming too much. Collection>>#isEmpty should not be optimized in terms of another method two sends away. Collection is the ivory tower which only codifies the elegant definitions of the general behaviors. We should let the subclasses be concerned with the nitty-gritty implementation details needed to go crazy with their (less elegant) optimizations. **We shouldn't try to make performance optimizations in Collection. > When we implement a message in Collection, we should think in term of > Collection, not Interval, Set nor Array. The old implementation already was implemented in terms of Collection. > We're talking of several things here: > - universality of the default implementation: The old implementation was universal. #size always has and always will be part of Collection. > we don't need to know about the size to tell if empty Not any more than we need to avoid it. Not any more than we need to initiate a #do: enumeration (which is very expensive for some). > not all collection have a known size So what will happen if they're sent #size then? If #shouldNotImplement then they will override #isEmpty. > - optimization: > old implementation is fast for collections having an optimized size, > catastrophic for others. Those are more rare, they should override #isEmpty for their specific needs. > new implementation is reasonably fast, and full optimization can be > restored with a few 1-liner. "Reasonably fast" for which subclass? We can't ignore the subclasses. > - cleverness: > IMO this is not more clever than, and quite logically connected to > implementation of size > > I agree that having several implementations for the sake of optimization is > not ideal. It told me something was amiss with this change. Too many subclasses disagree. > That's what would have curbed my enthusiasm. > Though I doubt about any noticeable slowdown in anything else than > micro-benchmarks. The SQLCollection example would be a large degradation. ----- It all distills down to slightly-negative. We should revert it. |
In reply to this post by David T. Lewis
I agreed that it looked good in the ivory tower. The thing is, there
are these real-world detractors which we should not igore. To save my old fingers, let me just summarize them from the earlier posts. In summary: - it provides no known material benefit - it could affect unsuspecting legacy application performance materially, and in an insidious way - it's an anomalous usage of the API which works against performance expectations about #size and #do:. - the justification for this change (performance optimization) in Collection is not valid in the first place. No real-world positives, just negatives or zeros. Please revert. On Tue, Oct 25, 2016 at 6:42 PM, David T. Lewis <[hidden email]> wrote: > On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote: >> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override. >> >> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too. > > I like Eliot's new implementation, in part because it teaches the reader something > about how iteration works. It is still simple enough for an inexperienced person > to read, but it also challenges the reader to think about the control flow. > > That said, I suspect that Monty and Chris are among the people with the most > real-world experience in dealing with performance on very large collections. > > So, I would like to ask from a strictly practical point of view, does the change > that we are discussing here cause real-world problems for Gemstone and Magma > applications? > > Does the change make the maintenance of large collection classes more difficult > for cross-dialect portability? > > Are there any collection classes (especially in Gemstone and Magma) that become > slower in a real-world measurable sense as a result of this change? > > Dave > > |
On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller <[hidden email]> wrote: Quite the opposite. It's a major improvement in basic API.I agreed that it looked good in the ivory tower. The thing is, there Having an O(1) runtime vs O(n) is a very large material benefit. Collection>>size is O(n) so it's not a good base for #isEmpty. - it could affect unsuspecting legacy application performance If you have performance problems, profile. - it's an anomalous usage of the API which works against performance The performance expectation of both #size and #do: in Collection is O(n). And there is nothing "anomalous" in using #do:, quite to the contrary, #do: is the *essence* of a collection of objects. - the justification for this change (performance optimization) in It is very valid optimization given the implementation of Collection>>size. No real-world positives, just negatives or zeros. Please revert. Yes, you will have to provide an optimized #isEmpty now, because you relied on an implementation detail of a superclass. But there never was a contract that #isEmpty needs to be implemented in terms of #size. The new implementation makes much more sense for an arbitrary abstract Collection, so I'd like to see it stay. - Bert - |
On 26 October 2016 at 09:35, Bert Freudenberg <[hidden email]> wrote:
+1. Especially since we've seen examples of Collections where #size just doesn't make sense: generators, Actor-style mailboxes, Xtreams (and, by extension, all reactive APIs). Which is exactly why in C# an IEnumerable (a Collection) doesn't even have a Count method. (It's added as an extension, and People Know(tm) to be careful using it on an abstract collection.) Really, to be properly Ivory Tower, and Elegant, Collection's #size should be removed. And as Eliot correctly points out, the only elegant way to ask a Collection about which you know nothing, whether it has any elements, is to _evaluate its first value_. I can't give real world Smalltalk experience reports, but I can say, with many burn marks to prove it, that asking an abstract collection of things for its size, is a terrible idea. frank
|
In reply to this post by Bert Freudenberg
I don't wish to belabor, but you completely ignored all of my (and
Monty's) arguments. Yes, there *is* a sort-of "contract" that #isEmpty should be implemented in terms of #size, and not #do:, that is now being violated. I explained why earlier and you even said it was a point worth discussing, but never did. The comment that was added to Collection>>#isEmpty advertises the flawed thinking. The flaws with all of your other statements were addressed in prior posts, too. I don't know why any of the advocates for this change couldn't address or even acknowledge those arguments. Even the comment that was added to Collection>>#isEmpty advertises the flawed thinking. I hope someone will go back and really _read_ the arguments being made in this thread. Just saw Franks note. Same thing.. :( On Wed, Oct 26, 2016 at 11:35 AM, Bert Freudenberg <[hidden email]> wrote: > On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller <[hidden email]> wrote: >> >> I agreed that it looked good in the ivory tower. The thing is, there >> are these real-world detractors which we should not igore. To save my >> old fingers, let me just summarize them from the earlier posts. In >> summary: >> >> - it provides no known material benefit > > > Having an O(1) runtime vs O(n) is a very large material benefit. > Collection>>size is O(n) so it's not a good base for #isEmpty. > >> >> - it could affect unsuspecting legacy application performance >> materially, and in an insidious way > > > If you have performance problems, profile. > >> >> - it's an anomalous usage of the API which works against performance >> expectations about #size and #do:. > > > The performance expectation of both #size and #do: in Collection is O(n). > And there is nothing "anomalous" in using #do:, quite to the contrary, #do: > is the *essence* of a collection of objects. > >> >> - the justification for this change (performance optimization) in >> Collection is not valid in the first place. > > > It is very valid optimization given the implementation of Collection>>size. > >> >> No real-world positives, just negatives or zeros. Please revert. > > > Quite the opposite. It's a major improvement in basic API. > > Yes, you will have to provide an optimized #isEmpty now, because you relied > on an implementation detail of a superclass. But there never was a contract > that #isEmpty needs to be implemented in terms of #size. The new > implementation makes much more sense for an arbitrary abstract Collection, > so I'd like to see it stay. > > - Bert - > > > > |
On Wed, Oct 26, 2016 at 11:41 AM, Chris Muller <[hidden email]> wrote: I don't wish to belabor, but you completely ignored all of my (and I don't see any such contract. Collection has a contract; to be fully functional as a collection class the subclass must implement #do:. This is stated; #do: is a subclass responsibility of Collection. I see no stated contract that #isEmpty is implemented in terms of #size. Given that in Collection #size is implemented in terms of #do:, we are free to implement #isEmpty et al in terms either of #do: or of #size. The new implementation is better for large collections, works for infinite collections, and is hence to be preferred.
I don't see what's special here. We've given arguments for the change; both Bert and Frank (effectively, in my view) refuted your arguments against the change. This is no different than other performance-related changes that have been made in the past. You have yet to present an example of code that is broken ny this change. Just saw Franks note. Same thing.. :( Looked pretty cogent and on point to me.
_,,,^..^,,,_ best, Eliot |
Hi Eliot, I put a lot of effort to convey these arguments for the
betterment of Squeak, I hope you'll oblige this last effort with an open and critical mind.. > I don't see any such contract. Collection has a contract; to be fully > functional as a collection class the subclass must implement #do:. This is > stated; #do: is a subclass responsibility of Collection. I see no stated > contract that #isEmpty is implemented in terms of #size. I said a "sort-of 'contract'", because the problem with this change is more about going against the universal reality of the Smalltalk environment and ecosystem, not against an explicit API contract. I made an explanation of this issue in my earlier post which began with "Since Smalltalk-80, sends to #do: ..." > Given that in > Collection #size is implemented in terms of #do:, we are free to implement > #isEmpty et al in terms either of #do: or of #size. The new implementation > is better for large collections, works for infinite collections, and is > hence to be preferred. If you understood that earlier post, then you understand why I believe we are NOT free to change Collection>>#isEmpty to operate in terms #do:, because we will have surreptitiously changed legacy applications which incur a much heavier cost with the #do: implementation. I provided a very plausible example of this (the SQLCollection) which should not be ignored. Now, I guess this change *already* did even slow down #isEmpty for some existing classes in the image(!!), to which Nicolas reponded to by copying-and-pasting the old implementation code into multiple classes in order to recover their original performance. Wow. In exchange for these blemishes, how many classes actually got benefit from inheriting the new implementation? Collection allSubclasses count: [ : each | (each lookupSelector: #size) methodClass = Collection] "0" So we're obviously not talking about benefit to any *in-image* kind of Collection, maybe just potential benefit to some... external application's Collection? But while hurting others..? Have we reached a point of enough diminishing returns yet? The reality is that #size is not only part of Smalltalk's original concept of a Collection since 30 years ago, but is universally expected to be fast, and Smalltalkers write there code under that assumption. That's why the responsibility should be on those unknown *exceptional* subclasses, the InfiniteCollection's or whatever, to deny their #size and provide their alternate #isEmpty's. Tallying it all up, we've got (-1) potential legacy impacts, (-1) all-new copy-and-pasted code in our core library, just so (+0) some exceptional-case external class MIGHT inherit this faster-for-them implementaiton rather than override it. I totally understand and agree with the ivory-tower arguments made in favor of this change, its just that in the practical world, it turns out to be a net negative. Best, Chris |
On Wed, Oct 26, 2016 at 05:51:40PM -0500, Chris Muller wrote:
> > In exchange for these blemishes, how many classes actually got benefit > from inheriting the new implementation? > > Collection allSubclasses count: [ : each | (each > lookupSelector: #size) methodClass = Collection] "0" > Fair point. Every subclass of Collection reimplements #size in some way. Collection>>size is never actually used by any class in the image, and possibly not in any classes outside the image. So Collection>>size serves as documentation of the intent, and as a default implementation if someone were to implement a new subclass of Collection directly. Some subclasses (e.g. Bag) would have similar characteristics and would presumably benefit from the new #isEmpty implementation, but most seem to have #size implementations that do not degrade in performance as size increases. So Chris is raising a fair question. The new #isEmpty looks like it should be faster, but is it actually so in practice? Dave p.s. I personally still prefer the new implementation on aesthetic grounds. |
In reply to this post by Chris Muller-3
On Thu, Oct 27, 2016 at 12:51 AM, Chris Muller <[hidden email]> wrote: In my book that's a speedup worth having.
| b | b := (1 to: 10000) asBag. [b isEmpty] bench before: : '5,540 per second. 181 microseconds per run.' now: '167,000 per second. 6 microseconds per run.' - Bert - |
In reply to this post by Bert Freudenberg
> there never was
> a contract that #isEmpty needs to be implemented in terms of #size. The > new implementation makes much more sense for an arbitrary abstract > Collection, so I'd like to see it stay. > > - Bert - +1 Stef |
In reply to this post by Bert Freudenberg
> | b |
> b := (1 to: 10000) asBag. > [b isEmpty] bench > > before: : '5,540 per second. 181 microseconds per run.' > now: '167,000 per second. 6 microseconds per run.' > > In my book that's a speedup worth having. You're depending on the superclass implementation in Collection for that speedup. What did you say about that yesterday? Honestly, Bert, it almost feels like you're being intellectually dishonest. You chose the only one that got sped up and ignored all the ones that got slowed down. AND, you also ignored all the other points (again) which render the above point irrelevant. Sad. I give up. You "win". |
Yes, in this case a series of tests across collection classes actually
would be a proof. A single point measurement is not a statistically relevant result. --Hannes On 10/27/16, Chris Muller <[hidden email]> wrote: >> | b | >> b := (1 to: 10000) asBag. >> [b isEmpty] bench >> >> before: : '5,540 per second. 181 microseconds per run.' >> now: '167,000 per second. 6 microseconds per run.' >> >> In my book that's a speedup worth having. > > You're depending on the superclass implementation in Collection for > that speedup. What did you say about that yesterday? > > Honestly, Bert, it almost feels like you're being intellectually > dishonest. You chose the only one that got sped up and ignored all > the ones that got slowed down. AND, you also ignored all the other > points (again) which render the above point irrelevant. Sad. > > I give up. You "win". > > |
Or to put it diffently: Bert supplied the first contribution in this
thread which is about performance actually measuring performance...... More is welcome. On 10/27/16, H. Hirzel <[hidden email]> wrote: > Yes, in this case a series of tests across collection classes actually > would be a proof. A single point measurement is not a statistically > relevant result. > > --Hannes > > On 10/27/16, Chris Muller <[hidden email]> wrote: >>> | b | >>> b := (1 to: 10000) asBag. >>> [b isEmpty] bench >>> >>> before: : '5,540 per second. 181 microseconds per run.' >>> now: '167,000 per second. 6 microseconds per run.' >>> >>> In my book that's a speedup worth having. >> >> You're depending on the superclass implementation in Collection for >> that speedup. What did you say about that yesterday? >> >> Honestly, Bert, it almost feels like you're being intellectually >> dishonest. You chose the only one that got sped up and ignored all >> the ones that got slowed down. AND, you also ignored all the other >> points (again) which render the above point irrelevant. Sad. >> >> I give up. You "win". >> >> > |
In reply to this post by Bert Freudenberg
Or just make Collection>>#size a subclassResponsibiliity!
This will force implementors to give proper #size implementations that are usually O(1), which almost all do already. Bag could get the #do:-based #isEmpty as an optimization, or we could add a dedicated 'tally' inst var like other collections have and its #size could return that, making the default #size-based #isEmpty OK for it. This is a compromise that addresses the concerns about the default Collection>>#isEmpty degrading linearly without a constant #size and the concerns about harming the performance of existing libraries and applications with custom collections that assume Collection>>#isEmpty is #size-based. Please consider it. > From: "Bert Freudenberg" <[hidden email]> > To: "Chris Muller" <[hidden email]>, "The general-purpose Squeak developers list" <[hidden email]> > Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work > > On Thu, Oct 27, 2016 at 12:51 AM, Chris Muller <[hidden email][mailto:[hidden email]]> wrote: > In exchange for these blemishes, how many classes actually got benefit > from inheriting the new implementation? > > | b | > b := (1 to: 10000) asBag. > [b isEmpty] bench > > before: : '5,540 per second. 181 microseconds per run.' > now: '167,000 per second. 6 microseconds per run.' > In my book that's a speedup worth having. > |
Free forum by Nabble | Edit this page |