Hi,
out of curiosity I did an initial port of Clojure's Reducers library. I've just pushed it to the public repository. It is by no means complete or optimized. In fact, I ask you for your impressions and ideas where to go from here. So, what's in here? Reducers builds upon the simple abstraction that a collection is a reducible object, i.e., an object that understands #reduce:init:. It provides fundamental collection operations similar to classical methods, e.g., like #collect:, but offers specific advantages: - Reducers are composable and reusable. - Memory efficiency, since no intermediate collections are constructed. - Enable collection operations on custom types simply by implementing #reduce:init:. - Possibility of parallel operations on collections, e.g., with Polycephaly 2. For example, the sum of odd numbers could be computed like this: | col red | col := (1 to: 10**6). (Reducer filter: #odd from: col) reduce: #+ init: 0. "composition of reducers works like this:" red := (Reducer filter: #odd) <> (Reducer map: #squared). (red from: col) reduce: #+ init: 0. Some questions: 1. What do you think of the current API? 2. How could one improve the execution time? (Optimizations to the construction of the (nested) blocks?) Have fun! Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
I almost forgot to link these articles that introduce Reducers:
http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html Regards, Steffen Am 09.05.2013, 16:40 Uhr, schrieb Steffen Märcker <[hidden email]>: > Hi, > > out of curiosity I did an initial port of Clojure's Reducers library. > I've just pushed it to the public repository. It is by no means complete > or optimized. In fact, I ask you for your impressions and ideas where to > go from here. > > So, what's in here? > Reducers builds upon the simple abstraction that a collection is a > reducible object, i.e., an object that understands #reduce:init:. It > provides fundamental collection operations similar to classical methods, > e.g., like #collect:, but offers specific advantages: > - Reducers are composable and reusable. > - Memory efficiency, since no intermediate collections are constructed. > - Enable collection operations on custom types simply by implementing > #reduce:init:. > - Possibility of parallel operations on collections, e.g., with > Polycephaly 2. > > For example, the sum of odd numbers could be computed like this: > > | col red | > col := (1 to: 10**6). > (Reducer filter: #odd from: col) reduce: #+ init: 0. > > "composition of reducers works like this:" > red := (Reducer filter: #odd) <> (Reducer map: #squared). > (red from: col) reduce: #+ init: 0. > > Some questions: > 1. What do you think of the current API? > 2. How could one improve the execution time? (Optimizations to the > construction of the (nested) blocks?) > > Have fun! > Steffen > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
just out of curiosity: what's so bad about select and inject?
why not just say ((1 to: 10**6) select:#odd) inject: 0 into: [:sum :each | sum + each].
here it's like: ((col select:#odd) collect:#squared) inject: 0 into:[:sum :each | sum + each]. Maybe i'm missing something here, but the Collection class already implements most of the Reducer stuff. Kind Regards Karsten Karsten Kusche - Dipl. Inf. (FH) - [hidden email] Georg Heeg eK - Köthen Handelsregister: Amtsgericht Dortmund A 12812 Am Donnerstag, 9. Mai 2013 um 17:58 schrieb Steffen Märcker:
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Karsten has a point - and if you need to lazily apply filters over a potentially very large virtual collection, there is also Xtreams: streamingReducer := (((1 to: 10**6) reading selecting: #odd) collecting: #squared) injecting: 0 into: [:sum :each | sum + each]. result := streamingReducer rest. Cheers, Michael On 13/05/2013, at 6:17 PM, Karsten Kusche <[hidden email]> wrote:
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Karsten Kusche
Thanks for your response.
> just out of curiosity: what's so bad about select and inject? They're fine, just different. But obviously I didn't make the points clear. A second attempt with five advantages. 1) No intermediate collections Consider the example of chained collection operations: > ((col select:#odd) > collect:#squared) > inject: 0 into: [:sum :each | sum + each]. The calls of #select: and #collect: create two intermediate collections. This wastes memory, since they are discarded immediately. In contrast: > (Reducer filter: #odd from: > (Reducer map: #squared from: col)) > reduce: #+ init: 0 The reducers wrap collections (reducibles). They do not create a new collection but transform the reducing block (symbol) when #reduce:init: is called. Eventually, the source collection is asked to reduce itself with the transformed block in just one step. Thus, no intermediate collections are generated. 2) Laziness enables handling of infinite collections Consider some infinite collecition, e.g., the natural numbers. Assume an implementation similar to: > Naturals>>reduce: function init: x > | result n | > result := x. > n := 0. > [result := [function value: result value: n]] repeat With reducers you can operate on the first five numbers: > (Reducer take: 5 from: Naturals new) > reduce: #+ init: 0. operate on all but the first 5 numbers: > (Reducer drop: 5 from Naturals new) ... or multiply the first five squared prime numbers: > (Reducer map: #squared from: > (Reducer filter: #isPrime from: Naturals new)) > reduce: #* init: 1. 3) Curried and composable Reducers can be created and composed without any collection to allow reuse. Consider: > squared := Reducer map: #squared. > odds := Reducer filter: #odd. > ((squared <> odd) from: collection) reduce: #+ init: 0. 4) Parallelizable (future work) Clojure's reducers parallelize many of the common collection operations. This is possible, since the reducing functions are transformed and then applied in one step to the leaves / collection items. For associative reducing functions, the work can now be parallelized with a fork/combine approach. Using Polycephaly, this can be implemented as well. Of course, the gained performance would depend heavily on the data and reduction function. 5) Simple to enable collection operations on custom types Randy Coulman pointed out, that sometimes, one needs to provide collection operations on custom data types. This usually requires implementing all the methods: http://randycoulman.com//blog/2013/04/30/affordances-specialized-collections/ Reducers allow to support all reductions just by implementing #reduce:init: in the custom type. I recommend you to have a look at Rich Hickey's original posts on Clojure's reducers, too. I think that's pretty coll stuff. http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html What do you think? Regards, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
I put some typos in the naturals reduction. Better is:
Naturals>>reduce: function init: x | result n | result := x. n := -1. [n := n+1. result := function value: result value: n.] repeat Am 13.05.2013, 12:01 Uhr, schrieb Steffen Märcker <[hidden email]>: > | result n | _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Steffen Märcker
I've done a lot of work with optimizing enumerations. I've done every kind of optimization that you hope to grow Reducers into. I find "Reducers" to be difficult to read so I'm likely missing some of what makes Reducers great. I'm going try to explain Reducers in my own words and let you reply with how it really works. My goal is to show you a few opportunities that you might not have thought of yet.
I see the "Reducer" examples you've given as somewhat similar to an Enumeration pattern as you might read from the GOF "Design Patterns" book. The Reducer is a model that controls how an enumeration is to proceed from where it currently is. It is a more sophisticated form of a Stream enumerating a collection of objects. A Stream has a position and rules for repositioning within a collection. A Stream also has a different protocol than a collection does, and is an interactive enumerator that does not need to create a results collection. The Reducer appears to be more like a Promise of results that are queried from a collection into a separate results collection. The Reducer has rules to determine what objects will be in the results collection. In a sense, a Reducer can be thought of as a Stream that is extended with the protocol of Collection and has special rules for determining the #next object. The point of the Reducer seems to be that it is to allow for results to be gathered independent of code that will be enumerating those results. You've suggested that a Reducer can be used to break up a work into parallel activities. I've used optimizations like that many times--but it only works where you can truly get parallel activities or where it is useful have a separate query with a promise of results. You don't get much benefit when VW "green" processes are forked in a single threaded VM. GemStone/S is one Smalltalk dialect where it is possible to break up work into separate OS processes that truly can work in parallel. Uses in VW tend to only be when dealing with delays to an external interface (like a socket, a database, or to the UI). A trick that I like to use for VW is to get the GS/S session gem gathering results while VW is busy opening the window that will display the results. That trick allows the UI to always remain responsive. It requires some tricky use of semaphores and UIEvent processing, but it works well. It would be possible to model a Reducer to do that, but I think it would a special kind of Reducer with those "Promise" tricks. An enumeration framework (CollectionViews) that I maintain (that extends GS/S and VW code) breaks any kind of collection into smaller logical or physical groups by way of #by: and #by:do: methods on Collection. It is a quick way of chunking a collection of objects so that each chunk can be processed (or replicated on my case) separately. That framework also has "SequenceUnion" objects that can also present those chunks a single logical collection to a remote object space. I mention that as a practical example of what you may have in mind for Reducers. Consider using reducers as a way to move large amounts of data in chunks. Enumerating a large collection using a complex block is a performance issue in both GS/S and VW. A complex block is a block closure that needs to know execution context (I forget what Cincom calls them). Context related objects are created and disposed of for each block evaluation. Code runs much slower than if you enumerate with a clean/simple block. Tuning to remove large complex block enumerations is one of the best ways to improve GS/S performance. Your Reducers don't have those complex block costs because they have symbols to perform. The problem is that #perform: is just as slow as if you were using a complex block because each execution creates and disposes an equal number of objects. My point is, if you want Reducers to perform well then make sure they can also perform simple blocks (with no external references). Use of #perform: is convenient but it tends to be slow. One trick you can use in Reducers to do is to just-in-time compile a simple block from a selector that you'd normally perform. You execute the simple block repeatedly instead of having multiple #perform: executions. I have those features incorporated into experimental versions of my "Efficient GemStone Enumeration" framework. It basically optimizes some kinds of complex blocks to be simple blocks for the duration of a large enumeration. You use Reducers to avoid creating intermediate collections. That is a very good tuning technique. I usually use #inject:into: to do tricks like that. I find the Reducers syntax to be somewhat confusing, and I'm very familiar with #inject:into:. Reducers may be a better and cleaner solution, but at this point it just reads like a different language. I wonder how operation order would be controlled with Reducers. An important feature of an enumeration is the ability to exit the enumeration when a condition is met. Along with that is the need to act on the result(s) before returning them. I suggest you build features like that into Reducers if you don't already have them. Consider how you'd use a Reducer to model enumerations like #detect:ifNone: and #anySatisfy:; you would not want to enumerate more than is necessary, and the result you return is different. You can use a #do: enumeration over all the Smalltalk collections, but there are often times you want to enumerate with either #keysAndValuesDo: or #associationsDo:. Reducers should have some way to specify what kinds of objects are to be enumerated and also have control over the kinds of objects that will be used in the results collection. For collections that are a collection of Association instances you may sometimes want the actual association of the original collection to be used in the results collection. Years ago I'd implemented a model-based enumeration framework. My experience with that design was that the cost of creating the enumeration objects was on average equal to the enumeration savings it achieved; however, tests on smaller and simpler collections had much worse performance. It was faster in many cases, but not enough to justify it. Fortunately that design was valued for more than performance. The enumeration frameworks that I've written since then are much closer to the "metal" of the VM execution and have an extreme focus on eliminating all object creation along the way. Point is to just tune your Reducer as much as practical to avoid instance creation costs, and realize that Reducers are still likely to be generally slower if broadly used. I hope this helps with your work. No need to give a point-by-point response. Just glean what helps your implementation from my reply. There is a lot of fun to be had in code like that. Regards, Paul Baumann -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Steffen Märcker Sent: Thursday, May 09, 2013 11:58 To: [hidden email] Subject: Re: [vwnc] Reducers in Smalltalk Importance: Low I almost forgot to link these articles that introduce Reducers: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html Regards, Steffen Am 09.05.2013, 16:40 Uhr, schrieb Steffen Märcker <[hidden email]>: > Hi, > > out of curiosity I did an initial port of Clojure's Reducers library. > I've just pushed it to the public repository. It is by no means > complete or optimized. In fact, I ask you for your impressions and > ideas where to go from here. > > So, what's in here? > Reducers builds upon the simple abstraction that a collection is a > reducible object, i.e., an object that understands #reduce:init:. It > provides fundamental collection operations similar to classical > methods, e.g., like #collect:, but offers specific advantages: > - Reducers are composable and reusable. > - Memory efficiency, since no intermediate collections are constructed. > - Enable collection operations on custom types simply by implementing > #reduce:init:. > - Possibility of parallel operations on collections, e.g., with > Polycephaly 2. > > For example, the sum of odd numbers could be computed like this: > > | col red | > col := (1 to: 10**6). > (Reducer filter: #odd from: col) reduce: #+ init: 0. > > "composition of reducers works like this:" > red := (Reducer filter: #odd) <> (Reducer map: #squared). > (red from: col) reduce: #+ init: 0. > > Some questions: > 1. What do you think of the current API? > 2. How could one improve the execution time? (Optimizations to the > construction of the (nested) blocks?) > > Have fun! > Steffen > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Dear Paul,
thank you very much for your comprehensive response and the link to CollectionViews. > I find "Reducers" to be difficult to read [...] I scented this issue. Probably it's due to three reasons: 1. Uncommon Selectors Although I find #map:, #filter etc. easy to read, they differ from the common language of #collect:, #select and so on. I choose them to indicate their slightly different semantics. Namely, the do not produce a new collection and work on keyed collections as well. However the selectors are not carved in stone; I am open to any ideas. 2. Decoupled Implementation The protocol is implemented in Reducers and interfaces to collections only via #reduce:init:. Thus, we have >> Reducer filter: #odd from: numbers. instead of >> numbers filter: #odd. To make Reducers easy to extend, I decided not to extend Collection itself. Since, you just have to implement a new transformer class in order to support a new reduction operation (e.g. concatenate) on all implementors of #reduce:init:. You may add a convenience method to Reducer, but that's optional. For example: >> Reducer class>>filter: aBlock from: aReducible >> ^self apply: (Mapping function: function) on: aReducible How do you feel about this design choice? Maybe convenience methods should be generated automatically? 3. First Class Operations, Composition The library treats reducers as minimal, composable, first class operations that live independently of a collection. This resembles the functional approach of the original Clojure implementation. It reads a bit uncommon in ST but offers (in my opinion) some nice opportunities. > The point of the Reducer seems to be that it is to allow for results to > be gathered independent of code that will be enumerating those results. Indeed, you can put it this way. And the reducer operations are independent from the representation (read: source collection), too. They build entirely upon a minimal interface, i.e., #reduce:init:. > I wonder how operation order would be controlled with Reducers. Do you mean the order in which chained reductions are applied? The meaning of <> is (function) composition. Thus, composed reducers are to be read from right to left. For example: >> (negated <> squared <> primes) from: Naturals new. >> "is short for" >> negated from: (squared from: (primes from: Naturals new)). >> "and yields the negation of all squared primes" Where negated, squared and primes are straight-forward reducers, e.g. negated := Reducer map: #negated. Optimized Blocks ---------------- Reducers rely on the #value: interface. Thus, blocks can be used instead of symbols to avoid the #perform: overhead. ("reduce: #+ init: 0" just reads nicely.) You're right, the performance crucially depends on the (generated) blocks. I am very interested in any suggestion to optimize them. Currently, almost all blocks are copying. This means, they do not depend on the context but exclusively access some variables. For example: >> Reducer filter: filterBlock generates a block on #reduce:init: similar to: >> transform: reduceBlock >> ^[:result :each | >> (filterBlock value: each) >> ifTrue: [reduceBlock value: result value: each] >> ifFalse: [result]] Even if filterBlock and reduceBlock are clean and will not change, the transformation yields a copying block that executes slower than a clean one. I did some experiments with decompiling the clean input blocks and compiling a clean transformed block. However this proved to be far too slow for serious application. It would be nice to just inline/insert the compiled clean blocks in an already compiled transformation block. But I don't know whether this is possible at all. Do you have any ideas? Keyed Collections ----------------- Reducers support the reduction with keys and values by the arity of the provided blocks/symbols (binary or ternary). Assume, for example, a dictionary that maps some log files to their creation time (logs : File -> Time): >> "yesterdays non-empty logs" >> yesterday := Reducer filter: [:l :t | l notEmpty & (t day = yesterday)]. >> "gather associations" >> associations := Reducer map: [:l :t | (l -> t)]. >> "apply to logs" >> associations <> yesterday from: logs. Probably, the associations reduction should be added to Reducer, e.g., as Reducer>>associations. Key-value reduction works for collections that understand #keysValuesDo:. Custom data types have to provide a reasonable #reduce:init: implementation to support it. Stop on Condition ----------------- Reducers provide a ReducedNotification (exception) for this purpose. For example, #take: and #takeWhile: make use of it: >> Reducer take: 5 from: anInfiniteCollection. >> "or" >> Reducer takeWhile: [:day | day hasRain not] from: days. Both, the reducers and the generic implementation of #reduce:init: handle the notification. Thus, if a custom type does not handle it in its implementation of #reduce:init:, reducers still work. Future Work ----------- - Some common operations, e.g. concatenation, are missing - Further Block optimization - Experiment with parallelism, e.g. using Polycephaly - Unit tests I wrote the Reducers package because I was curious whether and how Clojure's reducers library would work in Smalltalk. It's simply a spare time project. Thus, future development depends on the community's feedback and my free time. I hope, this did not get too lengthy. ;) Kind Regards, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Two minor points.
1. Reducer class>>filter:from: should be: >> Reducer class>>filter: aBlock from: aReducible >> ^self apply: (Filtering function: aBlock) on: aReducible 2. You can write short >> associations := Reducer map: #->. for >> associations := Reducer map: [:l :t | (l -> t)]. Regards, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Steffen Märcker
I've just pushed a new version that improves the API a little bit:
- added Collection>>reducer to support more smalltalkish style: ((1 to: 10) reducer filter: #odd) map: #squared - composition style: squares := (Reducer map: #squared). odds := (Reducer filter: #odd). squared <> odds from: (1 to: 10) Regards, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Apart from the composition, what advantage does this give me over Xtreams. Xtreams is not just read, it's also capable of write and it backs up to IO with a streaming API. Some kinds of streams are also positionable, which is another bonus.
Cheers, Michael On 15/05/2013, at 10:37 PM, Steffen Märcker <[hidden email]> wrote: > I've just pushed a new version that improves the API a little bit: > > - added Collection>>reducer to support more smalltalkish style: > ((1 to: 10) reducer filter: #odd) map: #squared > - composition style: > squares := (Reducer map: #squared). > odds := (Reducer filter: #odd). > squared <> odds from: (1 to: 10) > > Regards, > Steffen > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Dear Micheal,
I'll try to only indicate some answers, since I am not too familiar with Xtreams. I only used parts of it, e.g., the neat PEG package. Please keep in mind that Xtreams is an extensive library, whereas Reducers is in an early stage and neither complete nor optimized. For example, so far the only way to write to a collection/drain is to accumulate with #reduce:init:, e.g., > red reduce: #copyWith: init: #(). * Composition You already mentioned this. * Performance Reducer objects do nothing more than transforming a reducing function (block, symbol). The reduction with the transformed function is done by the underlying source (~terminal). Thus, apart from the one-time transformation, there is no overhead. This property may give a performance edge over the classical enumeration operations and Xtreams. First simple tests look promising to accomplish this. However, the tests were not extensive and hardly representative. * Parallelism Since most operations, e.g map and filter, do not assume any order or representation (see below: Concept), they are open for parallelization. Clojure's reducers library explores this in terms of a single fold function on top of its ForkJoin framework. ST Reducers could do that using Polycephaly, too. High-level(!) parallel fold might be useful for complex operations on generated or small sources. (e.g., parallel verification and result accumulation for a collection of files / data sets) I don't know what Xtreams offers in this area. * Simple to Extend To support all the library's collection operations on a custom type, just implement #reduce:init:. Optionally, you may add #reducer to have convenient access to the API, e.g. "obj reducer take: 5". To add a new operation that is applicable to all reducibles, subclass ReducingFunctionTransformator and implement #transform:. Optionally, you may add a convenience message to Reducer, e.g. Reducer>>myCollOp. On the one hand, extending Xtreams to work with new types or to support new operations seems to require more work, e.g. implementing a read/write stream for the custom type. On the other hand, Xtreams offers a very rich API and has broad support for different sources and drains. * Concept / Architecture Xtreams is a streaming library that supports sequential reading and writing on a variety of sources and drains (called terminals). As far as I understand, streams are (potentially nested) specialized wrappers around the terminals. The transforming/enumerations work is mainly done by recursive calls between those layers. In contrast, the Reducers approach builds only on reduction via #reduce:init: (resp. fold) and function transformation. Instead of wrapping a source (i.e. a Reducible) and distributing the work among the wrapper layers, the reducing functions (e.g. #+) themselves are transformed and executed directly on the source / leaves. For example: > (((1 to: 10) filter: #odd) map: #squared) reduce: #+ init: 0. is short for > f1 := (Filtering function: #odd) transform: #+. > f2 := (Mapping function: #squared) transform: f2. > (1 to: 10) reduce: f2 init: 0. Reducers are actually just "helper" objects that allow the transformations do be expressed in a compositional and convenient way. The operations, like map and filter, have different, fundamentally simpler semantics than their sequence-based counterparts. Most of them don't know about: * order * laziness * representation - don't build anything It's this simplicity what makes the concept powerful. I don't want to judge which approach is better. Instead, I treat reducers as an alternative view of collections that provides a complementary set of fundamental operations with very promising properties. Ciao, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Michael Lucas-Smith-2
There is already a way to compose the blocks used by select:/collect: etc. and avoid intermediate collections. For example in the case of test1 = [:i| I odd]. test2 := [:i| (i\\3) = 0]. ((0 to: 100) select: test1) select: test2 Does indeed iterate over the collection twice. But: (0 to: 100) select: [:i| (test1 value: i) and: [test2 value: i]] Only iterates over it once with the two tests composed. If you want nicer syntax you could create a BlockCollection class with ways to compose blocks. For example: aBlockCollection := (test1 & test2) || test3. Then modify select: to take a block collection so you can write (1 to: 100) select: aBlockCollection. Extensions to support block collections for collect could also be written say: aBlockCollection := map1, map2, map3. Then (1 to: 100) collect: aBlockCollection would give you the result of feeding the output of each block into the next. Joerg -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Steffen Märcker Sent: Friday, May 17, 2013 10:06 AM To: Michael Lucas-Smith Cc: [hidden email] Subject: Re: [vwnc] Reducers in Smalltalk Dear Micheal, I'll try to only indicate some answers, since I am not too familiar with Xtreams. I only used parts of it, e.g., the neat PEG package. Please keep in mind that Xtreams is an extensive library, whereas Reducers is in an early stage and neither complete nor optimized. For example, so far the only way to write to a collection/drain is to accumulate with #reduce:init:, e.g., > red reduce: #copyWith: init: #(). * Composition You already mentioned this. * Performance Reducer objects do nothing more than transforming a reducing function (block, symbol). The reduction with the transformed function is done by the underlying source (~terminal). Thus, apart from the one-time transformation, there is no overhead. This property may give a performance edge over the classical enumeration operations and Xtreams. First simple tests look promising to accomplish this. However, the tests were not extensive and hardly representative. * Parallelism Since most operations, e.g map and filter, do not assume any order or representation (see below: Concept), they are open for parallelization. Clojure's reducers library explores this in terms of a single fold function on top of its ForkJoin framework. ST Reducers could do that using Polycephaly, too. High-level(!) parallel fold might be useful for complex operations on generated or small sources. (e.g., parallel verification and result accumulation for a collection of files / data sets) I don't know what Xtreams offers in this area. * Simple to Extend To support all the library's collection operations on a custom type, just implement #reduce:init:. Optionally, you may add #reducer to have convenient access to the API, e.g. "obj reducer take: 5". To add a new operation that is applicable to all reducibles, subclass ReducingFunctionTransformator and implement #transform:. Optionally, you may add a convenience message to Reducer, e.g. Reducer>>myCollOp. On the one hand, extending Xtreams to work with new types or to support new operations seems to require more work, e.g. implementing a read/write stream for the custom type. On the other hand, Xtreams offers a very rich API and has broad support for different sources and drains. * Concept / Architecture Xtreams is a streaming library that supports sequential reading and writing on a variety of sources and drains (called terminals). As far as I understand, streams are (potentially nested) specialized wrappers around the terminals. The transforming/enumerations work is mainly done by recursive calls between those layers. In contrast, the Reducers approach builds only on reduction via #reduce:init: (resp. fold) and function transformation. Instead of wrapping a source (i.e. a Reducible) and distributing the work among the wrapper layers, the reducing functions (e.g. #+) themselves are transformed and executed directly on the source / leaves. For example: > (((1 to: 10) filter: #odd) map: #squared) reduce: #+ init: 0. is short for > f1 := (Filtering function: #odd) transform: #+. > f2 := (Mapping function: #squared) transform: f2. > (1 to: 10) reduce: f2 init: 0. Reducers are actually just "helper" objects that allow the transformations do be expressed in a compositional and convenient way. The operations, like map and filter, have different, fundamentally simpler semantics than their sequence-based counterparts. Most of them don't know about: * order * laziness * representation - don't build anything It's this simplicity what makes the concept powerful. I don't want to judge which approach is better. Instead, I treat reducers as an alternative view of collections that provides a complementary set of fundamental operations with very promising properties. Ciao, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Hi Joerg!
> There is already a way to compose the blocks used by select:/collect: > etc. and avoid intermediate collections. Yes, there is - namely by hand. Let's consider the following example: > (((1 to: 10) reducer filter: #odd) map: squared) reduce: #+ init: 0. It can be written the classic way: > (((1 to: 10) select: #odd) collect: #squared) inject: 0 into: #+. Which creates two intermediate collections. Again, we can avoid them: > (1 to: 10) inject: 0 into: > [:sum :n | n odd > ifTrue: [sum + n squared] > ifFalse: [sum]] But now we have to think in terms of #inject:into and craft the behavior of #select: and #collect: into the block by hand. This style is harder to read, especially for more complex conditions and mappings. > If you want nicer syntax you could create a BlockCollection class with > ways to compose blocks. For example: > aBlockCollection := (test1 & test2) || test3. > Then modify select: to take a block collection so you can write (1 to: > 100) select: aBlockCollection. > [... analogously for collect: block1, block2 ...] This idea uses specific block combinators (#&, #||, #,) for the different types of enumeration messages. Besides implementing the BlockCollection class, you extend BlockClosure with those new messages. Furthermore, you have to modify each implementor for each of the enumeration methods #collect:, select: and so on. This approach does not seem to be composable or easy to extend. Moreover, it doesn't give you the other benefits I've sketched before. (And it does not simplify the example above.) Did you gave reducers already a try? Kind regards, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
(((1 to: 10) reducer filter: #odd) map: squared) reduce: #+ init: 0. This style is harder to read, especially for more complex conditions and mappings. Actually, I'd like to completely disagree. It is, in my opinion, much easier to see what the latter does. The easiest to read is the most naive. Write the following extension methods: Given Collection>>sum, (((1 to: 10) select: #odd) collect: #squared) sum. Or you could use Collection>>inject:into: here instead of writing Collection>>sum. Or you could go the other way and implement, for instance Collection>>oddElements etc, and just do (1 to: 10) allOddElements sumOfSquares. What I really don't want to see is unnecessary complexity or in the code because programmers think that it will be faster or more efficient. For instance, one of your concerns was the creation of an intermediate block. But you have offered very little proof that this is even a problem. Who knows what the VM will do, especially if it does JIT optimization? or uses the GPU? As a full time performance engineer I spend a lot of time looking at incomprehensible code (often not Smalltalk) which has been 'pre-optimized' by developers, and this pre-optimization is largely (though not solely) responsible for its poor readability. Even where their efforts work (and you would be *amazed* how often they either have zero or even negative effects on real-world performance), the cost of maintaining such obscurantist code is nearly always vastly more than finding better hardware. In short, write your application first, then profile it and *then* optimize it - and only those bits which need it. The observed performance problems will very rarely coincide with the expectation of even some of the most talented developers. These Reducer things do not look self explanatory and to me that is adverse to the entire spirit of Smalltalk. My view would be that I would need to see *much* stronger evidence of *very* significant advantages of this approach before I would even consider using such an approach. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Dear John, in general I agree. I think the real issue is that Reducers are coming from a mindset different from Smalltalk: functional programming, which is rooted in mathematics. From my point of view functional programming looks rather at the “how to implement functionality” whereas Smalltalk’s philosophy is “what domain to model”. As a programmer of non-mathematical software I have an intuitive notion of what a Collection is (a collection of things). Yet what is a “Reducible”? Naively: something that can be reduced, i. e. a value?? If I am a programmer of mathematical software though, I may have an intuitive notion of what a Reducible is, just like I have a notion of vectors, matrices etc.. Having said that I think Reducers (a Do-er: is this OO think at all?) may have their place, yet not as a general concept. To not be misunderstood: I don’t judge anything like implementation details or claimed benefits of the one or other implementation. And I don’t want to belittle the work Steffen has done. I just talk about the philosophy behind. Cheers Helge Von: [hidden email] [mailto:[hidden email]] Im Auftrag von john woods
Actually, I'd like to completely disagree. It is, in my opinion, much easier to see what the latter does. The easiest to read is the most naive. Write the following extension methods:
Or you could use Collection>>inject:into: here instead of writing Collection>>sum. Or you could go the other way and implement, for instance Collection>>oddElements etc, and just do (1 to: 10) allOddElements sumOfSquares. What I really don't want to see is unnecessary complexity or in the code because programmers think that it will be faster or more efficient. For instance, one of your concerns was the creation of an intermediate block. But you have offered very little proof that this is even a problem. Who knows what the VM will do, especially if it does JIT optimization? or uses the GPU? As a full time performance engineer I spend a lot of time looking at incomprehensible code (often not Smalltalk) which has been 'pre-optimized' by developers, and this pre-optimization is largely (though not solely) responsible for its poor readability. Even where their efforts work (and you would be *amazed* how often they either have zero or even negative effects on real-world performance), the cost of maintaining such obscurantist code is nearly always vastly more than finding better hardware. In short, write your application first, then profile it and *then* optimize it - and only those bits which need it. The observed performance problems will very rarely coincide with the expectation of even some of the most talented developers. These Reducer things do not look self explanatory and to me that is adverse to the entire spirit of Smalltalk. My view would be that I would need to see *much* stronger evidence of *very* significant advantages of this approach before I would even consider using such an approach. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Hi!
Your're right Helge, Reducers originated from a functional (actually LISP-like) language, Clojure. My idea was to find out how the concept could work out in ST. It turned out, that it translates directly to iterating with (nested) blocks. The function transformers are just factories that produce those blocks. However, I am not sure why > (((1 to: 10) reducer filter: #odd) map: squared) > reduce: #+ init: 0. appears so hard to read in contrast to > (((1 to: 10) select: #odd) collect: #squared) > inject: 0 into: #+. Is it the uncommon (renameable) selectors? Or the concept of building a receipt (i.e, a reducer) for a collection instead of using intermediate collections? > Given Collection>>sum, (((1 to: 10) select: #odd) collect: #squared) sum. Besides the intermediate collection, I am a bit uncertain about the benefits of #sum or #allOddElements, too. These methods are quite the opposite of the generic high-level concept of #collect:, #select: and so on. (Which build upon a functional concept, blocks aka anonymous functions.) Also I am quite surprised by John's stress on optimization. Performance was just one of the possible benefits I can imagine. I did never claim that it will be significant or appropriate in all use cases. However, avoiding unnecessary work (e.g., intermediate collections) seems not too bad an idea. But what about the other opportunities? Regards, Steffen _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Nowak, Helge
As this conversation proves, one of the hardest things is to get people to actually try something new. It's more natural to respond with "I don't need it, I can cope with the basic functions I already have", or "I already know another add-on that offers some of the same benefits". Both are true, and I'm guilty on both counts. But in a sense the value of Reducers may be precisely that they offer something familiar to functional programmers, as Helge says. By giving them familiar vocabulary and things they value (e.g. parallelism), we make Smalltalk a little more palatable. And even old hands at Smalltalk may learn something new - after all, most agree that inject:into: always feels a little lacking.
I'm really glad to see Steffen's work, and the Xtreams work of Michael and Martin. I'm not using either in anger yet, because I don't have enough pain from the existing base collection and stream support to justify adding a dependency on a non-base component. But I do think my brain benefitted from being stretched in undergraduate days with FP, and more recently with a little greenfield project using Xtreams.
All the best,
Steve From: [hidden email] on behalf of Nowak, Helge Sent: Tue 21/05/2013 17:00 To: john woods Cc: [hidden email] Subject: Re: [vwnc] Reducers in Smalltalk Dear John,
in general I agree.
I think the real issue is that Reducers are coming from a mindset different from Smalltalk: functional programming, which is rooted in mathematics. From my point of view functional programming looks rather at the “how to implement functionality” whereas Smalltalk’s philosophy is “what domain to model”. As a programmer of non-mathematical software I have an intuitive notion of what a Collection is (a collection of things). Yet what is a “Reducible”? Naively: something that can be reduced, i. e. a value?? If I am a programmer of mathematical software though, I may have an intuitive notion of what a Reducible is, just like I have a notion of vectors, matrices etc.. Having said that I think Reducers (a Do-er: is this OO think at all?) may have their place, yet not as a general concept.
To not be misunderstood: I don’t judge anything like implementation details or claimed benefits of the one or other implementation. And I don’t want to belittle the work Steffen has done. I just talk about the philosophy behind.
Cheers Helge
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von john woods
Actually, I'd like to completely disagree. It is, in my opinion, much easier to see what the latter does. The easiest to read is the most naive. Write the following extension methods:
Or you could use Collection>>inject:into: here instead of writing Collection>>sum. Or you could go the other way and implement, for instance Collection>>oddElements etc, and just do (1 to: 10) allOddElements sumOfSquares. What I really don't want to see is unnecessary complexity or in the code because programmers think that it will be faster or more efficient. For instance, one of your concerns was the creation of an intermediate block. But you have offered very little proof that this is even a problem. Who knows what the VM will do, especially if it does JIT optimization? or uses the GPU? As a full time performance engineer I spend a lot of time looking at incomprehensible code (often not Smalltalk) which has been 'pre-optimized' by developers, and this pre-optimization is largely (though not solely) responsible for its poor readability. Even where their efforts work (and you would be *amazed* how often they either have zero or even negative effects on real-world performance), the cost of maintaining such obscurantist code is nearly always vastly more than finding better hardware. In short, write your application first, then profile it and *then* optimize it - and only those bits which need it. The observed performance problems will very rarely coincide with the expectation of even some of the most talented developers. These Reducer things do not look self explanatory and to me that is adverse to the entire spirit of Smalltalk. My view would be that I would need to see *much* stronger evidence of *very* significant advantages of this approach before I would even consider using such an approach. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
from a previous message, as an old hands at Smalltalk, (or old feet),
I'll wirte it : (1 to: 10) inject: 0 into: [:sum :n | sum + (n odd ifTrue: [ n squared ] ifFalse: [ 0 ]) ] But, I like new way of programming if there is a real change in point of view. Perhaps the case with Reducers. And I don't like to change a technique if it is still working efficiently. I have a concrete case : I have large arrays of numbers. I want to show different aspects of theses series of numbers. I may have to show samples taken randomly into the whole set, because calculations are too long to be done on each elements. For example, I can sample 1 of 100 of the elements. A contrario, I may have to do over and over the same calculations on portions of this population. How Reducers may help me ? Should I implement a system that remember what is already calculated to improve performance and saturate memory, or a "lazy" evaluation system, that will develop and calculate only the value really needed ? 2013/5/21 Steven Kelly <[hidden email]>: > As this conversation proves, one of the hardest things is to get people to > actually try something new. It's more natural to respond with "I don't need > it, I can cope with the basic functions I already have", or "I already know > another add-on that offers some of the same benefits". Both are true, and > I'm guilty on both counts. But in a sense the value of Reducers may be > precisely that they offer something familiar to functional programmers, as > Helge says. By giving them familiar vocabulary and things they value (e.g. > parallelism), we make Smalltalk a little more palatable. And even old hands > at Smalltalk may learn something new - after all, most agree that > inject:into: always feels a little lacking. > > I'm really glad to see Steffen's work, and the Xtreams work of Michael and > Martin. I'm not using either in anger yet, because I don't have enough pain > from the existing base collection and stream support to justify adding a > dependency on a non-base component. But I do think my brain benefitted from > being stretched in undergraduate days with FP, and more recently with a > little greenfield project using Xtreams. > > All the best, > Steve > > ________________________________ > From: [hidden email] on behalf of Nowak, Helge > Sent: Tue 21/05/2013 17:00 > To: john woods > Cc: [hidden email] > Subject: Re: [vwnc] Reducers in Smalltalk > > Dear John, > > > > in general I agree. > > > > I think the real issue is that Reducers are coming from a mindset different > from Smalltalk: functional programming, which is rooted in mathematics. From > my point of view functional programming looks rather at the “how to > implement functionality” whereas Smalltalk’s philosophy is “what domain to > model”. As a programmer of non-mathematical software I have an intuitive > notion of what a Collection is (a collection of things). Yet what is a > “Reducible”? Naively: something that can be reduced, i. e. a value?? If I am > a programmer of mathematical software though, I may have an intuitive notion > of what a Reducible is, just like I have a notion of vectors, matrices etc.. > Having said that I think Reducers (a Do-er: is this OO think at all?) may > have their place, yet not as a general concept. > > > > To not be misunderstood: I don’t judge anything like implementation details > or claimed benefits of the one or other implementation. And I don’t want to > belittle the work Steffen has done. I just talk about the philosophy behind. > > > > Cheers > > Helge > > > > Von: [hidden email] [mailto:[hidden email]] Im Auftrag > von john woods > Gesendet: Dienstag, 21. Mai 2013 15:02 > Cc: [hidden email] > Betreff: Re: [vwnc] Reducers in Smalltalk > > > > > > (((1 to: 10) reducer filter: #odd) map: squared) reduce: #+ init: 0. > > > It can be written the classic way: > > (((1 to: 10) select: #odd) collect: #squared) inject: 0 into: #+. > > > Which creates two intermediate collections. Again, we can avoid them: > > (1 to: 10) inject: 0 into: > [:sum :n | n odd > ifTrue: [sum + n squared] > ifFalse: [sum]] > > > But now we have to think in terms of #inject:into and craft the behavior of > #select: and #collect: into the block by hand. > > This style is harder to read, especially for more complex conditions and > mappings. > > > > Actually, I'd like to completely disagree. It is, in my opinion, much > easier to see what the latter does. The easiest to read is the most naive. > Write the following extension methods: > > > Given Collection>>sum, (((1 to: 10) select: #odd) collect: #squared) sum. > > Or you could use Collection>>inject:into: here instead of writing > Collection>>sum. Or you could go the other way and implement, for instance > Collection>>oddElements etc, and just do (1 to: 10) allOddElements > sumOfSquares. > > What I really don't want to see is unnecessary complexity or in the code > because programmers think that it will be faster or more efficient. For > instance, one of your concerns was the creation of an intermediate block. > But you have offered very little proof that this is even a problem. Who > knows what the VM will do, especially if it does JIT optimization? or uses > the GPU? > > As a full time performance engineer I spend a lot of time looking at > incomprehensible code (often not Smalltalk) which has been 'pre-optimized' > by developers, and this pre-optimization is largely (though not solely) > responsible for its poor readability. Even where their efforts work (and > you would be *amazed* how often they either have zero or even negative > effects on real-world performance), the cost of maintaining such > obscurantist code is nearly always vastly more than finding better hardware. > In short, write your application first, then profile it and *then* optimize > it - and only those bits which need it. The observed performance problems > will very rarely coincide with the expectation of even some of the most > talented developers. > > These Reducer things do not look self explanatory and to me that is adverse > to the entire spirit of Smalltalk. My view would be that I would need to > see *much* stronger evidence of *very* significant advantages of this > approach before I would even consider using such an approach. > > > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Steven Kelly
+1... sometimes I don't understand why we focus so much on picking a
winner to the detriment of diversity. One could also say such a tendency is at play when we compare different computer languages with Smalltalk. I think we all have picked our favorites a long time ago, what doesn't seem to follow for me is the apparent conclusion that since there must be *a* favorite then everything else is unworthy of attention. We tend to pay a huge price when we want to interface Smalltalk to that outside "alien" world that doesn't want to conform to our expectations of elegance, uniformity, and simplicity --- and when we would love to share our excitement and passion for what we do. Maybe we should be a bit more aware of these types of limitations so that we can keep an open mind to different ways of thinking. FYI, Gauss used to say that he learned a new language every couple years to "keep his mind flexible". On 5/21/13 10:16 , Steven Kelly wrote: > As this conversation proves, one of the hardest things is to get people > to actually try something new. It's more natural to respond with "I > don't need it, I can cope with the basic functions I already have", or > "I already know another add-on that offers some of the same benefits". > Both are true, and I'm guilty on both counts. But in a sense the value > of Reducers may be precisely that they offer something familiar to > functional programmers, as Helge says. By giving them familiar > vocabulary and things they value (e.g. parallelism), we make Smalltalk a > little more palatable. And even old hands at Smalltalk may learn > something new - after all, most agree that inject:into: always feels a > little lacking. > I'm really glad to see Steffen's work, and the Xtreams work of Michael > and Martin. I'm not using either in anger yet, because I don't have > enough pain from the existing base collection and stream support to > justify adding a dependency on a non-base component. But I do think my > brain benefitted from being stretched in undergraduate days with FP, and > more recently with a little greenfield project using Xtreams. > All the best, > Steve > > ------------------------------------------------------------------------ > *From:* [hidden email] on behalf of Nowak, Helge > *Sent:* Tue 21/05/2013 17:00 > *To:* john woods > *Cc:* [hidden email] > *Subject:* Re: [vwnc] Reducers in Smalltalk > > Dear John, > > in general I agree. > > I think the real issue is that Reducers are coming from a mindset > different from Smalltalk: functional programming, which is rooted in > mathematics. From my point of view functional programming looks rather > at the “how to implement functionality” whereas Smalltalk’s philosophy > is “what domain to model”. As a programmer of non-mathematical software > I have an intuitive notion of what a Collection is (a collection of > things). Yet what is a “Reducible”? Naively: something that can be > reduced, i. e. a value?? If I am a programmer of mathematical software > though, I may have an intuitive notion of what a Reducible is, just like > I have a notion of vectors, matrices etc.. Having said that I think > Reducers (a Do-er: is this OO think at all?) may have their place, yet > not as a general concept. > > To not be misunderstood: I don’t judge anything like implementation > details or claimed benefits of the one or other implementation. And I > don’t want to belittle the work Steffen has done. I just talk about the > philosophy behind. > > Cheers > > Helge > > *Von:*[hidden email] [mailto:[hidden email]] *Im > Auftrag von *john woods > *Gesende**t:*Dienstag, 21. Mai 2013 15:02 > *Cc:* [hidden email] > *Betreff:* Re: [vwnc] Reducers in Smalltalk > > (((1 to: 10) reducer filter: #odd) map: squared) reduce: #+ init: 0. > > > It can be written the classic way: > > (((1 to: 10) select: #odd) collect: #squared) inject: 0 into: #+. > > > Which creates two intermediate collections. Again, we can avoid them: > > (1 to: 10) inject: 0 into: > [:sum :n | n odd > ifTrue: [sum + n squared] > ifFalse: [sum]] > > > But now we have to think in terms of #inject:into and craft the > behavior of > #select: and #collect: into the block by hand. > > This style is harder to read, especially for more complex conditions > and mappings. > > Actually, I'd like to completely disagree. It is, in my opinion, much > easier to see what the latter does. The easiest to read is the most > naive. Write the following extension methods: > > > Given Collection>>sum, (((1 to: 10) select: #odd) collect: #squared) sum. > > Or you could use Collection>>inject:into: here instead of writing > Collection>>sum. Or you could go the other way and implement, for > instance Collection>>oddElements etc, and just do (1 to: 10) > allOddElements sumOfSquares. > > What I really don't want to see is unnecessary complexity or in the code > because programmers think that it will be faster or more efficient. For > instance, one of your concerns was the creation of an intermediate > block. But you have offered very little proof that this is even a > problem. Who knows what the VM will do, especially if it does JIT > optimization? or uses the GPU? > > As a full time performance engineer I spend a lot of time looking at > incomprehensible code (often not Smalltalk) which has been > 'pre-optimized' by developers, and this pre-optimization is largely > (though not solely) responsible for its poor readability. Even where > their efforts work (and you would be *amazed* how often they either have > zero or even negative effects on real-world performance), the cost of > maintaining such obscurantist code is nearly always vastly more than > finding better hardware. In short, write your application first, then > profile it and *then* optimize it - and only those bits which need it. > The observed performance problems will very rarely coincide with the > expectation of even some of the most talented developers. > > These Reducer things do not look self explanatory and to me that is > adverse to the entire spirit of Smalltalk. My view would be that I > would need to see *much* stronger evidence of *very* significant > advantages of this approach before I would even consider using such an > approach. > vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |