(Warning, long post)
I do not understand all the commotion over generators. If a Collection class cannot do what you want, use a Stream. Take a read only Stream and subclass from it a class called GenMyData. Inside GenMyData write code that generates the data you need on the fly and feeds it to the user upon request using the ReadStream interface. Perhaps there should be an Abstract class "Generator" that is a subclass of Stream to provide some generic generator functionality. What exactly do Python generators provide that my proposal does not? And is not my interface cleaner? And more adaptable? We could generalize this idea to that of a Pipe. A Pipe would be a generalization (subclass) of a ReadWriteStream. A Pipe can accept data using the WriteStream interface. The Pipe can then process the data in some way and then feed the result to the user upon request using the ReadStream interface. A pipe is more complicated than a generator because it needs to be able to block. If data is being fed into it but not being taken out then eventually it may fill up. Similarly, if data is being removed from it, then it may run out of data and again need to block. Blocking may mean a process blocks or it may mean sending a message to another pipe to send or accept data. I have not worked out the details. Given how useful pipes are in Linux, I think a Generic Pipe class would be VERY useful. Of particular use would be operators such as '<<', '||', '>>', and tee. These operators would provide functionality similar to what Linux does with pipes. The '<<' operator would pass the argument (a ReadStream, Generator, or Pipe) to the receiver which would use it as if it were a ReadStream. The '>>' operator would pass the argument (a WriteStream or Pipe) to the receiver which would use it as if it were a WriteStream. I guess we need another concept here, the reverse of a generator, which accepts data like a WriteStream, does some processing with the data, and keeps the result. Let's call this an Absorber. The '||' operator (assuming that receiver is a ReadStream, Generator, or Pipe) requires that the argument to be a WriteStream, Absorber, or Pipe. The result of the operation is an Absorber, a Generator, or a Pipe depending upon the classes of the receiver and operand. If the receiver is a Generator (or ReadStream) and the argument is an Absorber (or WriteSteam) then the actual calculation is triggered and the result is an Absorber (or WriteStream) which holds the final result of the computation. Actually, whenever the first argument is a Generator (or ReadStream), the computation could be triggered but it would eventually block if the second argument is a Pipe. The Tee class is a generalization (subclass) of class Pipe. it is instanciated with a collection argument where the elements of the collection are WriteStreams, Absorbers, Pipes, or Tees. The Tee class will forward the results of its computation to each of the elements of the collection. I suppose there could be the reverse of a Tee, ReverseTee, which accepts input from any of a collection of ReadStreams, Pipes, or ReverseTees. ReadStreams, Generators, and Pipes have a method tee: aCollection which creates a Tee instance and connects it to the receiver. I am sure there are numerous objects that make sense as Generators, Absorbers and Pipes. For example Random number generators should be Generators. Wow! Wasn't that insightful! e.g. An instances of class Prime could be a Generator from which prime numbers can be fetched. It would store the first 10,000 primes, say, and generate further primes requested on demand. e.g. Class PrimeFactors could be instanciated with a number which the instance factors. The instance would be a Generator from which can be fetched the factors of the initial number in sorted order. e.g. An instance of Class Sum could be an Absorber that stores the sum of the objects (numbers, vectors, matrices, etc.) fed to it. e.g. An instance of Class TextToLines could be a Pipe which acts like a WriteSteam of text and a ReadStream of lines of text. That is, it is fed text one character at a time and the same text can be fetched from it one line at a time. Is that is confusing? I think some terminology needs to be standardized here so we know what we are talking about. I have always wanted to implement Pipes in Smalltalk but have never found the time. I think it could be a fun project and VERY useful. Exercise: Write code in Smalltalk to copy a file to another file assuming the first file exists and is readable and the second file can be created and written. That is, ignore any error possibilities. Then do the same exercise using a pipe. (Don't implement the '||' operator, we only want to look at the code to see which is simpler. Bugs. You could create an infinite loop by linking a sequence of pipes into a loop. Is this always a bug? How would the computation be triggered in this case? Is the hierarchy all wrong? Should Absorber and Generator be subclasses of Pipe (i.e. pipes with some functionality removed?) and ReadStreams and WriteStreams be subclasses of Generator and Absorber respectively. This is reopening the controversy of what is the best hierarchy for stream only worse. :-( Pragmatically speaking, I think Absorbers, Generators, and Pipes, should have a separate hierarchy than Streams but, of course, share interface. So whatayaall think? Frankly I think the biggest problem is that pipes would be altogether too convenient; they would encourage writing code that is easy to write but have too high an overhead because of the need to deal with blocking. That is, pipes require good judgment, or that performance not be an issue. If your code is too slow and some pipes you are using is the problem then rewrite that code, perhaps avoiding the pipes. While I do not have the time to do this project on my own I could probably spare some time to help out if a group could form to do this. Besides I am sure other could add much insight. Note that this project involves more than creating the Abstract classes Generator, Absorber and Pipe. These three classes would be the head of Class hierarchies just like Stream is. I don't know what the subclasses would be though. Pipe might have a subclass BufferedPipe. Also, I hasten to point out that many classes would simply implement the Pipe, Generator, or Absorber interface and not actually be subclasses of either of these classes. Interesting question: Are there classes in Squeak that implement the ReadStream, WriteStream, or ReadWriteStream interfaces now but are not a subclasses of these classes? If not, should there be? That is, are there classes in Squeak that should implement these interfaces? In the Java world, Pipe, Generator, and Absorber would be Interfaces. I don't like Java much but Interfaces I liked. Regards, to those of you still here. Ralph Boland -- Had a wife and kids in Florida, Jack (Nicklaus) Went out for a "ride" but I couldn't get back As a ten timer being hunted down OR When the wife found out who I was knowing (biblically speaking) I hit a hydrant and I just kept going (till I hit a tree). ... |
Ralph Boland wrote:
> I do not understand all the commotion over generators. Generators transform a push model (#do:) into a pull model (#next). Let's say you have two algorithms one that produces values via a push model and one that consumes values via a pull model: producer := SomeAlgorithm new. producer do:[:value| "produces the value from the algorithm"]. consumer := AnotherAlgoritm new. consumer process: inputStream. "consumes the value from inputStream" Plugging these two together can be a royal pain. You basically have the following three options: 1) Preproduce all the values from the producer and create a stream that holds them: values := Array streamContents:[:s| producer do:[:each| s nextPut: each]]. consumer process: values readStream. This can work very well if the number of values is limited and if you know you're going to consume all the produced values (doesn't work that well with infinite sequences of numbers). 2) Rewrite either one of the algorithms. Can work very well if it's your own code and reasonably simple, and can be extremely painful if it's complex. 3) Use a Generator. With a generator, the solution becomes simply: consumer process: (Generator on:[:yield| producer do: yield]). When you can neither use solution 1) or 2) generators simplify your life *greatly*. In fact I'd say they are indispensable if you're dealing with a complex existing environment. > If a Collection class cannot do what you want, use a Stream. > Take a read only Stream and subclass from it a class called GenMyData. > Inside GenMyData write code that generates the data you > need on the fly and feeds it to the user upon request using > the ReadStream interface. See above. You're describing option #2. Rewriting an existing algorithm to "feed" the data properly can be decidedly non-trivial. You can start with Integer>>primesUpTo:do: which is a very simple algorithm. The point being that while all algorithms can be expressed either way the generic way in which a Generator does that replaces the need to reimplement each specific algorithm. Cheers, - Andreas |
As additional motivation, the Motivation section of PEP 225 is a great
read to explain why Generators are interesting and what the alternatives are: Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. Cheers, - Andreas Andreas Raab wrote: > Ralph Boland wrote: >> I do not understand all the commotion over generators. > > Generators transform a push model (#do:) into a pull model (#next). > Let's say you have two algorithms one that produces values via a push > model and one that consumes values via a pull model: > > producer := SomeAlgorithm new. > producer do:[:value| "produces the value from the algorithm"]. > > consumer := AnotherAlgoritm new. > consumer process: inputStream. "consumes the value from inputStream" > > Plugging these two together can be a royal pain. You basically have the > following three options: > > 1) Preproduce all the values from the producer and create a stream that > holds them: > > values := Array streamContents:[:s| producer do:[:each| s nextPut: each]]. > consumer process: values readStream. > > This can work very well if the number of values is limited and if you > know you're going to consume all the produced values (doesn't work that > well with infinite sequences of numbers). > > 2) Rewrite either one of the algorithms. Can work very well if it's your > own code and reasonably simple, and can be extremely painful if it's > complex. > > 3) Use a Generator. With a generator, the solution becomes simply: > > consumer process: (Generator on:[:yield| producer do: yield]). > > When you can neither use solution 1) or 2) generators simplify your life > *greatly*. In fact I'd say they are indispensable if you're dealing > with a complex existing environment. > >> If a Collection class cannot do what you want, use a Stream. >> Take a read only Stream and subclass from it a class called GenMyData. >> Inside GenMyData write code that generates the data you >> need on the fly and feeds it to the user upon request using >> the ReadStream interface. > > See above. You're describing option #2. Rewriting an existing algorithm > to "feed" the data properly can be decidedly non-trivial. You can start > with Integer>>primesUpTo:do: which is a very simple algorithm. The point > being that while all algorithms can be expressed either way the generic > way in which a Generator does that replaces the need to reimplement each > specific algorithm. > > Cheers, > - Andreas > > |
Tim Budd has already been mentioned, but I would like to reinforce that
by suggesting that anyone interested in this subject read chapter 8 of "A Little Smalltalk": http://stephane.ducasse.free.fr/FreeBooks/LittleSmalltalk/ALittleSmallta lk.pdf The implementation of generators in Smalltalk is shown (Little Smalltalk uses that instead of Streams) and several interesting examples are given. -- Jecel |
Free forum by Nabble | Edit this page |