Re: Generators in Smalltalk??

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Generators in Smalltalk??

Ralph Boland
(Warning, long post)

I do not understand all the commotion over generators.

If a Collection class cannot do what you want, use a Stream.
Take a read only Stream and subclass from it a class called GenMyData.
Inside GenMyData write code that generates the data you
need on the fly and feeds it to the user upon request using
the ReadStream interface.
Perhaps there should be an Abstract class "Generator" that
is a subclass of Stream to provide some generic generator functionality.

What exactly do Python generators provide that my proposal does not?
And is not my interface cleaner?
And more adaptable?

We could generalize this idea to that of a Pipe.  A Pipe would be
a generalization (subclass) of a  ReadWriteStream.  A Pipe can accept
data using the WriteStream interface.  The Pipe
can then process the data in some way and then feed the result
to the user upon request using the ReadStream interface.
A pipe is more complicated than a generator because it needs to
be able to block.  If data is being fed into it but not being taken out
then eventually it may fill up.  Similarly, if data is being removed
from it, then it may run out of data and again need to block.
Blocking may mean a process blocks or it may mean sending a message
to another pipe to send or accept data.  I have not worked out the details.
Given how useful pipes are in Linux, I think a Generic Pipe class
would be VERY useful.  Of particular use would be operators such
as '<<',   '||',  '>>', and  tee.  These operators would provide functionality
similar to what  Linux does with pipes.

The '<<' operator would pass the argument (a ReadStream, Generator,
or Pipe)  to the receiver which would use it as if it were a  ReadStream.

The '>>'  operator would pass the argument (a WriteStream or Pipe)
to the receiver which would use it as if it were a WriteStream.
I guess we need another concept here, the reverse of a generator, which accepts
data like a WriteStream, does some processing with the data, and keeps
the result.
Let's call this an Absorber.

The '||'  operator (assuming that receiver is a ReadStream, Generator,
or Pipe) requires that the argument to be a WriteStream, Absorber, or Pipe.
The result of the operation is an Absorber, a Generator, or a Pipe depending
upon the classes of the receiver and operand.
If the receiver is a Generator (or ReadStream)
and the argument is an Absorber (or WriteSteam)
then the actual calculation is triggered and the result is an Absorber
(or WriteStream) which holds the final result of the computation.
Actually, whenever the first argument is a Generator (or ReadStream), the
computation could be triggered but it would eventually block if the second
argument is a Pipe.

The Tee class is a generalization (subclass) of class Pipe. it is
instanciated with a
collection argument where the elements of the collection are WriteStreams,
Absorbers, Pipes, or Tees.  The Tee class will forward the results of its
computation to each of the elements of the collection.
I suppose there could be the reverse of a Tee, ReverseTee,
which accepts input from any of a collection of ReadStreams, Pipes, or
ReverseTees.
ReadStreams, Generators, and Pipes have a method  tee: aCollection
which creates a Tee instance and connects it to the receiver.

I am sure there are numerous objects that make sense as Generators, Absorbers
and Pipes.   For example Random number generators should be Generators.
Wow! Wasn't that insightful!

e.g. An instances of class Prime could be a Generator from which prime numbers
can be fetched. It would store the first 10,000 primes, say,
and generate further primes requested on demand.

e.g.  Class PrimeFactors could be instanciated with a number which the instance
factors.  The instance would be a Generator from which can be fetched
the factors of the initial number in sorted order.

e.g.  An instance of Class Sum could be an Absorber that stores the sum of the
objects (numbers, vectors, matrices, etc.) fed to it.

e.g.  An instance of  Class TextToLines  could be a Pipe which acts like a
WriteSteam of text and a ReadStream of lines of text.
That is, it is fed text one character at a time and
the same text can be fetched from it one line at a time.
Is that is confusing? I think some terminology needs to be standardized
here so we know what we are talking about.

I have always wanted to implement Pipes in Smalltalk but have never found the
time.  I think it could be a fun project and VERY useful.


Exercise:  Write code in Smalltalk to copy a file to another file assuming the
first file exists and is readable and the second file can be created
and written.
That is, ignore any error possibilities.
Then do the same exercise using a pipe. (Don't implement the '||' operator,
we only want to look at the code to see which is simpler.

Bugs.
You could create an infinite loop by linking  a sequence of pipes into a loop.
Is this always a bug?
How would the computation be triggered in this case?

Is the hierarchy all wrong?
Should  Absorber and Generator be subclasses of Pipe (i.e. pipes with
some functionality removed?) and ReadStreams and WriteStreams be
subclasses of Generator and Absorber respectively.  This is reopening
the controversy of what is the best hierarchy for stream only worse. :-(
Pragmatically speaking, I think  Absorbers, Generators, and Pipes, should
have a separate hierarchy than Streams but, of course, share interface.

So whatayaall think?

Frankly I think the biggest problem is that pipes would be altogether too
convenient;  they would encourage writing code that is easy to write but
have too high an overhead because of the need to deal with blocking.
That is, pipes require good judgment, or that performance not be an issue.
If your code is too slow and some pipes you are using is the problem then
rewrite that code, perhaps avoiding the pipes.

While I do not have the time to do this project on my own
I could probably spare some time to help out if a group
could form to do this.  Besides I am sure other could add much insight.

Note that this project involves more than creating the Abstract classes
Generator, Absorber and Pipe.  These three classes would be the
head of Class hierarchies just like Stream is.
I don't know what the subclasses would be though.
Pipe might have a subclass BufferedPipe.

Also, I hasten to point out that many classes would simply implement the
Pipe, Generator, or Absorber interface and not actually be subclasses
of either of these classes.

Interesting question:  Are there classes in Squeak that implement the
ReadStream, WriteStream, or ReadWriteStream interfaces now but are not
a subclasses of these classes?
If not, should there be?
That is, are there classes in Squeak that should implement these interfaces?
In the Java world, Pipe, Generator, and Absorber would be Interfaces.
I don't like Java much but Interfaces I liked.


Regards, to those of you still here.

Ralph Boland




--
Had a wife and kids in Florida, Jack (Nicklaus)
Went out for a "ride" but I couldn't get back
As a ten timer being hunted down
OR
When the wife found out who I was knowing (biblically speaking)
I hit a hydrant and I just kept going (till I hit a tree).
...

Reply | Threaded
Open this post in threaded view
|

Re: Generators in Smalltalk??

Andreas.Raab
Ralph Boland wrote:
> I do not understand all the commotion over generators.

Generators transform a push model (#do:) into a pull model (#next).
Let's say you have two algorithms one that produces values via a push
model and one that consumes values via a pull model:

producer := SomeAlgorithm new.
producer do:[:value| "produces the value from the algorithm"].

consumer := AnotherAlgoritm new.
consumer process: inputStream. "consumes the value from inputStream"

Plugging these two together can be a royal pain. You basically have the
following three options:

1) Preproduce all the values from the producer and create a stream that
holds them:

values := Array streamContents:[:s| producer do:[:each| s nextPut: each]].
consumer process: values readStream.

This can work very well if the number of values is limited and if you
know you're going to consume all the produced values (doesn't work that
well with infinite sequences of numbers).

2) Rewrite either one of the algorithms. Can work very well if it's your
own code and reasonably simple, and can be extremely painful if it's
complex.

3) Use a Generator. With a generator, the solution becomes simply:

consumer process: (Generator on:[:yield| producer do: yield]).

When you can neither use solution 1) or 2) generators simplify your life
  *greatly*. In fact I'd say they are indispensable if you're dealing
with a complex existing environment.

> If a Collection class cannot do what you want, use a Stream.
> Take a read only Stream and subclass from it a class called GenMyData.
> Inside GenMyData write code that generates the data you
> need on the fly and feeds it to the user upon request using
> the ReadStream interface.

See above. You're describing option #2. Rewriting an existing algorithm
to "feed" the data properly can be decidedly non-trivial. You can start
with Integer>>primesUpTo:do: which is a very simple algorithm. The point
being that while all algorithms can be expressed either way the generic
way in which a Generator does that replaces the need to reimplement each
specific algorithm.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Generators in Smalltalk??

Andreas.Raab
As additional motivation, the Motivation section of PEP 225 is a great
read to explain why Generators are interesting and what the alternatives
are:

Abstract

     This PEP introduces the concept of generators to Python, as well
     as a new statement used in conjunction with them, the "yield"
     statement.

Motivation

     When a producer function has a hard enough job that it requires
     maintaining state between values produced, most programming languages
     offer no pleasant and efficient solution beyond adding a callback
     function to the producer's argument list, to be called with each value
     produced.

     For example, tokenize.py in the standard library takes this approach:
     the caller must pass a "tokeneater" function to tokenize(), called
     whenever tokenize() finds the next token.  This allows tokenize to be
     coded in a natural way, but programs calling tokenize are typically
     convoluted by the need to remember between callbacks which token(s)
     were seen last.  The tokeneater function in tabnanny.py is a good
     example of that, maintaining a state machine in global variables, to
     remember across callbacks what it has already seen and what it hopes to
     see next.  This was difficult to get working correctly, and is still
     difficult for people to understand.  Unfortunately, that's typical of
     this approach.

     An alternative would have been for tokenize to produce an entire parse
     of the Python program at once, in a large list.  Then tokenize clients
     could be written in a natural way, using local variables and local
     control flow (such as loops and nested if statements) to keep track of
     their state.  But this isn't practical:  programs can be very large, so
     no a priori bound can be placed on the memory needed to materialize the
     whole parse; and some tokenize clients only want to see whether
     something specific appears early in the program (e.g., a future
     statement, or, as is done in IDLE, just the first indented statement),
     and then parsing the whole program first is a severe waste of time.

     Another alternative would be to make tokenize an iterator[1],
     delivering the next token whenever its .next() method is invoked.  This
     is pleasant for the caller in the same way a large list of results
     would be, but without the memory and "what if I want to get out early?"
     drawbacks.  However, this shifts the burden on tokenize to remember
     *its* state between .next() invocations, and the reader need only
     glance at tokenize.tokenize_loop() to realize what a horrid chore that
     would be.  Or picture a recursive algorithm for producing the nodes of
     a general tree structure:  to cast that into an iterator framework
     requires removing the recursion manually and maintaining the state of
     the traversal by hand.

     A fourth option is to run the producer and consumer in separate
     threads.  This allows both to maintain their states in natural ways,
     and so is pleasant for both.  Indeed, Demo/threads/Generator.py in the
     Python source distribution provides a usable synchronized-communication
     class for doing that in a general way.  This doesn't work on platforms
     without threads, though, and is very slow on platforms that do
     (compared to what is achievable without threads).

     A final option is to use the Stackless[2][3] variant implementation of
     Python instead, which supports lightweight coroutines.  This has much
     the same programmatic benefits as the thread option, but is much more
     efficient.  However, Stackless is a controversial rethinking of the
     Python core, and it may not be possible for Jython to implement the
     same semantics.  This PEP isn't the place to debate that, so suffice it
     to say here that generators provide a useful subset of Stackless
     functionality in a way that fits easily into the current CPython
     implementation, and is believed to be relatively straightforward for
     other Python implementations.

     That exhausts the current alternatives.  Some other high-level
     languages provide pleasant solutions, notably iterators in Sather[4],
     which were inspired by iterators in CLU; and generators in Icon[5], a
     novel language where every expression "is a generator".  There are
     differences among these, but the basic idea is the same:  provide a
     kind of function that can return an intermediate result ("the next
     value") to its caller, but maintaining the function's local state so
     that the function can be resumed again right where it left off.  A
     very simple example:

         def fib():
             a, b = 0, 1
             while 1:
                 yield b
                 a, b = b, a+b

     When fib() is first invoked, it sets a to 0 and b to 1, then yields b
     back to its caller.  The caller sees 1.  When fib is resumed, from its
     point of view the yield statement is really the same as, say, a print
     statement:  fib continues after the yield with all local state intact.
     a and b then become 1 and 1, and fib loops back to the yield, yielding
     1 to its invoker.  And so on.  From fib's point of view it's just
     delivering a sequence of results, as if via callback.  But from its
     caller's point of view, the fib invocation is an iterable object that
     can be resumed at will.  As in the thread approach, this allows both
     sides to be coded in the most natural ways; but unlike the thread
     approach, this can be done efficiently and on all platforms.  Indeed,
     resuming a generator should be no more expensive than a function call.

     The same kind of approach applies to many producer/consumer functions.
     For example, tokenize.py could yield the next token instead of invoking
     a callback function with it as argument, and tokenize clients could
     iterate over the tokens in a natural way:  a Python generator is a kind
     of Python iterator[1], but of an especially powerful kind.

Cheers,
   - Andreas

Andreas Raab wrote:

> Ralph Boland wrote:
>> I do not understand all the commotion over generators.
>
> Generators transform a push model (#do:) into a pull model (#next).
> Let's say you have two algorithms one that produces values via a push
> model and one that consumes values via a pull model:
>
> producer := SomeAlgorithm new.
> producer do:[:value| "produces the value from the algorithm"].
>
> consumer := AnotherAlgoritm new.
> consumer process: inputStream. "consumes the value from inputStream"
>
> Plugging these two together can be a royal pain. You basically have the
> following three options:
>
> 1) Preproduce all the values from the producer and create a stream that
> holds them:
>
> values := Array streamContents:[:s| producer do:[:each| s nextPut: each]].
> consumer process: values readStream.
>
> This can work very well if the number of values is limited and if you
> know you're going to consume all the produced values (doesn't work that
> well with infinite sequences of numbers).
>
> 2) Rewrite either one of the algorithms. Can work very well if it's your
> own code and reasonably simple, and can be extremely painful if it's
> complex.
>
> 3) Use a Generator. With a generator, the solution becomes simply:
>
> consumer process: (Generator on:[:yield| producer do: yield]).
>
> When you can neither use solution 1) or 2) generators simplify your life
>  *greatly*. In fact I'd say they are indispensable if you're dealing
> with a complex existing environment.
>
>> If a Collection class cannot do what you want, use a Stream.
>> Take a read only Stream and subclass from it a class called GenMyData.
>> Inside GenMyData write code that generates the data you
>> need on the fly and feeds it to the user upon request using
>> the ReadStream interface.
>
> See above. You're describing option #2. Rewriting an existing algorithm
> to "feed" the data properly can be decidedly non-trivial. You can start
> with Integer>>primesUpTo:do: which is a very simple algorithm. The point
> being that while all algorithms can be expressed either way the generic
> way in which a Generator does that replaces the need to reimplement each
> specific algorithm.
>
> Cheers,
>   - Andreas
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Generators in Smalltalk??

Jecel Assumpcao Jr
Tim Budd has already been mentioned, but I would like to reinforce that
by suggesting that anyone interested in this subject read chapter 8 of
"A Little Smalltalk":

http://stephane.ducasse.free.fr/FreeBooks/LittleSmalltalk/ALittleSmallta
lk.pdf

The implementation of generators in Smalltalk is shown (Little Smalltalk
uses that instead of Streams) and several interesting examples are
given.

-- Jecel