[ANN] Iterators

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] Iterators

Julien Delplanque-2
Hello,

I wanted to have an iterator framework for Pharo for a long time.

So I started building it step by step and today I think that, while it still requires more documentation, it is ready to be announced and used by others.


The idea is that, as described by the iterator design pattern, any object that needs to be walked provides one or many iterators.

In the library, #iterator method is the convention to get the default iterator of any object (if it has one).

Iterators provides a DSL to deal with iterators combination.

It is inspired from shell’s streams manipulation syntax:

- The pipe "|" allows one to chain iterators
- The ">" allows one to create a new collection with data transformed through chained iterators
- The ">>" allows one to fill an existing collection with data transformed through chained iterators
 
For example, one can write:

iterator := #(1 2 3) iterator.
iterator
| [ :x | x * 2 ] collectIt
| [ :x :y | x + y ] reduceIt
> Array "#(12)"

Or

iterator := #(1 2 3) iterator.
collectionToFill := OrderedCollection new.
iterator
| [ :x | x * 2 ] collectIt
| [ :x :y | x + y ] reduceIt
> collectionToFill.
collectionToFill "anOrderedCollection(12)"

The equivalent of "/dev/null" in Linux also exists:

iterator := #(1 2 3) iterator.
iterator
| [ :x | x * 2 ] collectIt
| [ :object | object logCr ] doIt "Just print incoming objects in transcript."
> NullAddableObject "Special object that ignore incoming objects."

There are documentation and examples on the GitHub repository.


Initially, the goal was to avoid to duplicate all collection’s iterator methods (e.g. #collect:, #select:, etc) in composite objects.

Thus, it provides an IteratorWithCollectionAPI which wrap an iterator and provides all the methods we want (#collect:, #select:, …).

Via IteratorWithCollectionAPI, your objects automatically gets the Collection API, you just need to access it via #iterator message:

myObject iterator select: [ :x | x isFoo ]

This is another way to use the framework, just to avoid code duplication.


Future work is to provide the possibility to have iterator with multiple inputs.

I already have an undocumented prototype on the repository that works like this:

it1 := (1 to: 10) iterator.
it2 := (1 to: 10) iterator.
it1 & it2
| [ :x :y | x@y ] mergeIt
> Array. "{(1@1). (2@2). (3@3). (4@4). (5@5). (6@6). (7@7). (8@8). (9@9). (10@10)}"


Yes, "&" operator will again kind of mimic the one from the shell.


Hope it helps other people.

Feedback is welcome.

Cheers,

Julien

---
Julien Delplanque
Doctorant à l’Université de Lille
Bâtiment B 40, Avenue Halley 59650 Villeneuve d'Ascq
Numéro de téléphone: +333 59 35 86 40

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Julien Delplanque-2

I made a mistake in the following snippet:

> Le 23 août 2019 à 16:14, Julien <[hidden email]> a écrit :
>
> iterator := #(1 2 3) iterator.
> collectionToFill := OrderedCollection new.
> iterator
> | [ :x | x * 2 ] collectIt
> | [ :x :y | x + y ] reduceIt
> > collectionToFill.
> collectionToFill "anOrderedCollection(12)"

">>" operator needs to be used here, so the correct code is:

iterator := #(1 2 3) iterator.
collectionToFill := OrderedCollection new.
iterator
        | [ :x | x * 2 ] collectIt
        | [ :x :y | x + y ] reduceIt
        >> collectionToFill.
collectionToFill "anOrderedCollection(12)"

Julien
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Kasper Osterbye
In reply to this post by Julien Delplanque-2
Hi

This is supercool!

I had been wondering myself of the lack of such a library. The Xtreams seem to be a bit related, but they focus on IO more that map/reduce programming.

I am intrigued by your choice of argument-less methods (reduceIt, collectIt,…). I would most likely have opted for a syntax in the style of 
((iterator collect: […]) select: […]) groupBy: [..] etc

Your syntax do away with the parenthesis hell. I like that aspect.

Do you see a connection to the database query (linq like) thing, working towards the same api for the iterators and queries ?

Going that direction might also enable some more advanced ideas for the & operator (more akin to join).

Cool stuff Julien!

Best,

Kasper

On 23 August 2019 at 16.15.31, Julien ([hidden email]) wrote:

Hello,

I wanted to have an iterator framework for Pharo for a long time.

So I started building it step by step and today I think that, while it still requires more documentation, it is ready to be announced and used by others.


The idea is that, as described by the iterator design pattern, any object that needs to be walked provides one or many iterators.

In the library, #iterator method is the convention to get the default iterator of any object (if it has one).

Iterators provides a DSL to deal with iterators combination.

It is inspired from shell’s streams manipulation syntax:

- The pipe "|" allows one to chain iterators
- The ">" allows one to create a new collection with data transformed through chained iterators
- The ">>" allows one to fill an existing collection with data transformed through chained iterators
 
For example, one can write:

iterator := #(1 2 3) iterator.
iterator
| [ :x | x * 2 ] collectIt
| [ :x :y | x + y ] reduceIt
> Array "#(12)"

Or

iterator := #(1 2 3) iterator.
collectionToFill := OrderedCollection new.
iterator
| [ :x | x * 2 ] collectIt
| [ :x :y | x + y ] reduceIt
> collectionToFill.
collectionToFill "anOrderedCollection(12)"

The equivalent of "/dev/null" in Linux also exists:

iterator := #(1 2 3) iterator.
iterator
| [ :x | x * 2 ] collectIt
| [ :object | object logCr ] doIt "Just print incoming objects in transcript."
> NullAddableObject "Special object that ignore incoming objects."

There are documentation and examples on the GitHub repository.


Initially, the goal was to avoid to duplicate all collection’s iterator methods (e.g. #collect:, #select:, etc) in composite objects.

Thus, it provides an IteratorWithCollectionAPI which wrap an iterator and provides all the methods we want (#collect:, #select:, …).

Via IteratorWithCollectionAPI, your objects automatically gets the Collection API, you just need to access it via #iterator message:

myObject iterator select: [ :x | x isFoo ]

This is another way to use the framework, just to avoid code duplication.


Future work is to provide the possibility to have iterator with multiple inputs.

I already have an undocumented prototype on the repository that works like this:

it1 := (1 to: 10) iterator.
it2 := (1 to: 10) iterator.
it1 & it2
| [ :x :y | x@y ] mergeIt
> Array. "{(1@1). (2@2). (3@3). (4@4). (5@5). (6@6). (7@7). (8@8). (9@9). (10@10)}"


Yes, "&" operator will again kind of mimic the one from the shell.


Hope it helps other people.

Feedback is welcome.

Cheers,

Julien

---
Julien Delplanque
Doctorant à l’Université de Lille
Bâtiment B 40, Avenue Halley 59650 Villeneuve d'Ascq
Numéro de téléphone: +333 59 35 86 40

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Herby Vojčík
In reply to this post by Julien Delplanque-2
On 23. 8. 2019 16:14, Julien wrote:

> Hello,
>
> I wanted to have an iterator framework for Pharo for a long time.
>
> So I started building it step by step and today I think that, while it
> still requires more documentation, it is ready to be announced and used
> by others.
>
> I present you Iterators : https://github.com/juliendelplanque/Iterators
>
> The idea is that, as described by the iterator design pattern, any
> object that needs to be walked provides one or many iterators.
>
> In the library, #iterator method is the convention to get the default
> iterator of any object (if it has one).
>
> Iterators provides a DSL to deal with iterators combination.
>
> It is inspired from shell’s streams manipulation syntax:
>
> - The pipe "|" allows one to chain iterators
> - The ">" allows one to create a new collection with data transformed
> through chained iterators
> - The ">>" allows one to fill an existing collection with data
> transformed through chained iterators
> For example, one can write:
>
> iterator := #(1 2 3) iterator.
> iterator
> | [ :x | x * 2 ] collectIt
> | [ :x :y | x + y ] reduceIt
>  > Array "#(12)"

Isn't this something readStream should provide?

   str := #(1 2 3) readStream.
   str
     | [ :x | x * 2 ] collectIt
     | [ :x :y | x + y ] reduceIt
     > Array "#(12)"

It is an object from which you take the front element, one at a time.

Why have something very similar with different name?

Herby

> Or
>
> iterator := #(1 2 3) iterator.
> collectionToFill := OrderedCollection new.
> iterator
> | [ :x | x * 2 ] collectIt
> | [ :x :y | x + y ] reduceIt
>  > collectionToFill.
> collectionToFill "anOrderedCollection(12)"
>
> The equivalent of "/dev/null" in Linux also exists:
>
> iterator := #(1 2 3) iterator.
> iterator
> | [ :x | x * 2 ] collectIt
> | [ :object | object logCr ] doIt "Just print incoming objects in
> transcript."
>  > NullAddableObject "Special object that ignore incoming objects."
>
> There are documentation and examples on the GitHub repository.
>
> —
>
> Initially, the goal was to avoid to duplicate all collection’s iterator
> methods (e.g. #collect:, #select:, etc) in composite objects.
>
> Thus, it provides an IteratorWithCollectionAPI which wrap an iterator
> and provides all the methods we want (#collect:, #select:, …).
>
> Via IteratorWithCollectionAPI, your objects automatically gets the
> Collection API, you just need to access it via #iterator message:
>
> myObject iterator select: [ :x | x isFoo ]
>
> This is another way to use the framework, just to avoid code duplication.
>
> —
>
> Future work is to provide the possibility to have iterator with multiple
> inputs.
>
> I already have an undocumented prototype on the repository that works
> like this:
>
> it1 := (1 to: 10) iterator.
> it2 := (1 to: 10) iterator.
> it1 & it2
> | [ :x :y | x@y ] mergeIt
>  > Array. "{(1@1). (2@2). (3@3). (4@4). (5@5). (6@6). (7@7). (8@8).
> (9@9). (10@10)}"
>
>
> Yes, "&" operator will again kind of mimic the one from the shell.
>
> —
>
> Hope it helps other people.
>
> Feedback is welcome.
>
> Cheers,
>
> Julien
>
> ---
> Julien Delplanque
> Doctorant à l’Université de Lille
> http://juliendelplanque.be/phd.html
> Equipe Rmod, Inria
> Bâtiment B 40, Avenue Halley 59650 Villeneuve d'Ascq
> Numéro de téléphone: +333 59 35 86 40
>



Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Herby Vojčík
On 23. 8. 2019 19:23, Herby Vojčík wrote:> On 23. 8. 2019 16:14, Julien
wrote:
 >> Hello,
 >>
 >> I wanted to have an iterator framework for Pharo for a long time.
 >>
 >> So I started building it step by step and today I think that, while it
 >> still requires more documentation, it is ready to be announced and
 >> used by others.
 >>
 >> I present you Iterators : https://github.com/juliendelplanque/Iterators
 >>
 >> The idea is that, as described by the iterator design pattern, any
 >> object that needs to be walked provides one or many iterators.
 >>
 >> In the library, #iterator method is the convention to get the default
 >> iterator of any object (if it has one).
 >>
 >> Iterators provides a DSL to deal with iterators combination.
 >>
 >> It is inspired from shell’s streams manipulation syntax:
 >>
 >> - The pipe "|" allows one to chain iterators
 >> - The ">" allows one to create a new collection with data transformed
 >> through chained iterators
 >> - The ">>" allows one to fill an existing collection with data
 >> transformed through chained iterators
 >> For example, one can write:
 >>
 >> iterator := #(1 2 3) iterator.
 >> iterator
 >> | [ :x | x * 2 ] collectIt
 >> | [ :x :y | x + y ] reduceIt
 >>  > Array "#(12)"
 >
 > Isn't this something readStream should provide?
 >
 >    str := #(1 2 3) readStream.
 >    str
 >      | [ :x | x * 2 ] collectIt
 >      | [ :x :y | x + y ] reduceIt
 >      > Array "#(12)"
 >
 > It is an object from which you take the front element, one at a time.
 >
 > Why have something very similar with different name?
 >
 > Herby
Now that I think about it, the problem may be a nomenclature.

This is my understanding, fix me if I am mistaken:

There are pull-based and push-based sequences out there. In Smalltalk it
can be said a sequence is pull-based if it has #next and #atEnd; it is
push-based if it has #do:.

AFAICT the tools to read pull-based sequence is called an iterator, and
it can have transformations like filter, map etc., sometimes called
"ix". The approach to transform push-based sequence (called observable)
is called reactive programming ("rx").

It seems you create rx-like library, but called the object iterator.

Maybe the set of operations you provide should be define for both
push-based and pull-based sequences, and called names that conform to
common canon.

Herby


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Steffen Märcker
In reply to this post by Julien Delplanque-2
Hi Julien,

nice work! Could you please tell how your approach is related to
transducers from the user perspective and technically?
(https://github.com/Pharophile/Transducers)

Your example suggests that the API is quite similar to the data flow API
of transducers. Let me show your example using transducers.

> result := (#+ init: 0) <~ [:x | x * 2] map <~ #(1 2 3).
> OrderedCollection with: result.

Or the more classical way:

> result := #(1, 2, 3)
>     transduce: [:x | x * 2] map
>     reduce: #+
>     init: 0.
> OrderedCollection with: result.


Best regards,
Steffen

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Richard O'Keefe
In reply to this post by Herby Vojčík
Distinguishing between "pull-based" and "push-based" streams in
Smalltalk makes no sense, because
   do: aBlock
      [self atEnd] whileFalse: [aBlock value: self next].
is in the same <gettableStream> protocol in the ANSI standard
as #atEnd and #next, and in most Smalltalk systems it is in
Stream.  There should never be anything that has #atEnd and
#next that doesn't have #do:.

In some Smalltalk systems, Stream and Collection both inherit
from a common superclass.  In GNU Smalltalk, it's called Iterable,
and provides, amongst other things
  stream1 , stream2
  stream allSatisfy: filterBlock
  stream anySatisfy: filterBlock
  stream collect: transformBlock
  stream count: filterBlock
  stream detect: filterBlock
  stream detect: filterBlock ifNone: exceptionBlock
  stream do: actionBlock
  stream do: actionBlock separatedBy: separatorBlock
  stream fold: combinerBlock
  stream inject: initial into: combinerBlock
  stream noneSatisfy: filterBlock
  stream reject: filterBlock
  stream select: filterBlock

My Smalltalk library does not do that, but it does have an
Enumerable class that's somewhat similar (the distinction is
that multiple traversal of a Collection must work, but only
single traversal of an Enumerable is required), so

  readStream asEnumerable reject: filterBlock

This makes about 257 Enumerable methods available.

Combinators are good, but they should fit in with existing
naming conventions.  *Arbitrary* overloadings are a bad idea.

The pipe "|" allows one to chain iterators
- The ">" allows one to create a new collection with data transformed through chained iterators
- The ">>" allows one to fill an existing collection with data transformed through chained iterators
 
The vertical bar is used in Smalltalk for "or".  It would make sense
to have
   MonadicBlock
     methods for: 'combinators'
       | other
         ^[:x | (self value: x) | (other value: x)]
and so on.  Smalltalk already has a selector that means "concatenation",
namely #, so
   stream1 , stream2
as in GNU Smalltalk would not be too confusing.

Similarly, ">" means "greater than", and it would be deeply confusing to
use it for some sort of data construction.  We would expect
stream1 > stream2
iff the future values of stream1 (as a sequence) are lexicographically
greater than the future values of stream2 (as a sequence).  Or you
could argue for "the relative order of two streams is the relative
order of their positions if the have the same underlying Collection or
file system object, otherwise undefined".


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Iterators

Julien Delplanque-2
In reply to this post by Steffen Märcker


> Le 24 août 2019 à 09:29, Steffen Märcker <[hidden email]> a écrit :
>
> Hi Julien,

Hello Steffen,

>
> nice work! Could you please tell how your approach is related to
> transducers from the user perspective and technically?
> (https://github.com/Pharophile/Transducers)

It is quite similar, except that I actually started it the other way around:

- First I wanted objects that can iterate on data structure (or any object)
- Second I wanted object to eventually provide multiple iterators in order to walk them in multiple ways
- Third I came with the IteratorWithCollectionAPI decorator which implements #select:, #collect: and so one in order to avoid re-implementing these methods on objects wrapping a collection
- Since IteratorWithCollectionAPI is a decorator, I came up with the idea to push further the composition feature and at this point, I get Iterators to be similar to Transducer

The original idea of Iterators can be summarised as:

myObjectThatShouldImplementCollectionButIsNotACollection iterator collect: #foo.

myObjectThatShouldImplementCollectionButIsNotACollection iterator do: #foo.

myObjectThatShouldImplementCollectionButIsNotACollection iterator doWithIndex: #foo.



Technically I don’t know what are the similarities as I did not dig in Transducers implementation.

From the DSL design point of view, I wanted that the composition through binary messages keeps the data flow to go from top left to down right when you read the code.

This gave me the idea to mimic shell language for stream manipulation.

Hope it answers your questions.

Cheers,

Julien