Re: Xtreams skipThroughAll: ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

kobetic
"Steven Kelly"<[hidden email]> wrote:

> Date: August 8, 2010 6:46:02 PM
> From: "Steven Kelly" <[hidden email]>
> To: "Michael Lucas-Smith"<[hidden email]>
> Cc: [hidden email]
> Subject: Re: [vwnc] Xtreams skipThroughAll: ?
>
> Thanks for replying, Michael! Unfortunately the content probably isn't regular enough for parsing. I tend to try a little light stream hacking for simple problems like these.
>  
> In most cases the bit of text I want to read varies - I can't search explicitly for "Indoor". But I do know what precedes and follows the bit of text I want.
>  
> Maybe this is just something that old streams do better than Xstreams? For now I'm just combining old and new:
>  
> substream rest readStream skipThroughAll: '<tr><td><b>'; upToAll: '</b>'

Maybe not quite as straightfoward, but doesn't seem that much worse to me:

        (substream ending: '<tr><td><b>') -= 0.
        (substream ending: '</b>') rest

Would it be worth to extend the skipping/reading API for this, or is the ability to nest substreams sufficient ?
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Steven Kelly
> > For now I'm just combining old and new:
> >
> > substream rest readStream skipThroughAll: '<tr><td><b>'; upToAll:
> > '</b>'
>
> Maybe not quite as straightfoward, but doesn't seem that much worse to
> me:
>
> (substream ending: '<tr><td><b>') -= 0.
> (substream ending: '</b>') rest

Thanks, Martin, that seems to do the trick. I was surprised by it,
though: why would positioning the new sub-substream formed by #ending:
also position the original substream? Here's my investigation of this,
with debugging print-it's indented.

html := '<tr><td><b>Indoor</b>&nbsp;</td>'.
substream := html reading.
   substream position. "0"
s2 := substream ending: '<tr><td><b>'.
   substream position. "0"
   s2 position. "0"
   s2 == substream. "false"
s2 -= 0.
   s2 position. "11"
   substream position. "11"

It's the last line that puzzled me. Obviously, this is how Xtreams
works, and maybe it's unavoidable and even desired, but it wasn't what I
expected. I think I'd also have expected s2 to be strictly before <tr>,
so -= 0 would have left it before the <tr>. Obviously, I need to do some
work with Xtreams to get my mental model right!

All the best,
Steve



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Michael Lucas-Smith-2
>
> s2 -= 0.
>   s2 position. "11"
>   substream position. "11"
>
> It's the last line that puzzled me. Obviously, this is how Xtreams
> works, and maybe it's unavoidable and even desired, but it wasn't what I
> expected. I think I'd also have expected s2 to be strictly before <tr>,
> so -= 0 would have left it before the <tr>. Obviously, I need to do some
> work with Xtreams to get my mental model right!

Creating an ending stream on the stream means it will read until it hits the thing it matches. The -= is a seek to 0 elements from the end of the stream.. so it makes the stream read until it hits the match. This is going to move the ending stream and the main stream because it's not exploring. You could just as easily use #rest instead of '-= 0' except that -= 0 is a seek and won't create a big array in memory like #rest might accidently do (if you're streaming over gigabytes of data looking for that illusive match).

Michael
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

kobetic
In reply to this post by kobetic
Positioning certainly does have some interesting subtleties, but I'm curious why did the behavior surprise you. What would you expect the positioning call to do in the context of the stream stack ?

As Michael pointed out

        (substream ending: '<tr><td><b>') rest.
        (substream ending: '</b>') rest

would do the trick as well. I was just trying to mimick the skipThroughAll: behavior as closely as I could. Admittedly the situation is a bit more complicated with -=. As most of the other positioning messages it only work on positionable stacks. The only positioning message that works universally is ++ "skip-forward". Currently, the " -= 0" is the only way to say "skip to end", but it doesn't work universally because -= is more general and requires positionability in general case. We didn't like adding dedicated selector just for -= 0. Maybe we should special case it and make it work universally for the 0 argument.

"Steven Kelly"<[hidden email]> wrote:

> Thanks, Martin, that seems to do the trick. I was surprised by it,
> though: why would positioning the new sub-substream formed by #ending:
> also position the original substream? Here's my investigation of this,
> with debugging print-it's indented.
>
> html := '<tr><td><b>Indoor</b>&nbsp;</td>'.
> substream := html reading.
>    substream position. "0"
> s2 := substream ending: '<tr><td><b>'.
>    substream position. "0"
>    s2 position. "0"
>    s2 == substream. "false"
> s2 -= 0.
>    s2 position. "11"
>    substream position. "11"
>
> It's the last line that puzzled me. Obviously, this is how Xtreams
> works, and maybe it's unavoidable and even desired, but it wasn't what I
> expected. I think I'd also have expected s2 to be strictly before <tr>,
> so -= 0 would have left it before the <tr>. Obviously, I need to do some
> work with Xtreams to get my mental model right!
>
> All the best,
> Steve
>
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

kobetic
In reply to this post by kobetic
[hidden email] wrote:
> ... We didn't like adding dedicated selector just for -= 0. Maybe we should special case it and make it work universally for the 0 argument.

Although, on a second thought, what should we do for infinite streams ?

        [ 1 ] reading -= 0
       
Spin ?
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Michael Lucas-Smith-2
In reply to this post by kobetic
> Maybe we should special case it and make it work universally for the 0 argument.
>
I'm pretty sure -= 0 is a special case already.

Michael
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Steven Kelly
In reply to this post by kobetic
Re: [vwnc] Xtreams skipThroughAll: ?
Thanks for the explanations, guys! Since Martin asked why I was surprised, let's improve the example a little to show that better:
 
html := 'abc<tr><td><b>Indoor</b>&nbsp;</td>'.
s1 := html reading.
s2 := s1 ending: '<tr><td><b>'.

I'd have expected s2 to be essentially like: 'abc' reading. I knew or presumed that the underlying collection wasn't being copied, so expected that to be shared with s1, but I thought s2 would have its own position. I also thought the possible range of positions for s2 would only be those first three letters, since I asked for it exclusive of the #ending: match. After creating s2 I would have thought s1's position would either be 0 or after <tr><td><b>.
 
s2 -= 0.
 
I'd have thought this would place the position of s2 after 'abc', but before <tr><td><b> (which I wouldn't expect to find in s2 at all). I wouldn't expect it to affect s1's position at all. Why do I find that surprising? In Smalltalk if you send a message to an object, and it returns a new object of the same type as the first one, operations on the new object itself don't affect the parent object. I perceived s2 to be of the same type as s1, because it responded to the same API. I thought its job was to pretend to be "'abc' reading", but without actually copying the stream collection.
 
HTH,
Steve

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Michael Lucas-Smith-2
We've certainly thought about making the substreams have their position appear to be relative instead of absolute. It's on our todo list in fact.

Xtreams are a facade over the real volatile data, so there is no way it would be a regular object that is independent.. that's definitely not what xtreams are.

If you've ever seen the GIL library in Boost it uses the same pattern to implement fancy facades over a n-d (2d) data set. It's an image manipulation library so you can do things like:

image := filename readImage.
blackandwhite := image asBlackandwhite.

The blackandwhite variable will not actually copy any data at that point, so much like Xtreams, you can stack manipulations over the top of a single source and it's not until you actually go to process the data that anything is done. I've been pondering a GIL like library with Xtreams because of the similarities.

Michael


On Aug 17, 2010, at 1:50 PM, Steven Kelly wrote:

Thanks for the explanations, guys! Since Martin asked why I was surprised, let's improve the example a little to show that better:
 
html := 'abc<tr><td><b>Indoor</b>&nbsp;</td>'.
s1 := html reading.
s2 := s1 ending: '<tr><td><b>'.

I'd have expected s2 to be essentially like: 'abc' reading. I knew or presumed that the underlying collection wasn't being copied, so expected that to be shared with s1, but I thought s2 would have its own position. I also thought the possible range of positions for s2 would only be those first three letters, since I asked for it exclusive of the #ending: match. After creating s2 I would have thought s1's position would either be 0 or after <tr><td><b>.
 
s2 -= 0.
 
I'd have thought this would place the position of s2 after 'abc', but before <tr><td><b> (which I wouldn't expect to find in s2 at all). I wouldn't expect it to affect s1's position at all. Why do I find that surprising? In Smalltalk if you send a message to an object, and it returns a new object of the same type as the first one, operations on the new object itself don't affect the parent object. I perceived s2 to be of the same type as s1, because it responded to the same API. I thought its job was to pretend to be "'abc' reading", but without actually copying the stream collection.
 
HTH,
Steve


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Steven Kelly
In reply to this post by kobetic
My point wasn't about substream positions being relative instead of absolute, but about a substream having its own position, independent of the position in the parent stream. Remember though that I'm just answering Martin's question about what surprised me and why, not trying to say that what I thought was how Xtreams should behave. At most, my surprise might tell you that you could add a comment about position sharing to the substreams section on the Wiki.
 
I hadn't seen GIL but it looks nice: I'm a big fan of facades, both for clarity and performance. Mind you, I'd probably expect any graphicsPosition in blackandwhite to be freely movable without affecting the graphicsPosition in image :-). But I can see arguments for both.
 
Steve
 


From: Michael Lucas-Smith [mailto:[hidden email]]
Sent: Wed 18/08/2010 00:00
To: Steven Kelly
Cc: [hidden email]; vwnc NC
Subject: Re: [vwnc] Xtreams skipThroughAll: ?

We've certainly thought about making the substreams have their position appear to be relative instead of absolute. It's on our todo list in fact.

Xtreams are a facade over the real volatile data, so there is no way it would be a regular object that is independent.. that's definitely not what xtreams are.

If you've ever seen the GIL library in Boost it uses the same pattern to implement fancy facades over a n-d (2d) data set. It's an image manipulation library so you can do things like:

image := filename readImage.
blackandwhite := image asBlackandwhite.

The blackandwhite variable will not actually copy any data at that point, so much like Xtreams, you can stack manipulations over the top of a single source and it's not until you actually go to process the data that anything is done. I've been pondering a GIL like library with Xtreams because of the similarities.

Michael


On Aug 17, 2010, at 1:50 PM, Steven Kelly wrote:

Thanks for the explanations, guys! Since Martin asked why I was surprised, let's improve the example a little to show that better:
 
html := 'abc<tr><td><b>Indoor</b>&nbsp;</td>'.
s1 := html reading.
s2 := s1 ending: '<tr><td><b>'.

I'd have expected s2 to be essentially like: 'abc' reading. I knew or presumed that the underlying collection wasn't being copied, so expected that to be shared with s1, but I thought s2 would have its own position. I also thought the possible range of positions for s2 would only be those first three letters, since I asked for it exclusive of the #ending: match. After creating s2 I would have thought s1's position would either be 0 or after <tr><td><b>.
 
s2 -= 0.
 
I'd have thought this would place the position of s2 after 'abc', but before <tr><td><b> (which I wouldn't expect to find in s2 at all). I wouldn't expect it to affect s1's position at all. Why do I find that surprising? In Smalltalk if you send a message to an object, and it returns a new object of the same type as the first one, operations on the new object itself don't affect the parent object. I perceived s2 to be of the same type as s1, because it responded to the same API. I thought its job was to pretend to be "'abc' reading", but without actually copying the stream collection.
 
HTH,
Steve


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

kobetic
In reply to this post by kobetic
"Steven Kelly"<[hidden email]> wrote:
> Thanks for the explanations, guys! Since Martin asked why I was surprised, let's improve the example a little to show that better:
>  
> html := 'abc<tr><td><b>Indoor</b>&nbsp;</td>'.
> s1 := html reading.
> s2 := s1 ending: '<tr><td><b>'.
>
> I'd have expected s2 to be essentially like: 'abc' reading. I knew or presumed that the underlying collection wasn't being copied, so expected that to be shared with s1, but I thought s2 would have its own position. I also thought the possible range of positions for s2 would only be those first three letters, since I asked for it exclusive of the #ending: match.

And that should be the case, but it looks like we have a bug. In general an 'ending:' substream is not positionable, but the way the seeking methods are implemented in the superclass, they ignore the isPositionable test. Moreover something like 's2 ++ 5' just breaks the substream instead of raising Incomplete. That needs to be fixed.


> After creating s2 I would have thought s1's position would either be 0 or after <tr><td><b>.
>  
> s2 -= 0.
>  
> I'd have thought this would place the position of s2 after 'abc', but before <tr><td><b> (which I wouldn't expect to find in s2 at all).

Semantically that should be the case.

> I wouldn't expect it to affect s1's position at all.

That isn't quite possible given our goals. Substreams are meant to work on any stream, including non-positionable ones, so they are written so that they only read from the underlying source, they never step back or anything like that. Therefore necessarily s2 has to read the whole boundary sequence from s1 before it can decide that it is trully at end.

> Why do I find that surprising? In Smalltalk if you send a message to an object, and it returns a new object of the same type as the first one, operations on the new object itself don't affect the parent object. I perceived s2 to be of the same type as s1, because it responded to the same API. I thought its job was to pretend to be "'abc' reading", but without actually copying the stream collection.

OK, so it seems we need to clarify this part about substreams. The source of the substream becomes an integral part of the substream (the substream just sits on top of the source stream stack), it is not some sort of substream factory that would be detached from the substream once it's created.

Anyway, s2 really should be like 'abc' reading for most intents and purposes. I guess the not so obvious aspect are the effects in the context of the whole stream stack. In some sense one would like to be able to ignore that part, after all that's what xtreams are all about, abstracting away the actual source. On the other hand, in real life, the stack has to be managed, so dealing with its structure and state at certain points is unavoidable. Admittedly it can be quite challenging to wrap your head around the state of a complex stream stack and we certainly don't have any complete answers. We spent some time coming up with reasonably helpful printString of stacks. It's not clear yet how successful we were. I think we'll have to improve on these issues as we gain some experience using Xtreams in real life scenarios.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Steven Kelly
In reply to this post by kobetic
RE: [vwnc] Xtreams skipThroughAll: ?
>> I also thought the possible range of positions for s2 would only be those
>>  first three letters, since I asked for it exclusive of the #ending: match.
>
> And that should be the case, but it looks like we have a bug.
 
OK, thanks for letting me know. It's hard to know what's a mental model clash and what's a bug, when trying to learn something new and powerful like Xtreams!

>> I wouldn't expect (s2 -= 0) to affect s1's position at all.
>
> That isn't quite possible given our goals. ... s2 has to read the whole
> boundary sequence from s1 before it can decide that it is trully at end.
Got it. I suppose I thought s1 would be positioned after the ending <tr><td><b>, but s2 before it.  If s2 shares not only the underlying collection (or other source) but also the position, that obviously can't happen. I wonder whether giving s2 its own position would be impossible? Alternatively, maybe all s2 processing could be in a block (like exploring), during which position would be restricted to positions legal in s2; after the block finished, position of s1 would jump to after the ending <tr><td><b>. Something like this (which I imagine is possible to implement using the existing #exploring:)
 
s1 := 'abc<tr><td><b>Indoor</b>&nbsp;</td>' reading.
s1 ending: '<tr><td><b>' do: [:s2 |
   s2 get "returns $a"].
s1 get "returns $I"
 
I kind of like this approach: part of the problem in my earlier tests was that both s1 and s2 were in scope, and at the same level, so it seemed as if you should be able to manipulate both of them somewhat independently. Obviously you can't! Creating and using s2 within a block based on a message sent to s1 hints that you shouldn't mess with s1 while in the block (like not messing with a collection that you are iterating over).
 
> We spent some time coming up with reasonably helpful printString
> of stacks. It's not clear yet how successful we were.
 
Very! I haven't used it enough to answer fully, but I was already surprised to see such a useful printString. So much better than old VW streams, where the first thing I have to do is open Trippy, look at the position, go to the collection, scroll to the right index, select a few indexes either side, and then try and read the individual elements as a string!
 
All the best,
Steve

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

kobetic
In reply to this post by kobetic
"Steven Kelly"<[hidden email]> wrote:
> >> I wouldn't expect (s2 -= 0) to affect s1's position at all.
> >
> > That isn't quite possible given our goals. ... s2 has to read the whole
> > boundary sequence from s1 before it can decide that it is trully at end.
>
> Got it. I suppose I thought s1 would be positioned after the ending <tr><td><b>, but s2 before it.  If s2 shares not only the underlying collection (or other source) but also the position, that obviously can't happen. I wonder whether giving s2 its own position would be impossible?

What would you want to get out of it ? Maybe positionability of ending: substreams ? It's probably doable, but at the time we've decided to punt on that. Currently, ending: substreams are non-positionable, like many other transforms. Positionability is tricky in many cases and I think we have a reasonably solid answer with our positioning layer, which allows to turn any stack into a positionable one for the cost of buffering. So if non-positionability of any type of transform is causing you grief, the positioning layer can hopefully help. In some sense making a transform positionable is an optimization (not just in terms of speed, but also in terms of memory or complexity in general). We try to do that at least for those transforms that are easily positionable (e.g. limiting: in the substream category).

>Alternatively, maybe all s2 processing could be in a block (like exploring), during which position would be restricted to positions legal in s2; after the block finished, position of s1 would jump to after the ending <tr><td><b>. Something like this (which I imagine is possible to implement using the existing #exploring:)
>  
> s1 := 'abc<tr><td><b>Indoor</b>&nbsp;</td>' reading.
> s1 ending: '<tr><td><b>' do: [:s2 |
>    s2 get "returns $a"].
> s1 get "returns $I"
>  
> I kind of like this approach: part of the problem in my earlier tests was that both s1 and s2 were in scope, and at the same level, so it seemed as if you should be able to manipulate both of them somewhat independently. Obviously you can't! Creating and using s2 within a block based on a message sent to s1 hints that you shouldn't mess with s1 while in the block (like not messing with a collection that you are iterating over).

I think you can say the same about any element of a stream stack. Messing with a stream in the middle of a stack is likely to mess up the transforms above it. So, as a general guideline, code should interact with the top of the stack. And that's consistent with the primary goal of abstracting away specifics of the actual source. Code using a complex stream stack should work the same with a simple collection stream with the same content as that produced by the stack. In that sense it shouldn't be aware of the inner structure of the stack. Of, course, it's not as simple as that. There must be the other code that actual prepares the stack for the consumer. That on the other hand necessarily has intimate knowledge of (at least some part) of the stack, to know when to add some transforms and when to take some away. Also, sometimes you want to fork the stack into several branches and read from different branches in an interleaved fashion. For example one pattern we use is setting !
 up several different interpreting: transforms on top of the same stack (interpreting: transform interprets bytes from its source as particular C-type, e.g. int, float, double, ...), and reading from the right interpreter depending of what type of value is expected next. These kinds of tricks require understanding of how the transform interacts with its source, e.g. you have to rely on the fact that interpreting: transform will leave the source positioned right after the last byte of the last value read, so that you can interpret the next value (Note that this would be the usual outcome with transforms that just read from the source, which we try to do with as many as we can).

Anyway, I don't think we have enough experience with composition/management of stream stacks to provide a comprehensive guideline. We'll just need to learn more about it together.
 
> > We spent some time coming up with reasonably helpful printString
> > of stacks. It's not clear yet how successful we were.
>  
> Very! I haven't used it enough to answer fully, but I was already surprised to see such a useful printString. So much better than old VW streams, where the first thing I have to do is open Trippy, look at the position, go to the collection, scroll to the right index, select a few indexes either side, and then try and read the individual elements as a string!

Thanks. I believe there still are quite a few specific stream types that could print more informatively, but we'll improve that as we go.

Cheers,

Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Steven Kelly
In reply to this post by kobetic
Thanks Martin, that all sounds good to me!

> > I wonder whether giving s2 its own position would be impossible?
>
> What would you want to get out of it ?

It would just help stop the side effects of some operations on s2 from
affecting s1. It sounds like a major change, though, and I don't think
there's enough evidence that it's worth it. I'll go with my #ending:do:
variant, implemented using #exploring:.
 
> Messing with a stream in the middle of a stack is likely to mess up
the
> transforms above it. So, as a general guideline, code should interact
> with the top of the stack.

I agree. I'll just note in passing that that is exactly what I thought I
was doing: s2 is the top of the stack. I'll play around with #ending:do:
to see if I can effectively insulate the rest of the stack from position
changes.

Thanks for all your help!
Steve

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc