Thoughts on Xtreams

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Thoughts on Xtreams

Chris Cunnington-4
I'm using the Altitude web framework. It is based on Xtreams. I've tied myself into knots and now I have had an epiphany about what the problem is. I don't think I have a working model in my head about what Xtreams is doing or why. I'm going to say a few things here for other people hopefully to have a comment on.

My conception of Streams comes from Ch. 10 of "Squeak By Example". This is a very friendly chapter (Thank you Stephane Ducasse), because the examples are short, complete and fit into a Workspace.

There are some examples on the Xtreams Google Code page that do fit into a Workspace, but I don't think they help understanding Xtreams, really. There is a bigger picture solution that Xtreams addresses. I can see where it's pointing, I just cannot see how to use it yet.

The problem is I keep wanting to see a whole stream at once, to put it into a Workspace. Xtreams does not let you see a whole stream coming over a socket at once. Ever. It's maddening. I want to see what I'm dealing with and then deal with it. All I see are classes piled on top of each other.

And then I remembered a conversation I had at an event where someone said," Xtreams are just like the infinite streams described in 'The Structure And Interpretation Of Computer Programs', isn't that cool. You can process an infinite stream."

Yes, I nodded. It sure is cool. I made a mental note to look up streams in SICP at a future date.

This is that date. And the SICP has some fascinating things to say about streams.  From section 3.5:

"Stream processing lets us model systems that have state without ever using assignment or mutable data. This has important implications, both theoretical and practical, because we can build models that avoid drawbacks inherent in introducing assignment. On the other hand, the stream framework raises difficulties of its own, and the question of which modelling technique leads to more modular and easily maintained systems remains open."

"Unfortunately, if we represent sequences as lists, this elegance is bought at the price of sever inefficiency with respect to both time and space required by our computations. When we represent manipulations on sequences as transformations of lists, our programs must construct and copy data structures (which may be huge) at every step of the process."

Our current streams implementation is inefficient because it copies everything whole, which consumes masses of unnecessary memory. With Xtreams we can manage an infinitely long stream, saving resources. But it leads to the problem I've been complaining about which is I cannot see the stream I'm working on. That is something I'm going to have to adapt to. It sounds worth the effort. But as the first passage makes clear, it's modelling the data in a different, dare I say non-object oriented way (functional?).

Why do we want Xtreams? It saves resources and allows us to process infinitely. In comparison, our Streams classes are crude and inefficient. What price do we have to pay? We need to mentally conceptualize streams in a different -- perhaps, extremely different -- way.

I do not have a mental model for what Xtreams is doing yet. How do other people conceptualize using Xtreams in contrast to the existing Streams implementation?

Chris






Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on Xtreams

Colin Putney-3


On Fri, Oct 2, 2015 at 8:06 AM, Chris Cunnington <[hidden email]> wrote:
 
I do not have a mental model for what Xtreams is doing yet. How do other people conceptualize using Xtreams in contrast to the existing Streams implementation? 

I think the elegance of Xtreams boils down to one thing: composition is better than inheritance. 

If you look at the Squeak stream hierarchy, it's a nightmare. Among the highlights:

  • There's a whole subhierarchy dedicated to compression. It successfully shares code for different types of compression but since DeflateStream inherits from PositionalStream, it can only write compressed data into memory. If you want to write compressed data to a file, you'll have to write to a compression stream, get its contents and write that to a file stream.
  • On the flip side, there's CrLfFileStream, which converts line endings when reading and writing to a file. Except wait, it's obsolete now and CrLfFileStream new actually returns an instance of MultiByteFileStream. This class has an annoying name, because it's camel-cased on syllable boundaries as well as word boundaries. Ugh. Worse, it combines line-end conversion with encoding conversion, but only when it text mode. Well, most of the time, when in text mode. You gotta be careful about those few methods that manipulate the file position in terms of bytes, because that can leave it in the middle of a multibyte character and then nothing works right. And if you want to do any of this conversion on data in memory, you're outta luck because MultiByteFileStream only works on data in files.
  • Luckily MultiByteBinaryOrTextStream is here to save the day. (Again with the capitals on syllable breaks.) It *does* work on data in memory. It has a whole separate implementation of the encoding and line-ending conversion code, plus no-nops implementations of the file-related stuff in MultiByteFileStream so the two are polymorphic. So convenient.
  • There's also ReadWriteStream, which subclasses WriteStream, and re-implements all of ReadStream's functionality.
  • There's also SocketStream, for convenience in doing network IO. Oh wait, it's not part of the Stream hierarchy at all. Never mind.
There's more (lots more), but let's not get sidetracked. The point is that there is just no way to have a sane inheritance hierarchy for a whole bunch of orthogonal concerns:
  • the underlying data storage - memory, file, socket or something more exotic
  • data transformation - encoding, compression, buffering, chunking etc
  • reading vs writing
Where Squeak streams try to do everything in one object, and combine different options via inheritance, Xtreams splits a stream into a pipeline of objects that each provide a separate bit of functionality. It's so much more flexible.

In the specific case of Altitude, Xtreams provide two main advantages:

First, the framework can build a custom pipeline of streams based on message headers. To handle a request, we just examine the headers, build the appropriate sequence of transformation streams and hand that off to the application for reading. When the response is ready, we again look at the headers, build a stream that performs all the transformations that the app has indicated it wants, and let the app write into it. We can use any of the features of HTTP, while still providing a simple and consistent interface to the app.

Second, it really helps with perceived performance. Browsers have incredibly optimized strategies for processing data as it arrives from the network. They're really good at rendering a page incrementally, adding content and refining the appearance as more data arrives. Xtreams allows Altitude to take advantage of that by doing the same thing on the server side. As the app is rendering content, we push it through the transformation streams in small chunks, and get it on the network as quickly as possible. (ALStreamingExample is a good demo of this.)

Of course, there's a lot of other little details to like about Xtreams, I think composition is what really makes it shine.

Colin



Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on Xtreams

Chris Cunnington-4


On 2015-10-03 1:45 AM, Colin Putney wrote:


On Fri, Oct 2, 2015 at 8:06 AM, Chris Cunnington <[hidden email]> wrote:
 
I do not have a mental model for what Xtreams is doing yet. How do other people conceptualize using Xtreams in contrast to the existing Streams implementation? 

I think the elegance of Xtreams boils down to one thing: composition is better than inheritance. 

If you look at the Squeak stream hierarchy, it's a nightmare. Among the highlights:

  • There's a whole subhierarchy dedicated to compression. It successfully shares code for different types of compression but since DeflateStream inherits from PositionalStream, it can only write compressed data into memory. If you want to write compressed data to a file, you'll have to write to a compression stream, get its contents and write that to a file stream.
  • On the flip side, there's CrLfFileStream, which converts line endings when reading and writing to a file. Except wait, it's obsolete now and CrLfFileStream new actually returns an instance of MultiByteFileStream. This class has an annoying name, because it's camel-cased on syllable boundaries as well as word boundaries. Ugh. Worse, it combines line-end conversion with encoding conversion, but only when it text mode. Well, most of the time, when in text mode. You gotta be careful about those few methods that manipulate the file position in terms of bytes, because that can leave it in the middle of a multibyte character and then nothing works right. And if you want to do any of this conversion on data in memory, you're outta luck because MultiByteFileStream only works on data in files.
  • Luckily MultiByteBinaryOrTextStream is here to save the day. (Again with the capitals on syllable breaks.) It *does* work on data in memory. It has a whole separate implementation of the encoding and line-ending conversion code, plus no-nops implementations of the file-related stuff in MultiByteFileStream so the two are polymorphic. So convenient.
  • There's also ReadWriteStream, which subclasses WriteStream, and re-implements all of ReadStream's functionality.
  • There's also SocketStream, for convenience in doing network IO. Oh wait, it's not part of the Stream hierarchy at all. Never mind.
There's more (lots more), but let's not get sidetracked. The point is that there is just no way to have a sane inheritance hierarchy for a whole bunch of orthogonal concerns:
  • the underlying data storage - memory, file, socket or something more exotic
  • data transformation - encoding, compression, buffering, chunking etc
  • reading vs writing
Where Squeak streams try to do everything in one object, and combine different options via inheritance, Xtreams splits a stream into a pipeline of objects that each provide a separate bit of functionality. It's so much more flexible.
I think I'm starting to see Xtreams like this.

A block can be used as a filter for iterating over the elements of a stream. If you have different filter blocks, then you can wrap each of  them in class and have a library of different filters.  If you compose these classes, then you have a pipeline of blocks that does a variety of things at once.

This has a functional feel because you cannot stop and examine things, because a filter has no state. It's just passing things on. Once it's composed, you need to wait until the end to see what happened. As such, it can handle infinite strings, as suggested in the SICP.

In the specific case of Altitude, Xtreams provide two main advantages:

First, the framework can build a custom pipeline of streams based on message headers. To handle a request, we just examine the headers, build the appropriate sequence of transformation streams and hand that off to the application for reading.
When the response is ready, we again look at the headers, build a stream that performs all the transformations that the app has indicated it wants, and let the app write into it. We can use any of the features of HTTP, while still providing a simple and consistent interface to the app

request := ALAuthenticationRequest new.    
aRequest readEntityWith: [:in| ALJsonParser parse: in for: request ]

This was doing my head in. I expected all the stream contents to be rallied in the aRequest location, the whole stream would be there, so I could see it. But the Relay/Transforms phase is not over at that point. There is no state. It's not until the data is all in the ALAuthenticationRequest that I can examine it. That block is like another Xtreams filter. Stateless. All it does is sort things as they go by. And that's why I was confused. Foiled expectations.

Second, it really helps with perceived performance. Browsers have incredibly optimized strategies for processing data as it arrives from the network. They're really good at rendering a page incrementally, adding content and refining the appearance as more data arrives. Xtreams allows Altitude to take advantage of that by doing the same thing on the server side. As the app is rendering content, we push it through the transformation streams in small chunks, and get it on the network as quickly as possible. (ALStreamingExample is a good demo of this.)
I think the word here is chunking. Since HTTP/1.1 came out I imagine clients have been optimizing for this. Servers? Not so much. I imagine that servers have never worked very hard to send data with the fine level of granularity equivalent to what clients can receive.
But I suppose this is really about buffer size and flushing frequency. I imagine Seaside could chunk its responses, but instead saves every response into one large buffer and then flushes once.
With Altitude I can put 'self halt' in the middle of a page and watch half a a page render. The buffer size is set to 1K and flushes when the buffer is full, which doesn't take long.

Chris


Of course, there's a lot of other little details to like about Xtreams, I think composition is what really makes it shine.

Colin






Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on Xtreams

Colin Putney-3


On Mon, Oct 5, 2015 at 9:04 AM, Chris Cunnington <[hidden email]> wrote:

But I suppose this is really about buffer size and flushing frequency. I imagine Seaside could chunk its responses, but instead saves every response into one large buffer and then flushes once.
With Altitude I can put 'self halt' in the middle of a page and watch half a a page render. The buffer size is set to 1K and flushes when the buffer is full, which doesn't take long. 

Right. Seaside feels slower, even when the rendering time is the same. However, Seaside handles application errors more gracefully than Altitude. If there's an error halfway down the page, Seaside catches it and sends a nice error page page to the browser, with a walkback and a debug link. Altitude can't do that, since the headers and half the page have already been sent. Different tradeoffs.




Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on Xtreams

mkobetic
In reply to this post by Chris Cunnington-4
"Chris Cunnington"<[hidden email]> wrote:
> This has a functional feel because you cannot stop and examine things,
> because a filter has no state. It's just passing things on. Once it's
> composed, you need to wait until the end to see what happened. As such,
> it can handle infinite strings, as suggested in the SICP.

Debugging streams is definitely an issue that could use attention. To be fair I don't think it's any worse with Xtreams, than it is with the classic streams. Sure if you have an in-memory stream you just dive into the underlying collection, but that's the same with Xtreams. Frameworks like Altitude have very good reasons for not doing that however. Admittedly that doesn't change the fact that it's still challenging to debug those kinds of streams.

There are some basic tools that could be used in some circumstances. There's DuplicateRead/WriteStream that can be injected at any point of the stream stack/pipeline to copy whatever flows through it into another destination (Transcript, Stdout, file etc). Problem is, you're usually handed a ready made stack of streams and you shouldn't be making assumptions about its composition.

That however is exactly what a competent stream inspector needs to do. Ultimately it needs to break open the encapsulation, carefully step around the internals of whatever stream layer it's digging into without triggering a state change, and excavate whatever relevant bit of info there is. There is a poor man's skeleton of something like that in the printOn:/streamingPrintOn: methods, but it's too bare bones to be useful in many cases. TBH, I'm not convinced that's the right way to go about it either.

> I think the word here is chunking. Since HTTP/1.1 came out I imagine
> clients have been optimizing for this. Servers? Not so much. I imagine
> that servers have never worked very hard to send data with the fine
> level of granularity equivalent to what clients can receive.
> But I suppose this is really about buffer size and flushing frequency. I
> imagine Seaside could chunk its responses, but instead saves every
> response into one large buffer and then flushes once.
> With Altitude I can put 'self halt' in the middle of a page and watch
> half a a page render. The buffer size is set to 1K and flushes when the
> buffer is full, which doesn't take long.

I'd say it's actually the other way around, it's the servers that care about chunking. The protocol requires that the server either specifies the size of the response in bytes, or it chunks (I'm deliberately ignoring HTTP1.0 zombie). So for any dynamically computed response the server will want to chunk. Otherwise it can't send any bytes until the full response is generated. Note that you may have to actually generate several copies of the content in various stages of transfer encoding in order to get correct final byte size. Stream processing + chunking is really the only sane way to handle that IMO.

Browsers actually don't really care whether the response is chunked or not, they will partially render whatever fragment of the page they have as soon as they get it. Granted I'm handwaving over the much more complex reality of today's web pages. Regardless it's the server side where chunking enables sending incomplete response fragments.

Martin

Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on Xtreams

Frank Shearar-3
In reply to this post by Chris Cunnington-4
On 5 October 2015 at 17:04, Chris Cunnington <[hidden email]> wrote:

>
>
> On 2015-10-03 1:45 AM, Colin Putney wrote:
>
>
>
> On Fri, Oct 2, 2015 at 8:06 AM, Chris Cunnington <[hidden email]> wrote:
>
>>
>> I do not have a mental model for what Xtreams is doing yet. How do other
>> people conceptualize using Xtreams in contrast to the existing Streams
>> implementation?
>
>
> I think the elegance of Xtreams boils down to one thing: composition is
> better than inheritance.
>
> If you look at the Squeak stream hierarchy, it's a nightmare. Among the
> highlights:
>
> There's a whole subhierarchy dedicated to compression. It successfully
> shares code for different types of compression but since DeflateStream
> inherits from PositionalStream, it can only write compressed data into
> memory. If you want to write compressed data to a file, you'll have to write
> to a compression stream, get its contents and write that to a file stream.
> On the flip side, there's CrLfFileStream, which converts line endings when
> reading and writing to a file. Except wait, it's obsolete now and
> CrLfFileStream new actually returns an instance of MultiByteFileStream. This
> class has an annoying name, because it's camel-cased on syllable boundaries
> as well as word boundaries. Ugh. Worse, it combines line-end conversion with
> encoding conversion, but only when it text mode. Well, most of the time,
> when in text mode. You gotta be careful about those few methods that
> manipulate the file position in terms of bytes, because that can leave it in
> the middle of a multibyte character and then nothing works right. And if you
> want to do any of this conversion on data in memory, you're outta luck
> because MultiByteFileStream only works on data in files.
> Luckily MultiByteBinaryOrTextStream is here to save the day. (Again with the
> capitals on syllable breaks.) It *does* work on data in memory. It has a
> whole separate implementation of the encoding and line-ending conversion
> code, plus no-nops implementations of the file-related stuff in
> MultiByteFileStream so the two are polymorphic. So convenient.
> There's also ReadWriteStream, which subclasses WriteStream, and
> re-implements all of ReadStream's functionality.
> There's also SocketStream, for convenience in doing network IO. Oh wait,
> it's not part of the Stream hierarchy at all. Never mind.
>
> There's more (lots more), but let's not get sidetracked. The point is that
> there is just no way to have a sane inheritance hierarchy for a whole bunch
> of orthogonal concerns:
>
> the underlying data storage - memory, file, socket or something more exotic
> data transformation - encoding, compression, buffering, chunking etc
> reading vs writing
>
> Where Squeak streams try to do everything in one object, and combine
> different options via inheritance, Xtreams splits a stream into a pipeline
> of objects that each provide a separate bit of functionality. It's so much
> more flexible.
>
> I think I'm starting to see Xtreams like this.
>
> A block can be used as a filter for iterating over the elements of a stream.
> If you have different filter blocks, then you can wrap each of  them in
> class and have a library of different filters.  If you compose these
> classes, then you have a pipeline of blocks that does a variety of things at
> once.
>
> This has a functional feel because you cannot stop and examine things,
> because a filter has no state. It's just passing things on. Once it's
> composed, you need to wait until the end to see what happened. As such, it
> can handle infinite strings, as suggested in the SICP.
>
> In the specific case of Altitude, Xtreams provide two main advantages:
>
> First, the framework can build a custom pipeline of streams based on message
> headers. To handle a request, we just examine the headers, build the
> appropriate sequence of transformation streams and hand that off to the
> application for reading.
>
> When the response is ready, we again look at the headers, build a stream
> that performs all the transformations that the app has indicated it wants,
> and let the app write into it. We can use any of the features of HTTP, while
> still providing a simple and consistent interface to the app
>
>
> request := ALAuthenticationRequest new.
> aRequest readEntityWith: [:in| ALJsonParser parse: in for: request ]
>
> This was doing my head in. I expected all the stream contents to be rallied
> in the aRequest location, the whole stream would be there, so I could see
> it. But the Relay/Transforms phase is not over at that point. There is no
> state. It's not until the data is all in the ALAuthenticationRequest that I
> can examine it. That block is like another Xtreams filter. Stateless. All it
> does is sort things as they go by. And that's why I was confused. Foiled
> expectations.

I have a minor quibble with "stateless" in that sentence. An Xtreams
object can be as stateful as you like - stream that count, or groups,
or deduplicate, must all keep some kind of state.

But I think what you're talking about is that when you compose this
pipeline of processors, you have assembled a _computation_, you have
not yet _computed_ anything. So the state of the stream itself isn't
visible. Bear in mind too that Xtreams is _lazy_: you only get to see
the transformed output of a stream upon request. Exactly like a nested
pipeline of blocks, in fact.

It would super cool if there was an inspector that could _show_ you
how the composed stream will manipulate the data...

frank