lesson learned trying external configuration of compression stream classes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

lesson learned trying external configuration of compression stream classes

Wayne Johnston
https://www.instantiations.com/docs/92/wwhelp/wwhimpl/js/html/wwhelp.htm#href=pr/compression2.html
Because the best compression choice depends on circumstances, I was tempted to not hard-code a decision on which read & write stream classes to use.  But I found at least for LZ4, the lz4Compress and lz4Decompress methods are much faster than the equivalent general stream code shown on that page.  Just FYI.

Still liking the idea of being able to configure the class, I may add my own methods on the class side of the stream classes, for instance in LZ4WriteStream I would add a method like #compress: which would simply use lz4Compress

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/d015f615-b9a1-437b-b86d-4c0cf73b2d3a%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Seth Berman
Greetings Wayne,

Your observation is correct and that is the expected performance.  Each way, streams vs one-shot have their own trade-offs.
For more information, see my "Unified Compression Streams" speech at ESUG 2019
https://youtu.be/neTO5M1Y6e0

One-Shot
The most performant, but requires the data to be in-memory with the boundaries known.  This may or may not be the case.
And you can't fit the compression/decompression behavior seamlessly into existing usages of streams.

Streaming
Useful to fit into existing streams and can wrap memory/file/socket streams as they are composable and polymorphic with Streams.
Does NOT require the size of the data is known or that it is already in memory.
For example, you can work on data as it comes across the socket.  You can write in-memory zip files across the socket without
ever needing file I/O.  If you have 1GB worth of data, you can work on it incrementally in small chunks instead of requiring the
creation of a 1GB ByteArray in memory.  Or perhaps the data is even too large for a Smalltalk container....you can stream across 16GB worth of data if you want.

An example where the streams came in really handy was with IMAP.  IMAP support the COMPRESS extension which means that
the data going back/forth with the server is compressed using DEFLATE.  Normally, the connection works with a SocketStream, but
all I needed to do was subclass InflateReadStream to create SstImap4CompressionStream.  This stream wrapped the socketStream
and required very little on my part to get the behavior.  This would not have worked quite the same with one-shot APIs and would have
complicated the implementation quite a bit.

- Seth


On Sunday, December 8, 2019 at 12:22:13 PM UTC-5, Wayne Johnston wrote:
<a href="https://www.instantiations.com/docs/92/wwhelp/wwhimpl/js/html/wwhelp.htm#href=pr/compression2.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.instantiations.com%2Fdocs%2F92%2Fwwhelp%2Fwwhimpl%2Fjs%2Fhtml%2Fwwhelp.htm%23href%3Dpr%2Fcompression2.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFZFYiyh9F6wJzJ6pf3HRrJoV6mHQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.instantiations.com%2Fdocs%2F92%2Fwwhelp%2Fwwhimpl%2Fjs%2Fhtml%2Fwwhelp.htm%23href%3Dpr%2Fcompression2.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFZFYiyh9F6wJzJ6pf3HRrJoV6mHQ&#39;;return true;">https://www.instantiations.com/docs/92/wwhelp/wwhimpl/js/html/wwhelp.htm#href=pr/compression2.html
Because the best compression choice depends on circumstances, I was tempted to not hard-code a decision on which read & write stream classes to use.  But I found at least for LZ4, the lz4Compress and lz4Decompress methods are much faster than the equivalent general stream code shown on that page.  Just FYI.

Still liking the idea of being able to configure the class, I may add my own methods on the class side of the stream classes, for instance in LZ4WriteStream I would add a method like #compress: which would simply use lz4Compress

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/a04cc39c-b649-4a00-906a-13da17b9b2dc%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Wayne Johnston
Thanks. Makes sense.

I was thinking Brotli was real slow, then I realized the default is maximum compression.

What would be cool is a way for a client to send a compressed string to a server, and the server to write the uncompressed string to a file, without the server needing the full uncompressed string in its memory.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/3a1a1821-bbcc-4ac8-9edc-6dd9995f974f%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Seth Berman
Hi Wayne,

Are you saying there is no way to support your scenario now?  How so?
If the server wrapped an algo-specific EsCompressionReadStream around the receive socket, then you could just
pull a buffer worth of data and pass that to a CfsWriteFileStream.
Then loop and keeping doing this until the data was all received.
Or I may have misunderstood what you were asking.

- Seth

On Sunday, December 8, 2019 at 9:53:51 PM UTC-5, Wayne Johnston wrote:
Thanks. Makes sense.

I was thinking Brotli was real slow, then I realized the default is maximum compression.

What would be cool is a way for a client to send a compressed string to a server, and the server to write the uncompressed string to a file, without the server needing the full uncompressed string in its memory.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/d44bd90c-edb4-4901-903c-4ecdc7f46724%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Wayne Johnston
Sorry, I should have said with normal SST client-server communication.  Yes I saw mention of being able to hook in with sockets, but I think that would require a separate (from SST) socket/port with this specialized handling.  Or I could just be displaying my ignorance...

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/4f0e5a91-88ff-42ec-bec9-ed2f9711285f%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Wayne Johnston
Scratch my socket thought.  I now remember my thinking.

If a server was sent (say) a zstdCompressed string which was originally 1GB, and the server wants to write the uncompressed string to a file, the server essentially has to do:
myCfsWriteStream nextPutAll: compressedBytes zstdUncompressed
thereby getting that whole 1GB string in memory.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/0f95f0b7-5c44-4b91-b4f1-150a70544827%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Seth Berman
Hi Wayne,

To answer your first question, yes, you need an SstSocketStream from Server Smalltalk - TCP Communications as the source/sink to wrap an EsCompressionStream around.

As to your second point, I'm saying that it can be arranged such that 'compressedBytes' can be pulled off the socket incrementally and pushed to the file stream in parts.
It's certainly easier, I won't deny that to do what you showed.  But you can arrange things such that the server has a socket, wrapped by a socket stream.
So getting data on the server-side is a bunch of #next, #next:, #nextLine... actions of internally buffered data in the socket stream instead of a bunch of #recv calls on a primitive socket.

Once you know the the next number of bytes to get, which may be zstdCompressed...then wrap an ZstdReadStream around the socket stream and go into a
next loop, grabbing bits of uncompressed zstd data and pushing it into a file stream.
Here is a really primitive example to make the point...and this is possibly blocking on zstdStream next since it wraps a socket stream.
bytesExpected timesRepeat: [fileStream nextPut: zstdStream next]

- Seth

The trick here is to make sure you don't overread into the zstd stream since it is internally buffered.
If you don't want to mess around with

On Tuesday, December 10, 2019 at 6:56:48 AM UTC-5, Wayne Johnston wrote:
Scratch my socket thought.  I now remember my thinking.

If a server was sent (say) a zstdCompressed string which was originally 1GB, and the server wants to write the uncompressed string to a file, the server essentially has to do:
myCfsWriteStream nextPutAll: compressedBytes zstdUncompressed
thereby getting that whole 1GB string in memory.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/07aeb606-c96b-4548-89e6-a5e85906c1f4%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: lesson learned trying external configuration of compression stream classes

Seth Berman

"The trick here is to make sure you don't overread into the zstd stream since it is internally buffered.
If you don't want to mess around with"

- too much context switching...just forget this part at the end

On Tuesday, December 10, 2019 at 9:16:35 AM UTC-5, Seth Berman wrote:
Hi Wayne,

To answer your first question, yes, you need an SstSocketStream from Server Smalltalk - TCP Communications as the source/sink to wrap an EsCompressionStream around.

As to your second point, I'm saying that it can be arranged such that 'compressedBytes' can be pulled off the socket incrementally and pushed to the file stream in parts.
It's certainly easier, I won't deny that to do what you showed.  But you can arrange things such that the server has a socket, wrapped by a socket stream.
So getting data on the server-side is a bunch of #next, #next:, #nextLine... actions of internally buffered data in the socket stream instead of a bunch of #recv calls on a primitive socket.

Once you know the the next number of bytes to get, which may be zstdCompressed...then wrap an ZstdReadStream around the socket stream and go into a
next loop, grabbing bits of uncompressed zstd data and pushing it into a file stream.
Here is a really primitive example to make the point...and this is possibly blocking on zstdStream next since it wraps a socket stream.
bytesExpected timesRepeat: [fileStream nextPut: zstdStream next]

- Seth

The trick here is to make sure you don't overread into the zstd stream since it is internally buffered.
If you don't want to mess around with

On Tuesday, December 10, 2019 at 6:56:48 AM UTC-5, Wayne Johnston wrote:
Scratch my socket thought.  I now remember my thinking.

If a server was sent (say) a zstdCompressed string which was originally 1GB, and the server wants to write the uncompressed string to a file, the server essentially has to do:
myCfsWriteStream nextPutAll: compressedBytes zstdUncompressed
thereby getting that whole 1GB string in memory.

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/089a8c68-0490-4824-9cc4-23906396ab5b%40googlegroups.com.