Newbie to Smalltalk and VW.
I am interested in reading and writing some large, complicated binary files. Is there any documentation on how to get started with such a project in VW? In Squeak, I had to practically roll my own solution and the I/O performance is terrible (about 50x slower than C, 25x slower than Python). I wanted to give VW a try and see if the JIT helped performance. The binary format uses every C type in the book. So I am looking for big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned integers, IEEE 32-bit and 64-bit floating point types. Any pointers appreciated. Thanks, David _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
David Finlayson wrote:
> Newbie to Smalltalk and VW. > Welcome! > I am interested in reading and writing some large, complicated binary > files. Is there any documentation on how to get started with such a > project in VW? In Squeak, I had to practically roll my own solution > and the I/O performance is terrible (about 50x slower than C, 25x > slower than Python). I wanted to give VW a try and see if the JIT > helped performance. > > The binary format uses every C type in the book. So I am looking for > big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned > integers, IEEE 32-bit and 64-bit floating point types. > This may look bad but since you seem to want max performance this is a good thing, it allows you to tweak the code for speed - since you own it no third party stuff will break when you do so. VW has a decent profiler in the Advanced Tools package, you will need it ;-) The JIT of VW will make things faster relative to Squeak, but be aware that it does only local optimizations - it can optimize a single method but has no algorithms to optimize code spread out over several methods (inlining, specialization etc). The fallout is that you will want to write your inner loops such that each is defined inside a single method so the JIT has a chance to optimize them (IOW do some manual inlining to minimize message sends). I also recently experienced that parsing need not be the bottleneck. I while ago I played with parsing the (binary) iTunes library of an ipod and discovered that parsing several MB took a split second (recursive descent parser) . When I hooked that parser to a node builder the performance dropped dramatically though. I guess this means that if you are parsing the file to extract only part of the information then it pays off to handle such filtering in your parser and not further down the line. As usual watch out for premature optimizations, write your code as naively as possible, then *measure* where the time goes in this baseline implementation before considering an optimization (and as mentioned above make sure to measure performance of not only the parser but also the rest of your data path- node building tree walking, tree transforming etc). I've been surprised by such measurements many times, rarely was the time spent where I predicted it would be.... Enjoy! Reinout ------- _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
On Wed, 20 Aug 2008 10:14:29 +0200
Reinout Heeck <[hidden email]> wrote: > The situation is pretty much the same on VW: roll your own. > This may look bad but since you seem to want max performance this is a > good thing, it allows you to tweak the code for speed - since you own it > no third party stuff will break when you do so. VW has a decent profiler > in the Advanced Tools package, you will need it ;-) Another helpful item would be Andres Valloud's "Mentoring Course for Smalltalk", readily available at lulu.com. s. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by David Finlayson-4
For some starting tips, see:
http://www.cincomsmalltalk.com/userblogs/cincom/blogView?content=smalltalk_daily_libraries scroll down through the topics, there's a screencast on binary files there James Robertson Cincom Smalltalk Product Evangelist http://www.cincomsmalltalk.com/blog/blogView Talk Small and Carry a Big Class Library On Aug 20, 2008, at 2:02 AM, David Finlayson wrote: > Newbie to Smalltalk and VW. > > I am interested in reading and writing some large, complicated binary > files. Is there any documentation on how to get started with such a > project in VW? In Squeak, I had to practically roll my own solution > and the I/O performance is terrible (about 50x slower than C, 25x > slower than Python). I wanted to give VW a try and see if the JIT > helped performance. > > The binary format uses every C type in the book. So I am looking for > big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned > integers, IEEE 32-bit and 64-bit floating point types. > > Any pointers appreciated. > > Thanks, > > David > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by David Finlayson-4
David,
I'd recommend trying to read a file into an instance of UninterpretedBytes, and then using methods such as doubleAt: to read objects from it. Those methods are primitives, so assuming the format is stable enough and the required coordination is not too onerous, this approach may pay off. Just a thought, Andres. PS: I see your address is from USGS. Are you by any chance doing work with earthquakes? I live in California and I really can't help wondering about the mathematic models behind e.g.: the shaking forecast... David Finlayson wrote: > Newbie to Smalltalk and VW. > > I am interested in reading and writing some large, complicated binary > files. Is there any documentation on how to get started with such a > project in VW? In Squeak, I had to practically roll my own solution > and the I/O performance is terrible (about 50x slower than C, 25x > slower than Python). I wanted to give VW a try and see if the JIT > helped performance. > > The binary format uses every C type in the book. So I am looking for > big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned > integers, IEEE 32-bit and 64-bit floating point types. > > Any pointers appreciated. > > Thanks, > > David > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Reinout Heeck-2
On Aug 20, 2008, at 1:14 AM, Reinout Heeck wrote: > David Finlayson wrote: >> Newbie to Smalltalk and VW. >> > Welcome! Ditto. > > > >> I am interested in reading and writing some large, complicated binary >> files. Is there any documentation on how to get started with such a >> project in VW? In Squeak, I had to practically roll my own solution >> and the I/O performance is terrible (about 50x slower than C, 25x >> slower than Python). I wanted to give VW a try and see if the JIT >> helped performance. >> >> The binary format uses every C type in the book. So I am looking for >> big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned >> integers, IEEE 32-bit and 64-bit floating point types. >> > The situation is pretty much the same on VW: roll your own. > This may look bad but since you seem to want max performance this is a > good thing, it allows you to tweak the code for speed - since you > own it > no third party stuff will break when you do so. VW has a decent > profiler > in the Advanced Tools package, you will need it ;-) The scaffolding for what you want to do is mostly there, you just need to add some methods. I've build binary reading abilities twice now at other companies. First of all, you make sure your stream is in binary mode: stream := 'somefilename.ext' asFilename readStream. stream binary Then you're going to add a number of new methods to the Stream class (put them in your own package). Here's an example: Stream>>nextBESigned32 | bytes | bytes := self next: 4. bytes changeClassTo: UninterpretedBytes. ^bytes unsignedLongAt: 1 bigEndian: true What this method does, is reads N bytes (where N is the byte size of the entity you're trying to interpret from the stream). That method (since you're in binary mode) will return a ByteArray. But all of the "interpret these bytes as this kind of fp/integer are on a similar class called UninterpretedBytes. So we convert the byte array to that class type. And then invoke the appropriate API. You can browse UninterpretedBytes and see all of the reading methods there. To avoid having to do the conversion when I've done these, I've copied the various floatAt:bigEndian: etc APIs over to ByteArray. -- Travis Griggs Objologist "Every institution finally perishes by an excess of its own first principle." - Lord Acton _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Andres Valloud-3
That was what I was looking for. I haven't had time to read through
all the VW documentation yet, just trying to wrap my head around it all. > > PS: I see your address is from USGS. Are you by any chance doing work > with earthquakes? I live in California and I really can't help > wondering about the mathematic models behind e.g.: the shaking forecast... > Not directly. I do sonar signal processing for our Coastal and Marine Geology program. I recently worked on a project to locate the San Andreas Fault beneath San Francisco's reservoir system (Crystal Springs). We do a lot of custom programing to extract the data we need from sonar and seismic instruments. I am just exploring the pros and cons of a higher-level language than C for some of our upcoming projects. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |