[vwnc] Reading binary files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] Reading binary files

David Finlayson-4
Newbie to Smalltalk and VW.

I am interested in reading and writing some large, complicated binary
files. Is there any documentation on how to get started with such a
project in VW? In Squeak, I had to practically roll my own solution
and the I/O performance is terrible (about 50x slower than C, 25x
slower than Python). I wanted to give VW a try and see if the JIT
helped performance.

The binary format uses every C type in the book. So I am looking for
big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned
integers, IEEE 32-bit and 64-bit floating point types.

Any pointers appreciated.

Thanks,

David
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Reading binary files

Reinout Heeck-2
David Finlayson wrote:
> Newbie to Smalltalk and VW.
>  
Welcome!


> I am interested in reading and writing some large, complicated binary
> files. Is there any documentation on how to get started with such a
> project in VW? In Squeak, I had to practically roll my own solution
> and the I/O performance is terrible (about 50x slower than C, 25x
> slower than Python). I wanted to give VW a try and see if the JIT
> helped performance.
>
> The binary format uses every C type in the book. So I am looking for
> big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned
> integers, IEEE 32-bit and 64-bit floating point types.
>  
The situation is pretty much the same on VW: roll your own.
This may look bad but since you seem to want max performance this is a
good thing, it allows you to tweak the code for speed - since you own it
no third party stuff will break when you do so. VW has a decent profiler
in the Advanced Tools package, you will need it ;-)


The JIT of VW will make things faster relative to Squeak, but be aware
that it does only local optimizations - it can optimize a single method
but has no algorithms to optimize code spread out over several methods
(inlining, specialization etc). The fallout is that you will want to
write your inner loops such that each is defined inside a single method
so the JIT has a chance to optimize them (IOW do some manual inlining to
minimize message sends).


I also recently experienced that parsing need not be the bottleneck.
I while ago I played with parsing the (binary) iTunes library of an ipod
and discovered that parsing several MB took a split second (recursive
descent parser) . When I hooked that parser to a node builder the
performance dropped dramatically though. I guess this means that if you
are parsing the file to extract only part of the information then it
pays off to handle such filtering in your parser and not further down
the line.

As usual watch out for premature optimizations, write your code as
naively as possible, then *measure* where the time goes in this baseline
implementation before considering an optimization (and as mentioned
above make sure to measure performance of not only the parser but also
the rest of your data path- node building tree walking, tree
transforming etc).
I've been surprised by such measurements many times, rarely was the time
spent where I predicted it would be....




Enjoy!

Reinout
-------


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Reading binary files

Stefan Schmiedl
On Wed, 20 Aug 2008 10:14:29 +0200
Reinout Heeck <[hidden email]> wrote:

> The situation is pretty much the same on VW: roll your own.
> This may look bad but since you seem to want max performance this is a
> good thing, it allows you to tweak the code for speed - since you own it
> no third party stuff will break when you do so. VW has a decent profiler
> in the Advanced Tools package, you will need it ;-)

Another helpful item would be Andres Valloud's "Mentoring Course for
Smalltalk", readily available at lulu.com.

s.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Reading binary files

James Robertson-7
In reply to this post by David Finlayson-4
For some starting tips, see:

http://www.cincomsmalltalk.com/userblogs/cincom/blogView?content=smalltalk_daily_libraries


scroll down through the topics, there's a screencast on binary files  
there

James Robertson
Cincom Smalltalk Product Evangelist
http://www.cincomsmalltalk.com/blog/blogView
Talk Small and Carry a Big Class Library




On Aug 20, 2008, at 2:02 AM, David Finlayson wrote:

> Newbie to Smalltalk and VW.
>
> I am interested in reading and writing some large, complicated binary
> files. Is there any documentation on how to get started with such a
> project in VW? In Squeak, I had to practically roll my own solution
> and the I/O performance is terrible (about 50x slower than C, 25x
> slower than Python). I wanted to give VW a try and see if the JIT
> helped performance.
>
> The binary format uses every C type in the book. So I am looking for
> big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned
> integers, IEEE 32-bit and 64-bit floating point types.
>
> Any pointers appreciated.
>
> Thanks,
>
> David
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Reading binary files

Andres Valloud-3
In reply to this post by David Finlayson-4
David,

I'd recommend trying to read a file into an instance of
UninterpretedBytes, and then using methods such as doubleAt: to read
objects from it.  Those methods are primitives, so assuming the format
is stable enough and the required coordination is not too onerous, this
approach may pay off.

Just a thought,
Andres.

PS: I see your address is from USGS.  Are you by any chance doing work
with earthquakes?  I live in California and I really can't help
wondering about the mathematic models behind e.g.: the shaking forecast...


David Finlayson wrote:

> Newbie to Smalltalk and VW.
>
> I am interested in reading and writing some large, complicated binary
> files. Is there any documentation on how to get started with such a
> project in VW? In Squeak, I had to practically roll my own solution
> and the I/O performance is terrible (about 50x slower than C, 25x
> slower than Python). I wanted to give VW a try and see if the JIT
> helped performance.
>
> The binary format uses every C type in the book. So I am looking for
> big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned
> integers, IEEE 32-bit and 64-bit floating point types.
>
> Any pointers appreciated.
>
> Thanks,
>
> David
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>  

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Spam:Spam:Re: Reading binary files

Travis Griggs-3
In reply to this post by Reinout Heeck-2

On Aug 20, 2008, at 1:14 AM, Reinout Heeck wrote:

> David Finlayson wrote:
>> Newbie to Smalltalk and VW.
>>
> Welcome!

Ditto.

>
>
>
>> I am interested in reading and writing some large, complicated binary
>> files. Is there any documentation on how to get started with such a
>> project in VW? In Squeak, I had to practically roll my own solution
>> and the I/O performance is terrible (about 50x slower than C, 25x
>> slower than Python). I wanted to give VW a try and see if the JIT
>> helped performance.
>>
>> The binary format uses every C type in the book. So I am looking for
>> big- and little-endian support; 8, 16, 24, 32 bit signed and unsigned
>> integers, IEEE 32-bit and 64-bit floating point types.
>>
> The situation is pretty much the same on VW: roll your own.
> This may look bad but since you seem to want max performance this is a
> good thing, it allows you to tweak the code for speed - since you  
> own it
> no third party stuff will break when you do so. VW has a decent  
> profiler
> in the Advanced Tools package, you will need it ;-)

The scaffolding for what you want to do is mostly there, you just need  
to add some methods. I've build binary reading abilities twice now at  
other companies.

First of all, you make sure your stream is in binary mode:

stream := 'somefilename.ext' asFilename readStream.
stream binary

Then you're going to add a number of new methods to the Stream class  
(put them in your own package). Here's an example:

Stream>>nextBESigned32
       
        | bytes |
        bytes := self next: 4.
        bytes changeClassTo: UninterpretedBytes.
        ^bytes unsignedLongAt: 1 bigEndian: true

What this method does, is reads N bytes (where N is the byte size of  
the entity you're trying to interpret from the stream). That method  
(since you're in binary mode) will return a ByteArray. But all of the  
"interpret these bytes as this kind of fp/integer are on a similar  
class called UninterpretedBytes. So we convert the byte array to that  
class type. And then invoke the appropriate API. You can browse  
UninterpretedBytes and see all of the reading methods there.

To avoid having to do the conversion when I've done these, I've copied  
the various floatAt:bigEndian: etc APIs over to ByteArray.

--
Travis Griggs
Objologist
"Every institution finally perishes by an excess of its own first  
principle." - Lord Acton



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] Reading binary files

David Finlayson-4
In reply to this post by Andres Valloud-3
That was what I was looking for. I haven't had time to read through
all the VW documentation yet, just trying to wrap my head around it
all.

>
> PS: I see your address is from USGS.  Are you by any chance doing work
> with earthquakes?  I live in California and I really can't help
> wondering about the mathematic models behind e.g.: the shaking forecast...
>

Not directly. I do sonar signal processing for our Coastal and Marine
Geology program. I recently worked on a project to locate the San
Andreas Fault beneath San Francisco's reservoir system (Crystal
Springs). We do a lot of custom programing to extract the data we need
from sonar and seismic instruments. I am just exploring the pros and
cons of a higher-level language than C for some of our upcoming
projects.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc