Smalltalk › Pharo › Pharo Smalltalk Developers

about bufferedFileStream

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

Stéphane Ducasse

about bufferedFileStream

Hi levente

I was wondering why the bufferedFileStream made the system faster.
I mean are querying the same contents on several occasion?

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Igor Stasenko

Re: [Pharo-project] about bufferedFileStream

2010/1/21 Stéphane Ducasse <[hidden email]>:
> Hi levente
>
> I was wondering why the bufferedFileStream made the system faster.
> I mean are querying the same contents on several occasion?
>
Because disk access is times slower than any "intense" memory access.

> Stef
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Nicolas Cellier

Re: [Pharo-project] about bufferedFileStream

In reply to this post by Stéphane Ducasse

2010/1/21 Stéphane Ducasse <[hidden email]>:
> Hi levente
>
> I was wondering why the bufferedFileStream made the system faster.
> I mean are querying the same contents on several occasion?
>
> Stef
>

My interpretation is this one:

Overhead time spent > core time required.
Whether you read/write 1 byte or 2000 bytes, the cost of the primitive
call + underlying system call (fread/fwrite) is almost the same
(overhead time is dominating).
If you issue 2000 primitive calls for reading bytes 1 by 1, or a
single for reading all at once, the main difference is 1999*overhead.
The buffer use a much faster primitive ByteString/ByteArray
#at:/at:put:, so it does not count.

Nicolas

> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Nicolas Cellier

Re: [Pharo-project] about bufferedFileStream

In reply to this post by Igor Stasenko

2010/1/21 Igor Stasenko <[hidden email]>:
> 2010/1/21 Stéphane Ducasse <[hidden email]>:
>> Hi levente
>>
>> I was wondering why the bufferedFileStream made the system faster.
>> I mean are querying the same contents on several occasion?
>>
> Because disk access is times slower than any "intense" memory access.
>

Depends how the file access is parameterized... Maybe the OS is
buffering behind our back anyway.
There is also a cost associated to the primitive (converting file ID
into a C FILE*, consistency checks, etc...), and maybe one associated
to fread() itself (would be worth testing that with a simple C
program).

Nicolas

>> Stef
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Schwab,Wilhelm K

Re: [Pharo-project] about bufferedFileStream

In reply to this post by Stéphane Ducasse

Stef,

Probably unrelated, some time back I noted that "long" methods pretty clearly draw twice, once in black and again with Shout's coloring.

Bill

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Stéphane Ducasse
Sent: Thursday, January 21, 2010 12:16 PM
To: [hidden email] Development
Subject: [Pharo-project] about bufferedFileStream

Hi levente

I was wondering why the bufferedFileStream made the system faster.
I mean are querying the same contents on several occasion?

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Levente Uzonyi-2

Re: [Pharo-project] about bufferedFileStream

In reply to this post by Stéphane Ducasse

On Thu, 21 Jan 2010, Stéphane Ducasse wrote:

> Hi levente
>
> I was wondering why the bufferedFileStream made the system faster.
> I mean are querying the same contents on several occasion?

As Nicolas said, not all primitives have the same cost. The file
primitives cost a lot more than primitive 65 or some smalltalk code.
And as Igor said accessing the disk costs a lot more than accessing
memory, but that only counts in rare cases.

Here's an example showing that the file primitive costs almost the same if
it reads 1 or 2000 bytes:

#(1 2000) collect: [ :bufferSize |
(1 to: 5) collect: [ :run |
StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file |
| buffer fileID |
fileID := file instVarNamed: #fileID.
buffer := String new: bufferSize.
Smalltalk garbageCollect.
[ 1 to: 100000 do: [ :each |
file
primSetPosition: fileID to: 0;
primRead: fileID into: buffer startingAt: 1 count: bufferSize ] ] timeToRun ] ] ]

===> #(
#(458 466 463 466 467)
#(482 484 492 483 484))

This means that a primitive send takes ~2.5 microseconds so it can be used
~400000 times/second.
If you read one byte/primitive that means 400KB/sec, if you read 2000
bytes/primitive you get 800MB/sec (your disk may be slower of course).

Using the buffer you get the following values:
(1 to: 10) collect: [ :run |
StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file |
Smalltalk garbageCollect.
[ 1 to: 100000 do: [ :each |
file next ] ] timeToRun ] ]
===> #(15 14 12 11 12 12 11 12 12 12)

It takes ~12 milliseconds to read 100KB if you read the bytes
one-by-one. That means 8.3MB/sec. This is far from 800MB/sec, but much
better than the 400KB/sec.

Why does it matter? Because MultiByteFileStream (which is the default
FileStream) can only read (and convert) single bytes.

Levente

>
> Stef
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Stéphane Ducasse

Re: [Pharo-project] about bufferedFileStream

Thanks.
I was wondering if in addition to chunk versus single element reading, the logic of the tools was not fooled
and for example accessing two times the source code (may be for identifying selector and the other for method body
or something like that).

Stef

On Jan 21, 2010, at 8:19 PM, Levente Uzonyi wrote:

> On Thu, 21 Jan 2010, Stéphane Ducasse wrote:
>
>> Hi levente
>>
>> I was wondering why the bufferedFileStream made the system faster.
>> I mean are querying the same contents on several occasion?
>
> As Nicolas said, not all primitives have the same cost. The file primitives cost a lot more than primitive 65 or some smalltalk code.
> And as Igor said accessing the disk costs a lot more than accessing memory, but that only counts in rare cases.

Yes I imagine that.
Marcus in the past did an experience to get all the code in memory and it was fast to do some source code search :)

> Here's an example showing that the file primitive costs almost the same if it reads 1 or 2000 bytes:
>
> #(1 2000) collect: [ :bufferSize |
> (1 to: 5) collect: [ :run |
> StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file |
> | buffer fileID |
> fileID := file instVarNamed: #fileID.
> buffer := String new: bufferSize.
> Smalltalk garbageCollect.
> [ 1 to: 100000 do: [ :each |
> file
> primSetPosition: fileID to: 0;
> primRead: fileID into: buffer startingAt: 1 count: bufferSize ] ] timeToRun ] ] ]
>
> ===> #(
> #(458 466 463 466 467)
> #(482 484 492 483 484))
>
> This means that a primitive send takes ~2.5 microseconds so it can be used ~400000 times/second.
> If you read one byte/primitive that means 400KB/sec, if you read 2000 bytes/primitive you get 800MB/sec (your disk may be slower of course).
>
> Using the buffer you get the following values:
> (1 to: 10) collect: [ :run |
> StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file |
> Smalltalk garbageCollect.
> [ 1 to: 100000 do: [ :each |
> file next ] ] timeToRun ] ]
> ===> #(15 14 12 11 12 12 11 12 12 12)
>
> It takes ~12 milliseconds to read 100KB if you read the bytes one-by-one. That means 8.3MB/sec. This is far from 800MB/sec, but much better than the 400KB/sec.
>
>
> Why does it matter? Because MultiByteFileStream (which is the default FileStream) can only read (and convert) single bytes.

Ok so this means that reading files could be really speed up by supporting a kind of "larger chunk" reading.

>
>
> Levente
>
>
>>
>> Stef
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project