Hi levente
I was wondering why the bufferedFileStream made the system faster. I mean are querying the same contents on several occasion? Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
2010/1/21 Stéphane Ducasse <[hidden email]>:
> Hi levente > > I was wondering why the bufferedFileStream made the system faster. > I mean are querying the same contents on several occasion? > Because disk access is times slower than any "intense" memory access. > Stef > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
2010/1/21 Stéphane Ducasse <[hidden email]>:
> Hi levente > > I was wondering why the bufferedFileStream made the system faster. > I mean are querying the same contents on several occasion? > > Stef > My interpretation is this one: Overhead time spent > core time required. Whether you read/write 1 byte or 2000 bytes, the cost of the primitive call + underlying system call (fread/fwrite) is almost the same (overhead time is dominating). If you issue 2000 primitive calls for reading bytes 1 by 1, or a single for reading all at once, the main difference is 1999*overhead. The buffer use a much faster primitive ByteString/ByteArray #at:/at:put:, so it does not count. Nicolas > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Igor Stasenko
2010/1/21 Igor Stasenko <[hidden email]>:
> 2010/1/21 Stéphane Ducasse <[hidden email]>: >> Hi levente >> >> I was wondering why the bufferedFileStream made the system faster. >> I mean are querying the same contents on several occasion? >> > Because disk access is times slower than any "intense" memory access. > Depends how the file access is parameterized... Maybe the OS is buffering behind our back anyway. There is also a cost associated to the primitive (converting file ID into a C FILE*, consistency checks, etc...), and maybe one associated to fread() itself (would be worth testing that with a simple C program). Nicolas >> Stef >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
Stef,
Probably unrelated, some time back I noted that "long" methods pretty clearly draw twice, once in black and again with Shout's coloring. Bill -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Stéphane Ducasse Sent: Thursday, January 21, 2010 12:16 PM To: [hidden email] Development Subject: [Pharo-project] about bufferedFileStream Hi levente I was wondering why the bufferedFileStream made the system faster. I mean are querying the same contents on several occasion? Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
On Thu, 21 Jan 2010, Stéphane Ducasse wrote:
> Hi levente > > I was wondering why the bufferedFileStream made the system faster. > I mean are querying the same contents on several occasion? As Nicolas said, not all primitives have the same cost. The file primitives cost a lot more than primitive 65 or some smalltalk code. And as Igor said accessing the disk costs a lot more than accessing memory, but that only counts in rare cases. Here's an example showing that the file primitive costs almost the same if it reads 1 or 2000 bytes: #(1 2000) collect: [ :bufferSize | (1 to: 5) collect: [ :run | StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file | | buffer fileID | fileID := file instVarNamed: #fileID. buffer := String new: bufferSize. Smalltalk garbageCollect. [ 1 to: 100000 do: [ :each | file primSetPosition: fileID to: 0; primRead: fileID into: buffer startingAt: 1 count: bufferSize ] ] timeToRun ] ] ] ===> #( #(458 466 463 466 467) #(482 484 492 483 484)) This means that a primitive send takes ~2.5 microseconds so it can be used ~400000 times/second. If you read one byte/primitive that means 400KB/sec, if you read 2000 bytes/primitive you get 800MB/sec (your disk may be slower of course). Using the buffer you get the following values: (1 to: 10) collect: [ :run | StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file | Smalltalk garbageCollect. [ 1 to: 100000 do: [ :each | file next ] ] timeToRun ] ] ===> #(15 14 12 11 12 12 11 12 12 12) It takes ~12 milliseconds to read 100KB if you read the bytes one-by-one. That means 8.3MB/sec. This is far from 800MB/sec, but much better than the 400KB/sec. Why does it matter? Because MultiByteFileStream (which is the default FileStream) can only read (and convert) single bytes. Levente > > Stef > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Thanks.
I was wondering if in addition to chunk versus single element reading, the logic of the tools was not fooled and for example accessing two times the source code (may be for identifying selector and the other for method body or something like that). Stef On Jan 21, 2010, at 8:19 PM, Levente Uzonyi wrote: > On Thu, 21 Jan 2010, Stéphane Ducasse wrote: > >> Hi levente >> >> I was wondering why the bufferedFileStream made the system faster. >> I mean are querying the same contents on several occasion? > > As Nicolas said, not all primitives have the same cost. The file primitives cost a lot more than primitive 65 or some smalltalk code. > And as Igor said accessing the disk costs a lot more than accessing memory, but that only counts in rare cases. Yes I imagine that. Marcus in the past did an experience to get all the code in memory and it was fast to do some source code search :) > Here's an example showing that the file primitive costs almost the same if it reads 1 or 2000 bytes: > > #(1 2000) collect: [ :bufferSize | > (1 to: 5) collect: [ :run | > StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file | > | buffer fileID | > fileID := file instVarNamed: #fileID. > buffer := String new: bufferSize. > Smalltalk garbageCollect. > [ 1 to: 100000 do: [ :each | > file > primSetPosition: fileID to: 0; > primRead: fileID into: buffer startingAt: 1 count: bufferSize ] ] timeToRun ] ] ] > > ===> #( > #(458 466 463 466 467) > #(482 484 492 483 484)) > > This means that a primitive send takes ~2.5 microseconds so it can be used ~400000 times/second. > If you read one byte/primitive that means 400KB/sec, if you read 2000 bytes/primitive you get 800MB/sec (your disk may be slower of course). > > Using the buffer you get the following values: > (1 to: 10) collect: [ :run | > StandardFileStream readOnlyFileNamed: SourceFiles second name do: [ :file | > Smalltalk garbageCollect. > [ 1 to: 100000 do: [ :each | > file next ] ] timeToRun ] ] > ===> #(15 14 12 11 12 12 11 12 12 12) > > It takes ~12 milliseconds to read 100KB if you read the bytes one-by-one. That means 8.3MB/sec. This is far from 800MB/sec, but much better than the 400KB/sec. > > > Why does it matter? Because MultiByteFileStream (which is the default FileStream) can only read (and convert) single bytes. Ok so this means that reading files could be really speed up by supporting a kind of "larger chunk" reading. > > > Levente > > >> >> Stef >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |