Dear Squeakers,
I want to count files with a certain extension in a folder recursively. Here is the code I use: | dir count runtime | count := 0. dir := FileDirectory on: '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'. runtime := Time millisecondsToRun: [ dir directoryTreeDo: [:each | (each last name endsWith: '.emlx') ifTrue: [count := count + 1]]]. {count. runtime}. #(289747 66109) As you can see it finds 289.747 files and it takes about 66 seconds. Is there any faster way to do this given the current VM primitives? The reason I ask is that the equivalent Python code takes between 1.5 and 6 seconds. :-/ #!/usr/local/bin/python3 import os import time path = '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox' print(path) start = time.time() emlx = 0 for dirpath, dirnames, filenames in os.walk(path): for filename in filenames: if filename.endswith('.emlx'): emlx += 1 runtime = time.time() - start print(emlx, runtime) It seems to have to do with an optimized os.scandir() function, described here: https://www.python.org/dev/peps/pep-0471/ Cheers, Bernhard |
It is probably far too bit-rotted to be of any use now, but here is what I
came up with 15 years ago to improve this: http://wiki.squeak.org/squeak/2274 I did briefly look at this again a couple of years ago, and put the updates on SqueakSource. But I think I found that the directory primitives are nowhere near as big a win now as they were 15 years ago. Nevertheless it may still be of some interest. Dave > Dear Squeakers, > > I want to count files with a certain extension in a folder recursively. > Here is the code I use: > > | dir count runtime | > count := 0. > dir := FileDirectory on: > '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'. > runtime := Time millisecondsToRun: [ > dir directoryTreeDo: [:each | > (each last name endsWith: '.emlx') ifTrue: [count := count + 1]]]. > {count. runtime}. #(289747 66109) > > As you can see it finds 289.747 files and it takes about 66 seconds. Is > there any faster way to do this given the current VM primitives? > > The reason I ask is that the equivalent Python code takes between 1.5 and > 6 seconds. :-/ > > #!/usr/local/bin/python3 > import os > import time > > path = > '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox' > > print(path) > > start = time.time() > emlx = 0 > for dirpath, dirnames, filenames in os.walk(path): > for filename in filenames: > if filename.endswith('.emlx'): > emlx += 1 > > runtime = time.time() - start > > print(emlx, runtime) > > It seems to have to do with an optimized os.scandir() function, described > here: https://www.python.org/dev/peps/pep-0471/ > > Cheers, > Bernhard > > > |
Hi Dave,
Thanks for the answer. I guess I would need to build the latest version of the plugin myself, right? (I am on macOS Sierra.) I could load DirectoryPlugin. However, VMConstruction-Plugins-DirectoryPlugin needs InterpreterPlugin available. Bernhard > Am 17.10.2016 um 19:56 schrieb David T. Lewis <[hidden email]>: > > It is probably far too bit-rotted to be of any use now, but here is what I > came up with 15 years ago to improve this: > > http://wiki.squeak.org/squeak/2274 > > I did briefly look at this again a couple of years ago, and put the > updates on SqueakSource. But I think I found that the directory primitives > are nowhere near as big a win now as they were 15 years ago. Nevertheless > it may still be of some interest. > > Dave > >> Dear Squeakers, >> >> I want to count files with a certain extension in a folder recursively. >> Here is the code I use: >> >> | dir count runtime | >> count := 0. >> dir := FileDirectory on: >> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'. >> runtime := Time millisecondsToRun: [ >> dir directoryTreeDo: [:each | >> (each last name endsWith: '.emlx') ifTrue: [count := count + 1]]]. >> {count. runtime}. #(289747 66109) >> >> As you can see it finds 289.747 files and it takes about 66 seconds. Is >> there any faster way to do this given the current VM primitives? >> >> The reason I ask is that the equivalent Python code takes between 1.5 and >> 6 seconds. :-/ >> >> #!/usr/local/bin/python3 >> import os >> import time >> >> path = >> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox' >> >> print(path) >> >> start = time.time() >> emlx = 0 >> for dirpath, dirnames, filenames in os.walk(path): >> for filename in filenames: >> if filename.endswith('.emlx'): >> emlx += 1 >> >> runtime = time.time() - start >> >> print(emlx, runtime) >> >> It seems to have to do with an optimized os.scandir() function, described >> here: https://www.python.org/dev/peps/pep-0471/ >> >> Cheers, >> Bernhard >> >> >> > > > |
Hi Bernhard,
InterpreterPlugin is part of the VMMaker package, so you would need to be working in an image with VMMaker loaded (maybe one of the prepared image from Eliot's site). I should have checked my own notes before replying - I cannot explain the reason for this, but it seems that the readdir() primitives no longer provided any performance benefit when I tested them a couple of years ago. Here is what I wrote in the summary on http://www.squeaksource.com/DirectoryPlugin: Performance characteristics have changed significantly since Squeak circa 2003. The readdir() primitives no longer provide any benefit, but the file testing primitives still yield a couple orders of magnitude improvement for some functions. So ... I guess that some additional profiling would be in order. Dave > Hi Dave, > > Thanks for the answer. I guess I would need to build the latest version of > the plugin myself, right? (I am on macOS Sierra.) > > I could load DirectoryPlugin. However, > VMConstruction-Plugins-DirectoryPlugin needs InterpreterPlugin available. > > Bernhard > >> Am 17.10.2016 um 19:56 schrieb David T. Lewis <[hidden email]>: >> >> It is probably far too bit-rotted to be of any use now, but here is what >> I >> came up with 15 years ago to improve this: >> >> http://wiki.squeak.org/squeak/2274 >> >> I did briefly look at this again a couple of years ago, and put the >> updates on SqueakSource. But I think I found that the directory >> primitives >> are nowhere near as big a win now as they were 15 years ago. >> Nevertheless >> it may still be of some interest. >> >> Dave >> >>> Dear Squeakers, >>> >>> I want to count files with a certain extension in a folder recursively. >>> Here is the code I use: >>> >>> | dir count runtime | >>> count := 0. >>> dir := FileDirectory on: >>> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'. >>> runtime := Time millisecondsToRun: [ >>> dir directoryTreeDo: [:each | >>> (each last name endsWith: '.emlx') ifTrue: [count := count + 1]]]. >>> {count. runtime}. #(289747 66109) >>> >>> As you can see it finds 289.747 files and it takes about 66 seconds. Is >>> there any faster way to do this given the current VM primitives? >>> >>> The reason I ask is that the equivalent Python code takes between 1.5 >>> and >>> 6 seconds. :-/ >>> >>> #!/usr/local/bin/python3 >>> import os >>> import time >>> >>> path = >>> '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox' >>> >>> print(path) >>> >>> start = time.time() >>> emlx = 0 >>> for dirpath, dirnames, filenames in os.walk(path): >>> for filename in filenames: >>> if filename.endswith('.emlx'): >>> emlx += 1 >>> >>> runtime = time.time() - start >>> >>> print(emlx, runtime) >>> >>> It seems to have to do with an optimized os.scandir() function, >>> described >>> here: https://www.python.org/dev/peps/pep-0471/ >>> >>> Cheers, >>> Bernhard >>> >>> >>> >> >> >> > > |
On Mon, Oct 17, 2016 at 1:17 PM, David T. Lewis <[hidden email]> wrote: Hi Bernhard, There aren't any. There is a script in the image subdirectory of http://www.github.com/opensmalltalk/vm which builds one; see image/buildspurtrunkvmmakerimage.sh I should have checked my own notes before replying - I cannot explain the _,,,^..^,,,_ best, Eliot |
In reply to this post by bpi
The whole image-side code starting from #directoryTreeDo: could use some
optimization, but that would only make it at most 1.5x faster. If I were you, I'd use OSProcess and execute this: find directory -name '*.exml' It's not that nice, but it shouldn't take more than a second to find the files. Levente On Mon, 17 Oct 2016, Bernhard Pieber wrote: > Dear Squeakers, > > I want to count files with a certain extension in a folder recursively. Here is the code I use: > > | dir count runtime | > count := 0. > dir := FileDirectory on: '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox'. > runtime := Time millisecondsToRun: [ > dir directoryTreeDo: [:each | > (each last name endsWith: '.emlx') ifTrue: [count := count + 1]]]. > {count. runtime}. #(289747 66109) > > As you can see it finds 289.747 files and it takes about 66 seconds. Is there any faster way to do this given the current VM primitives? > > The reason I ask is that the equivalent Python code takes between 1.5 and 6 seconds. :-/ > > #!/usr/local/bin/python3 > import os > import time > > path = '/Users/bernhard/Library/Mail/V4/D77E3582-7EBE-4B5A-BFE0-E30BF6AE995F/Smalltalk.mbox/Squeak.mbox' > > print(path) > > start = time.time() > emlx = 0 > for dirpath, dirnames, filenames in os.walk(path): > for filename in filenames: > if filename.endswith('.emlx'): > emlx += 1 > > runtime = time.time() - start > > print(emlx, runtime) > > It seems to have to do with an optimized os.scandir() function, described here: https://www.python.org/dev/peps/pep-0471/ > > Cheers, > Bernhard > > > > |
Free forum by Nabble | Edit this page |