Hi!
I'm trying to do some OCR from squeak using tesseract. I installed OSProcess. So far so good. But I don't know what classes to use for stdIn and stdOut. My code would look something like this: | proc stdIn stdOut d | stdIn := ??? stdOut := ??? proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' arguments: #('-' '-' '--dpi' '100') environment: nil descriptors: (Array with: stdIn with: stdOut with: nil). proc ifNil: [self class noAccessorAvailable]. d := Delay forMilliseconds: 50. [proc runState == #complete] whileFalse: [d wait]. " and now read the text from stdOut..." Can someone fill in the blanks or point me to code that does similar things? Thanks very much. Martin |
On Thu, May 14, 2020 at 09:37:26PM +0200, Martin Kuball wrote:
> Hi! > > I'm trying to do some OCR from squeak using tesseract. I installed OSProcess. > So far so good. But I don't know what classes to use for stdIn and stdOut. My > code would look something like this: > > | proc stdIn stdOut d | > stdIn := ??? > stdOut := ??? > proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' arguments: > #('-' '-' '--dpi' '100') environment: nil descriptors: (Array with: stdIn > with: stdOut with: nil). > proc ifNil: [self class noAccessorAvailable]. > d := Delay forMilliseconds: 50. > [proc runState == #complete] whileFalse: [d wait]. > " and now read the text from stdOut..." > > Can someone fill in the blanks or point me to code that does similar things? > Thanks very much. Hi Martin, First, please also install CommandShell in addition to OSProcess. Get the latest versions of both OSProcess and CommandShell, regardless of the version of Squeak you are using. If you are using SqueakMap to load them, then please select the versions labelled "(head)". Start out by trying something like this: OSProcess outputOf: 'tesseract - - --dpi 100' I'm not sure if this will do what you want but please give it a try, and if it does not work I'll try to give a better answer. This uses a couple of new methods that I added to OSPrecess recently, but have not mentioned until now. If it proves to be useful you, you will be the first :-) Assuming that it works, here is what will have happened: - The argument string is parsed into a unix-style command pipeline - The pipeline is all objects, with OS process proxies doing the work - When evaluated, and stderr result will show up in an error notifier in your image (proceed though the notifier) - Command stdout is collected and answered as the result of #outputOf: I would recommend running this in a debugger so you can step through it and see what is going on. Dave > > > |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 If one ignores the message when selecting "head" , then that's a risk no ? "The package you are about to install is not listed as being compatible with your image version (Squeak5.3), so the package may not work properly. Do you still want to proceed with the install Yes/No" If you downloaded via "Squeak Map Catalog" (Package Loader) the "head" - OSProcess - CommandShell Selecting "head" works but gives a warning that it is "potentially" not compatible with the VM (or Image?). In my case when I install (succesfully I think) I get: OSProcess versionString prints : '4.6.19' CommandShell versionString prints: '4.7.10' What I worry about is that I may be installing with "head", is more recent than the one supported by the image or VM. How can I check that they are compatible ? Thanks, David Stes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJevnWAAAoJEAwpOKXMq1MaR7YH/RwZBZJh4nEJhbr6v6rs0f9W pLaHSjc8VaZYgn49PygHLAxLtFt1iNTeUpaXYej6iu2fddN2fAdU9nyN/7T/9qx7 sf0f/IpXshjS5GHsU7KeupLsWkAVlr/NwaQNTmgOMX3O4MYfYuEFDtuoZX1GoZlO fF+AOatRnMioSWhoKuHx1yc9UJ8+lX15EQTCFq3iI4FFFG++JYA+xQj5woUG1DqY w3vwgA5U5kXFBa3rW/Dr9088F6GBxKggPW8E3XusAXGEOXD+4yiBGnV+JRqW2xWw ZVqGA7UVVFiF0gfJ+vR0eTAGx6O4sSnasizvS+4BEAGe8TO5yLlRTwJau71vyaE= =r0uf -----END PGP SIGNATURE----- -- Sent from: http://forum.world.st/Squeak-Dev-f45488.html |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256 I was asking myself the question on "install the 'head' version", because when I read that you can "always" install the 'head' version, I was thinking that for a class like OSProcess, it could matter ... If I'm not mistaken the VM has some "primitives" built in, those primitives should match the class that is installed, I guess. Most likely you would encounter errors like "primitive" not existing or similar, but nevertheless I wonder whether there is a way to see, whether the primitives that are provided by the VM (or image?), are "compatible" with the OSProcess and CommandShell that one installs. For example is there a way to see what version of the primitives for "UnixOSProcessPlugin" is loaded in the VM ? It's not very appealing to just "try" and experimentally see whether it works, as it will not be obvious whether any problem is due to a mismatch of a wrong version of a class installed, or due to some other issue. David Stes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJevqtZAAoJEAwpOKXMq1MachQIAKxs79hlUCaNTVBgTu0r+uik WJqriErOuLd0Fu1gP8fATSf9BybAn/p1dqq9VAi4qXRHeVin9akb6kmSJqKyaj9l X81V+l5tICFePEE60lzmE/oRyoYr4eXrnQD64/RyS9CfUgeLpIzXp/KC+p/PIIFM iHGNWuPKYbvrJoDx6exvn4dwIEE/dmgK2BcDoFIGl+I+bGvhhIOjfqUpgIekZ581 xFTyoQlP2QzIFAj36Mz6VzK9qUuTM6Rcqgi/IsOngEefT1V8kjQs9N847tASg2qB gPIxbzuAqQo6HwT1rA1j1QKGq9FcBbVNTI9/07Mk63qZmk1/vG1dAepOyrUcv0I= =vUmH -----END PGP SIGNATURE----- -- Sent from: http://forum.world.st/Squeak-Dev-f45488.html |
In reply to this post by Martin Kuball
On Fri, 15 May 2020 at 03:37, Martin Kuball <[hidden email]> wrote: Hi! Another option would be calling the library functions directly using FFI. I notice... cheers -ben |
In reply to this post by stes
On 15/05/20 8:17 pm, stes wrote:
> Most likely you would encounter errors like "primitive" not existing > or similar, but nevertheless I wonder whether there is a way to see, > whether the primitives that are provided by the VM (or image?), > are "compatible" with the OSProcess and CommandShell that one installs. > > For example is there a way to see what version of the primitives > for "UnixOSProcessPlugin" is loaded in the VM ? You can get the names of the loaded modules with: Smalltalk listLoadedModules (old) or SmalltalkImage current listLoadedModules You can also see these in "About Squeak" -> VM Modules section. AFAIK, the modules themselves, being in machine code, are not reified as objects that can be inspected in Squeak. They are private to VM. HTH .. Subbu |
In reply to this post by stes
Hi David,
I'm afraid you will just have to take my word for it :-) SqueakMap is nice because it makes the packages easily findable, but a long-standing annoyance is that I have no way of expressing the version compatibility. The warnings from SqueakMap look alarming, but I don't know how to fix that. Bottom line: For any version of Squeak that has been released in the last ten years or so, please use the latest version of OSProcess and CommmandShell. Dave On Fri, May 15, 2020 at 05:57:54AM -0500, stes wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > > If one ignores the message when selecting "head" , then that's a risk no ? > > "The package you are about to install is not listed as being compatible > with your image version (Squeak5.3), so the package may not work properly. > Do you still want to proceed with the install Yes/No" > > If you downloaded via "Squeak Map Catalog" (Package Loader) the "head" > > - OSProcess > - CommandShell > > Selecting "head" works but gives a warning that it is "potentially" > not compatible with the VM (or Image?). > > In my case when I install (succesfully I think) I get: > > OSProcess versionString prints : '4.6.19' > CommandShell versionString prints: '4.7.10' > > What I worry about is that I may be installing with "head", > is more recent than the one supported by the image or VM. > > How can I check that they are compatible ? > > Thanks, > David Stes > |
In reply to this post by David T. Lewis
Am Donnerstag, 14. Mai 2020, 23:27:59 CEST schrieb David T. Lewis:
> On Thu, May 14, 2020 at 09:37:26PM +0200, Martin Kuball wrote: > > Hi! > > > > I'm trying to do some OCR from squeak using tesseract. I installed > > OSProcess. So far so good. But I don't know what classes to use for stdIn > > and stdOut. My> > > code would look something like this: > > | proc stdIn stdOut d | > > > > stdIn := ??? > > stdOut := ??? > > proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' arguments: > > #('-' '-' '--dpi' '100') environment: nil descriptors: (Array with: stdIn > > with: stdOut with: nil). > > proc ifNil: [self class noAccessorAvailable]. > > d := Delay forMilliseconds: 50. > > [proc runState == #complete] whileFalse: [d wait]. > > " and now read the text from stdOut..." > > > > Can someone fill in the blanks or point me to code that does similar > > things? Thanks very much. > > Hi Martin, > > First, please also install CommandShell in addition to OSProcess. Get the > latest versions of both OSProcess and CommandShell, regardless of the > version of Squeak you are using. If you are using SqueakMap to load them, > then please select the versions labelled "(head)". > > Start out by trying something like this: > > OSProcess outputOf: 'tesseract - - --dpi 100' > > I'm not sure if this will do what you want but please give it a try, > and if it does not work I'll try to give a better answer. > > This uses a couple of new methods that I added to OSPrecess recently, > but have not mentioned until now. If it proves to be useful you, you > will be the first :-) > > Assuming that it works, here is what will have happened: > > - The argument string is parsed into a unix-style command pipeline > > - The pipeline is all objects, with OS process proxies doing the work > > - When evaluated, and stderr result will show up in an error notifier > in your image (proceed though the notifier) > > - Command stdout is collected and answered as the result of #outputOf: > > I would recommend running this in a debugger so you can step through > it and see what is going on. > > Dave Hi Dave, thanks for your answer. Acutally I did install CommandShell because I thougth it might help me understand the usage of OSProcess. And maybe it will if I give it more time. So here is what I did: I configure the following repository: MCHttpRepository location: 'http://www.squeaksource.com/OSProcess' user: '' password: '' and installed OSProcess-Base dtl.71, OSProcess-AIO dlt.9 and OSProcess-Unix dtl.35. But I do not see any mention of a head label. So where did I go wrong? The command you suggested worked. At least if I provide the image to tesseract as a file. I will go on with the debugger and try to find out how to feed the image data to stdIn. Martin |
In reply to this post by Ben Coman
Am Freitag, 15. Mai 2020, 18:25:58 CEST schrieb Ben Coman:
> On Fri, 15 May 2020 at 03:37, Martin Kuball <[hidden email]> wrote: > > Hi! > > > > I'm trying to do some OCR from squeak using tesseract. I installed > > OSProcess. > > Another option would be calling the library functions directly using FFI. > I notice... > https://github.com/ottopedi/Squeak_Tesseract > > cheers -ben The readme says it's on windows 10. So it will not work on linux out of the box, or will it? Martin |
In reply to this post by Martin Kuball
Hi Martin,
On Fri, May 15, 2020 at 09:20:17PM +0200, Martin Kuball wrote: > Am Donnerstag, 14. Mai 2020, 23:27:59 CEST schrieb David T. Lewis: > > On Thu, May 14, 2020 at 09:37:26PM +0200, Martin Kuball wrote: > > > Hi! > > > > > > I'm trying to do some OCR from squeak using tesseract. I installed > > > OSProcess. So far so good. But I don't know what classes to use for stdIn > > > and stdOut. My> > > > code would look something like this: > > > | proc stdIn stdOut d | > > > > > > stdIn := ??? > > > stdOut := ??? > > > proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' arguments: > > > #('-' '-' '--dpi' '100') environment: nil descriptors: (Array with: stdIn > > > with: stdOut with: nil). > > > proc ifNil: [self class noAccessorAvailable]. > > > d := Delay forMilliseconds: 50. > > > [proc runState == #complete] whileFalse: [d wait]. > > > " and now read the text from stdOut..." > > > > > > things? Thanks very much. > > > > Hi Martin, > > > > First, please also install CommandShell in addition to OSProcess. Get the > > latest versions of both OSProcess and CommandShell, regardless of the > > version of Squeak you are using. If you are using SqueakMap to load them, > > then please select the versions labelled "(head)". > > > > Start out by trying something like this: > > > > OSProcess outputOf: 'tesseract - - --dpi 100' > > > > I'm not sure if this will do what you want but please give it a try, > > and if it does not work I'll try to give a better answer. > > > > This uses a couple of new methods that I added to OSPrecess recently, > > but have not mentioned until now. If it proves to be useful you, you > > will be the first :-) > > > > Assuming that it works, here is what will have happened: > > > > - The argument string is parsed into a unix-style command pipeline > > > > - The pipeline is all objects, with OS process proxies doing the work > > > > - When evaluated, and stderr result will show up in an error notifier > > in your image (proceed though the notifier) > > > > - Command stdout is collected and answered as the result of #outputOf: > > > > I would recommend running this in a debugger so you can step through > > it and see what is going on. > > > > Dave > > Hi Dave, > > thanks for your answer. Acutally I did install CommandShell because I thougth > it might help me understand the usage of OSProcess. And maybe it will if I > give it more time. > > So here is what I did: I configure the following repository: > > MCHttpRepository > location: 'http://www.squeaksource.com/OSProcess' > user: '' > password: '' > > and installed OSProcess-Base dtl.71, OSProcess-AIO dlt.9 and OSProcess-Unix > dtl.35. But I do not see any mention of a head label. So where did I go wrong? > From the OSProcess repository, load OSProcess-dtl.118. From the CommandShell repository, load CommandShell-dtl.109. These are currently the most recent versions. Ignore the sub-packages such as "OSProcess-Unix", that is something I did to support Pharo (something of a fools errand if I may say so). All of the sub-packages are included in the full OSProcess and CommandShell packages. A shortcut to do this is: Installer ss project: 'OSProcess'; install: 'OSProcess'. Installer ss project: 'CommandShell'; install: 'CommandShell'. The "(head)" version labels in the SqueakMap package loader do the same thing, except for the alarming warning messages which you can safely ignore. > The command you suggested worked. At least if I provide the image to tesseract > as a file. I will go on with the debugger and try to find out how to feed the > image data to stdIn. The basic Unix shell redirector operators #> and #< should work. For example, try evaluating this: OSProcess outputOf: 'cat < /etc/services | edit' I am not familiar with tesseract, but you can probably use the same approach. Dave |
In reply to this post by David T. Lewis
Hi Martin, > I don't know what classes to use for stdIn and stdOut. Squeak provides access to stdin and stdout from the FileStream class, even without OSProcess loaded. > But I do not see any mention of a head label. So where did I go wrong? The head label is a naming convention used on some SqueakMap packages to build a developer's workstation to work on that package. It usually means "load the latest code" including all tests and tools packages. It's useful early on in a project, but when finally deploying your own app, you will definitely want to specify a fixed version, and _not_ the head version, otherwise your package could easily rot. > SqueakMap is nice because it makes the packages easily findable, > but a long-standing annoyance is that I have no way of expressing> the version compatibility. The above needs a clarification. SqueakMap is widely misunderstood to be an SCM tool. It's not, it's actually Squeak's App Store. It's 100% about letting Publishers define working software Releases (including specifying the "version compatibility" -- which version of Squeak they were tested on) that can then be consumed with "one click" by Users, while providing a good UX for both. I'm planning a revamp of it later this year. > The warnings from SqueakMap look alarming, This is the message Dave is referring to. The head version is the developers version. When developers intending to work on a package see this message, they already know about potential compatibility issues. The message is correct. If it feels alarming, it means you should select the one listed under the "safely-available" filter. Best,  Chris |
On 18/05/20 5:22 am, Chris Muller wrote:
> The head version is the developers version. When developers intending > to work on a package see this message, they already know about potential > compatibility issues. The message is correct. If it feels alarming, it > means you should select the one listed under the "safely-available" filter. When you revamp the code later, could you also reword the message to make this explicit? Perhaps something like, "The package you are about to install has not yet been tested for compatibility with your image version (Squeak 5.2). You may want to select from the packages listed under "safely-available" filter. Do you still want to proceed with the install?" Regards .. Subbu |
Good idea, absolutely. My plans for the revamp are initially for a new backend that provides a simple API that will eventually enable both a web AND new ToolBuilder interface.  - Chris On Mon, May 18, 2020 at 4:34 AM K K Subbu <[hidden email]> wrote: On 18/05/20 5:22 am, Chris Muller wrote: |
In reply to this post by David T. Lewis
Hi David,
I finally found a solution to my problem. Successfully converted 1300 images in a couple of minutes. The code I come up with is roughly the following (using the wc program here as a proof of concept before switching to tesseract and image data for the input): | input output stdErr proc accessProtect res err | input := ExternalPipe nonBlockingPipe. output := ExternalPipe blockingPipe. stdErr := ExternalPipe nonBlockingPipe. proc := ExternalOSProcess concreteClass programName: '/usr/bin/wc' arguments: #('-w') initialEnvironment: nil. proc initialStdIn: input reader. proc initialStdOut: output writer. proc initialStdErr: stdErr writer. accessProtect := Semaphore forMutualExclusion. accessProtect critical: [ proc value. input nextPutAll: 'this is a test'. input closeWriter. res := output upToEndOfFile. err := stdErr upToEndOfFile ]. proc waitForTermination. output close. stdErr close. proc exitStatus ~= 0 ifTrue: [^ 'error ' , proc exitStatus , ': ' , err]. ifFalse: [^res] Mybe it is possible to reuse more of your high-level code. But I wanted low overhead and complete control over the input and output streams. By the way uninstalling an older version of OSProcess and installing the new one left a OSProcess watcher with an ObsoleteUnixOSProcessAccessor running. And a final question: If you want to get a deeper understanding of process handling in squeak, what would you recomend to read? Is the blue book still a good reference? Thanks again for your help and the great code. Martin Am Samstag, 16. Mai 2020, 00:13:19 CEST schrieb David T. Lewis: > Hi Martin, > > On Fri, May 15, 2020 at 09:20:17PM +0200, Martin Kuball wrote: > > Am Donnerstag, 14. Mai 2020, 23:27:59 CEST schrieb David T. Lewis: > > > On Thu, May 14, 2020 at 09:37:26PM +0200, Martin Kuball wrote: > > > > Hi! > > > > > > > > I'm trying to do some OCR from squeak using tesseract. I installed > > > > OSProcess. So far so good. But I don't know what classes to use for > > > > stdIn > > > > and stdOut. My> > > > > > > > > code would look something like this: > > > > | proc stdIn stdOut d | > > > > > > > > stdIn := ??? > > > > stdOut := ??? > > > > proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' > > > > arguments: > > > > #('-' '-' '--dpi' '100') environment: nil descriptors: (Array with: > > > > stdIn > > > > with: stdOut with: nil). > > > > proc ifNil: [self class noAccessorAvailable]. > > > > d := Delay forMilliseconds: 50. > > > > [proc runState == #complete] whileFalse: [d wait]. > > > > " and now read the text from stdOut..." > > > > > > > > Can someone fill in the blanks or point me to code that does similar > > > > > > > > things? Thanks very much. > > > > > > Hi Martin, > > > > > > First, please also install CommandShell in addition to OSProcess. Get > > > the > > > latest versions of both OSProcess and CommandShell, regardless of the > > > version of Squeak you are using. If you are using SqueakMap to load > > > them, > > > then please select the versions labelled "(head)". > > > > > > Start out by trying something like this: > > > OSProcess outputOf: 'tesseract - - --dpi 100' > > > > > > I'm not sure if this will do what you want but please give it a try, > > > and if it does not work I'll try to give a better answer. > > > > > > This uses a couple of new methods that I added to OSPrecess recently, > > > but have not mentioned until now. If it proves to be useful you, you > > > will be the first :-) > > > > > > Assuming that it works, here is what will have happened: > > > > > > - The argument string is parsed into a unix-style command pipeline > > > > > > - The pipeline is all objects, with OS process proxies doing the work > > > > > > - When evaluated, and stderr result will show up in an error notifier > > > > > > in your image (proceed though the notifier) > > > > > > - Command stdout is collected and answered as the result of #outputOf: > > > > > > I would recommend running this in a debugger so you can step through > > > it and see what is going on. > > > > > > Dave > > > > Hi Dave, > > > > thanks for your answer. Acutally I did install CommandShell because I > > thougth it might help me understand the usage of OSProcess. And maybe it > > will if I give it more time. > > > > So here is what I did: I configure the following repository: > > > > MCHttpRepository > > > > location: 'http://www.squeaksource.com/OSProcess' > > user: '' > > password: '' > > > > and installed OSProcess-Base dtl.71, OSProcess-AIO dlt.9 and > > OSProcess-Unix > > dtl.35. But I do not see any mention of a head label. So where did I go > > wrong? > From the OSProcess repository, load OSProcess-dtl.118. From the CommandShell > repository, load CommandShell-dtl.109. These are currently the most recent > versions. Ignore the sub-packages such as "OSProcess-Unix", that is > something I did to support Pharo (something of a fools errand if I may say > so). All of the sub-packages are included in the full OSProcess and > CommandShell packages. > > A shortcut to do this is: > > Installer ss project: 'OSProcess'; install: 'OSProcess'. > Installer ss project: 'CommandShell'; install: 'CommandShell'. > > The "(head)" version labels in the SqueakMap package loader do the same > thing, except for the alarming warning messages which you can safely ignore. > > The command you suggested worked. At least if I provide the image to > > tesseract as a file. I will go on with the debugger and try to find out > > how to feed the image data to stdIn. > > The basic Unix shell redirector operators #> and #< should work. For > example, try evaluating this: > > OSProcess outputOf: 'cat < /etc/services | edit' > > I am not familiar with tesseract, but you can probably use the same > approach. > > Dave |
On 23/05/20 2:05 am, Martin Kuball wrote:
> Hi David, > > I finally found a solution to my problem. Successfully converted 1300 images in > a couple of minutes. The code I come up with is roughly the following (using > the wc program here as a proof of concept before switching to tesseract and > image data for the input): Very nice! You could also look at waitForCommand: which spawns an external process and waits for its completion. > Mybe it is possible to reuse more of your high-level code. But I wanted low > overhead and complete control over the input and output streams. Squeak is not just an application running on the host. It is a whole virtual machine. You have to think of the host as another node in a network. Using stdin/stdout makes Squeak process a co-routine with the host process and be prepared to handle SIGPIPE etc. You may not want such close coupling. Back in 2008-09, students (11-13yrs) wanted to use Etoys/Squeak to practice Math and languages with complex scripts (Hindi, Kannada) that only LaTeX supported. Simon Guest had written a LatexMorph for processing simple LaTeX code on host. I improved it to handle complex Indic scripts. Students would type LaTeX sentences in a text morph. LatexMorph would save this string in a file, run latex and dvipng to produce an image and read it back as an ImageMorph. It was a hack using just OSProcess waitForCommand: and tmpfs (to avoid disk i/o) but it fast enough for live LaTeX renders and even teachers took to it. We sure had a lot of fun with it. See https://squeaksource.com/LatexMorph (~ 500 lines of Squeak. The operative LatexUnix class is only around 50 lines). I can send you the changeset if you are interested. HTH .. Subbu |
In reply to this post by Martin Kuball
Hi Martin,
On Fri, May 22, 2020 at 10:35:08PM +0200, Martin Kuball wrote: > Hi David, > > I finally found a solution to my problem. Successfully converted 1300 images in > a couple of minutes. The code I come up with is roughly the following (using > the wc program here as a proof of concept before switching to tesseract and > image data for the input): > > | input output stdErr proc accessProtect res err | > input := ExternalPipe nonBlockingPipe. > output := ExternalPipe blockingPipe. > stdErr := ExternalPipe nonBlockingPipe. > proc := ExternalOSProcess concreteClass > programName: '/usr/bin/wc' > arguments: #('-w') > initialEnvironment: nil. > proc initialStdIn: input reader. > proc initialStdOut: output writer. > proc initialStdErr: stdErr writer. > accessProtect := Semaphore forMutualExclusion. > accessProtect critical: [ > proc value. > input nextPutAll: 'this is a test'. > input closeWriter. > res := output upToEndOfFile. > err := stdErr upToEndOfFile > ]. > proc waitForTermination. > output close. > stdErr close. > proc exitStatus ~= 0 > ifTrue: [^ 'error ' , proc exitStatus , ': ' , err]. > ifFalse: [^res] > I'm glad it worked out for you! > > Mybe it is possible to reuse more of your high-level code. But I wanted low > overhead and complete control over the input and output streams. Indeed, it can be interesting to work with this at a lower level so you can see exactly what is going on. > > By the way uninstalling an older version of OSProcess and installing the new > one left a OSProcess watcher with an ObsoleteUnixOSProcessAccessor running. This does not surprise me. That would just be the child process watcher that was left running after removing all the OSProcess classes, so I guess you would have to terminate the old process in that case. > > And a final question: If you want to get a deeper understanding of process > handling in squeak, what would you recomend to read? Is the blue book still a > good reference? I think we are talking about two different things here. In Smalltalk, a process is very lightweight, more like what you would call a thread in most operating systems today. The blue book descriptions are still very relevant. There have been some changes and improvements in Squeak over the years, but the basics would still be the same. The "processes" in OSProcess are a different thing entirely. These refer to the processes of the underlying operating system. For an operating system like Unix (Linux) or Windows, these processes are heavier-weight, and they carry quite a lot of execution context in addition to the basic schedulable unit of execution. If you were to make a comparison, a Process in Squeak is like a "green thread" in typical operating system lingo. But to your question - yes I would start with the blue book, and then others on this list (notably Eliot Miranda) can give explanations of the finer points and the more recent chnages. Dave > > Thanks again for your help and the great code. > > Martin > > > > Am Samstag, 16. Mai 2020, 00:13:19 CEST schrieb David T. Lewis: > > Hi Martin, > > > > On Fri, May 15, 2020 at 09:20:17PM +0200, Martin Kuball wrote: > > > Am Donnerstag, 14. Mai 2020, 23:27:59 CEST schrieb David T. Lewis: > > > > On Thu, May 14, 2020 at 09:37:26PM +0200, Martin Kuball wrote: > > > > > Hi! > > > > > > > > > > I'm trying to do some OCR from squeak using tesseract. I installed > > > > > OSProcess. So far so good. But I don't know what classes to use for > > > > > stdIn > > > > > and stdOut. My> > > > > > > > > > > code would look something like this: > > > > > | proc stdIn stdOut d | > > > > > > > > > > stdIn := ??? > > > > > stdOut := ??? > > > > > proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' > > > > > arguments: > > > > > #('-' '-' '--dpi' '100') environment: nil descriptors: (Array with: > > > > > stdIn > > > > > with: stdOut with: nil). > > > > > proc ifNil: [self class noAccessorAvailable]. > > > > > d := Delay forMilliseconds: 50. > > > > > [proc runState == #complete] whileFalse: [d wait]. > > > > > " and now read the text from stdOut..." > > > > > > > > > > Can someone fill in the blanks or point me to code that does similar > > > > > > > > > > things? Thanks very much. > > > > > > > > Hi Martin, > > > > > > > > First, please also install CommandShell in addition to OSProcess. Get > > > > the > > > > latest versions of both OSProcess and CommandShell, regardless of the > > > > version of Squeak you are using. If you are using SqueakMap to load > > > > them, > > > > then please select the versions labelled "(head)". > > > > > > > > Start out by trying something like this: > > > > OSProcess outputOf: 'tesseract - - --dpi 100' > > > > > > > > I'm not sure if this will do what you want but please give it a try, > > > > and if it does not work I'll try to give a better answer. > > > > > > > > This uses a couple of new methods that I added to OSPrecess recently, > > > > but have not mentioned until now. If it proves to be useful you, you > > > > will be the first :-) > > > > > > > > Assuming that it works, here is what will have happened: > > > > > > > > - The argument string is parsed into a unix-style command pipeline > > > > > > > > - The pipeline is all objects, with OS process proxies doing the work > > > > > > > > - When evaluated, and stderr result will show up in an error notifier > > > > > > > > in your image (proceed though the notifier) > > > > > > > > - Command stdout is collected and answered as the result of #outputOf: > > > > > > > > I would recommend running this in a debugger so you can step through > > > > it and see what is going on. > > > > > > > > Dave > > > > > > Hi Dave, > > > > > > thanks for your answer. Acutally I did install CommandShell because I > > > thougth it might help me understand the usage of OSProcess. And maybe it > > > will if I give it more time. > > > > > > So here is what I did: I configure the following repository: > > > > > > MCHttpRepository > > > > > > location: 'http://www.squeaksource.com/OSProcess' > > > user: '' > > > password: '' > > > > > > and installed OSProcess-Base dtl.71, OSProcess-AIO dlt.9 and > > > OSProcess-Unix > > > dtl.35. But I do not see any mention of a head label. So where did I go > > > wrong? > > From the OSProcess repository, load OSProcess-dtl.118. From the CommandShell > > repository, load CommandShell-dtl.109. These are currently the most recent > > versions. Ignore the sub-packages such as "OSProcess-Unix", that is > > something I did to support Pharo (something of a fools errand if I may say > > so). All of the sub-packages are included in the full OSProcess and > > CommandShell packages. > > > > A shortcut to do this is: > > > > Installer ss project: 'OSProcess'; install: 'OSProcess'. > > Installer ss project: 'CommandShell'; install: 'CommandShell'. > > > > The "(head)" version labels in the SqueakMap package loader do the same > > thing, except for the alarming warning messages which you can safely ignore. > > > The command you suggested worked. At least if I provide the image to > > > tesseract as a file. I will go on with the debugger and try to find out > > > how to feed the image data to stdIn. > > > > The basic Unix shell redirector operators #> and #< should work. For > > example, try evaluating this: > > > > OSProcess outputOf: 'cat < /etc/services | edit' > > > > I am not familiar with tesseract, but you can probably use the same > > approach. > > > > Dave > > > > > |
Administrator
|
In reply to this post by Martin Kuball
Sorry, I just saw this thread. For the future, I did a small wrapper like
this that might help [1]. It looks like I didn't convert the repo to Tonel yet and IIRC there are few dependencies other than OSP, so it may load in Squeak (or could be used for inspiration). I would also be happy to accept PRs to make it so. I probably will convert to Tonel at some point - although it looks like Tonel support in Squeak may be imminent, so hopefully no problem there :) Martin Kuball wrote > Mybe it is possible to reuse more of your high-level code. This is what I ended up with [2]: | p result | p := PipeableOSProcess waitForCommand: commandString. p succeeded ifFalse: [ ^ self error: 'tesseract failed with: ', p errorUpToEnd ]. result := self tempFile readStreamDo: [ :str | str contents ]. self tempFile delete. ^ result. If I had to do it again, I'd probably try via FFI. I've had a longstanding belief that wrapping command line stuff is an easy way to "get it to work" and then I can later "get it right" with FFI, but after doing a lot of this sort of thing, there are so many quirks that I think it might usually be easier to just start with FFI (although maybe the grass is always greener...). [1]. https://github.com/seandenigris/Tesseract-St [2]. https://github.com/seandenigris/Tesseract-St/blob/master/src/Tesseract.package/Tesseract.class/instance/evaluate..st ----- Cheers, Sean -- Sent from: http://forum.world.st/Squeak-Dev-f45488.html
Cheers,
Sean |
In reply to this post by David T. Lewis
Hi David,
Am Dienstag, 26. Mai 2020, 01:28:43 CEST schrieb David T. Lewis: > Hi Martin, > > On Fri, May 22, 2020 at 10:35:08PM +0200, Martin Kuball wrote: > > Hi David, > > > > I finally found a solution to my problem. Successfully converted 1300 > > images in a couple of minutes. The code I come up with is roughly the > > following (using the wc program here as a proof of concept before > > switching to tesseract and> > > image data for the input): > > | input output stdErr proc accessProtect res err | > > > > input := ExternalPipe nonBlockingPipe. > > output := ExternalPipe blockingPipe. > > stdErr := ExternalPipe nonBlockingPipe. > > proc := ExternalOSProcess concreteClass > > > > programName: '/usr/bin/wc' > > arguments: #('-w') > > initialEnvironment: nil. > > > > proc initialStdIn: input reader. > > proc initialStdOut: output writer. > > proc initialStdErr: stdErr writer. > > accessProtect := Semaphore forMutualExclusion. > > accessProtect critical: [ > > > > proc value. > > input nextPutAll: 'this is a test'. > > input closeWriter. > > res := output upToEndOfFile. > > err := stdErr upToEndOfFile > > > > ]. > > proc waitForTermination. > > output close. > > stdErr close. > > proc exitStatus ~= 0 > > > > ifTrue: [^ 'error ' , proc exitStatus , ': ' , err]. > > ifFalse: [^res] > > I'm glad it worked out for you! > > > Mybe it is possible to reuse more of your high-level code. But I wanted > > low > > overhead and complete control over the input and output streams. > > Indeed, it can be interesting to work with this at a lower level so you > can see exactly what is going on. > > > By the way uninstalling an older version of OSProcess and installing the > > new one left a OSProcess watcher with an ObsoleteUnixOSProcessAccessor > > running. > This does not surprise me. That would just be the child process watcher that > was left running after removing all the OSProcess classes, so I guess you > would have to terminate the old process in that case. That's exactly what I did ;). > > > And a final question: If you want to get a deeper understanding of process > > handling in squeak, what would you recomend to read? Is the blue book > > still a good reference? > > I think we are talking about two different things here. In Smalltalk, a > process is very lightweight, more like what you would call a thread in most > operating systems today. The blue book descriptions are still very relevant. > There have been some changes and improvements in Squeak over the years, but > the basics would still be the same. Sorry, my question really was a little bit ambigous. And it was about the Smalltalk "internal" process model. Like class Semaphore. I still don't understand why you use critical: method in many places. Or e.g. I added a Delay for debugging purposes after starting the external process. But that did not work. The process exited imediately with an error that it could not read from stdIn. > > The "processes" in OSProcess are a different thing entirely. These refer to > the processes of the underlying operating system. For an operating system > like Unix (Linux) or Windows, these processes are heavier-weight, and they > carry quite a lot of execution context in addition to the basic schedulable > unit of execution. > > If you were to make a comparison, a Process in Squeak is like a "green > thread" in typical operating system lingo. > > But to your question - yes I would start with the blue book, and then others > on this list (notably Eliot Miranda) can give explanations of the finer > points and the more recent chnages. > > Dave > > > Thanks again for your help and the great code. > > > > Martin > > > > Am Samstag, 16. Mai 2020, 00:13:19 CEST schrieb David T. Lewis: > > > Hi Martin, > > > > > > On Fri, May 15, 2020 at 09:20:17PM +0200, Martin Kuball wrote: > > > > Am Donnerstag, 14. Mai 2020, 23:27:59 CEST schrieb David T. Lewis: > > > > > On Thu, May 14, 2020 at 09:37:26PM +0200, Martin Kuball wrote: > > > > > > Hi! > > > > > > > > > > > > I'm trying to do some OCR from squeak using tesseract. I installed > > > > > > OSProcess. So far so good. But I don't know what classes to use > > > > > > for > > > > > > stdIn > > > > > > and stdOut. My> > > > > > > > > > > > > code would look something like this: > > > > > > | proc stdIn stdOut d | > > > > > > > > > > > > stdIn := ??? > > > > > > stdOut := ??? > > > > > > proc := ExternalUnixOSProcess forkAndExec: '/usr/bin/tesseract' > > > > > > arguments: > > > > > > #('-' '-' '--dpi' '100') environment: nil descriptors: (Array > > > > > > with: > > > > > > stdIn > > > > > > with: stdOut with: nil). > > > > > > proc ifNil: [self class noAccessorAvailable]. > > > > > > d := Delay forMilliseconds: 50. > > > > > > [proc runState == #complete] whileFalse: [d wait]. > > > > > > " and now read the text from stdOut..." > > > > > > > > > > > > Can someone fill in the blanks or point me to code that does > > > > > > similar > > > > > > > > > > > > things? Thanks very much. > > > > > > > > > > Hi Martin, > > > > > > > > > > First, please also install CommandShell in addition to OSProcess. > > > > > Get > > > > > the > > > > > latest versions of both OSProcess and CommandShell, regardless of > > > > > the > > > > > version of Squeak you are using. If you are using SqueakMap to load > > > > > them, > > > > > then please select the versions labelled "(head)". > > > > > > > > > > Start out by trying something like this: > > > > > OSProcess outputOf: 'tesseract - - --dpi 100' > > > > > > > > > > I'm not sure if this will do what you want but please give it a try, > > > > > and if it does not work I'll try to give a better answer. > > > > > > > > > > This uses a couple of new methods that I added to OSPrecess > > > > > recently, > > > > > but have not mentioned until now. If it proves to be useful you, you > > > > > will be the first :-) > > > > > > > > > > Assuming that it works, here is what will have happened: > > > > > > > > > > - The argument string is parsed into a unix-style command pipeline > > > > > > > > > > - The pipeline is all objects, with OS process proxies doing the > > > > > work > > > > > > > > > > - When evaluated, and stderr result will show up in an error > > > > > notifier > > > > > > > > > > in your image (proceed though the notifier) > > > > > > > > > > - Command stdout is collected and answered as the result of > > > > > #outputOf: > > > > > > > > > > I would recommend running this in a debugger so you can step through > > > > > it and see what is going on. > > > > > > > > > > Dave > > > > > > > > Hi Dave, > > > > > > > > thanks for your answer. Acutally I did install CommandShell because I > > > > thougth it might help me understand the usage of OSProcess. And maybe > > > > it > > > > will if I give it more time. > > > > > > > > So here is what I did: I configure the following repository: > > > > > > > > MCHttpRepository > > > > > > > > location: 'http://www.squeaksource.com/OSProcess' > > > > user: '' > > > > password: '' > > > > > > > > and installed OSProcess-Base dtl.71, OSProcess-AIO dlt.9 and > > > > OSProcess-Unix > > > > dtl.35. But I do not see any mention of a head label. So where did I > > > > go > > > > wrong? > > > > > > From the OSProcess repository, load OSProcess-dtl.118. From the > > > CommandShell repository, load CommandShell-dtl.109. These are currently > > > the most recent versions. Ignore the sub-packages such as > > > "OSProcess-Unix", that is something I did to support Pharo (something > > > of a fools errand if I may say so). All of the sub-packages are > > > included in the full OSProcess and CommandShell packages. > > > > > > A shortcut to do this is: > > > Installer ss project: 'OSProcess'; install: 'OSProcess'. > > > Installer ss project: 'CommandShell'; install: 'CommandShell'. > > > > > > The "(head)" version labels in the SqueakMap package loader do the same > > > thing, except for the alarming warning messages which you can safely > > > ignore.> > > > > > The command you suggested worked. At least if I provide the image to > > > > tesseract as a file. I will go on with the debugger and try to find > > > > out > > > > how to feed the image data to stdIn. > > > > > > The basic Unix shell redirector operators #> and #< should work. For > > > > > > example, try evaluating this: > > > OSProcess outputOf: 'cat < /etc/services | edit' > > > > > > I am not familiar with tesseract, but you can probably use the same > > > approach. > > > > > > Dave |
In reply to this post by Sean P. DeNigris
Hi Sean,
your solution with temp files is definitely simpler and more maintainable. But for sheer ambition I wanted to do it without. But at the moment I have not enough ambition to do it with FFI. Getting the data structures right is not that easy. I remember having a hard time writing a connector for the xvid library more than 10 years ago. But I had a lot more spare time back than. Martin Am Dienstag, 26. Mai 2020, 15:44:53 CEST schrieb Sean P. DeNigris: > Sorry, I just saw this thread. For the future, I did a small wrapper like > this that might help [1]. It looks like I didn't convert the repo to Tonel > yet and IIRC there are few dependencies other than OSP, so it may load in > Squeak (or could be used for inspiration). I would also be happy to accept > PRs to make it so. I probably will convert to Tonel at some point - although > it looks like Tonel support in Squeak may be imminent, so hopefully no > problem there :) > > > Martin Kuball wrote > > > Mybe it is possible to reuse more of your high-level code. > > This is what I ended up with [2]: > | p result | > > p := PipeableOSProcess waitForCommand: commandString. > p succeeded ifFalse: [ ^ self error: 'tesseract failed with: ', p > errorUpToEnd ]. > result := self tempFile readStreamDo: [ :str | str contents ]. > self tempFile delete. > ^ result. > > If I had to do it again, I'd probably try via FFI. I've had a longstanding > belief that wrapping command line stuff is an easy way to "get it to work" > and then I can later "get it right" with FFI, but after doing a lot of this > sort of thing, there are so many quirks that I think it might usually be > easier to just start with FFI (although maybe the grass is always > greener...). > > [1]. https://github.com/seandenigris/Tesseract-St > [2]. > https://github.com/seandenigris/Tesseract-St/blob/master/src/Tesseract.packa > ge/Tesseract.class/instance/evaluate..st > > > > ----- > Cheers, > Sean > -- > Sent from: http://forum.world.st/Squeak-Dev-f45488.html |
In reply to this post by Sean P. DeNigris
Am Dienstag, 26. Mai 2020, 15:44:53 CEST schrieb Sean P. DeNigris:
> https://github.com/seandenigris/Tesseract-St By the way, how do you open this in monticello? Do I have to manually clone it first? Thanks. |
Free forum by Nabble | Edit this page |