Execute Tet pdflib

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Execute Tet pdflib

pauguillot
Hello,

1-  I need help to extract a selected text from a pdf document.
Tet pdflib can do that and  i need help to execute command line using
Pharo.
I am able to do the work using a bat file but i think we can do better.

2- How we can code properly the delay (when we wait the Bat file)

Thank you for your help.


https://www.pdflib.com/products/tet/

|bat pdf stream pdfAsText contents |
bat := 'C:\some\code.bat'.
pdf := 'C:\some\document.pdf'.
stream := bat asFileReference writeStream.

"Console Windows code"
stream nextPutAll:  'CD "C:\Program Files\PDFlib\TET 5.0 64-bit\bin"';
        nextPut: Character linefeed;
        nextPutAll: 'TET --samedir --lastpage last-1 --pageopt "includebox={
{98.38 693.32 253.41 709.91} {522.70 183.66 595 226.30} {416.97 574.79
479 598.03} {401.33 773.91 453 789.55} {294.18 575.74 369.56 598.5}
{295.6 492.3 373.35 524.43} {150.53 414.55 219.28 438.25} {47.17 494.67
98.85 520.75} {112.26 684.01 193.66 698.03} {533.61 212.57 595 620}
{100.75 657.29 217.85 673.41}}"';
        nextPut: Character space;
        nextPutAll: pdf;
        nextPut: Character linefeed;
        nextPutAll: 'EXIT';
        close.
OS2Process command: bat.

pdfAsText := String streamContents: [ :s | 1 to: (pdf size - 4) do: [
:i | s nextPut: (pdf at: i) ]. s nextPutAll: '.txt'; close ].
"C:\some\document.txt"
(Delay forMilliseconds: 500)wait.
contents := pdfAsText asFileReference contents.
bat asFileReference delete.
pdfAsText asFileReference delete.
contents

  ================================================================
        Missatge enviat a través del Webmail de Girona.com
                      http://www.girona.com
  ================================================================


Reply | Threaded
Open this post in threaded view
|

Re: Execute Tet pdflib

Ben Coman
On Tue, Dec 6, 2016 at 4:28 AM,  <[hidden email]> wrote:
> Hello,
>
> 1-  I need help to extract a selected text from a pdf document.
> Tet pdflib can do that and  i need help to execute command line using Pharo.
> I am able to do the work using a bat file but i think we can do better.

Someone else will need ot help for that.

>
> 2- How we can code properly the delay (when we wait the Bat file)

Can you expand on the problem.
"(Delay forMilliseconds: 500)wait." looks fine to me.

cheers -ben


>
> Thank you for your help.
>
>
> https://www.pdflib.com/products/tet/
>
> |bat pdf stream pdfAsText contents |
> bat := 'C:\some\code.bat'.
> pdf := 'C:\some\document.pdf'.
> stream := bat asFileReference writeStream.
>
> "Console Windows code"
> stream nextPutAll:  'CD "C:\Program Files\PDFlib\TET 5.0 64-bit\bin"';
>         nextPut: Character linefeed;
>         nextPutAll: 'TET --samedir --lastpage last-1 --pageopt "includebox={
> {98.38 693.32 253.41 709.91} {522.70 183.66 595 226.30} {416.97 574.79 479
> 598.03} {401.33 773.91 453 789.55} {294.18 575.74 369.56 598.5} {295.6 492.3
> 373.35 524.43} {150.53 414.55 219.28 438.25} {47.17 494.67 98.85 520.75}
> {112.26 684.01 193.66 698.03} {533.61 212.57 595 620} {100.75 657.29 217.85
> 673.41}}"';
>         nextPut: Character space;
>         nextPutAll: pdf;
>         nextPut: Character linefeed;
>         nextPutAll: 'EXIT';
>         close.
> OS2Process command: bat.
>
> pdfAsText := String streamContents: [ :s | 1 to: (pdf size - 4) do: [ :i | s
> nextPut: (pdf at: i) ]. s nextPutAll: '.txt'; close ].
> "C:\some\document.txt"
> (Delay forMilliseconds: 500)wait.
> contents := pdfAsText asFileReference contents.
> bat asFileReference delete.
> pdfAsText asFileReference delete.
> contents
>
>  ================================================================
>         Missatge enviat a través del Webmail de Girona.com
>                      http://www.girona.com
>  ================================================================
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Execute Tet pdflib

pauguillot
Works fine but it's a Delay fixed it's not a delay adapted.
I wan't to wait until we have the pdfAsText document disponible to read
contents. Not wait some constant time
Tank you -Pau



A 2016-12-06 00:43, Ben Coman escrigué:

> On Tue, Dec 6, 2016 at 4:28 AM,  <[hidden email]> wrote:
>> Hello,
>>
>> 1-  I need help to extract a selected text from a pdf document.
>> Tet pdflib can do that and  i need help to execute command line using
>> Pharo.
>> I am able to do the work using a bat file but i think we can do
>> better.
>
> Someone else will need ot help for that.
>
>>
>> 2- How we can code properly the delay (when we wait the Bat file)
>
> Can you expand on the problem.
> "(Delay forMilliseconds: 500)wait." looks fine to me.
>
> cheers -ben
>
>
>>
>> Thank you for your help.
>>
>>
>> https://www.pdflib.com/products/tet/
>>
>> |bat pdf stream pdfAsText contents |
>> bat := 'C:\some\code.bat'.
>> pdf := 'C:\some\document.pdf'.
>> stream := bat asFileReference writeStream.
>>
>> "Console Windows code"
>> stream nextPutAll:  'CD "C:\Program Files\PDFlib\TET 5.0
>> 64-bit\bin"';
>>         nextPut: Character linefeed;
>>         nextPutAll: 'TET --samedir --lastpage last-1 --pageopt
>> "includebox={
>> {98.38 693.32 253.41 709.91} {522.70 183.66 595 226.30} {416.97
>> 574.79 479
>> 598.03} {401.33 773.91 453 789.55} {294.18 575.74 369.56 598.5}
>> {295.6 492.3
>> 373.35 524.43} {150.53 414.55 219.28 438.25} {47.17 494.67 98.85
>> 520.75}
>> {112.26 684.01 193.66 698.03} {533.61 212.57 595 620} {100.75 657.29
>> 217.85
>> 673.41}}"';
>>         nextPut: Character space;
>>         nextPutAll: pdf;
>>         nextPut: Character linefeed;
>>         nextPutAll: 'EXIT';
>>         close.
>> OS2Process command: bat.
>>
>> pdfAsText := String streamContents: [ :s | 1 to: (pdf size - 4) do: [
>> :i | s
>> nextPut: (pdf at: i) ]. s nextPutAll: '.txt'; close
>> ]."C:\some\document.txt"
>> (Delay forMilliseconds: 500)wait.
>> contents := pdfAsText asFileReference contents.
>> bat asFileReference delete.
>> pdfAsText asFileReference delete.
>> contents
>>
>>  ================================================================
>>         Missatge enviat a través del Webmail de Girona.com
>>                      http://www.girona.com
>>  ================================================================
>>
>>

  ================================================================
        Missatge enviat a través del Webmail de Girona.com
                      http://www.girona.com
  ================================================================


Reply | Threaded
Open this post in threaded view
|

Re: Execute Tet pdflib

pauguillot
In reply to this post by Ben Coman
Ok, that's the best that i can do. -Pau

|bin pdf pdfAsText result|
bin :=  '"C:\Program Files\PDFlib\TET 5.0 64-bit\bin\tet.exe"'.
pdf := '"C:\*\document.pdf"'.
pdfAsText := 'C:\*\document.txt' asFileReference.

OS2Process command: bin, ' --samedir ', pdf.
[ pdfAsText exists ] whileFalse: [ (Delay forMilliseconds: 1) wait ].
(Delay forMilliseconds: 300) wait.
result := pdfAsText contents.
pdfAsText delete.
result






A 2016-12-06 00:43, Ben Coman escrigué:

> On Tue, Dec 6, 2016 at 4:28 AM,  <[hidden email]> wrote:
>> Hello,
>>
>> 1-  I need help to extract a selected text from a pdf document.
>> Tet pdflib can do that and  i need help to execute command line using
>> Pharo.
>> I am able to do the work using a bat file but i think we can do
>> better.
>
> Someone else will need ot help for that.
>
>>
>> 2- How we can code properly the delay (when we wait the Bat file)
>
> Can you expand on the problem.
> "(Delay forMilliseconds: 500)wait." looks fine to me.
>
> cheers -ben
>
>
>>
>> Thank you for your help.
>>
>>
>> https://www.pdflib.com/products/tet/
>>
>> |bat pdf stream pdfAsText contents |
>> bat := 'C:\some\code.bat'.
>> pdf := 'C:\some\document.pdf'.
>> stream := bat asFileReference writeStream.
>>
>> "Console Windows code"
>> stream nextPutAll:  'CD "C:\Program Files\PDFlib\TET 5.0
>> 64-bit\bin"';
>>         nextPut: Character linefeed;
>>         nextPutAll: 'TET --samedir --lastpage last-1 --pageopt
>> "includebox={
>> {98.38 693.32 253.41 709.91} {522.70 183.66 595 226.30} {416.97
>> 574.79 479
>> 598.03} {401.33 773.91 453 789.55} {294.18 575.74 369.56 598.5}
>> {295.6 492.3
>> 373.35 524.43} {150.53 414.55 219.28 438.25} {47.17 494.67 98.85
>> 520.75}
>> {112.26 684.01 193.66 698.03} {533.61 212.57 595 620} {100.75 657.29
>> 217.85
>> 673.41}}"';
>>         nextPut: Character space;
>>         nextPutAll: pdf;
>>         nextPut: Character linefeed;
>>         nextPutAll: 'EXIT';
>>         close.
>> OS2Process command: bat.
>>
>> pdfAsText := String streamContents: [ :s | 1 to: (pdf size - 4) do: [
>> :i | s
>> nextPut: (pdf at: i) ]. s nextPutAll: '.txt'; close ].
>> "C:\some\document.txt"
>> (Delay forMilliseconds: 500)wait.
>> contents := pdfAsText asFileReference contents.
>> bat asFileReference delete.
>> pdfAsText asFileReference delete.
>> contents
>>
>>  ================================================================
>>         Missatge enviat a través del Webmail de Girona.com
>>                      http://www.girona.com
>>  ================================================================
>>
>>

  ================================================================
        Missatge enviat a través del Webmail de Girona.com
                      http://www.girona.com
  ================================================================