Automation of MS Office from Pharo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Automation of MS Office from Pharo

tesonep@gmail.com
Hi!!!
  This was a great report. I have submitted a fix in the master of Pharo-COM.

Basically the problem was to free twice the BSTR in the Variant.
It was being free in the access to the value and in the free of the struct.
Why it works with other BSTR when they are smaller, I cannot know.

I have added another smoke test using Word

Can you try the fix?

Thanks, both for helping me with the reports, they were great.

On Wed, Apr 8, 2020 at 9:37 AM PBKResearch <[hidden email]> wrote:

>
> Tomaz, that was my understanding from the VBA piece you cited yesterday. So presumably it must be something in Pharo-Com which imposes the limits we have seen. I am OK at the moment, because all this work is just an exploration of possibilities; I can wait until you and Pablo have sorted it out. But from the results of your tests, a maximum of 16K in 64-bit systems must be a serious limitation, so something in Pharo-Com needs fixing.
>
>
>
> For my immediate work, I shall continue exporting the full text using MailItem.SaveAs; my further processing uses files I have exported manually in this way, so it’s not a problem.
>
>
>
> Thanks
>
>
>
> Peter Kenny
>
>
>
> From: Pharo-users <[hidden email]> On Behalf Of Tomaž Turk
> Sent: 08 April 2020 07:58
> To: Any question about pharo is welcome <[hidden email]>
> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>
>
>
> Thanks, Stephane, for the acknowledgement. Peter, as I understand, the limits in COM BSTR data type are defined by the header's length prefix (which is 4 bytes) and software implementatios - for instance, string data type in Visual Basic for Applications is described as "a variable-length string can contain up to approximately 2 billion (2^31) characters", which is in line with the BSTR header. I'm not sure if the OS architecture (32 and 64 bit) influences these values.
>
>
>
> Best wishes,
>
> Tomaz
>
>
>
>



--
Pablo Tesone.
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Automation of MS Office from Pharo

eftomi
Thanks Pablo for your quick response! I tested 32 and 64 bit images with
 >1.000.000 strings and it works just fine.

Best wishes,
Tomaz


------ Original Message ------
From: "[hidden email]" <[hidden email]>
To: "Any question about pharo is welcome" <[hidden email]>
Cc: "Tomaž Turk" <[hidden email]>
Sent: 8.4.2020 9:54:35
Subject: Re: [Pharo-users] Automation of MS Office from Pharo

>Hi!!!
>   This was a great report. I have submitted a fix in the master of Pharo-COM.
>
>Basically the problem was to free twice the BSTR in the Variant.
>It was being free in the access to the value and in the free of the struct.
>Why it works with other BSTR when they are smaller, I cannot know.
>
>I have added another smoke test using Word
>
>Can you try the fix?
>
>Thanks, both for helping me with the reports, they were great.
>
>On Wed, Apr 8, 2020 at 9:37 AM PBKResearch <[hidden email]> wrote:
>>
>>  Tomaz, that was my understanding from the VBA piece you cited yesterday. So presumably it must be something in Pharo-Com which imposes the limits we have seen. I am OK at the moment, because all this work is just an exploration of possibilities; I can wait until you and Pablo have sorted it out. But from the results of your tests, a maximum of 16K in 64-bit systems must be a serious limitation, so something in Pharo-Com needs fixing.
>>
>>
>>
>>  For my immediate work, I shall continue exporting the full text using MailItem.SaveAs; my further processing uses files I have exported manually in this way, so it’s not a problem.
>>
>>
>>
>>  Thanks
>>
>>
>>
>>  Peter Kenny
>>
>>
>>
>>  From: Pharo-users <[hidden email]> On Behalf Of Tomaž Turk
>>  Sent: 08 April 2020 07:58
>>  To: Any question about pharo is welcome <[hidden email]>
>>  Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>>
>>
>>
>>  Thanks, Stephane, for the acknowledgement. Peter, as I understand, the limits in COM BSTR data type are defined by the header's length prefix (which is 4 bytes) and software implementatios - for instance, string data type in Visual Basic for Applications is described as "a variable-length string can contain up to approximately 2 billion (2^31) characters", which is in line with the BSTR header. I'm not sure if the OS architecture (32 and 64 bit) influences these values.
>>
>>
>>
>>  Best wishes,
>>
>>  Tomaz
>>
>>
>>
>>
>
>
>
>--
>Pablo Tesone.
>[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Automation of MS Office from Pharo

Guillermo Polito
Cool, thanks to the three :)

@Peter, I’d like if you tell us at the end if you had success at automating you mail workflow ^^

> El 8 abr 2020, a las 10:15, Tomaž Turk <[hidden email]> escribió:
>
> Thanks Pablo for your quick response! I tested 32 and 64 bit images with >1.000.000 strings and it works just fine.
>
> Best wishes,
> Tomaz
>
>
> ------ Original Message ------
> From: "[hidden email]" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Cc: "Tomaž Turk" <[hidden email]>
> Sent: 8.4.2020 9:54:35
> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>
>> Hi!!!
>>  This was a great report. I have submitted a fix in the master of Pharo-COM.
>>
>> Basically the problem was to free twice the BSTR in the Variant.
>> It was being free in the access to the value and in the free of the struct.
>> Why it works with other BSTR when they are smaller, I cannot know.
>>
>> I have added another smoke test using Word
>>
>> Can you try the fix?
>>
>> Thanks, both for helping me with the reports, they were great.
>>
>> On Wed, Apr 8, 2020 at 9:37 AM PBKResearch <[hidden email]> wrote:
>>>
>>> Tomaz, that was my understanding from the VBA piece you cited yesterday. So presumably it must be something in Pharo-Com which imposes the limits we have seen. I am OK at the moment, because all this work is just an exploration of possibilities; I can wait until you and Pablo have sorted it out. But from the results of your tests, a maximum of 16K in 64-bit systems must be a serious limitation, so something in Pharo-Com needs fixing.
>>>
>>>
>>>
>>> For my immediate work, I shall continue exporting the full text using MailItem.SaveAs; my further processing uses files I have exported manually in this way, so it’s not a problem.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Peter Kenny
>>>
>>>
>>>
>>> From: Pharo-users <[hidden email]> On Behalf Of Tomaž Turk
>>> Sent: 08 April 2020 07:58
>>> To: Any question about pharo is welcome <[hidden email]>
>>> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>>>
>>>
>>>
>>> Thanks, Stephane, for the acknowledgement. Peter, as I understand, the limits in COM BSTR data type are defined by the header's length prefix (which is 4 bytes) and software implementatios - for instance, string data type in Visual Basic for Applications is described as "a variable-length string can contain up to approximately 2 billion (2^31) characters", which is in line with the BSTR header. I'm not sure if the OS architecture (32 and 64 bit) influences these values.
>>>
>>>
>>>
>>> Best wishes,
>>>
>>> Tomaz
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Pablo Tesone.
>> [hidden email]
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Automation of MS Office from Pharo

Peter Kenny
Hello Pablo

Success! I have rerun one of the troublesome cases, with no problem. I then re-ran the test on the latest 16 messages, collecting all the HTML outputs in an array, which took just over 5 secs total; quicker than I expected. The longest texts, from the most verbose newsletters, run from 267K to over 300K, so that is where they ran into the limit we found yesterday.

I now know that I can pass the HTML to the XMLHTMLParser in memory, without saving to file first; this will be more convenient. Thanks for the prompt help.

@Guillermo - All this is a proof of possibility, it will need to be combined with other bits I am working on to make the automated system I want. I shall let you know when it is worked out - but it won't be in the next few days!

Thanks to all

Peter Kenny

-----Original Message-----
From: Pharo-users <[hidden email]> On Behalf Of Guillermo Polito
Sent: 08 April 2020 09:20
To: Tomaž Turk <[hidden email]>; Any question about pharo is welcome <[hidden email]>
Subject: Re: [Pharo-users] Automation of MS Office from Pharo

Cool, thanks to the three :)

@Peter, I’d like if you tell us at the end if you had success at automating you mail workflow ^^

> El 8 abr 2020, a las 10:15, Tomaž Turk <[hidden email]> escribió:
>
> Thanks Pablo for your quick response! I tested 32 and 64 bit images with >1.000.000 strings and it works just fine.
>
> Best wishes,
> Tomaz
>
>
> ------ Original Message ------
> From: "[hidden email]" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Cc: "Tomaž Turk" <[hidden email]>
> Sent: 8.4.2020 9:54:35
> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>
>> Hi!!!
>>  This was a great report. I have submitted a fix in the master of Pharo-COM.
>>
>> Basically the problem was to free twice the BSTR in the Variant.
>> It was being free in the access to the value and in the free of the struct.
>> Why it works with other BSTR when they are smaller, I cannot know.
>>
>> I have added another smoke test using Word
>>
>> Can you try the fix?
>>
>> Thanks, both for helping me with the reports, they were great.
>>
>> On Wed, Apr 8, 2020 at 9:37 AM PBKResearch <[hidden email]> wrote:
>>>
>>> Tomaz, that was my understanding from the VBA piece you cited yesterday. So presumably it must be something in Pharo-Com which imposes the limits we have seen. I am OK at the moment, because all this work is just an exploration of possibilities; I can wait until you and Pablo have sorted it out. But from the results of your tests, a maximum of 16K in 64-bit systems must be a serious limitation, so something in Pharo-Com needs fixing.
>>>
>>>
>>>
>>> For my immediate work, I shall continue exporting the full text using MailItem.SaveAs; my further processing uses files I have exported manually in this way, so it’s not a problem.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Peter Kenny
>>>
>>>
>>>
>>> From: Pharo-users <[hidden email]> On Behalf Of Tomaž Turk
>>> Sent: 08 April 2020 07:58
>>> To: Any question about pharo is welcome <[hidden email]>
>>> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>>>
>>>
>>>
>>> Thanks, Stephane, for the acknowledgement. Peter, as I understand, the limits in COM BSTR data type are defined by the header's length prefix (which is 4 bytes) and software implementatios - for instance, string data type in Visual Basic for Applications is described as "a variable-length string can contain up to approximately 2 billion (2^31) characters", which is in line with the BSTR header. I'm not sure if the OS architecture (32 and 64 bit) influences these values.
>>>
>>>
>>>
>>> Best wishes,
>>>
>>> Tomaz
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Pablo Tesone.
>> [hidden email]
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Automation of MS Office from Pharo

Ben Coman
I'd be very interested to hear how this ends up.
Parsing Outlook mails from Pharo may prove useful in my day job.

cheers -ben

On Wed, 8 Apr 2020 at 18:02, PBKResearch <[hidden email]> wrote:
Hello Pablo

Success! I have rerun one of the troublesome cases, with no problem. I then re-ran the test on the latest 16 messages, collecting all the HTML outputs in an array, which took just over 5 secs total; quicker than I expected. The longest texts, from the most verbose newsletters, run from 267K to over 300K, so that is where they ran into the limit we found yesterday.

I now know that I can pass the HTML to the XMLHTMLParser in memory, without saving to file first; this will be more convenient. Thanks for the prompt help.

@Guillermo - All this is a proof of possibility, it will need to be combined with other bits I am working on to make the automated system I want. I shall let you know when it is worked out - but it won't be in the next few days!

Thanks to all

Peter Kenny

-----Original Message-----
From: Pharo-users <[hidden email]> On Behalf Of Guillermo Polito
Sent: 08 April 2020 09:20
To: Tomaž Turk <[hidden email]>; Any question about pharo is welcome <[hidden email]>
Subject: Re: [Pharo-users] Automation of MS Office from Pharo

Cool, thanks to the three :)

@Peter, I’d like if you tell us at the end if you had success at automating you mail workflow ^^

> El 8 abr 2020, a las 10:15, Tomaž Turk <[hidden email]> escribió:
>
> Thanks Pablo for your quick response! I tested 32 and 64 bit images with >1.000.000 strings and it works just fine.
>
> Best wishes,
> Tomaz
>
>
> ------ Original Message ------
> From: "[hidden email]" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Cc: "Tomaž Turk" <[hidden email]>
> Sent: 8.4.2020 9:54:35
> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>
>> Hi!!!
>>  This was a great report. I have submitted a fix in the master of Pharo-COM.
>>
>> Basically the problem was to free twice the BSTR in the Variant.
>> It was being free in the access to the value and in the free of the struct.
>> Why it works with other BSTR when they are smaller, I cannot know.
>>
>> I have added another smoke test using Word
>>
>> Can you try the fix?
>>
>> Thanks, both for helping me with the reports, they were great.
>>
>> On Wed, Apr 8, 2020 at 9:37 AM PBKResearch <[hidden email]> wrote:
>>>
>>> Tomaz, that was my understanding from the VBA piece you cited yesterday. So presumably it must be something in Pharo-Com which imposes the limits we have seen. I am OK at the moment, because all this work is just an exploration of possibilities; I can wait until you and Pablo have sorted it out. But from the results of your tests, a maximum of 16K in 64-bit systems must be a serious limitation, so something in Pharo-Com needs fixing.
>>>
>>>
>>>
>>> For my immediate work, I shall continue exporting the full text using MailItem.SaveAs; my further processing uses files I have exported manually in this way, so it’s not a problem.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Peter Kenny
>>>
>>>
>>>
>>> From: Pharo-users <[hidden email]> On Behalf Of Tomaž Turk
>>> Sent: 08 April 2020 07:58
>>> To: Any question about pharo is welcome <[hidden email]>
>>> Subject: Re: [Pharo-users] Automation of MS Office from Pharo
>>>
>>>
>>>
>>> Thanks, Stephane, for the acknowledgement. Peter, as I understand, the limits in COM BSTR data type are defined by the header's length prefix (which is 4 bytes) and software implementatios - for instance, string data type in Visual Basic for Applications is described as "a variable-length string can contain up to approximately 2 billion (2^31) characters", which is in line with the BSTR header. I'm not sure if the OS architecture (32 and 64 bit) influences these values.
>>>
>>>
>>>
>>> Best wishes,
>>>
>>> Tomaz
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Pablo Tesone.
>> [hidden email]
>
>



12