Pharo-Chrome (was: Soup bug(fix))

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
Hi Stef,

On 12 November 2017 at 14:47, Stephane Ducasse <[hidden email]> wrote:

> exampleNavigation
> | chrome page logger |
> logger := InMemoryLogger new.
> logger start.
> chrome := GoogleChrome new
> debugOn;
> debugSession;
> open;
> yourself.
> page := chrome tabPages first.
> page enablePage.
> page enableDOM.
> page navigateTo: 'http://pharo.org'.
> page getDocument.
> page getMissingChildren.
> page updateTitle.
> logger stop.
> ^{ chrome. page. logger. }
>
> but in fact I realised that I would like to a simple doc :)
>
>
> On Sun, Nov 12, 2017 at 2:44 PM, Stephane Ducasse
> <[hidden email]> wrote:
>> Hi alistair
>>
>> this is cool.
>> Do you have one little example so that we can see how we can use it?
>>
>> Stef

Fair enough :-)

I'll try and extend the readme to include some basic documentation.

Cheers,
Alistair



>> On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <[hidden email]> wrote:
>>> On 9 November 2017 at 00:00, Kjell Godo <[hidden email]> wrote:
>>>> i like to collect some newspaper comics from an online newspaper
>>>>      but it takes really long to do it by hand by hand
>>>> i tried Soup but i didn’t get anywhere
>>>>      the pictures were hidden behind a script or something
>>>> is there anything to do about that?
>>>
>>> Most of the web pages I want to scrape use javascript to construct the
>>> DOM, which makes Soup. XMLHTMLParser, etc. useless.
>>>
>>> I've extended Torsten's Pharo-Chrome library and use that to navigate
>>> the DOM in a way similar to Soup:
>>>
>>> https://github.com/akgrant43/Pharo-Chrome
>>>
>>> This gets around the issue with javascript since it waits for the
>>> browser to load the page, run the javascript and construct the DOM.
>>>
>>> HTH,
>>> Alistair
>>>
>>>
>>>
>>>>         i don’t want to collect them all
>>>> i have the XPath .pdf but i haven’t read it yet
>>>>
>>>> these browsers seem to gobble up memory
>>>>      and while open they just keep getting bigger till the OS session crash
>>>>      might there be a browser that is more minimal?
>>>>
>>>> Vivaldi seems better at not bloating up RAM
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Stephane Ducasse-3
Tx and one day we can turn it into another little booklet :)

Stef

On Sun, Nov 12, 2017 at 3:04 PM, Alistair Grant <[hidden email]> wrote:

> Hi Stef,
>
> On 12 November 2017 at 14:47, Stephane Ducasse <[hidden email]> wrote:
>> exampleNavigation
>> | chrome page logger |
>> logger := InMemoryLogger new.
>> logger start.
>> chrome := GoogleChrome new
>> debugOn;
>> debugSession;
>> open;
>> yourself.
>> page := chrome tabPages first.
>> page enablePage.
>> page enableDOM.
>> page navigateTo: 'http://pharo.org'.
>> page getDocument.
>> page getMissingChildren.
>> page updateTitle.
>> logger stop.
>> ^{ chrome. page. logger. }
>>
>> but in fact I realised that I would like to a simple doc :)
>>
>>
>> On Sun, Nov 12, 2017 at 2:44 PM, Stephane Ducasse
>> <[hidden email]> wrote:
>>> Hi alistair
>>>
>>> this is cool.
>>> Do you have one little example so that we can see how we can use it?
>>>
>>> Stef
>
> Fair enough :-)
>
> I'll try and extend the readme to include some basic documentation.
>
> Cheers,
> Alistair
>
>
>
>>> On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <[hidden email]> wrote:
>>>> On 9 November 2017 at 00:00, Kjell Godo <[hidden email]> wrote:
>>>>> i like to collect some newspaper comics from an online newspaper
>>>>>      but it takes really long to do it by hand by hand
>>>>> i tried Soup but i didn’t get anywhere
>>>>>      the pictures were hidden behind a script or something
>>>>> is there anything to do about that?
>>>>
>>>> Most of the web pages I want to scrape use javascript to construct the
>>>> DOM, which makes Soup. XMLHTMLParser, etc. useless.
>>>>
>>>> I've extended Torsten's Pharo-Chrome library and use that to navigate
>>>> the DOM in a way similar to Soup:
>>>>
>>>> https://github.com/akgrant43/Pharo-Chrome
>>>>
>>>> This gets around the issue with javascript since it waits for the
>>>> browser to load the page, run the javascript and construct the DOM.
>>>>
>>>> HTH,
>>>> Alistair
>>>>
>>>>
>>>>
>>>>>         i don’t want to collect them all
>>>>> i have the XPath .pdf but i haven’t read it yet
>>>>>
>>>>> these browsers seem to gobble up memory
>>>>>      and while open they just keep getting bigger till the OS session crash
>>>>>      might there be a browser that is more minimal?
>>>>>
>>>>> Vivaldi seems better at not bloating up RAM
>>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
In reply to this post by Alistair Grant
Hi Sean,

Thanks for your feedback!  (responses below)


On 12 November 2017 at 18:11, Sean P. DeNigris <[hidden email]> wrote:
> Alistair Grant wrote
>> https://github.com/akgrant43/Pharo-Chrome
>
> Wow, that was a wild ride!

Sorry about that.


> Lessons learned along the way:
> 1. On a Mac, to use the snazzy `chrome` terminal command referenced all over
> the place in the docs, you must first `alias chrome="/Applications/Google\
> Chrome.app/Contents/MacOS/Google\ Chrome"`

I'm an Ubuntu Linux user, however if you look at OSXChromePlatform
class>>defaultExecutableLocation you can see that is where it should
be looking for the exe, so the alias shouldn't really be necessary.
Torsten wrote this, so maybe has more insight.


> 2. Chrome must be started with certain flags: `chrome
> --remote-debugging-port=9222 --disable-gpu` (not sure if the last flag is
> needed, but `#get:` seemed to hang before using; reference
> https://developers.google.com/web/updates/2017/04/headless-chrome)

I've been using this without headless mode.  I'll add a headless flag
that also disables the gpu.



> 3. Beacon has renamed InMemoryLogger to MemoryLogger
> 4. I guess Beacon has renamed `#log` to `#emit`

Sorry about that.  I didn't realise that the Pharo-Chrome baseline is
loading Beacon stable while my install script upgrades it to
#development.  #development is more recent, so I'll update the
baseline.



> 5. I had to comment out `chromeProcess sigterm.` because `chromeProcess` was
> nil and also #sigterm seemed not to be defined anywhere in the image. I'm
> not sure what the issue is there.

chromeProcess is set in GoogleChrome>>openURL:.  Can you give me a
small example that demonstrates the problem?

#sigterm is implemented by OSSUnixSubprocess, which is what I
ultimately use to launch the Chrome process on Ubuntu.

But... this will be broken on Mac at the moment because the current
method of launching chrome doesn't keep track of the process, so
doesn't support #sigterm.  Do you know if OSSUnixSubprocess works on
Mac?  If it does, I can update the code (but not test it :-().


> Pull request issued for #3 & #4.

Once I update the baseline this shouldn't be required.


> Also, I'm not sure what platforms you
> support, but you may want to tag the example methods with <gtExample> or
> similar so that they are runnable from the browser and open an inspector if
> there is an interesting return value.

Good idea, I'll do this.

I'm also making a few other changes:

1. Add an #extractTables method that searches through the page and
returns an array of rows for each table it finds in the page
(something that can easily be loaded in to DataFrame using #fromRows:,
but I don't want to make Pharo-Chrome dependent on DataFrame at the
moment).  Most of the time I use Pharo-Chrome it is extracting data
from tables.

2. I don't know of any reliable way to tell when a page has loaded
since there can always be javascript that periodically updates the
page.  At the moment it waits until the page hasn't changed for a
configurable amount of time.  I'm planning to add a check for specific
content to determine if the page is considered loaded.

3. Add some documentation to the readme :-)



> -----
> Cheers,
> Sean

I'll let you know when I have a new version available (hopefully in
the next few days).


Thanks again,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
I've committed some fixes to the development branch:

1. MacOS hopefully works now (I don't have access to the platform, so
can't test it).
2. The development version of Beacon is loaded (which is required for
the InMemoryLogger).
3. The README is a tiny bit better.
4. Added #extractTables.

As an example of how historical stock market data can be extracted,
the following retrieves data for the Australian S&P200 index from
yahoo:


| rootNode tables historicalData dataFrame |

rootNode := GoogleChrome get:
'https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO'.
tables := rootNode extractTables.
historicalData := (tables sorted: #size ascending) last.
dataFrame := DataFrame fromRows: (historicalData select: [ :each |
each size = 7 ]).
dataFrame asStringTable.

"
     |  1             2         3         4         5         6
    7
-----+-----------------------------------------------------------------------------
1    |  Date          Open      High      Low       Close*    Adj
Close**  Volume
2    |  Nov 14, 2017  6,021.80  6,021.80  5,957.10  5,966.00  5,966.00
    -
3    |  Nov 13, 2017  6,029.40  6,029.40  6,010.70  6,021.80  6,021.80
    -
4    |  Nov 10, 2017  6,049.40  6,049.40  6,020.70  6,029.40  6,029.40
    -
etc.
"


To load the development version on MacOS or Linux in a 32 bit image:

"Assuming you don't have OSProcess loaded:"
Metacello new
    configuration: 'OSSubprocess';
    repository: 'github://marianopeck/OSSubprocess:master/repository';
    version: #stable;
    load.

Metacello new
    baseline: 'Chrome';
    repository: 'github://akgrant43/Pharo-Chrome:development/repository';
    load.


Cheers,
Alistair


On 12 November 2017 at 20:09, Alistair Grant <[hidden email]> wrote:

> Hi Sean,
>
> Thanks for your feedback!  (responses below)
>
>
> On 12 November 2017 at 18:11, Sean P. DeNigris <[hidden email]> wrote:
>> Alistair Grant wrote
>>> https://github.com/akgrant43/Pharo-Chrome
>>
>> Wow, that was a wild ride!
>
> Sorry about that.
>
>
>> Lessons learned along the way:
>> 1. On a Mac, to use the snazzy `chrome` terminal command referenced all over
>> the place in the docs, you must first `alias chrome="/Applications/Google\
>> Chrome.app/Contents/MacOS/Google\ Chrome"`
>
> I'm an Ubuntu Linux user, however if you look at OSXChromePlatform
> class>>defaultExecutableLocation you can see that is where it should
> be looking for the exe, so the alias shouldn't really be necessary.
> Torsten wrote this, so maybe has more insight.
>
>
>> 2. Chrome must be started with certain flags: `chrome
>> --remote-debugging-port=9222 --disable-gpu` (not sure if the last flag is
>> needed, but `#get:` seemed to hang before using; reference
>> https://developers.google.com/web/updates/2017/04/headless-chrome)
>
> I've been using this without headless mode.  I'll add a headless flag
> that also disables the gpu.
>
>
>
>> 3. Beacon has renamed InMemoryLogger to MemoryLogger
>> 4. I guess Beacon has renamed `#log` to `#emit`
>
> Sorry about that.  I didn't realise that the Pharo-Chrome baseline is
> loading Beacon stable while my install script upgrades it to
> #development.  #development is more recent, so I'll update the
> baseline.
>
>
>
>> 5. I had to comment out `chromeProcess sigterm.` because `chromeProcess` was
>> nil and also #sigterm seemed not to be defined anywhere in the image. I'm
>> not sure what the issue is there.
>
> chromeProcess is set in GoogleChrome>>openURL:.  Can you give me a
> small example that demonstrates the problem?
>
> #sigterm is implemented by OSSUnixSubprocess, which is what I
> ultimately use to launch the Chrome process on Ubuntu.
>
> But... this will be broken on Mac at the moment because the current
> method of launching chrome doesn't keep track of the process, so
> doesn't support #sigterm.  Do you know if OSSUnixSubprocess works on
> Mac?  If it does, I can update the code (but not test it :-().
>
>
>> Pull request issued for #3 & #4.
>
> Once I update the baseline this shouldn't be required.
>
>
>> Also, I'm not sure what platforms you
>> support, but you may want to tag the example methods with <gtExample> or
>> similar so that they are runnable from the browser and open an inspector if
>> there is an interesting return value.
>
> Good idea, I'll do this.
>
> I'm also making a few other changes:
>
> 1. Add an #extractTables method that searches through the page and
> returns an array of rows for each table it finds in the page
> (something that can easily be loaded in to DataFrame using #fromRows:,
> but I don't want to make Pharo-Chrome dependent on DataFrame at the
> moment).  Most of the time I use Pharo-Chrome it is extracting data
> from tables.
>
> 2. I don't know of any reliable way to tell when a page has loaded
> since there can always be javascript that periodically updates the
> page.  At the moment it waits until the page hasn't changed for a
> configurable amount of time.  I'm planning to add a check for specific
> content to determine if the page is considered loaded.
>
> 3. Add some documentation to the readme :-)
>
>
>
>> -----
>> Cheers,
>> Sean
>
> I'll let you know when I have a new version available (hopefully in
> the next few days).
>
>
> Thanks again,
> Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Sean P. DeNigris
Administrator
Alistair Grant wrote
> I've committed some fixes to the development branch:

Thanks!

I tried your example, but apparently the OSXProcess class, which is
referenced in openChromeWith: is missing. Also, no class in the image seems
to define #createProcess:, which is sent to OSXProcess there



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
Hi Sean,

On 14 November 2017 at 19:06, Sean P. DeNigris <[hidden email]> wrote:
> Alistair Grant wrote
>> I've committed some fixes to the development branch:
>
> Thanks!
>
> I tried your example, but apparently the OSXProcess class, which is
> referenced in openChromeWith: is missing. Also, no class in the image seems
> to define #createProcess:, which is sent to OSXProcess there

This looks like you are using an old (cached?) version.  Maybe try
"Pull incoming commits" from Iceberg?

You should have (minus the broken formatting from pasting):


OSXChromePlatform>>openChromeWith: arguments

| executableLocation process |
executableLocation := self defaultExecutableLocation copyReplaceAll: '
' with: '\ '.
process := AKGOSProcess command: executableLocation arguments: arguments.
process run.
^process


HTH,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
On 14 November 2017 at 19:13, Alistair Grant <[hidden email]> wrote:

> Hi Sean,
>
> On 14 November 2017 at 19:06, Sean P. DeNigris <[hidden email]> wrote:
>> Alistair Grant wrote
>>> I've committed some fixes to the development branch:
>>
>> Thanks!
>>
>> I tried your example, but apparently the OSXProcess class, which is
>> referenced in openChromeWith: is missing. Also, no class in the image seems
>> to define #createProcess:, which is sent to OSXProcess there
>
> This looks like you are using an old (cached?) version.  Maybe try
> "Pull incoming commits" from Iceberg?
>
> You should have (minus the broken formatting from pasting):
>
>
> OSXChromePlatform>>openChromeWith: arguments
>
> | executableLocation process |
> executableLocation := self defaultExecutableLocation copyReplaceAll: '
> ' with: '\ '.
> process := AKGOSProcess command: executableLocation arguments: arguments.
> process run.
> ^process


P.S. Don't forget this is on the development branch.

Cheers,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Offray Vladimir Luna Cárdenas-2
Hi Alistar,

I have tried to run the examples, but seems that installation doesn't
include all needed package. At the beginning I installed OSUnix and then
OSLinuxUbuntu. None of them seems to include "AKGOSProcess", so the
"GoogleChrome get: 'http://pharo.org'" example raises:
"#command:arguments: was sent to nil". What package provides the proper
installation for a 64 bits Manjaro Linux including the dependencies?

Thanks,

Offray


On 14/11/17 13:14, Alistair Grant wrote:

> On 14 November 2017 at 19:13, Alistair Grant <[hidden email]> wrote:
>> Hi Sean,
>>
>> On 14 November 2017 at 19:06, Sean P. DeNigris <[hidden email]> wrote:
>>> Alistair Grant wrote
>>>> I've committed some fixes to the development branch:
>>> Thanks!
>>>
>>> I tried your example, but apparently the OSXProcess class, which is
>>> referenced in openChromeWith: is missing. Also, no class in the image seems
>>> to define #createProcess:, which is sent to OSXProcess there
>> This looks like you are using an old (cached?) version.  Maybe try
>> "Pull incoming commits" from Iceberg?
>>
>> You should have (minus the broken formatting from pasting):
>>
>>
>> OSXChromePlatform>>openChromeWith: arguments
>>
>> | executableLocation process |
>> executableLocation := self defaultExecutableLocation copyReplaceAll: '
>> ' with: '\ '.
>> process := AKGOSProcess command: executableLocation arguments: arguments.
>> process run.
>> ^process
>
> P.S. Don't forget this is on the development branch.
>
> Cheers,
> Alistair
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Offray Vladimir Luna Cárdenas-2
OK, the development branch solve this, as shown in
http://ws.stfx.eu/O6J4CJ1FZF89. Now I'm getting an unresponsive image
until I close Chrome, but I think that was talked in the thread. I'll
revise.

Cheers,

Offray


On 14/11/17 17:28, Offray Vladimir Luna Cárdenas wrote:

> Hi Alistar,
>
> I have tried to run the examples, but seems that installation doesn't
> include all needed package. At the beginning I installed OSUnix and then
> OSLinuxUbuntu. None of them seems to include "AKGOSProcess", so the
> "GoogleChrome get: 'http://pharo.org'" example raises:
> "#command:arguments: was sent to nil". What package provides the proper
> installation for a 64 bits Manjaro Linux including the dependencies?
>
> Thanks,
>
> Offray
>
>
> On 14/11/17 13:14, Alistair Grant wrote:
>> On 14 November 2017 at 19:13, Alistair Grant <[hidden email]> wrote:
>>> Hi Sean,
>>>
>>> On 14 November 2017 at 19:06, Sean P. DeNigris <[hidden email]> wrote:
>>>> Alistair Grant wrote
>>>>> I've committed some fixes to the development branch:
>>>> Thanks!
>>>>
>>>> I tried your example, but apparently the OSXProcess class, which is
>>>> referenced in openChromeWith: is missing. Also, no class in the image seems
>>>> to define #createProcess:, which is sent to OSXProcess there
>>> This looks like you are using an old (cached?) version.  Maybe try
>>> "Pull incoming commits" from Iceberg?
>>>
>>> You should have (minus the broken formatting from pasting):
>>>
>>>
>>> OSXChromePlatform>>openChromeWith: arguments
>>>
>>> | executableLocation process |
>>> executableLocation := self defaultExecutableLocation copyReplaceAll: '
>>> ' with: '\ '.
>>> process := AKGOSProcess command: executableLocation arguments: arguments.
>>> process run.
>>> ^process
>> P.S. Don't forget this is on the development branch.
>>
>> Cheers,
>> Alistair
>>
>>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Offray Vladimir Luna Cárdenas-2
In reply to this post by Alistair Grant
Hi Alistair,

The example is not working for me. When I run it, a chrome session is
open but nothing happens there, except that my image gets frozen until I
close chrome and then I get this message: "ConnectionTimedOut: Cannot
connect to 127.0.0.1:9222". What is the expected behavior? PharoChrome
expects the user to have a Google account or be logged in by default to
work (that would be a shame for those of us that don't a Google account
and still value our privacy).

Thanks,

Offray


On 14/11/17 11:26, Alistair Grant wrote:

> I've committed some fixes to the development branch:
>
> 1. MacOS hopefully works now (I don't have access to the platform, so
> can't test it).
> 2. The development version of Beacon is loaded (which is required for
> the InMemoryLogger).
> 3. The README is a tiny bit better.
> 4. Added #extractTables.
>
> As an example of how historical stock market data can be extracted,
> the following retrieves data for the Australian S&P200 index from
> yahoo:
>
>
> | rootNode tables historicalData dataFrame |
>
> rootNode := GoogleChrome get:
> 'https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO'.
> tables := rootNode extractTables.
> historicalData := (tables sorted: #size ascending) last.
> dataFrame := DataFrame fromRows: (historicalData select: [ :each |
> each size = 7 ]).
> dataFrame asStringTable.
>
> "
>      |  1             2         3         4         5         6
>     7
> -----+-----------------------------------------------------------------------------
> 1    |  Date          Open      High      Low       Close*    Adj
> Close**  Volume
> 2    |  Nov 14, 2017  6,021.80  6,021.80  5,957.10  5,966.00  5,966.00
>     -
> 3    |  Nov 13, 2017  6,029.40  6,029.40  6,010.70  6,021.80  6,021.80
>     -
> 4    |  Nov 10, 2017  6,049.40  6,049.40  6,020.70  6,029.40  6,029.40
>     -
> etc.
> "
>
>
> To load the development version on MacOS or Linux in a 32 bit image:
>
> "Assuming you don't have OSProcess loaded:"
> Metacello new
>     configuration: 'OSSubprocess';
>     repository: 'github://marianopeck/OSSubprocess:master/repository';
>     version: #stable;
>     load.
>
> Metacello new
>     baseline: 'Chrome';
>     repository: 'github://akgrant43/Pharo-Chrome:development/repository';
>     load.
>
>
> Cheers,
> Alistair
>
>
> On 12 November 2017 at 20:09, Alistair Grant <[hidden email]> wrote:
>> Hi Sean,
>>
>> Thanks for your feedback!  (responses below)
>>
>>
>> On 12 November 2017 at 18:11, Sean P. DeNigris <[hidden email]> wrote:
>>> Alistair Grant wrote
>>>> https://github.com/akgrant43/Pharo-Chrome
>>> Wow, that was a wild ride!
>> Sorry about that.
>>
>>
>>> Lessons learned along the way:
>>> 1. On a Mac, to use the snazzy `chrome` terminal command referenced all over
>>> the place in the docs, you must first `alias chrome="/Applications/Google\
>>> Chrome.app/Contents/MacOS/Google\ Chrome"`
>> I'm an Ubuntu Linux user, however if you look at OSXChromePlatform
>> class>>defaultExecutableLocation you can see that is where it should
>> be looking for the exe, so the alias shouldn't really be necessary.
>> Torsten wrote this, so maybe has more insight.
>>
>>
>>> 2. Chrome must be started with certain flags: `chrome
>>> --remote-debugging-port=9222 --disable-gpu` (not sure if the last flag is
>>> needed, but `#get:` seemed to hang before using; reference
>>> https://developers.google.com/web/updates/2017/04/headless-chrome)
>> I've been using this without headless mode.  I'll add a headless flag
>> that also disables the gpu.
>>
>>
>>
>>> 3. Beacon has renamed InMemoryLogger to MemoryLogger
>>> 4. I guess Beacon has renamed `#log` to `#emit`
>> Sorry about that.  I didn't realise that the Pharo-Chrome baseline is
>> loading Beacon stable while my install script upgrades it to
>> #development.  #development is more recent, so I'll update the
>> baseline.
>>
>>
>>
>>> 5. I had to comment out `chromeProcess sigterm.` because `chromeProcess` was
>>> nil and also #sigterm seemed not to be defined anywhere in the image. I'm
>>> not sure what the issue is there.
>> chromeProcess is set in GoogleChrome>>openURL:.  Can you give me a
>> small example that demonstrates the problem?
>>
>> #sigterm is implemented by OSSUnixSubprocess, which is what I
>> ultimately use to launch the Chrome process on Ubuntu.
>>
>> But... this will be broken on Mac at the moment because the current
>> method of launching chrome doesn't keep track of the process, so
>> doesn't support #sigterm.  Do you know if OSSUnixSubprocess works on
>> Mac?  If it does, I can update the code (but not test it :-().
>>
>>
>>> Pull request issued for #3 & #4.
>> Once I update the baseline this shouldn't be required.
>>
>>
>>> Also, I'm not sure what platforms you
>>> support, but you may want to tag the example methods with <gtExample> or
>>> similar so that they are runnable from the browser and open an inspector if
>>> there is an interesting return value.
>> Good idea, I'll do this.
>>
>> I'm also making a few other changes:
>>
>> 1. Add an #extractTables method that searches through the page and
>> returns an array of rows for each table it finds in the page
>> (something that can easily be loaded in to DataFrame using #fromRows:,
>> but I don't want to make Pharo-Chrome dependent on DataFrame at the
>> moment).  Most of the time I use Pharo-Chrome it is extracting data
>> from tables.
>>
>> 2. I don't know of any reliable way to tell when a page has loaded
>> since there can always be javascript that periodically updates the
>> page.  At the moment it waits until the page hasn't changed for a
>> configurable amount of time.  I'm planning to add a check for specific
>> content to determine if the page is considered loaded.
>>
>> 3. Add some documentation to the readme :-)
>>
>>
>>
>>> -----
>>> Cheers,
>>> Sean
>> I'll let you know when I have a new version available (hopefully in
>> the next few days).
>>
>>
>> Thanks again,
>> Alistair
>



Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Offray Vladimir Luna Cárdenas-2
The last was a question :-P Is PharoChrome expecting to be logged in to
some Google account to work?

Cheers,

Offray


On 14/11/17 18:18, Offray Vladimir Luna Cárdenas wrote:

> Hi Alistair,
>
> The example is not working for me. When I run it, a chrome session is
> open but nothing happens there, except that my image gets frozen until I
> close chrome and then I get this message: "ConnectionTimedOut: Cannot
> connect to 127.0.0.1:9222". What is the expected behavior? PharoChrome
> expects the user to have a Google account or be logged in by default to
> work (that would be a shame for those of us that don't a Google account
> and still value our privacy).
>
> Thanks,
>
> Offray
>
>
> On 14/11/17 11:26, Alistair Grant wrote:
>> I've committed some fixes to the development branch:
>>
>> 1. MacOS hopefully works now (I don't have access to the platform, so
>> can't test it).
>> 2. The development version of Beacon is loaded (which is required for
>> the InMemoryLogger).
>> 3. The README is a tiny bit better.
>> 4. Added #extractTables.
>>
>> As an example of how historical stock market data can be extracted,
>> the following retrieves data for the Australian S&P200 index from
>> yahoo:
>>
>>
>> | rootNode tables historicalData dataFrame |
>>
>> rootNode := GoogleChrome get:
>> 'https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO'.
>> tables := rootNode extractTables.
>> historicalData := (tables sorted: #size ascending) last.
>> dataFrame := DataFrame fromRows: (historicalData select: [ :each |
>> each size = 7 ]).
>> dataFrame asStringTable.
>>
>> "
>>      |  1             2         3         4         5         6
>>     7
>> -----+-----------------------------------------------------------------------------
>> 1    |  Date          Open      High      Low       Close*    Adj
>> Close**  Volume
>> 2    |  Nov 14, 2017  6,021.80  6,021.80  5,957.10  5,966.00  5,966.00
>>     -
>> 3    |  Nov 13, 2017  6,029.40  6,029.40  6,010.70  6,021.80  6,021.80
>>     -
>> 4    |  Nov 10, 2017  6,049.40  6,049.40  6,020.70  6,029.40  6,029.40
>>     -
>> etc.
>> "
>>
>>
>> To load the development version on MacOS or Linux in a 32 bit image:
>>
>> "Assuming you don't have OSProcess loaded:"
>> Metacello new
>>     configuration: 'OSSubprocess';
>>     repository: 'github://marianopeck/OSSubprocess:master/repository';
>>     version: #stable;
>>     load.
>>
>> Metacello new
>>     baseline: 'Chrome';
>>     repository: 'github://akgrant43/Pharo-Chrome:development/repository';
>>     load.
>>
>>
>> Cheers,
>> Alistair
>>
>>
>> On 12 November 2017 at 20:09, Alistair Grant <[hidden email]> wrote:
>>> Hi Sean,
>>>
>>> Thanks for your feedback!  (responses below)
>>>
>>>
>>> On 12 November 2017 at 18:11, Sean P. DeNigris <[hidden email]> wrote:
>>>> Alistair Grant wrote
>>>>> https://github.com/akgrant43/Pharo-Chrome
>>>> Wow, that was a wild ride!
>>> Sorry about that.
>>>
>>>
>>>> Lessons learned along the way:
>>>> 1. On a Mac, to use the snazzy `chrome` terminal command referenced all over
>>>> the place in the docs, you must first `alias chrome="/Applications/Google\
>>>> Chrome.app/Contents/MacOS/Google\ Chrome"`
>>> I'm an Ubuntu Linux user, however if you look at OSXChromePlatform
>>> class>>defaultExecutableLocation you can see that is where it should
>>> be looking for the exe, so the alias shouldn't really be necessary.
>>> Torsten wrote this, so maybe has more insight.
>>>
>>>
>>>> 2. Chrome must be started with certain flags: `chrome
>>>> --remote-debugging-port=9222 --disable-gpu` (not sure if the last flag is
>>>> needed, but `#get:` seemed to hang before using; reference
>>>> https://developers.google.com/web/updates/2017/04/headless-chrome)
>>> I've been using this without headless mode.  I'll add a headless flag
>>> that also disables the gpu.
>>>
>>>
>>>
>>>> 3. Beacon has renamed InMemoryLogger to MemoryLogger
>>>> 4. I guess Beacon has renamed `#log` to `#emit`
>>> Sorry about that.  I didn't realise that the Pharo-Chrome baseline is
>>> loading Beacon stable while my install script upgrades it to
>>> #development.  #development is more recent, so I'll update the
>>> baseline.
>>>
>>>
>>>
>>>> 5. I had to comment out `chromeProcess sigterm.` because `chromeProcess` was
>>>> nil and also #sigterm seemed not to be defined anywhere in the image. I'm
>>>> not sure what the issue is there.
>>> chromeProcess is set in GoogleChrome>>openURL:.  Can you give me a
>>> small example that demonstrates the problem?
>>>
>>> #sigterm is implemented by OSSUnixSubprocess, which is what I
>>> ultimately use to launch the Chrome process on Ubuntu.
>>>
>>> But... this will be broken on Mac at the moment because the current
>>> method of launching chrome doesn't keep track of the process, so
>>> doesn't support #sigterm.  Do you know if OSSUnixSubprocess works on
>>> Mac?  If it does, I can update the code (but not test it :-().
>>>
>>>
>>>> Pull request issued for #3 & #4.
>>> Once I update the baseline this shouldn't be required.
>>>
>>>
>>>> Also, I'm not sure what platforms you
>>>> support, but you may want to tag the example methods with <gtExample> or
>>>> similar so that they are runnable from the browser and open an inspector if
>>>> there is an interesting return value.
>>> Good idea, I'll do this.
>>>
>>> I'm also making a few other changes:
>>>
>>> 1. Add an #extractTables method that searches through the page and
>>> returns an array of rows for each table it finds in the page
>>> (something that can easily be loaded in to DataFrame using #fromRows:,
>>> but I don't want to make Pharo-Chrome dependent on DataFrame at the
>>> moment).  Most of the time I use Pharo-Chrome it is extracting data
>>> from tables.
>>>
>>> 2. I don't know of any reliable way to tell when a page has loaded
>>> since there can always be javascript that periodically updates the
>>> page.  At the moment it waits until the page hasn't changed for a
>>> configurable amount of time.  I'm planning to add a check for specific
>>> content to determine if the page is considered loaded.
>>>
>>> 3. Add some documentation to the readme :-)
>>>
>>>
>>>
>>>> -----
>>>> Cheers,
>>>> Sean
>>> I'll let you know when I have a new version available (hopefully in
>>> the next few days).
>>>
>>>
>>> Thanks again,
>>> Alistair
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Sean P. DeNigris
Administrator
In reply to this post by Alistair Grant
Alistair Grant wrote
> This looks like you are using an old (cached?) version.

Ugh, yes. I just deleted the local clone and let Iceberg reclone.

Now when I tried:
    `GoogleChrome get:
'https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO'`
I got:
    Error: Error: posix_spawn(), code: 2, description: No such file or
directory
Even though pasting the command into Terminal successfully launched Chrome.

BTW I had to insert a leading / to into the executable location.



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Mariano Martinez Peck
If this is a problem with OSSubprocess I am happy to help it debug it, but please share with me the exact steps to reproduce it and which code to look at. And which OS and which Pharo. And it should be 32 bits (OSSubprocess doesn't work on 64 yet)

Thanks,

On Tue, Nov 14, 2017 at 9:47 PM, Sean P. DeNigris <[hidden email]> wrote:
Alistair Grant wrote
> This looks like you are using an old (cached?) version.

Ugh, yes. I just deleted the local clone and let Iceberg reclone.

Now when I tried:
    `GoogleChrome get:
'<a href="https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO&#39;`" rel="noreferrer" target="_blank">https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO'`
I got:
    Error: Error: posix_spawn(), code: 2, description: No such file or
directory
Even though pasting the command into Terminal successfully launched Chrome.

BTW I had to insert a leading / to into the executable location.



--
Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
In reply to this post by Offray Vladimir Luna Cárdenas-2
Hi Offray,

On 15 November 2017 at 00:18, Offray Vladimir Luna Cárdenas
<[hidden email]> wrote:
> Hi Alistair,
>
> The example is not working for me. When I run it, a chrome session is
> open but nothing happens there, except that my image gets frozen until I
> close chrome and then I get this message: "ConnectionTimedOut: Cannot
> connect to 127.0.0.1:9222". What is the expected behavior?

I'm not sure why this is happening.  Chrome only allows one instance
per profile to be running, however the example should be creating its
own profile (which is what is done when GoogleChrome>>debugSession is
sent).

Can you check that the profile directory is being created:
/tmp/pharo/GoogleChrome/debugSession/

Also, you should have several processes running, similar to:

alistair 11001  6953  3 07:34 pts/19   00:00:57
/opt/google/chrome/chrome
--user-data-dir=/tmp/pharo/GoogleChrome/debugSession
--remote-debugging-port=9222
alistair 11005 11001  0 07:34 pts/19   00:00:00
/opt/google/chrome/chrome --type=zygote
--enable-crash-reporter=9472c7b5-b817-49a9-a2df-266ef87a1707,unknown
--user-data-dir=/tmp/pharo/GoogleChrome/debugSession
alistair 11009 11005  0 07:34 pts/19   00:00:00
/opt/google/chrome/chrome --type=zygote
--enable-crash-reporter=9472c7b5-b817-49a9-a2df-266ef87a1707,unknown
--user-data-dir=/tmp/pharo/GoogleChrome/debugSession
alistair 11193 11009  6 07:35 pts/19   00:01:51
/opt/google/chrome/chrome --type=renderer
--field-trial-handle=13786453131923986905,2801831905294320914,131072
--service-pipe-token=4E2DA31A2AA7D6D8585A99928CABF01B --lang=en-GB
--enable-crash-reporter=9472c7b5-b817-49a9-a2df-266ef87a1707,unknown
--user-data-dir=/tmp/pharo/GoogleChrome/debugSession
--enable-offline-auto-reload --enable-offline-auto-reload-visible-only
--enable-pinch --num-raster-threads=2
--enable-main-frame-before-activation
--content-image-texture-target=(lots of numbers removed)

You can see that the first process has the separate profile
(--user-data-dir) and remote debugging enabled.  The last process
listed above is the one rendering the page (I ran GoogleChrome
class>>exampleNavigation to get this).

Maybe as a last resort you could try ensuring that no other instances
of chrome are running before you try the example.



> PharoChrome
> expects the user to have a Google account or be logged in by default to
> work (that would be a shame for those of us that don't a Google account
> and still value our privacy).

No, by default it won't be logged in (since it is creating a separate profile).

Thanks,
Alistair



> Thanks,
>
> Offray

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
In reply to this post by Sean P. DeNigris
Hi Sean,

Sorry (and to Offray) for the trouble, but thanks for persevering.

On 15 November 2017 at 01:47, Sean P. DeNigris <[hidden email]> wrote:

> Alistair Grant wrote
>> This looks like you are using an old (cached?) version.
>
> Ugh, yes. I just deleted the local clone and let Iceberg reclone.
>
> Now when I tried:
>     `GoogleChrome get:
> 'https://finance.yahoo.com/quote/%5EAXJO/history?p=%5EAXJO'`
> I got:
>     Error: Error: posix_spawn(), code: 2, description: No such file or
> directory
> Even though pasting the command into Terminal successfully launched Chrome.
>
> BTW I had to insert a leading / to into the executable location.

Would you mind setting a breakpoint in
AKGOSProcess>>command:arguments:, printing the command and arguments
and making sure that the --user-data-dir exists?  (I'm not familiar
with MacOS and am wondering if maybe there is some sandboxing causing
trouble).

Also, as Mariano requested, can you confirm that it is MacOS, which
version of Pharo, and a 32 bit VM?

Thanks,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Sean P. DeNigris
Administrator
Alistair Grant wrote
> Sorry (and to Offray) for the trouble, but thanks for persevering.

Not at all! Thanks for updating the library :)


Alistair Grant wrote
> Would you mind setting a breakpoint in
> AKGOSProcess>>command:arguments:, printing the command and arguments
> and making sure that the --user-data-dir exists?

Sure:
command = '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome'
"(after I added the leading slash)".
args = #('--user-data-dir=/tmp/pharo/GoogleChrome/debugSession'
'--remote-debugging-port=9222')


Alistair Grant wrote
> and making sure that the --user-data-dir exists?

It does.

Also, as a sanity check, the following works:
    OSSUnixSubprocess new
        command: 'open';
        arguments: { 'http://www.pharo.org' };
        run
As well as from the Terminal:
$ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
--user-data-dir=/tmp/pharo/GoogleChrome/debugSession
--remote-debugging-port=9222


Alistair Grant wrote
> Also, as Mariano requested, can you confirm that it is MacOS, which
> version of Pharo, and a 32 bit VM?

I think I'm on 32 bit. I created a fresh 6.1 from Launcher. How can one tell
for sure if one is in a 32 vs 64 bit image?

Latest update: #60520

Operating System/Hardware
-------------------------
Mac OS 1013.1 intel

Virtual Machine
---------------
/Users/sean/Documents/Pharo/vms/61-x86/Pharo.app/Contents/MacOS/Pharo
CoInterpreter VMMaker.oscog-eem.2254 uuid:
4f2c2cce-f4a2-469a-93f1-97ed941df0ad Jul 20 2017
StackToRegisterMappingCogit VMMaker.oscog-eem.2252 uuid:
2f3e9b0e-ecd3-4adf-b092-cce2e2587a5c Jul 20 2017
VM: 201707201942 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
Date: Thu Jul 20 12:42:21 2017 -0700 $ Plugins: 201707201942
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

Mac OS X built on Jul 20 2017 21:45:23 UTC Compiler: 4.2.1 Compatible Apple
LLVM 6.1.0 (clang-602.0.53)
VMMaker versionString VM: 201707201942
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Thu Jul 20
12:42:21 2017 -0700 $ Plugins: 201707201942
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
CoInterpreter VMMaker.oscog-eem.2254 uuid:
4f2c2cce-f4a2-469a-93f1-97ed941df0ad Jul 20 2017
StackToRegisterMappingCogit VMMaker.oscog-eem.2252 uuid:
2f3e9b0e-ecd3-4adf-b092-cce2e2587a5c Jul 20 2017



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
Hi Sean,

On 15 November 2017 at 10:23, Sean P. DeNigris <[hidden email]> wrote:

> Alistair Grant wrote
>> Sorry (and to Offray) for the trouble, but thanks for persevering.
>
> Not at all! Thanks for updating the library :)
>
>
> Alistair Grant wrote
>> Would you mind setting a breakpoint in
>> AKGOSProcess>>command:arguments:, printing the command and arguments
>> and making sure that the --user-data-dir exists?
>
> Sure:
> command = '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome'
> "(after I added the leading slash)".

I've added the slash, so hopefully you won't have to do this again.


> args = #('--user-data-dir=/tmp/pharo/GoogleChrome/debugSession'
> '--remote-debugging-port=9222')
>
>
> Alistair Grant wrote
>> and making sure that the --user-data-dir exists?
>
> It does.
>
> Also, as a sanity check, the following works:
>     OSSUnixSubprocess new
>         command: 'open';
>         arguments: { 'http://www.pharo.org' };
>         run
> As well as from the Terminal:
> $ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
> --user-data-dir=/tmp/pharo/GoogleChrome/debugSession
> --remote-debugging-port=9222
>
>
> Alistair Grant wrote
>> Also, as Mariano requested, can you confirm that it is MacOS, which
>> version of Pharo, and a 32 bit VM?
>
> I think I'm on 32 bit. I created a fresh 6.1 from Launcher. How can one tell
> for sure if one is in a 32 vs 64 bit image?

OSPlatform current isUnix32

But if the OSSUnixSubprocess command above is working it must be 32 bit.

I'm struggling to figure this one out.  Sorry for making you do all
the work, but the only things I can think of at the moment are:

1. If chrome is your default browser, can you try replacing the
explicit command with "open" since it seems to work above, i.e. in
OSXChromePlatform class>>openChromeWith: just set executableLocation
:= 'open'.

2. Try just opening the browser without any optional arguments, i,e,

OSSUnixSubprocess new
    command: '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome';
    arguments: { 'http://www.pharo.org' };
    run



> Latest update: #60520
>
> Operating System/Hardware
> -------------------------
> Mac OS 1013.1 intel
>
> Virtual Machine
> ---------------
> /Users/sean/Documents/Pharo/vms/61-x86/Pharo.app/Contents/MacOS/Pharo
> CoInterpreter VMMaker.oscog-eem.2254 uuid:
> 4f2c2cce-f4a2-469a-93f1-97ed941df0ad Jul 20 2017
> StackToRegisterMappingCogit VMMaker.oscog-eem.2252 uuid:
> 2f3e9b0e-ecd3-4adf-b092-cce2e2587a5c Jul 20 2017
> VM: 201707201942 https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
> Date: Thu Jul 20 12:42:21 2017 -0700 $ Plugins: 201707201942
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
>
> Mac OS X built on Jul 20 2017 21:45:23 UTC Compiler: 4.2.1 Compatible Apple
> LLVM 6.1.0 (clang-602.0.53)
> VMMaker versionString VM: 201707201942
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Thu Jul 20
> 12:42:21 2017 -0700 $ Plugins: 201707201942
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git $
> CoInterpreter VMMaker.oscog-eem.2254 uuid:
> 4f2c2cce-f4a2-469a-93f1-97ed941df0ad Jul 20 2017
> StackToRegisterMappingCogit VMMaker.oscog-eem.2252 uuid:
> 2f3e9b0e-ecd3-4adf-b092-cce2e2587a5c Jul 20 2017
>
>
>
> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Sean P. DeNigris
Administrator
Alistair Grant wrote
> Sorry for making you do all the work

Not at all; happy to help. It takes a village! BTW I tracked it down to the
spaces in the command path. IIRC from my OSP hacking days, it probably has
something to do with the path not being run through the shell to interpret
the $\s.



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Alistair Grant
Hi Sean,

On 15 November 2017 at 23:04, Sean P. DeNigris <[hidden email]> wrote:
> Alistair Grant wrote
>> Sorry for making you do all the work
>
> Not at all; happy to help. It takes a village! BTW I tracked it down to the
> spaces in the command path. IIRC from my OSP hacking days, it probably has
> something to do with the path not being run through the shell to interpret
> the $\s.

I'm glad (and relieved :-)) to hear that it is working.

Would you mind sending the modified command path that you're using so
I can update the code?  (I guess that it is just removing the
backspaces, but just in case...).

Thanks!
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: Pharo-Chrome (was: Soup bug(fix))

Sean P. DeNigris
Administrator
Alistair Grant wrote
> I'm glad (and relieved :-)) to hear that it is working.
>
> Would you mind sending the modified command path that you're using so
> I can update the code?  (I guess that it is just removing the
> backspaces, but just in case...).

That is correct…

OSSUnixSubprocess new
        command: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome';
        arguments: #('--user-data-dir=/tmp/pharo/GoogleChrome/debugSession'
'--remote-debugging-port=9222');
        run.



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
12