Coding XPath as Smalltalk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Coding XPath as Smalltalk

Peter Kenny

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.)

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

cedreek
Hi Peter,

Never used Path so I cannot help there. I just wander if you connote use Soup to « dissect » your webpages ?

HTH,

Cédrik

Le 1 sept. 2016 à 15:26, PBKResearch <[hidden email]> a écrit :

Hello
 
I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
 
document xPath: '//div[@id=''catlinks'']//li//text()'
 
and found that an equivalent is:
 
document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
(I had to put two dummy arguments in the three-argument block to get it to work.) 
 
Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
 
Many thanks for any help.
 
Peter Kenny

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

Peter Kenny

Hi Cédrik

 

I started out using Soup, but I found out that it does what its name suggests, and jumbles up the contents of the pages. I now parse the pages with XMLHTMLParser, which preserves the original structure exactly. The point of XPath is that it is a convenient way of specifying a route through the structure to the desired information. So the XPath I cited says ‘find a DIV node, at any depth, which has id=”catlinks”, then find a descendant which is a LI node, then find the text of any descendants.’

 

Peter

 

 

Hi Peter,

 

Never used Path so I cannot help there. I just wander if you connote use Soup to « dissect » your webpages ?

 

HTH,

 

Cédrik

 

Le 1 sept. 2016 à 15:26, PBKResearch <[hidden email]> a écrit :

 

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.) 

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

hernanmd
In reply to this post by Peter Kenny
Hi Peter,

2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary.


Any specific reason to not use the SPARQL endpoint?

 

Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.


Which XPath version are you using? How did you installed it?

 

I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.)

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

Peter Kenny

Hi Hernan

 

I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.

 

I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.

 

Peter

 

From: Pharo-users [mailto:[hidden email]] On Behalf Of Hernán Morales Durand
Sent: 01 September 2016 18:54
To: Any question about pharo is welcome <[hidden email]>
Subject: Re: [Pharo-users] Coding XPath as Smalltalk

 

Hi Peter,

 

2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary.

 

Any specific reason to not use the SPARQL endpoint?


 

Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.

 

Which XPath version are you using? How did you installed it?


 

I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.)

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

hernanmd


2016-09-01 16:51 GMT-03:00 PBKResearch <[hidden email]>:

Hi Hernan

 

I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.

 


You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
 

I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.

 


I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.

Hernán
 

Peter

 

From: Pharo-users [mailto:[hidden email]] On Behalf Of Hernán Morales Durand
Sent: 01 September 2016 18:54
To: Any question about pharo is welcome <[hidden email]>
Subject: Re: [Pharo-users] Coding XPath as Smalltalk

 

Hi Peter,

 

2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary.

 

Any specific reason to not use the SPARQL endpoint?


 

Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.

 

Which XPath version are you using? How did you installed it?


 

I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.)

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

 


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

monty-3
In reply to this post by Peter Kenny
Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)

The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.

Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.

> From: PBKResearch <[hidden email]>
> To: [hidden email]
> Subject: [Pharo-users] Coding XPath as Smalltalk
>
> Hello
>  
> I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>  
> document xPath: '//div[@id=''catlinks'']//li//text()'
>  
> and found that an equivalent is:
>  
> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
> (I had to put two dummy arguments in the three-argument block to get it to work.)
>  
> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>  
> Many thanks for any help.
>  
> Peter Kenny

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

monty-3
In reply to this post by hernanmd
 
Hernan, the PharoExtras/XPath repo has a major rewrite of your package to support all of XPath 1.0 + XPath 2.0 extensions like the element() and attribute() type tests and namespace literals in name tests like '{namespaceURI}localName'. A rewrite was needed because the old lib only implemented a small subset of the spec and would infinite loop on some inputs.
 
Sent: Thursday, September 01, 2016 at 3:56 PM
From: "Hernán Morales Durand" <[hidden email]>
To: "Any question about pharo is welcome" <[hidden email]>
Subject: Re: [Pharo-users] Coding XPath as Smalltalk
 
 
2016-09-01 16:51 GMT-03:00 PBKResearch <peter@...>:

Hi Hernan

 

I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.

 

 
You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
 

 

I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.

 

 
I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.
 
Hernán
 

 

Peter

 

From: Pharo-users [mailto:pharo-users-bounces@...] On Behalf Of Hernán Morales Durand
Sent: 01 September 2016 18:54
To: Any question about pharo is welcome <pharo-users@...>
Subject: Re: [Pharo-users] Coding XPath as Smalltalk

 

Hi Peter,

 

2016-09-01 10:26 GMT-03:00 PBKResearch <peter@...>:

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary.

 

Any specific reason to not use the SPARQL endpoint?


 

Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.

 

Which XPath version are you using? How did you installed it?


 

I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.)

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

stepharo
In reply to this post by monty-3
Hi monty

In which repository this maintained version is?

PharoExtras?

Is it the entry in the catalog?

Stef



Le 3/9/16 à 07:54, monty a écrit :

> Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)
>
> The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.
>
> Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.
>
>> From: PBKResearch <[hidden email]>
>> To: [hidden email]
>> Subject: [Pharo-users] Coding XPath as Smalltalk
>>
>> Hello
>>  
>> I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>>  
>> document xPath: '//div[@id=''catlinks'']//li//text()'
>>  
>> and found that an equivalent is:
>>  
>> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
>> (I had to put two dummy arguments in the three-argument block to get it to work.)
>>  
>> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>>  
>> Many thanks for any help.
>>  
>> Peter Kenny
>


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

hernanmd
In reply to this post by monty-3
Thank you Monty for the clarification. I should say the original XPath package was written by Phil Hargett and I just added a couple of methods. Glad you rewrote the lib!
Cheers,

Hernán


2016-09-03 3:01 GMT-03:00 monty <[hidden email]>:
 
Hernan, the PharoExtras/XPath repo has a major rewrite of your package to support all of XPath 1.0 + XPath 2.0 extensions like the element() and attribute() type tests and namespace literals in name tests like '{namespaceURI}localName'. A rewrite was needed because the old lib only implemented a small subset of the spec and would infinite loop on some inputs.
 
Sent: Thursday, September 01, 2016 at 3:56 PM
From: "Hernán Morales Durand" <[hidden email]>

To: "Any question about pharo is welcome" <[hidden email]>
Subject: Re: [Pharo-users] Coding XPath as Smalltalk
 
 
2016-09-01 16:51 GMT-03:00 PBKResearch <peter@...>:

Hi Hernan

 

I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.

 

 
You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
 

 

I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.

 

 
I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.
 
Hernán
 

 

Peter

 

From: Pharo-users [mailto:pharo-users-bounces@lists.pharo.org] On Behalf Of Hernán Morales Durand
Sent: 01 September 2016 18:54
To: Any question about pharo is welcome <pharo-users@...>
Subject: Re: [Pharo-users] Coding XPath as Smalltalk

 

Hi Peter,

 

2016-09-01 10:26 GMT-03:00 PBKResearch <peter@...>:

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary.

 

Any specific reason to not use the SPARQL endpoint?


 

Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.

 

Which XPath version are you using? How did you installed it?


 

I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to work.)

 

Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

 


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

Tudor Girba-2
In reply to this post by stepharo
Hi,

This is the latest stable version:

        spec
                name: 'XPath';
                className: #ConfigurationOfXPath;
                versionString: #stable;
                repository: 'http://www.smalltalkhub.com/mc/PharoExtras/XPath/main'.

Doru


> On Sep 3, 2016, at 8:02 AM, stepharo <[hidden email]> wrote:
>
> Hi monty
>
> In which repository this maintained version is?
>
> PharoExtras?
>
> Is it the entry in the catalog?
>
> Stef
>
>
>
> Le 3/9/16 à 07:54, monty a écrit :
>> Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)
>>
>> The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.
>>
>> Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.
>>
>>> From: PBKResearch <[hidden email]>
>>> To: [hidden email]
>>> Subject: [Pharo-users] Coding XPath as Smalltalk
>>>
>>> Hello
>>>  I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>>>  document xPath: '//div[@id=''catlinks'']//li//text()'
>>>  and found that an equivalent is:
>>>  document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
>>> (I had to put two dummy arguments in the three-argument block to get it to work.)
>>>  Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>>>  Many thanks for any help.
>>>  Peter Kenny
>>
>
>

--
www.tudorgirba.com
www.feenk.com

"What we can governs what we wish."





Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

Tudor Girba-2
In reply to this post by hernanmd
Hi,

Indeed, Monty is doing a great job at maintaining and evolving the XML support.

Cheers,
Doru


> On Sep 3, 2016, at 8:06 AM, Hernán Morales Durand <[hidden email]> wrote:
>
> Thank you Monty for the clarification. I should say the original XPath package was written by Phil Hargett and I just added a couple of methods. Glad you rewrote the lib!
> Cheers,
>
> Hernán
>
>
> 2016-09-03 3:01 GMT-03:00 monty <[hidden email]>:
>  
> Hernan, the PharoExtras/XPath repo has a major rewrite of your package to support all of XPath 1.0 + XPath 2.0 extensions like the element() and attribute() type tests and namespace literals in name tests like '{namespaceURI}localName'. A rewrite was needed because the old lib only implemented a small subset of the spec and would infinite loop on some inputs.
>  
> Sent: Thursday, September 01, 2016 at 3:56 PM
> From: "Hernán Morales Durand" <[hidden email]>
>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>  
>  
> 2016-09-01 16:51 GMT-03:00 PBKResearch <[hidden email]>:
> Hi Hernan
>
>  
> I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.
>
>  
>  
> You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
>  
>  
> I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.
>
>  
>  
> I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.
>  
> Hernán
>  
>  
> Peter
>
>  
> From: Pharo-users [mailto:[hidden email]] On Behalf Of Hernán Morales Durand
> Sent: 01 September 2016 18:54
> To: Any question about pharo is welcome <[hidden email]>
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>
>  
> Hi Peter,
>
>  
> 2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:
>
> Hello
>
>  
> I am using XPath as a way of dissecting web pages, especially from Wiktionary.
>
>  
> Any specific reason to not use the SPARQL endpoint?
>
>
>  
>
> Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.
>
>  
> Which XPath version are you using? How did you installed it?
>
>
>  
>
> I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>
>  
> document xPath: '//div[@id=''catlinks'']//li//text()'
>
>  
> and found that an equivalent is:
>
>  
> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
>
> (I had to put two dummy arguments in the three-argument block to get it to work.)
>
>  
> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>
>  
> Many thanks for any help.
>
>  
> Peter Kenny
>
>  
>  
>

--
www.tudorgirba.com
www.feenk.com

“Live like you mean it."


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

monty-3
In reply to this post by stepharo
> Sent: Saturday, September 03, 2016 at 2:02 AM
> From: stepharo <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>
> Hi monty
>
> In which repository this maintained version is?

PharoExtras/XPath (you gave me the write access).

 
> PharoExtras?
>
> Is it the entry in the catalog?

It has a catalog entry at http://catalog.pharo.org and a CI job at https://ci.inria.fr/pharo-contribution/job/XPath/

>
> Stef
>
>
>
> Le 3/9/16 à 07:54, monty a écrit :
> > Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)
> >
> > The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.
> >
> > Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.
> >
> >> From: PBKResearch <[hidden email]>
> >> To: [hidden email]
> >> Subject: [Pharo-users] Coding XPath as Smalltalk
> >>
> >> Hello
> >>  
> >> I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
> >>  
> >> document xPath: '//div[@id=''catlinks'']//li//text()'
> >>  
> >> and found that an equivalent is:
> >>  
> >> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
> >> (I had to put two dummy arguments in the three-argument block to get it to work.)
> >>  
> >> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
> >>  
> >> Many thanks for any help.
> >>  
> >> Peter Kenny
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

stepharo
In reply to this post by Tudor Girba-2
+ 1000000


Le 3/9/16 à 08:17, Tudor Girba a écrit :

> Hi,
>
> Indeed, Monty is doing a great job at maintaining and evolving the XML support.
>
> Cheers,
> Doru
>
>
>> On Sep 3, 2016, at 8:06 AM, Hernán Morales Durand <[hidden email]> wrote:
>>
>> Thank you Monty for the clarification. I should say the original XPath package was written by Phil Hargett and I just added a couple of methods. Glad you rewrote the lib!
>> Cheers,
>>
>> Hernán
>>
>>
>> 2016-09-03 3:01 GMT-03:00 monty <[hidden email]>:
>>  
>> Hernan, the PharoExtras/XPath repo has a major rewrite of your package to support all of XPath 1.0 + XPath 2.0 extensions like the element() and attribute() type tests and namespace literals in name tests like '{namespaceURI}localName'. A rewrite was needed because the old lib only implemented a small subset of the spec and would infinite loop on some inputs.
>>  
>> Sent: Thursday, September 01, 2016 at 3:56 PM
>> From: "Hernán Morales Durand" <[hidden email]>
>>
>> To: "Any question about pharo is welcome" <[hidden email]>
>> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>>  
>>  
>> 2016-09-01 16:51 GMT-03:00 PBKResearch <[hidden email]>:
>> Hi Hernan
>>
>>  
>> I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.
>>
>>  
>>  
>> You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
>>  
>>  
>> I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.
>>
>>  
>>  
>> I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.
>>  
>> Hernán
>>  
>>  
>> Peter
>>
>>  
>> From: Pharo-users [mailto:[hidden email]] On Behalf Of Hernán Morales Durand
>> Sent: 01 September 2016 18:54
>> To: Any question about pharo is welcome <[hidden email]>
>> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>>
>>  
>> Hi Peter,
>>
>>  
>> 2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:
>>
>> Hello
>>
>>  
>> I am using XPath as a way of dissecting web pages, especially from Wiktionary.
>>
>>  
>> Any specific reason to not use the SPARQL endpoint?
>>
>>
>>  
>>
>> Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.
>>
>>  
>> Which XPath version are you using? How did you installed it?
>>
>>
>>  
>>
>> I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>>
>>  
>> document xPath: '//div[@id=''catlinks'']//li//text()'
>>
>>  
>> and found that an equivalent is:
>>
>>  
>> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
>>
>> (I had to put two dummy arguments in the three-argument block to get it to work.)
>>
>>  
>> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>>
>>  
>> Many thanks for any help.
>>
>>  
>> Peter Kenny
>>
>>  
>>  
>>
> --
> www.tudorgirba.com
> www.feenk.com
>
> “Live like you mean it."
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

stepharo
In reply to this post by monty-3


Le 3/9/16 à 08:41, monty a écrit :
>> Sent: Saturday, September 03, 2016 at 2:02 AM
>> From: stepharo <[hidden email]>
>> To: "Any question about pharo is welcome" <[hidden email]>
>> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>>
>> Hi monty
>>
>> In which repository this maintained version is?
> PharoExtras/XPath (you gave me the write access).

Excellent.
I like the idea that we all share an improve common identifiable places.
>
>  
>> PharoExtras?
>>
>> Is it the entry in the catalog?
> It has a catalog entry at http://catalog.pharo.org and a CI job at https://ci.inria.fr/pharo-contribution/job/XPath/

super cool!
I'm happy

>
>> Stef
>>
>>
>>
>> Le 3/9/16 à 07:54, monty a écrit :
>>> Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)
>>>
>>> The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.
>>>
>>> Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.
>>>
>>>> From: PBKResearch <[hidden email]>
>>>> To: [hidden email]
>>>> Subject: [Pharo-users] Coding XPath as Smalltalk
>>>>
>>>> Hello
>>>>    
>>>> I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>>>>    
>>>> document xPath: '//div[@id=''catlinks'']//li//text()'
>>>>    
>>>> and found that an equivalent is:
>>>>    
>>>> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
>>>> (I had to put two dummy arguments in the three-argument block to get it to work.)
>>>>    
>>>> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>>>>    
>>>> Many thanks for any help.
>>>>    
>>>> Peter Kenny
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

Sven Van Caekenberghe-2
In reply to this post by Tudor Girba-2

> On 03 Sep 2016, at 08:17, Tudor Girba <[hidden email]> wrote:
>
> Hi,
>
> Indeed, Monty is doing a great job at maintaining and evolving the XML support.

Yes indeed !

> Cheers,
> Doru
>
>
>> On Sep 3, 2016, at 8:06 AM, Hernán Morales Durand <[hidden email]> wrote:
>>
>> Thank you Monty for the clarification. I should say the original XPath package was written by Phil Hargett and I just added a couple of methods. Glad you rewrote the lib!
>> Cheers,
>>
>> Hernán
>>
>>
>> 2016-09-03 3:01 GMT-03:00 monty <[hidden email]>:
>>
>> Hernan, the PharoExtras/XPath repo has a major rewrite of your package to support all of XPath 1.0 + XPath 2.0 extensions like the element() and attribute() type tests and namespace literals in name tests like '{namespaceURI}localName'. A rewrite was needed because the old lib only implemented a small subset of the spec and would infinite loop on some inputs.
>>
>> Sent: Thursday, September 01, 2016 at 3:56 PM
>> From: "Hernán Morales Durand" <[hidden email]>
>>
>> To: "Any question about pharo is welcome" <[hidden email]>
>> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>>
>>
>> 2016-09-01 16:51 GMT-03:00 PBKResearch <[hidden email]>:
>> Hi Hernan
>>
>>
>> I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.
>>
>>
>>
>> You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
>>
>>
>> I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.
>>
>>
>>
>> I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.
>>
>> Hernán
>>
>>
>> Peter
>>
>>
>> From: Pharo-users [mailto:[hidden email]] On Behalf Of Hernán Morales Durand
>> Sent: 01 September 2016 18:54
>> To: Any question about pharo is welcome <[hidden email]>
>> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>>
>>
>> Hi Peter,
>>
>>
>> 2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:
>>
>> Hello
>>
>>
>> I am using XPath as a way of dissecting web pages, especially from Wiktionary.
>>
>>
>> Any specific reason to not use the SPARQL endpoint?
>>
>>
>>
>>
>> Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.
>>
>>
>> Which XPath version are you using? How did you installed it?
>>
>>
>>
>>
>> I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>>
>>
>> document xPath: '//div[@id=''catlinks'']//li//text()'
>>
>>
>> and found that an equivalent is:
>>
>>
>> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
>>
>> (I had to put two dummy arguments in the three-argument block to get it to work.)
>>
>>
>> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>>
>>
>> Many thanks for any help.
>>
>>
>> Peter Kenny
>>
>>
>>
>>
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> “Live like you mean it."
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

Peter Kenny
In reply to this post by monty-3
Hi Monty

Many thanks. I have picked up a project that I had not worked on for a while, which explains why I am using an old image. I shall try the latest Moose image, as you suggest. My only anxiety is that I need to be able to use a rather ancient package called TextLint, and I do not know whether it will load OK in a new Pharo. If not, I shall try to update my existing image.

With the latest XPath, will it be clear how to use the binary syntax to carry out node tests like the example of '//div[@id=''catlinks'']//' that I cited below? The case I am interested in is where the actual identifier ('catlinks' in this case) is a variable rather than a constant. It would be possible to do it in standard XPath by assembling the XPath string with a variable component, but it might be more convenient in the binary syntax.

Many thanks for your help.

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:[hidden email]] On Behalf Of monty
Sent: 03 September 2016 06:54
To: [hidden email]
Subject: Re: [Pharo-users] Coding XPath as Smalltalk

Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)

The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.

Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.

> From: PBKResearch <[hidden email]>
> To: [hidden email]
> Subject: [Pharo-users] Coding XPath as Smalltalk
>
> Hello
>  
> I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
>  
> document xPath: '//div[@id=''catlinks'']//li//text()'
>  
> and found that an equivalent is:
>  
> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
> (I had to put two dummy arguments in the three-argument block to get it to work.)
>  
> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
>  
> Many thanks for any help.
>  
> Peter Kenny


Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

monty-3


> Sent: Saturday, September 03, 2016 at 5:30 AM
> From: PBKResearch <[hidden email]>
> To: "'Any question about pharo is welcome'" <[hidden email]>
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>
> Hi Monty
>
> Many thanks. I have picked up a project that I had not worked on for a while, which explains why I am using an old image. I shall try the latest Moose image, as you suggest. My only anxiety is that I need to be able to use a rather ancient package called TextLint, and I do not know whether it will load OK in a new Pharo. If not, I shall try to update my existing image.

If you'd looked at CI job, you'd see that XPath builds on Pharo 5 through 3 (but should work back to 1.4). You can always start fresh with a clean, old image from http://files.pharo.org/image/ or the Moose website if TextLint doesn't work anymore.

> With the latest XPath, will it be clear how to use the binary syntax to carry out node tests like the example of '//div[@id=''catlinks'']//' that I cited below? The case I am interested in is where the actual identifier ('catlinks' in this case) is a variable rather than a constant. It would be possible to do it in standard XPath by assembling the XPath string with a variable component, but it might be more convenient in the binary syntax.
>

You could do this:
 ((doc // 'div') select: [:each | (each attributeAt: 'id') = catlinks]) // 'li' // 'text()'

where "catlinks" is a var. Or you could use xPath:context: with an XPath var that you dynamically bind using custom contexts:
 doc
     xPath: '//div[@id=$catlinks]//li//text()'
     context: (XPathContext variables: {'catlinks' -> catlinks})

The advantage over this:
 doc xPath: '//div[@id=''', catlinks, ''']//li//text()'

is that the xPath: expression string is the same each time, so it's only compiled once, the first time, and cached for later uses (inspect 'XPath compiledXPathCache') instead of being compiled each time the xPath: expression string arg changes.

> Many thanks for your help.
>
> Peter Kenny
>
> -----Original Message-----
> From: Pharo-users [mailto:[hidden email]] On Behalf Of monty
> Sent: 03 September 2016 06:54
> To: [hidden email]
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>
> Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)
>
> The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.
>
> Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.
>
> > From: PBKResearch <[hidden email]>
> > To: [hidden email]
> > Subject: [Pharo-users] Coding XPath as Smalltalk
> >
> > Hello
> >  
> > I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
> >  
> > document xPath: '//div[@id=''catlinks'']//li//text()'
> >  
> > and found that an equivalent is:
> >  
> > document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
> > (I had to put two dummy arguments in the three-argument block to get it to work.)
> >  
> > Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
> >  
> > Many thanks for any help.
> >  
> > Peter Kenny
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

monty-3
In reply to this post by Sven Van Caekenberghe-2
Thanks!

> Sent: Saturday, September 03, 2016 at 4:31 AM
> From: "Sven Van Caekenberghe" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>
>
> > On 03 Sep 2016, at 08:17, Tudor Girba <[hidden email]> wrote:
> >
> > Hi,
> >
> > Indeed, Monty is doing a great job at maintaining and evolving the XML support.
>
> Yes indeed !
>
> > Cheers,
> > Doru
> >
> >
> >> On Sep 3, 2016, at 8:06 AM, Hernán Morales Durand <[hidden email]> wrote:
> >>
> >> Thank you Monty for the clarification. I should say the original XPath package was written by Phil Hargett and I just added a couple of methods. Glad you rewrote the lib!
> >> Cheers,
> >>
> >> Hernán
> >>
> >>
> >> 2016-09-03 3:01 GMT-03:00 monty <[hidden email]>:
> >>
> >> Hernan, the PharoExtras/XPath repo has a major rewrite of your package to support all of XPath 1.0 + XPath 2.0 extensions like the element() and attribute() type tests and namespace literals in name tests like '{namespaceURI}localName'. A rewrite was needed because the old lib only implemented a small subset of the spec and would infinite loop on some inputs.
> >>
> >> Sent: Thursday, September 01, 2016 at 3:56 PM
> >> From: "Hernán Morales Durand" <[hidden email]>
> >>
> >> To: "Any question about pharo is welcome" <[hidden email]>
> >> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
> >>
> >>
> >> 2016-09-01 16:51 GMT-03:00 PBKResearch <[hidden email]>:
> >> Hi Hernan
> >>
> >>
> >> I don’t understand your first question – I can’t see a connection between SPARQL and what I am doing.
> >>
> >>
> >>
> >> You could get the Wikitionary data by querying a SPARQL endpoint http://wiktionary.dbpedia.org/sparql instead of scrapping web pages (which seems more difficult)
> >>
> >>
> >> I downloaded XPath from http://smalltalkhub.com/mc/PharoExtras/XPath/. However, I am probably using a somewhat out of date version; I downloaded it about a year ago.
> >>
> >>
> >>
> >> I don't know about that version. I copied an old version from SqueakSource (with permission) and updated from time to time, but there is no much. There is also a XPath2 repository which you may try.
> >>
> >> Hernán
> >>
> >>
> >> Peter
> >>
> >>
> >> From: Pharo-users [mailto:[hidden email]] On Behalf Of Hernán Morales Durand
> >> Sent: 01 September 2016 18:54
> >> To: Any question about pharo is welcome <[hidden email]>
> >> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
> >>
> >>
> >> Hi Peter,
> >>
> >>
> >> 2016-09-01 10:26 GMT-03:00 PBKResearch <[hidden email]>:
> >>
> >> Hello
> >>
> >>
> >> I am using XPath as a way of dissecting web pages, especially from Wiktionary.
> >>
> >>
> >> Any specific reason to not use the SPARQL endpoint?
> >>
> >>
> >>
> >>
> >> Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests.
> >>
> >>
> >> Which XPath version are you using? How did you installed it?
> >>
> >>
> >>
> >>
> >> I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
> >>
> >>
> >> document xPath: '//div[@id=''catlinks'']//li//text()'
> >>
> >>
> >> and found that an equivalent is:
> >>
> >>
> >> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
> >>
> >> (I had to put two dummy arguments in the three-argument block to get it to work.)
> >>
> >>
> >> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
> >>
> >>
> >> Many thanks for any help.
> >>
> >>
> >> Peter Kenny
> >>
> >>
> >>
> >>
> >
> > --
> > www.tudorgirba.com
> > www.feenk.com
> >
> > “Live like you mean it."
> >
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Coding XPath as Smalltalk

monty-3
In reply to this post by stepharo


> Sent: Saturday, September 03, 2016 at 2:56 AM
> From: stepharo <[hidden email]>
> To: [hidden email]
> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
>
>
>
> Le 3/9/16 à 08:41, monty a écrit :
> >> Sent: Saturday, September 03, 2016 at 2:02 AM
> >> From: stepharo <[hidden email]>
> >> To: "Any question about pharo is welcome" <[hidden email]>
> >> Subject: Re: [Pharo-users] Coding XPath as Smalltalk
> >>
> >> Hi monty
> >>
> >> In which repository this maintained version is?
> > PharoExtras/XPath (you gave me the write access).
>
> Excellent.
> I like the idea that we all share an improve common identifiable places.

Yes, I also moved XMLParserHTML and XMLParserStAX from PharoExtras/XMLParser to separate PharoExtras repos. PharoExtras could be the place for community-maintained standard libs.

> >
> >  
> >> PharoExtras?
> >>
> >> Is it the entry in the catalog?
> > It has a catalog entry at http://catalog.pharo.org and a CI job at https://ci.inria.fr/pharo-contribution/job/XPath/
>
> super cool!
> I'm happy
>
> >
> >> Stef
> >>
> >>
> >>
> >> Le 3/9/16 à 07:54, monty a écrit :
> >>> Peter, you're using an ancient version with bugs that were fixed last fall. The newest version has more tests and correct behavior (checked against a reference implementation). Just download a new Moose image and you'll get it, along with an up to date XMLParser. (But if you insist on upgrading in your old image, run "XPath initialize" after)
> >>>
> >>> The binary syntax (there are keyword equivalents now) officially only supports XPath axis selectors like #/ and #// that take node test arguments where the node tests can be name tests like 'name,' '*', 'prefix:*' or type tests like 'text()', 'comment()', 'element(name)'.
> >>>
> >>> Filters aren't officially supported with that syntax, but you can always use select: on the result. ?? was removed, but I might add it back as shorthand. Filters are implemented differently now.
> >>>
> >>>> From: PBKResearch <[hidden email]>
> >>>> To: [hidden email]
> >>>> Subject: [Pharo-users] Coding XPath as Smalltalk
> >>>>
> >>>> Hello
> >>>>    
> >>>> I am using XPath as a way of dissecting web pages, especially from Wiktionary. Generally I get good results, but I could get useful extra flexibility by using the binary Smalltalk operators to represent XPath, as mentioned at the end of the class comment for XPath. However, the description there is very terse, and I am having difficulty seeing how to include more complex expressions, especially attribute tests. I have put some of my XPath expressions through the XPath compiler and looked at the output, and out of that I have found expressions which work but look very clumsy. As an example, I have used the fragment:
> >>>>    
> >>>> document xPath: '//div[@id=''catlinks'']//li//text()'
> >>>>    
> >>>> and found that an equivalent is:
> >>>>    
> >>>> document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 'catlinks']//'li'//[:n| n isStringNode]].
> >>>> (I had to put two dummy arguments in the three-argument block to get it to work.)
> >>>>    
> >>>> Is there a more extensive explanation of the use of these binary operators? If not, could some kind person show me the most concise translation of the sample XPath above, to give me a start in working out more complex cases?
> >>>>    
> >>>> Many thanks for any help.
> >>>>    
> >>>> Peter Kenny
> >>
> >>
> >
>
>
>

12