Introducing SimpleXPath

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Introducing SimpleXPath

Steffen Märcker
Hi,

during my work on the XML-Mapping framework SimpleXO, I realized that the  
XML querying code could be useful standalone, too. I factored out a  
library named SimpleXPath and made it available in Cincom public  
repository under the MIT license. It is similar to the XPath location path  
subset (without predicates) but offers some distinct features:

- paths are built as pure Smalltalk expressions
- extended wildcard support
- simple API

Example:
(RootStep // 'source' /@ 'id') "XPath: //source/@id"
        contextNode: anXmlNode;
        nodesDo: [:node | Transcript show: node stringValue; cr].

The above code prints the 'id' value of all 'source' elements in the XML  
document from which anXmlNode is taken.

I am interested in your opinions. I'd be glad If you give it a try and  
discuss your thoughts here. Below I've attached the package comment  
explaining the API, just in case. ;)

Regards and happy coding!
Steffen




Simple XPath is an XML query library based on a subset of the XPath 1.0  
language. It provides a handy API to construct paths and a parser for  
abbreviated XPath location paths without predicates.

See also: http://www.w3.org/TR/xpath/.

I. NodeSets
-----------------
The result of constructing a path or parsing an XPath location path is a  
NodeSet. If applied to an XML node, a NodeSet provides access to the nodes  
selected by this set.
1. Call #contextNode: to define the node a NodeSet is applied to.
2. Call
        #nodes to get a set of all matched nodes,
        #nodesDo: with a one argument block to iterate over all matched nodes and
        #selectNodes: with a one argument block to select some of the matched  
nodes.
If you are working with tags that have prefixed names, ensure that you  
resolve the associated namespace before using a NodeSet.
Call >>#resolveNamespaces: with a dictionary that maps all prefixes to  
their namespace.

II. Path construction API:
------------------------------------
To construct a path programmatically, use the Axis classes and the methods  
 from the protocol "path construction".
1. Single path steps:
        ChildAxis ? 'name'. "select all child nodes tagged with 'name'"
        ChildAxis ? ('prefix' + 'name'). "select all child nodes tagged with  
'prefix:name'"
        AttributeAxis ? 'id'. "select all attribute nodes tagged with 'id'"

        SelfAxis ? AnyNodeTest. "select the context node itself"
        DescendantOrSelfAxis ? CommentTest. "select all descendant comment nodes"

2. Concatenate steps with #/ :
        (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
        (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').

        "Often, the axis can be omitted:"
        'name' / ('second' + 'name'). "same as"
                (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
        AnyNodeTest / (AttributeAxis ? 'id'). "same as"
                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').

        "Similar to XPath, #/@, #// and #//@ abbreviate attribute and  
descendant-or-self steps:"
        AnyNodeTest /@ 'id'. "same as"
                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
        'name' // CommentTest. "same as"
                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) / (ChildAxis  
? CommentTest).
        'name' //@ 'id'. "same as"
                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /  
(AttributeAxis ? 'id').

3. Query from the document root with a RootStep:
        RootStep // AnyNodeTest. "all nodes"
        RootStep //@ 'id'. "id of each node"

4. Create the union of two NodeSets with #| :
        (RootStep // 'element') | (RootStep // CommentTest).

        "#\@ abbreviates the union with an attribute step:"
        CommentTest \@ 'id'. "same as"
                (ChildAxis ? CommentTest) | (AttributeAxis ? 'id').

5. The wildcards # and * match single and multiple characters in local tag  
names:
        ChildAxis ? 'name_##'. "selects e.g. <name_01 />"
        AttributeAxis ? '*_id'. "selects e.g. ... svg_id='0x5' ..."
        "NOTE: XPath allows * only for the whole tag name, e.g. //prefix:* "

III. Parser API:
--------------------
To parse an abbreviated XPath location path, use SimpleXPathParser.  
However, predicate expressions are not supported.
Call
        #parseString: with the XPath string to parse that string and obtain a  
NodeSet and
        #validateString: to check whether the string is free of syntax errors.
If parsing fails, a SyntaxError is raised that gives the error position  
and a brief description.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Introducing SimpleXPath

stephane ducasse-2
Good idea!

Stef

On Dec 5, 2011, at 2:29 PM, Steffen Märcker wrote:

> Hi,
>
> during my work on the XML-Mapping framework SimpleXO, I realized that the  
> XML querying code could be useful standalone, too. I factored out a  
> library named SimpleXPath and made it available in Cincom public  
> repository under the MIT license. It is similar to the XPath location path  
> subset (without predicates) but offers some distinct features:
>
> - paths are built as pure Smalltalk expressions
> - extended wildcard support
> - simple API
>
> Example:
> (RootStep // 'source' /@ 'id') "XPath: //source/@id"
> contextNode: anXmlNode;
> nodesDo: [:node | Transcript show: node stringValue; cr].
>
> The above code prints the 'id' value of all 'source' elements in the XML  
> document from which anXmlNode is taken.
>
> I am interested in your opinions. I'd be glad If you give it a try and  
> discuss your thoughts here. Below I've attached the package comment  
> explaining the API, just in case. ;)
>
> Regards and happy coding!
> Steffen
>
>
>
>
> Simple XPath is an XML query library based on a subset of the XPath 1.0  
> language. It provides a handy API to construct paths and a parser for  
> abbreviated XPath location paths without predicates.
>
> See also: http://www.w3.org/TR/xpath/.
>
> I. NodeSets
> -----------------
> The result of constructing a path or parsing an XPath location path is a  
> NodeSet. If applied to an XML node, a NodeSet provides access to the nodes  
> selected by this set.
> 1. Call #contextNode: to define the node a NodeSet is applied to.
> 2. Call
> #nodes to get a set of all matched nodes,
> #nodesDo: with a one argument block to iterate over all matched nodes and
> #selectNodes: with a one argument block to select some of the matched  
> nodes.
> If you are working with tags that have prefixed names, ensure that you  
> resolve the associated namespace before using a NodeSet.
> Call >>#resolveNamespaces: with a dictionary that maps all prefixes to  
> their namespace.
>
> II. Path construction API:
> ------------------------------------
> To construct a path programmatically, use the Axis classes and the methods  
> from the protocol "path construction".
> 1. Single path steps:
> ChildAxis ? 'name'. "select all child nodes tagged with 'name'"
> ChildAxis ? ('prefix' + 'name'). "select all child nodes tagged with  
> 'prefix:name'"
> AttributeAxis ? 'id'. "select all attribute nodes tagged with 'id'"
>
> SelfAxis ? AnyNodeTest. "select the context node itself"
> DescendantOrSelfAxis ? CommentTest. "select all descendant comment nodes"
>
> 2. Concatenate steps with #/ :
> (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
> (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>
> "Often, the axis can be omitted:"
> 'name' / ('second' + 'name'). "same as"
> (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
> AnyNodeTest / (AttributeAxis ? 'id'). "same as"
> (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>
> "Similar to XPath, #/@, #// and #//@ abbreviate attribute and  
> descendant-or-self steps:"
> AnyNodeTest /@ 'id'. "same as"
> (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
> 'name' // CommentTest. "same as"
> (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) / (ChildAxis  
> ? CommentTest).
> 'name' //@ 'id'. "same as"
> (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /  
> (AttributeAxis ? 'id').
>
> 3. Query from the document root with a RootStep:
> RootStep // AnyNodeTest. "all nodes"
> RootStep //@ 'id'. "id of each node"
>
> 4. Create the union of two NodeSets with #| :
> (RootStep // 'element') | (RootStep // CommentTest).
>
> "#\@ abbreviates the union with an attribute step:"
> CommentTest \@ 'id'. "same as"
> (ChildAxis ? CommentTest) | (AttributeAxis ? 'id').
>
> 5. The wildcards # and * match single and multiple characters in local tag  
> names:
> ChildAxis ? 'name_##'. "selects e.g. <name_01 />"
> AttributeAxis ? '*_id'. "selects e.g. ... svg_id='0x5' ..."
> "NOTE: XPath allows * only for the whole tag name, e.g. //prefix:* "
>
> III. Parser API:
> --------------------
> To parse an abbreviated XPath location path, use SimpleXPathParser.  
> However, predicate expressions are not supported.
> Call
> #parseString: with the XPath string to parse that string and obtain a  
> NodeSet and
> #validateString: to check whether the string is free of syntax errors.
> If parsing fails, a SyntaxError is raised that gives the error position  
> and a brief description.
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Fixing SimpleXPath

Steffen Märcker
In reply to this post by Steffen Märcker
Hello again,

I've pushed a new version of SimpleXPath that fixes a serious bug. It was  
possible that a Path/NodeSet enumerates a node twice. This is not allowed  
by the spec, since NodeSets are sets.

I highly recommend updating to this version. This affects SimpleXO as  
well, since node duplicates may cause an incorrect XML to object mapping.

Regards, Steffen


Am 05.12.2011, 14:29 Uhr, schrieb Steffen Märcker <[hidden email]>:

> Hi,
>
> during my work on the XML-Mapping framework SimpleXO, I realized that the
> XML querying code could be useful standalone, too. I factored out a
> library named SimpleXPath and made it available in Cincom public
> repository under the MIT license. It is similar to the XPath location  
> path
> subset (without predicates) but offers some distinct features:
>
> - paths are built as pure Smalltalk expressions
> - extended wildcard support
> - simple API
>
> Example:
> (RootStep // 'source' /@ 'id') "XPath: //source/@id"
> contextNode: anXmlNode;
> nodesDo: [:node | Transcript show: node stringValue; cr].
>
> The above code prints the 'id' value of all 'source' elements in the XML
> document from which anXmlNode is taken.
>
> I am interested in your opinions. I'd be glad If you give it a try and
> discuss your thoughts here. Below I've attached the package comment
> explaining the API, just in case. ;)
>
> Regards and happy coding!
> Steffen
>
>
>
>
> Simple XPath is an XML query library based on a subset of the XPath 1.0
> language. It provides a handy API to construct paths and a parser for
> abbreviated XPath location paths without predicates.
>
> See also: http://www.w3.org/TR/xpath/.
>
> I. NodeSets
> -----------------
> The result of constructing a path or parsing an XPath location path is a
> NodeSet. If applied to an XML node, a NodeSet provides access to the  
> nodes
> selected by this set.
> 1. Call #contextNode: to define the node a NodeSet is applied to.
> 2. Call
> #nodes to get a set of all matched nodes,
> #nodesDo: with a one argument block to iterate over all matched nodes  
> and
> #selectNodes: with a one argument block to select some of the matched
> nodes.
> If you are working with tags that have prefixed names, ensure that you
> resolve the associated namespace before using a NodeSet.
> Call >>#resolveNamespaces: with a dictionary that maps all prefixes to
> their namespace.
>
> II. Path construction API:
> ------------------------------------
> To construct a path programmatically, use the Axis classes and the  
> methods
>  from the protocol "path construction".
> 1. Single path steps:
> ChildAxis ? 'name'. "select all child nodes tagged with 'name'"
> ChildAxis ? ('prefix' + 'name'). "select all child nodes tagged with
> 'prefix:name'"
> AttributeAxis ? 'id'. "select all attribute nodes tagged with 'id'"
>
> SelfAxis ? AnyNodeTest. "select the context node itself"
> DescendantOrSelfAxis ? CommentTest. "select all descendant comment  
> nodes"
>
> 2. Concatenate steps with #/ :
> (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
> (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>
> "Often, the axis can be omitted:"
> 'name' / ('second' + 'name'). "same as"
> (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
> AnyNodeTest / (AttributeAxis ? 'id'). "same as"
> (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>
> "Similar to XPath, #/@, #// and #//@ abbreviate attribute and
> descendant-or-self steps:"
> AnyNodeTest /@ 'id'. "same as"
> (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
> 'name' // CommentTest. "same as"
> (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /  
> (ChildAxis
> ? CommentTest).
> 'name' //@ 'id'. "same as"
> (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /
> (AttributeAxis ? 'id').
>
> 3. Query from the document root with a RootStep:
> RootStep // AnyNodeTest. "all nodes"
> RootStep //@ 'id'. "id of each node"
>
> 4. Create the union of two NodeSets with #| :
> (RootStep // 'element') | (RootStep // CommentTest).
>
> "#\@ abbreviates the union with an attribute step:"
> CommentTest \@ 'id'. "same as"
> (ChildAxis ? CommentTest) | (AttributeAxis ? 'id').
>
> 5. The wildcards # and * match single and multiple characters in local  
> tag
> names:
> ChildAxis ? 'name_##'. "selects e.g. <name_01 />"
> AttributeAxis ? '*_id'. "selects e.g. ... svg_id='0x5' ..."
> "NOTE: XPath allows * only for the whole tag name, e.g. //prefix:* "
>
> III. Parser API:
> --------------------
> To parse an abbreviated XPath location path, use SimpleXPathParser.
> However, predicate expressions are not supported.
> Call
> #parseString: with the XPath string to parse that string and obtain a
> NodeSet and
> #validateString: to check whether the string is free of syntax errors.
> If parsing fails, a SyntaxError is raised that gives the error position
> and a brief description.
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc