Fwd: [vwnc] Introducing SimpleXPath

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: [vwnc] Introducing SimpleXPath

Steffen Märcker
Hi,

during my work on the XML-Mapping framework SimpleXO, I realized that the
XML querying code could be useful standalone, too. I factored out a
library named SimpleXPath and made it available in Cincom public
repository under the MIT license. It is similar to the XPath location path
subset (without predicates) but offers some distinct features:

- paths are built as pure Smalltalk expressions
- extended wildcard support
- simple API

Example:
(RootStep // 'source' /@ 'id') "XPath: //source/@id"
        contextNode: anXmlNode;
        nodesDo: [:node | Transcript show: node stringValue; cr].

The above code prints the 'id' value of all 'source' elements in the XML
document from which anXmlNode is taken.

I am interested in your opinions. I'd be glad If you give it a try and
discuss your thoughts here. Below I've attached the package comment
explaining the API, just in case. ;)

Regards and happy coding!
Steffen




Simple XPath is an XML query library based on a subset of the XPath 1.0
language. It provides a handy API to construct paths and a parser for
abbreviated XPath location paths without predicates.

See also: http://www.w3.org/TR/xpath/.

I. NodeSets
-----------------
The result of constructing a path or parsing an XPath location path is a
NodeSet. If applied to an XML node, a NodeSet provides access to the nodes
selected by this set.
1. Call #contextNode: to define the node a NodeSet is applied to.
2. Call
        #nodes to get a set of all matched nodes,
        #nodesDo: with a one argument block to iterate over all matched nodes and
        #selectNodes: with a one argument block to select some of the matched
nodes.
If you are working with tags that have prefixed names, ensure that you
resolve the associated namespace before using a NodeSet.
Call >>#resolveNamespaces: with a dictionary that maps all prefixes to
their namespace.

II. Path construction API:
------------------------------------
To construct a path programmatically, use the Axis classes and the methods
   from the protocol "path construction".
1. Single path steps:
        ChildAxis ? 'name'. "select all child nodes tagged with 'name'"
        ChildAxis ? ('prefix' + 'name'). "select all child nodes tagged with
'prefix:name'"
        AttributeAxis ? 'id'. "select all attribute nodes tagged with 'id'"

        SelfAxis ? AnyNodeTest. "select the context node itself"
        DescendantOrSelfAxis ? CommentTest. "select all descendant comment nodes"

2. Concatenate steps with #/ :
        (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
        (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').

        "Often, the axis can be omitted:"
        'name' / ('second' + 'name'). "same as"
                (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
        AnyNodeTest / (AttributeAxis ? 'id'). "same as"
                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').

        "Similar to XPath, #/@, #// and #//@ abbreviate attribute and
descendant-or-self steps:"
        AnyNodeTest /@ 'id'. "same as"
                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
        'name' // CommentTest. "same as"
                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) / (ChildAxis
? CommentTest).
        'name' //@ 'id'. "same as"
                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /
(AttributeAxis ? 'id').

3. Query from the document root with a RootStep:
        RootStep // AnyNodeTest. "all nodes"
        RootStep //@ 'id'. "id of each node"

4. Create the union of two NodeSets with #| :
        (RootStep // 'element') | (RootStep // CommentTest).

        "#\@ abbreviates the union with an attribute step:"
        CommentTest \@ 'id'. "same as"
                (ChildAxis ? CommentTest) | (AttributeAxis ? 'id').

5. The wildcards # and * match single and multiple characters in local tag
names:
        ChildAxis ? 'name_##'. "selects e.g. <name_01 />"
        AttributeAxis ? '*_id'. "selects e.g. ... svg_id='0x5' ..."
        "NOTE: XPath allows * only for the whole tag name, e.g. //prefix:* "

III. Parser API:
--------------------
To parse an abbreviated XPath location path, use SimpleXPathParser.
However, predicate expressions are not supported.
Call
        #parseString: with the XPath string to parse that string and obtain a
NodeSet and
        #validateString: to check whether the string is free of syntax errors.
If parsing fails, a SyntaxError is raised that gives the error position
and a brief description.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: [vwnc] Introducing SimpleXPath

Tudor Girba-2
Thanks, Steffen.

I will look into it, but I do not have VW at hand right now.

Cheers,
Doru


2011/12/19 Steffen Märcker <[hidden email]>:

> Hi,
>
> during my work on the XML-Mapping framework SimpleXO, I realized that the
> XML querying code could be useful standalone, too. I factored out a
> library named SimpleXPath and made it available in Cincom public
> repository under the MIT license. It is similar to the XPath location path
> subset (without predicates) but offers some distinct features:
>
> - paths are built as pure Smalltalk expressions
> - extended wildcard support
> - simple API
>
> Example:
> (RootStep // 'source' /@ 'id')  "XPath: //source/@id"
>        contextNode: anXmlNode;
>        nodesDo: [:node | Transcript show: node stringValue; cr].
>
> The above code prints the 'id' value of all 'source' elements in the XML
> document from which anXmlNode is taken.
>
> I am interested in your opinions. I'd be glad If you give it a try and
> discuss your thoughts here. Below I've attached the package comment
> explaining the API, just in case. ;)
>
> Regards and happy coding!
> Steffen
>
>
>
>
> Simple XPath is an XML query library based on a subset of the XPath 1.0
> language. It provides a handy API to construct paths and a parser for
> abbreviated XPath location paths without predicates.
>
> See also: http://www.w3.org/TR/xpath/.
>
> I. NodeSets
> -----------------
> The result of constructing a path or parsing an XPath location path is a
> NodeSet. If applied to an XML node, a NodeSet provides access to the nodes
> selected by this set.
> 1. Call #contextNode: to define the node a NodeSet is applied to.
> 2. Call
>        #nodes to get a set of all matched nodes,
>        #nodesDo: with a one argument block to iterate over all matched nodes
> and
>        #selectNodes: with a one argument block to select some of the matched
> nodes.
> If you are working with tags that have prefixed names, ensure that you
> resolve the associated namespace before using a NodeSet.
> Call >>#resolveNamespaces: with a dictionary that maps all prefixes to
> their namespace.
>
> II. Path construction API:
> ------------------------------------
> To construct a path programmatically, use the Axis classes and the methods
>  from the protocol "path construction".
> 1. Single path steps:
>        ChildAxis ? 'name'.
> "select all child nodes tagged with 'name'"
>        ChildAxis ? ('prefix' + 'name').                        "select all
> child nodes tagged with
> 'prefix:name'"
>        AttributeAxis ? 'id'.
> "select all attribute nodes tagged with 'id'"
>
>        SelfAxis ? AnyNodeTest.                         "select the context
> node itself"
>        DescendantOrSelfAxis ? CommentTest.     "select all descendant
> comment nodes"
>
> 2. Concatenate steps with #/ :
>        (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
>        (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>
>        "Often, the axis can be omitted:"
>        'name' / ('second' + 'name').                           "same as"
>                (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
>        AnyNodeTest / (AttributeAxis ? 'id').           "same as"
>                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>
>        "Similar to XPath, #/@, #// and #//@ abbreviate attribute and
> descendant-or-self steps:"
>        AnyNodeTest /@ 'id'.                                    "same as"
>                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
>        'name' // CommentTest.                                  "same as"
>                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /
> (ChildAxis
> ? CommentTest).
>        'name' //@ 'id'.
>    "same as"
>                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /
> (AttributeAxis ? 'id').
>
> 3. Query from the document root with a RootStep:
>        RootStep // AnyNodeTest.                                "all nodes"
>        RootStep //@ 'id'.                                              "id
> of each node"
>
> 4. Create the union of two NodeSets with #| :
>        (RootStep // 'element') | (RootStep // CommentTest).
>
>        "#\@ abbreviates the union with an attribute step:"
>        CommentTest \@ 'id'.                                    "same as"
>                (ChildAxis ? CommentTest) | (AttributeAxis ? 'id').
>
> 5. The wildcards # and * match single and multiple characters in local tag
> names:
>        ChildAxis ? 'name_##'.                                  "selects e.g.
> <name_01 />"
>        AttributeAxis ? '*_id'.                                 "selects e.g.
> ... svg_id='0x5' ..."
>        "NOTE: XPath allows * only for the whole tag name, e.g. //prefix:* "
>
> III. Parser API:
> --------------------
> To parse an abbreviated XPath location path, use SimpleXPathParser.
> However, predicate expressions are not supported.
> Call
>        #parseString: with the XPath string to parse that string and obtain a
> NodeSet and
>        #validateString: to check whether the string is free of syntax
> errors.
> If parsing fails, a SyntaxError is raised that gives the error position
> and a brief description.
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev



--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev