XMLParser weirdness

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

XMLParser weirdness

Andreas.Raab
Hi -

I just spent about two hours staring at code because of an oddity in the
XML parser's printing of nodes. Here's an example:

node:= (XMLElement new) name: 'foo';
        addContent: (XMLStringNode string: 'Hello World');
        setAttributes: (Dictionary new);
        yourself.

This prints '<foo>Hello World</foo>' which is fine. However, the
following construction, which adds just a single attribute:

node:= (XMLElement new) name: 'foo';
        addContent: (XMLStringNode string: 'Hello World');
        setAttributes: (Dictionary newFromPairs: {#id. 1});
        yourself.

prints now as '<foo id="1"/>' (i.e., losing its content string). Looking
at the code in XMLElement>>printXmlOn: it does something weird if the
writer is considered "non-canonical", i.e.,

        "... snip ..."
        (writer canonical not
                and: [self isEmpty and: [self attributes isEmpty not]])
                ifTrue: [writer endEmptyTag: self name]
        "... snap ..."

Two questions about this: 1) What's the meaning of 'canonical' XML? Is
this a well-defined (sub-)set of XML? If so, where can I read about it?
2) Is the above a bug or a feature? I'm wondering in particular about
XMLElement>>isEmpty which only considers the elements but not eventual
contents.

Any help is greatly welcome.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: XMLParser weirdness

Bert Freudenberg

On 10.08.2010, at 21:21, Andreas Raab wrote:

> Hi -
>
> I just spent about two hours staring at code because of an oddity in the XML parser's printing of nodes. Here's an example:
>
> node:= (XMLElement new) name: 'foo';
> addContent: (XMLStringNode string: 'Hello World');
> setAttributes: (Dictionary new);
> yourself.
>
> This prints '<foo>Hello World</foo>' which is fine. However, the following construction, which adds just a single attribute:
>
> node:= (XMLElement new) name: 'foo';
> addContent: (XMLStringNode string: 'Hello World');
> setAttributes: (Dictionary newFromPairs: {#id. 1});
> yourself.
>
> prints now as '<foo id="1"/>' (i.e., losing its content string). Looking at the code in XMLElement>>printXmlOn: it does something weird if the writer is considered "non-canonical", i.e.,
>
> "... snip ..."
> (writer canonical not
> and: [self isEmpty and: [self attributes isEmpty not]])
> ifTrue: [writer endEmptyTag: self name]
> "... snap ..."
>
> Two questions about this: 1) What's the meaning of 'canonical' XML? Is this a well-defined (sub-)set of XML? If so, where can I read about it? 2) Is the above a bug or a feature? I'm wondering in particular about XMLElement>>isEmpty which only considers the elements but not eventual contents.
>
> Any help is greatly welcome.
>
> Cheers,
>  - Andreas


Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: XMLParser weirdness

Andreas.Raab
On 8/10/2010 12:33 PM, Bert Freudenberg wrote:
> Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.

Good theory, but it seems that in that case the test that says:

        (writer canonical not
                and: [self isEmpty and: [self attributes isEmpty not]])

could be shortened to just

        (writer canonical not
                and: [self isEmpty])

no? I mean why would it matter if the list of attributes is empty or
not? The way it's right now, you get:

node:= (XMLElement new) name: 'foo';
     setAttributes: (Dictionary new);
     yourself.

=> '<foo></foo>'

even when running 'non-canonical' (due to 'self attributes isEmpty not'
failing).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: XMLParser weirdness

Bert Freudenberg

On 10.08.2010, at 22:08, Andreas Raab wrote:

> On 8/10/2010 12:33 PM, Bert Freudenberg wrote:
>> Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.
>
> Good theory, but it seems that in that case the test that says:
>
> (writer canonical not
> and: [self isEmpty and: [self attributes isEmpty not]])
>
> could be shortened to just
>
> (writer canonical not
> and: [self isEmpty])
>
> no? I mean why would it matter if the list of attributes is empty or not? The way it's right now, you get:
>
> node:= (XMLElement new) name: 'foo';
>    setAttributes: (Dictionary new);
>    yourself.
>
> => '<foo></foo>'
>
> even when running 'non-canonical' (due to 'self attributes isEmpty not' failing).
>
> Cheers,
>  - Andreas

I can't see the canonicalization having  anything to do with attributes being present or not:

        http://www.w3.org/TR/xml-c14n#Example-SETags

- Bert -