Smalltalk › Squeak › Squeak - Dev

XMLParser weirdness

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Andreas.Raab

XMLParser weirdness

Hi -

I just spent about two hours staring at code because of an oddity in the
XML parser's printing of nodes. Here's an example:

node:= (XMLElement new) name: 'foo';
addContent: (XMLStringNode string: 'Hello World');
setAttributes: (Dictionary new);
yourself.

This prints '<foo>Hello World</foo>' which is fine. However, the
following construction, which adds just a single attribute:

node:= (XMLElement new) name: 'foo';
addContent: (XMLStringNode string: 'Hello World');
setAttributes: (Dictionary newFromPairs: {#id. 1});
yourself.

prints now as '<foo id="1"/>' (i.e., losing its content string). Looking
at the code in XMLElement>>printXmlOn: it does something weird if the
writer is considered "non-canonical", i.e.,

"... snip ..."
(writer canonical not
and: [self isEmpty and: [self attributes isEmpty not]])
ifTrue: [writer endEmptyTag: self name]
"... snap ..."

Two questions about this: 1) What's the meaning of 'canonical' XML? Is
this a well-defined (sub-)set of XML? If so, where can I read about it?
2) Is the above a bug or a feature? I'm wondering in particular about
XMLElement>>isEmpty which only considers the elements but not eventual
contents.

Any help is greatly welcome.

Cheers,
- Andreas

Bert Freudenberg

Re: XMLParser weirdness

On 10.08.2010, at 21:21, Andreas Raab wrote:

> Hi -
>
> I just spent about two hours staring at code because of an oddity in the XML parser's printing of nodes. Here's an example:
>
> node:= (XMLElement new) name: 'foo';
> addContent: (XMLStringNode string: 'Hello World');
> setAttributes: (Dictionary new);
> yourself.
>
> This prints '<foo>Hello World</foo>' which is fine. However, the following construction, which adds just a single attribute:
>
> node:= (XMLElement new) name: 'foo';
> addContent: (XMLStringNode string: 'Hello World');
> setAttributes: (Dictionary newFromPairs: {#id. 1});
> yourself.
>
> prints now as '<foo id="1"/>' (i.e., losing its content string). Looking at the code in XMLElement>>printXmlOn: it does something weird if the writer is considered "non-canonical", i.e.,
>
> "... snip ..."
> (writer canonical not
> and: [self isEmpty and: [self attributes isEmpty not]])
> ifTrue: [writer endEmptyTag: self name]
> "... snap ..."
>
> Two questions about this: 1) What's the meaning of 'canonical' XML? Is this a well-defined (sub-)set of XML? If so, where can I read about it? 2) Is the above a bug or a feature? I'm wondering in particular about XMLElement>>isEmpty which only considers the elements but not eventual contents.
>
> Any help is greatly welcome.
>
> Cheers,
> - Andreas

Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.

- Bert -

Andreas.Raab

Re: XMLParser weirdness

On 8/10/2010 12:33 PM, Bert Freudenberg wrote:
> Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.

Good theory, but it seems that in that case the test that says:

(writer canonical not
and: [self isEmpty and: [self attributes isEmpty not]])

could be shortened to just

(writer canonical not
and: [self isEmpty])

no? I mean why would it matter if the list of attributes is empty or
not? The way it's right now, you get:

node:= (XMLElement new) name: 'foo';
setAttributes: (Dictionary new);
yourself.

=> '<foo></foo>'

even when running 'non-canonical' (due to 'self attributes isEmpty not'
failing).

Cheers,
- Andreas

Bert Freudenberg

Re: XMLParser weirdness

On 10.08.2010, at 22:08, Andreas Raab wrote:

> On 8/10/2010 12:33 PM, Bert Freudenberg wrote:
>> Sounds like #isEmpty is buggy, it certainly should look at both contents and elements. And "canonical" may mean that there are no empty "shorthand" tags but always an opening and closing tag.
>
> Good theory, but it seems that in that case the test that says:
>
> (writer canonical not
> and: [self isEmpty and: [self attributes isEmpty not]])
>
> could be shortened to just
>
> (writer canonical not
> and: [self isEmpty])
>
> no? I mean why would it matter if the list of attributes is empty or not? The way it's right now, you get:
>
> node:= (XMLElement new) name: 'foo';
> setAttributes: (Dictionary new);
> yourself.
>
> => '<foo></foo>'
>
> even when running 'non-canonical' (due to 'self attributes isEmpty not' failing).
>
> Cheers,
> - Andreas

I can't see the canonicalization having anything to do with attributes being present or not:

http://www.w3.org/TR/xml-c14n#Example-SETags

- Bert -