Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

jaayer


============ Forwarded message ============
From : jaayer
To : "Jan van de Sandt"
Date : Mon, 09 Aug 2010 14:55:29 -0700
Subject : Re: Squeaksource XML Parser - Enhanced support for CDATA Sections
============ Forwarded message ============



---- On Mon, 09 Aug 2010 04:44:07 -0700 Jan van de Sandt wrote ----

>Hello,
>
>Ok, I understand the problem :-) thanks for explaining it.
>
>
>Here is a new version. In this version the XMLDOMParser has an extra property preserveCDATASections so the behaviour is now configurable. For now the default value of this property is true.

Thank you for the patch. I have merged it in, but with the following modifications:
1) Preservation of CDATA sections is disabled by default, as most of the time you don't really care whether parsed character data originally contained &, < or other pseudoentities to escape special characters or if those special characters were guarded within a CDATA section.
2) #addCDATASection: has not been added. I really don't want a proliferation of #add* methods in XMLElement or XMLNodeWithElements for every node type. #addContent: is special, as it accepts a node or a string, and #addElement; was once needed for the special handling element nodes required but is no longer needed. #addNode: should be preferred.
3) The messages added to XMLDOMParser were renamed to #isInCDataSection, #preservesCDataSections, and #preservesCDataSections: to make it clearer that they take or return boolean values.

Here is an example that demonstrates parsing with CDATA section preservation:
doc := (XMLDOMParser on: '')
 preservesCDataSections: true;
 parseDocument.
doc root firstNode
When evaluated with cmd-p, it produces:


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

Stéphane Ducasse
Just an api point

why preservesC...
and not
        preserverC

I'm always confuse with the infinitive and third person singular situation.

Stef

> Here is an example that demonstrates parsing with CDATA section preservation:
> doc := (XMLDOMParser on: '')
> preservesCDataSections: true;
> parseDocument.
> doc root firstNode
> When evaluated with cmd-p, it produces:
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

NorbertHartl

On 10.08.2010, at 10:45, Stéphane Ducasse wrote:

> Just an api point
>
> why preservesC...
> and not
> preserverC
>
> I'm always confuse with the infinitive and third person singular situation.
>
You mean preserveC...? The preserve(r)C.. was a typo, right? I think preservesC... qualifies for a testing selector.

But as I wrote in my description of the problem I would call it

coalesceCDATASections: aBoolean

or

enableCoalescing
disableCoalescing

The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node.

my 2 cents,

Norbert


> Stef
>
>> Here is an example that demonstrates parsing with CDATA section preservation:
>> doc := (XMLDOMParser on: '')
>> preservesCDataSections: true;
>> parseDocument.
>> doc root firstNode
>> When evaluated with cmd-p, it produces:
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

Stéphane Ducasse
>
> coalesceCDATASections: aBoolean
>
> or
>
> enableCoalescing
> disableCoalescing

most of the time you need the 3 because the first one let you easily build scripts

for me
       
> coalesceCDATASections: aBoolean

is a setter not a testing selector
_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

jaayer
In reply to this post by Stéphane Ducasse


---- On Tue, 10 Aug 2010 01:45:42 -0700 Stéphane Ducasse  wrote ----

>Just an api point
>
>why preservesC...
>and not
>    preserverC
>
>I'm always confuse with the infinitive and third person singular situation.
>
>Stef

The infinitive in English is two words with possibly other words separating them, the word "to" and then the verb lacking any "s" or "es" or other tense, number or person modifiers: "to program" or "to code."

I choose preservesCDataSections because it is more obvious that it returns a boolean and that the corresponding preservesCDataSections: accepts a boolean. If you call the testing message preserveCDataSections, it sounds more like you are commanding the receiver to do so rather than asking if it already does..

Also, my mail client ate the example code, so here it is again:
doc :=
        (XMLDOMParser on: '<root><![CDATA[&foo;&bar;]]></root>')
                preservesCDataSections: true;
                parseDocument.
doc root firstNode
produces:
 <![CDATA[&foo;&bar;]]>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

Stéphane Ducasse
>>
>
> The infinitive in English is two words with possibly other words separating them, the word "to" and then the verb lacking any "s" or "es" or other tense, number or person modifiers: "to program" or "to code."

Thanks. I know the difference :) I meant in method selectors include: vs includes:

> I choose preservesCDataSections because it is more obvious that it returns a boolean and that the corresponding preservesCDataSections: accepts a boolean.

> If you call the testing message preserveCDataSections,

No I would write is

isPreservingCDataSections
doesPreserveCDataSections

for me
        preservesCDataSection:
should better be written as
        preserveCDataSections:

Because I do not have to think if I should put an S or not.


> it sounds more like you are commanding the receiver to do so rather than asking if it already does..
>
> Also, my mail client ate the example code, so here it is again:
> doc :=
> (XMLDOMParser on: '<root><![CDATA[&foo;&bar;]]></root>')
> preservesCDataSections: true;


Yes but it looks like an order too and I do not understand the difference between
using
        preserveSCD....
and
        parseDocument (with no S after parseDocument)

> parseDocument.
> doc root firstNode
> produces:
> <![CDATA[&foo;&bar;]]>

I follow Beck and Smalltalk with Style (see my web page) convention.

>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

jaayer


---- On Tue, 10 Aug 2010 12:17:36 -0700 Stéphane Ducasse  wrote ----

>>>
>>
>> The infinitive in English is two words with possibly other words separating them, the word "to" and then the verb lacking any "s" or "es" or other tense, number or person modifiers: "to program" or "to code."
>
>Thanks. I know the difference :) I meant in method selectors include: vs includes:

I think #includes*, #has* and similar messages use the third person singular so that when you use them in an expression like this:
aCollection includes: anObject
What you are really doing is affirming something (inclusion of an object) about some subject, (a collection). A sentence with a subject and a predicate that affirms or denies something about that subject is a proposition, and propositions in two-valued logic are either true or false (just as the smalltalk expression above would be when evaluated).

>
>isPreservingCDataSections
>doesPreserveCDataSections

The first form is already in use elsewhere in the API and has some advantages over the third person singular form (the "is" prefix). However, it can also imply an unnecessary temporal restriction to the present. For example, compare #resolvesExternalEntities with #isResolvingExternalEntities. The second selector could just mean--if true--that the parser supports resolution of external entities, but it could also mean that the parser is right now, at this very moment, resolving external entities. While the "does" form does not suffer from these ambiguities, it is also the longest and ugliest of bunch.

>
>Yes but it looks like an order too and I do not understand the difference between
>using
>    preserveSCD....
>and
>    parseDocument (with no S after parseDocument)

Imperative forms of a verb in English never have an "s" or "es" at the end of them. That means #preservesCDataSections (or #includes: or any other similar message) could never be taken to be an imperative command given to the receiver and instead must form, with the receiver and any arguments, some type of propositional sentence that is either true or false.

>I follow Beck and Smalltalk with Style (see my web page) convention.

I have read Kent's Smalltalk Best Practice Patterns, though not the other one. I will check it out, and I appreciate your feedback, Stéphane.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

jaayer
In reply to this post by NorbertHartl


---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote ----

>
>But as I wrote in my description of the problem I would call it
>
>coalesceCDATASections: aBoolean
>
>or
>
>enableCoalescing
>disableCoalescing

The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two.
 
>The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node.
>
>my 2 cents,
>
>Norbert

I think "preserve" is better, if only because "coalesceCData" just implies  a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

jaayer
In reply to this post by jaayer


---- On Tue, 10 Aug 2010 13:41:09 -0700 jaayer  wrote ----

>The first form is already in use elsewhere in the API and has some advantages over the third person singular form (the "is" prefix).

I meant to say here that the "is" prefix is an advantage over the third person singular form (because people recognize it right away as indicating a boolean return value) and not that it is the third person singular (it isn't).


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

NorbertHartl
In reply to this post by jaayer

On 11.08.2010, at 00:56, jaayer wrote:



---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote ----


But as I wrote in my description of the problem I would call it

coalesceCDATASections: aBoolean

or

enableCoalescing
disableCoalescing

The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two.

The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node.

my 2 cents,

Norbert

I think "preserve" is better, if only because "coalesceCData" just implies  a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.

I think it is hard to find a word that describes completely what ist going on. And I think that common sense/common usage is also kind of an argument. I didn't start to think of myself what would be the best describing word (quite hard as non-native speaker). If you search the net then you might see (as I did) that it is quite common that this effect is described as coalescing. That was my only reason to speak up because I think its recognition is better this way.
If you know about coalescing than the state in DOM tree is pretty obvious. The nodes can coalesce only if they are of the same kind. While a cdata _is_ a text node all cdata nodes are converted to simple text nodes and then all of the text nodes coalesce into one. The state in the DOM is always that there is a single text node after coalescing.

Norbert


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

Stéphane Ducasse
:funnily enough for me coalesce is far more obscure than preserve.

Now my point was not for this specific message but I would like to get some guidelines to specify consistent API.
And I'm always thorn apart when writing code if I should use s or not.

Stef

On Aug 11, 2010, at 10:07 AM, Norbert Hartl wrote:

>
> On 11.08.2010, at 00:56, jaayer wrote:
>
>>
>>
>> ---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote ----
>>
>>>
>>> But as I wrote in my description of the problem I would call it
>>>
>>> coalesceCDATASections: aBoolean
>>>
>>> or
>>>
>>> enableCoalescing
>>> disableCoalescing
>>
>> The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two.
>>
>>> The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node.
>>>
>>> my 2 cents,
>>>
>>> Norbert
>>
>> I think "preserve" is better, if only because "coalesceCData" just implies  a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.
>>
> I think it is hard to find a word that describes completely what ist going on. And I think that common sense/common usage is also kind of an argument. I didn't start to think of myself what would be the best describing word (quite hard as non-native speaker). If you search the net then you might see (as I did) that it is quite common that this effect is described as coalescing. That was my only reason to speak up because I think its recognition is better this way.
> If you know about coalescing than the state in DOM tree is pretty obvious. The nodes can coalesce only if they are of the same kind. While a cdata _is_ a text node all cdata nodes are converted to simple text nodes and then all of the text nodes coalesce into one. The state in the DOM is always that there is a single text node after coalescing.
>
> Norbert
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

NorbertHartl

On 11.08.2010, at 10:31, Stéphane Ducasse wrote:

> :funnily enough for me coalesce is far more obscure than preserve.
>
Might be the french in you :)

> Now my point was not for this specific message but I would like to get some guidelines to specify consistent API.
> And I'm always thorn apart when writing code if I should use s or not.
>
I think we all want to figure out how it is done best in general. Discussing about a single selector in one piece of code would hardly justify a longer thread :)

To me this is really important. I read the Becks but can't even remember which of those. Because it's none of these things you read and you can remember afterwards. Well, at least in my cast this doesn't work and far too lazy to read it over and over.

Norbert

>
> On Aug 11, 2010, at 10:07 AM, Norbert Hartl wrote:
>
>>
>> On 11.08.2010, at 00:56, jaayer wrote:
>>
>>>
>>>
>>> ---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote ----
>>>
>>>>
>>>> But as I wrote in my description of the problem I would call it
>>>>
>>>> coalesceCDATASections: aBoolean
>>>>
>>>> or
>>>>
>>>> enableCoalescing
>>>> disableCoalescing
>>>
>>> The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two.
>>>
>>>> The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node.
>>>>
>>>> my 2 cents,
>>>>
>>>> Norbert
>>>
>>> I think "preserve" is better, if only because "coalesceCData" just implies  a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.
>>>
>> I think it is hard to find a word that describes completely what ist going on. And I think that common sense/common usage is also kind of an argument. I didn't start to think of myself what would be the best describing word (quite hard as non-native speaker). If you search the net then you might see (as I did) that it is quite common that this effect is described as coalescing. That was my only reason to speak up because I think its recognition is better this way.
>> If you know about coalescing than the state in DOM tree is pretty obvious. The nodes can coalesce only if they are of the same kind. While a cdata _is_ a text node all cdata nodes are converted to simple text nodes and then all of the text nodes coalesce into one. The state in the DOM is always that there is a single text node after coalescing.
>>
>> Norbert
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

Stéphane Ducasse
> :funnily enough for me coalesce is far more obscure than preserve.
>>
> Might be the french in you :)

Probably :)
We can't mess up with our roots :)

>
>> Now my point was not for this specific message but I would like to get some guidelines to specify consistent API.
>> And I'm always thorn apart when writing code if I should use s or not.
>>
> I think we all want to figure out how it is done best in general. Discussing about a single selector in one piece of code would hardly justify a longer thread :)
>
> To me this is really important. I read the Becks but can't even remember which of those. Because it's none of these things you read and you can remember afterwards. Well, at least in my cast this doesn't work and far too lazy to read it over and over.

It is 4 pages so this is worth the effort. I will reread them and smalltalk with style. and probably Smalltalk by example.

Stef


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project