============ Forwarded message ============ From : jaayer To : "Jan van de Sandt" Date : Mon, 09 Aug 2010 14:55:29 -0700 Subject : Re: Squeaksource XML Parser - Enhanced support for CDATA Sections ============ Forwarded message ============ ---- On Mon, 09 Aug 2010 04:44:07 -0700 Jan van de Sandt wrote ---- >Hello, > >Ok, I understand the problem :-) thanks for explaining it. > > >Here is a new version. In this version the XMLDOMParser has an extra property preserveCDATASections so the behaviour is now configurable. For now the default value of this property is true. Thank you for the patch. I have merged it in, but with the following modifications: 1) Preservation of CDATA sections is disabled by default, as most of the time you don't really care whether parsed character data originally contained &, < or other pseudoentities to escape special characters or if those special characters were guarded within a CDATA section. 2) #addCDATASection: has not been added. I really don't want a proliferation of #add* methods in XMLElement or XMLNodeWithElements for every node type. #addContent: is special, as it accepts a node or a string, and #addElement; was once needed for the special handling element nodes required but is no longer needed. #addNode: should be preferred. 3) The messages added to XMLDOMParser were renamed to #isInCDataSection, #preservesCDataSections, and #preservesCDataSections: to make it clearer that they take or return boolean values. Here is an example that demonstrates parsing with CDATA section preservation: doc := (XMLDOMParser on: '') preservesCDataSections: true; parseDocument. doc root firstNode When evaluated with cmd-p, it produces: _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Just an api point
why preservesC... and not preserverC I'm always confuse with the infinitive and third person singular situation. Stef > Here is an example that demonstrates parsing with CDATA section preservation: > doc := (XMLDOMParser on: '') > preservesCDataSections: true; > parseDocument. > doc root firstNode > When evaluated with cmd-p, it produces: > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 10.08.2010, at 10:45, Stéphane Ducasse wrote: > Just an api point > > why preservesC... > and not > preserverC > > I'm always confuse with the infinitive and third person singular situation. > You mean preserveC...? The preserve(r)C.. was a typo, right? I think preservesC... qualifies for a testing selector. But as I wrote in my description of the problem I would call it coalesceCDATASections: aBoolean or enableCoalescing disableCoalescing The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node. my 2 cents, Norbert > Stef > >> Here is an example that demonstrates parsing with CDATA section preservation: >> doc := (XMLDOMParser on: '') >> preservesCDataSections: true; >> parseDocument. >> doc root firstNode >> When evaluated with cmd-p, it produces: >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>
> coalesceCDATASections: aBoolean > > or > > enableCoalescing > disableCoalescing most of the time you need the 3 because the first one let you easily build scripts for me > coalesceCDATASections: aBoolean is a setter not a testing selector _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Stéphane Ducasse
---- On Tue, 10 Aug 2010 01:45:42 -0700 Stéphane Ducasse wrote ---- >Just an api point > >why preservesC... >and not > preserverC > >I'm always confuse with the infinitive and third person singular situation. > >Stef The infinitive in English is two words with possibly other words separating them, the word "to" and then the verb lacking any "s" or "es" or other tense, number or person modifiers: "to program" or "to code." I choose preservesCDataSections because it is more obvious that it returns a boolean and that the corresponding preservesCDataSections: accepts a boolean. If you call the testing message preserveCDataSections, it sounds more like you are commanding the receiver to do so rather than asking if it already does.. Also, my mail client ate the example code, so here it is again: doc := (XMLDOMParser on: '<root><![CDATA[&foo;&bar;]]></root>') preservesCDataSections: true; parseDocument. doc root firstNode produces: <![CDATA[&foo;&bar;]]> _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>>
> > The infinitive in English is two words with possibly other words separating them, the word "to" and then the verb lacking any "s" or "es" or other tense, number or person modifiers: "to program" or "to code." Thanks. I know the difference :) I meant in method selectors include: vs includes: > I choose preservesCDataSections because it is more obvious that it returns a boolean and that the corresponding preservesCDataSections: accepts a boolean. > If you call the testing message preserveCDataSections, No I would write is isPreservingCDataSections doesPreserveCDataSections for me preservesCDataSection: should better be written as preserveCDataSections: Because I do not have to think if I should put an S or not. > it sounds more like you are commanding the receiver to do so rather than asking if it already does.. > > Also, my mail client ate the example code, so here it is again: > doc := > (XMLDOMParser on: '<root><![CDATA[&foo;&bar;]]></root>') > preservesCDataSections: true; Yes but it looks like an order too and I do not understand the difference between using preserveSCD.... and parseDocument (with no S after parseDocument) > parseDocument. > doc root firstNode > produces: > <![CDATA[&foo;&bar;]]> I follow Beck and Smalltalk with Style (see my web page) convention. > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
---- On Tue, 10 Aug 2010 12:17:36 -0700 Stéphane Ducasse wrote ---- >>> >> >> The infinitive in English is two words with possibly other words separating them, the word "to" and then the verb lacking any "s" or "es" or other tense, number or person modifiers: "to program" or "to code." > >Thanks. I know the difference :) I meant in method selectors include: vs includes: I think #includes*, #has* and similar messages use the third person singular so that when you use them in an expression like this: aCollection includes: anObject What you are really doing is affirming something (inclusion of an object) about some subject, (a collection). A sentence with a subject and a predicate that affirms or denies something about that subject is a proposition, and propositions in two-valued logic are either true or false (just as the smalltalk expression above would be when evaluated). > >isPreservingCDataSections >doesPreserveCDataSections The first form is already in use elsewhere in the API and has some advantages over the third person singular form (the "is" prefix). However, it can also imply an unnecessary temporal restriction to the present. For example, compare #resolvesExternalEntities with #isResolvingExternalEntities. The second selector could just mean--if true--that the parser supports resolution of external entities, but it could also mean that the parser is right now, at this very moment, resolving external entities. While the "does" form does not suffer from these ambiguities, it is also the longest and ugliest of bunch. > >Yes but it looks like an order too and I do not understand the difference between >using > preserveSCD.... >and > parseDocument (with no S after parseDocument) Imperative forms of a verb in English never have an "s" or "es" at the end of them. That means #preservesCDataSections (or #includes: or any other similar message) could never be taken to be an imperative command given to the receiver and instead must form, with the receiver and any arguments, some type of propositional sentence that is either true or false. >I follow Beck and Smalltalk with Style (see my web page) convention. I have read Kent's Smalltalk Best Practice Patterns, though not the other one. I will check it out, and I appreciate your feedback, Stéphane. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by NorbertHartl
---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl wrote ---- > >But as I wrote in my description of the problem I would call it > >coalesceCDATASections: aBoolean > >or > >enableCoalescing >disableCoalescing The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two. >The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node. > >my 2 cents, > >Norbert I think "preserve" is better, if only because "coalesceCData" just implies a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by jaayer
---- On Tue, 10 Aug 2010 13:41:09 -0700 jaayer wrote ---- >The first form is already in use elsewhere in the API and has some advantages over the third person singular form (the "is" prefix). I meant to say here that the "is" prefix is an advantage over the third person singular form (because people recognize it right away as indicating a boolean return value) and not that it is the third person singular (it isn't). _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by jaayer
On 11.08.2010, at 00:56, jaayer wrote:
If you know about coalescing than the state in DOM tree is pretty obvious. The nodes can coalesce only if they are of the same kind. While a cdata _is_ a text node all cdata nodes are converted to simple text nodes and then all of the text nodes coalesce into one. The state in the DOM is always that there is a single text node after coalescing. Norbert _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
:funnily enough for me coalesce is far more obscure than preserve.
Now my point was not for this specific message but I would like to get some guidelines to specify consistent API. And I'm always thorn apart when writing code if I should use s or not. Stef On Aug 11, 2010, at 10:07 AM, Norbert Hartl wrote: > > On 11.08.2010, at 00:56, jaayer wrote: > >> >> >> ---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl wrote ---- >> >>> >>> But as I wrote in my description of the problem I would call it >>> >>> coalesceCDATASections: aBoolean >>> >>> or >>> >>> enableCoalescing >>> disableCoalescing >> >> The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two. >> >>> The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node. >>> >>> my 2 cents, >>> >>> Norbert >> >> I think "preserve" is better, if only because "coalesceCData" just implies a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it. >> > I think it is hard to find a word that describes completely what ist going on. And I think that common sense/common usage is also kind of an argument. I didn't start to think of myself what would be the best describing word (quite hard as non-native speaker). If you search the net then you might see (as I did) that it is quite common that this effect is described as coalescing. That was my only reason to speak up because I think its recognition is better this way. > If you know about coalescing than the state in DOM tree is pretty obvious. The nodes can coalesce only if they are of the same kind. While a cdata _is_ a text node all cdata nodes are converted to simple text nodes and then all of the text nodes coalesce into one. The state in the DOM is always that there is a single text node after coalescing. > > Norbert > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On 11.08.2010, at 10:31, Stéphane Ducasse wrote: > :funnily enough for me coalesce is far more obscure than preserve. > Might be the french in you :) > Now my point was not for this specific message but I would like to get some guidelines to specify consistent API. > And I'm always thorn apart when writing code if I should use s or not. > I think we all want to figure out how it is done best in general. Discussing about a single selector in one piece of code would hardly justify a longer thread :) To me this is really important. I read the Becks but can't even remember which of those. Because it's none of these things you read and you can remember afterwards. Well, at least in my cast this doesn't work and far too lazy to read it over and over. Norbert > > On Aug 11, 2010, at 10:07 AM, Norbert Hartl wrote: > >> >> On 11.08.2010, at 00:56, jaayer wrote: >> >>> >>> >>> ---- On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl wrote ---- >>> >>>> >>>> But as I wrote in my description of the problem I would call it >>>> >>>> coalesceCDATASections: aBoolean >>>> >>>> or >>>> >>>> enableCoalescing >>>> disableCoalescing >>> >>> The downside of enable/disable pairs is the need for three message (two to modify, one to test and lazily initialize) rather than two. >>> >>>> The functionality that is described here is better known as coalescing. And it describes better what is going. If a parser is coalescing two things will happen. CDATA sections will be read in as text nodes and then subsequent text nodes are coalesing into a single text node. >>>> >>>> my 2 cents, >>>> >>>> Norbert >>> >>> I think "preserve" is better, if only because "coalesceCData" just implies a joining together of CDATA sections and says nothing about their status in the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it. >>> >> I think it is hard to find a word that describes completely what ist going on. And I think that common sense/common usage is also kind of an argument. I didn't start to think of myself what would be the best describing word (quite hard as non-native speaker). If you search the net then you might see (as I did) that it is quite common that this effect is described as coalescing. That was my only reason to speak up because I think its recognition is better this way. >> If you know about coalescing than the state in DOM tree is pretty obvious. The nodes can coalesce only if they are of the same kind. While a cdata _is_ a text node all cdata nodes are converted to simple text nodes and then all of the text nodes coalesce into one. The state in the DOM is always that there is a single text node after coalescing. >> >> Norbert >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
> :funnily enough for me coalesce is far more obscure than preserve.
>> > Might be the french in you :) Probably :) We can't mess up with our roots :) > >> Now my point was not for this specific message but I would like to get some guidelines to specify consistent API. >> And I'm always thorn apart when writing code if I should use s or not. >> > I think we all want to figure out how it is done best in general. Discussing about a single selector in one piece of code would hardly justify a longer thread :) > > To me this is really important. I read the Becks but can't even remember which of those. Because it's none of these things you read and you can remember afterwards. Well, at least in my cast this doesn't work and far too lazy to read it over and over. It is 4 pages so this is worth the effort. I will reread them and smalltalk with style. and probably Smalltalk by example. Stef _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |