)Hi
we are manipulating an XML document and I would like to get rid of the spurious empty string. We saw that the gt panes are doing it. (aNodeWithElements isStringNode and: [aNodeWithElements isEmpty or: [aNodeWithElements isWhitespace]] Is there a way not to produce empty nodes? Is there a simple way not to have to handle them Now each time we are dealing with a node with have to check. Stef |
We tried
| parser doc visitor | parser := XMLDOMParser new on: self xmlContents; preservesIgnorableWhitespace: false. doc := parser parseDocument. but we still have the empty nodes around. Stef On Tue, Dec 5, 2017 at 2:29 PM, Stephane Ducasse <[hidden email]> wrote: > )Hi > > we are manipulating an XML document and I would like to get rid of the > spurious empty string. > We saw that the gt panes are doing it. > > (aNodeWithElements isStringNode > and: [aNodeWithElements isEmpty > or: [aNodeWithElements isWhitespace]] > > Is there a way not to produce empty nodes? > Is there a simple way not to have to handle them > > Now each time we are dealing with a node with have to check. > > Stef |
In reply to this post by Stephane Ducasse-3
By "empty XML nodes," do you mean whitespace-only string nodes? Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent For example, if you declare an element like this: <!ELEMENT one (two,three*,four?)> Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. > Sent: Tuesday, December 05, 2017 at 8:29 AM > From: "Stephane Ducasse" <[hidden email]> > To: "Pharo Development List" <[hidden email]> > Subject: [Pharo-dev] How to get rid of empty XML nodes? > > )Hi > > we are manipulating an XML document and I would like to get rid of the > spurious empty string. > We saw that the gt panes are doing it. > > (aNodeWithElements isStringNode > and: [aNodeWithElements isEmpty > or: [aNodeWithElements isWhitespace]] > > Is there a way not to produce empty nodes? > Is there a simple way not to have to handle them > > Now each time we are dealing with a node with have to check. > > Stef > > |
Hi monty
On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: > By "empty XML nodes," do you mean whitespace-only string nodes? Yes > Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space I know. There was a discussion a while ago. I just lost a couple of hours understanding that :( But this is a super super super annoying practices. We had to test each nodes to see if it is a empty nodes so it makes everything a lot more complex without real justification beside the fact that these standardizers probably never implemented some real cases. This standard is a really out of reality from that perspective. > The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent Well the XML files that I had (I did not choose XML because I would have prefer JSON :) ), had no DTD :( So at the end of the day, this wonderful standard puts all the stress and burden to people. > > For example, if you declare an element like this: > > <!ELEMENT one (two,three*,four?)> > > Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. > > I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! Because reality is that people have XML files with just nodes and no empty nodes and they are forced to Let me know because I could try. I was showing how to use Pharo to import code to pharo learners and this was a big drag. Stef I tried to set some values in the parser but it did not work. BTW I saw that the configuration logic forces to write the following | parser doc visitor | parser := XMLDOMParser new on: self xmlContents; preservesIgnorableWhitespace: true. and not | parser doc visitor | parser := XMLDOMParser new preservesIgnorableWhitespace: true. on: self xmlContents; > >> Sent: Tuesday, December 05, 2017 at 8:29 AM >> From: "Stephane Ducasse" <[hidden email]> >> To: "Pharo Development List" <[hidden email]> >> Subject: [Pharo-dev] How to get rid of empty XML nodes? >> >> )Hi >> >> we are manipulating an XML document and I would like to get rid of the >> spurious empty string. >> We saw that the gt panes are doing it. >> >> (aNodeWithElements isStringNode >> and: [aNodeWithElements isEmpty >> or: [aNodeWithElements isWhitespace]] >> >> Is there a way not to produce empty nodes? >> Is there a simple way not to have to handle them >> >> Now each time we are dealing with a node with have to check. >> >> Stef >> >> > |
> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[hidden email]>: > > Hi monty > > >> On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: >> By "empty XML nodes," do you mean whitespace-only string nodes? > > Yes > >> Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space > > I know. There was a discussion a while ago. I just lost a couple of > hours understanding that :( > > But this is a super super super annoying practices. > We had to test each nodes to see if it is a empty nodes so it makes > everything a lot more complex without real justification > beside the fact that these standardizers probably never implemented > some real cases. > This standard is a really out of reality from that perspective. Are you sure you do not oversimplify things? XML would be even more complex if these cases would be in the standard. It is not easy to decide if a whitespace is important or not. Where do this whitespaces in your case come from? Most probably because the XML is pretty printed. That is inserting whitespaces into the serialized text. So why not just stopping to pretty print and your problem is gone. Norbert > >> The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent > > Well the XML files that I had (I did not choose XML because I would > have prefer JSON :) ), had no DTD :( > > So at the end of the day, this wonderful standard puts all the stress > and burden to people. > >> >> For example, if you declare an element like this: >> >> <!ELEMENT one (two,three*,four?)> >> >> Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. >> >> I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. > > It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! > > > Because reality is that people have XML files with just nodes and no > empty nodes and they are forced to > Let me know because I could try. > > I was showing how to use Pharo to import code to pharo learners and > this was a big drag. > > Stef > > > I tried to set some values in the parser but it did not work. > BTW I saw that the configuration logic forces to write the following > > | parser doc visitor | > parser := XMLDOMParser new > on: self xmlContents; > preservesIgnorableWhitespace: true. > > and not > > | parser doc visitor | > parser := XMLDOMParser new > preservesIgnorableWhitespace: true. > on: self xmlContents; > > >> >>> Sent: Tuesday, December 05, 2017 at 8:29 AM >>> From: "Stephane Ducasse" <[hidden email]> >>> To: "Pharo Development List" <[hidden email]> >>> Subject: [Pharo-dev] How to get rid of empty XML nodes? >>> >>> )Hi >>> >>> we are manipulating an XML document and I would like to get rid of the >>> spurious empty string. >>> We saw that the gt panes are doing it. >>> >>> (aNodeWithElements isStringNode >>> and: [aNodeWithElements isEmpty >>> or: [aNodeWithElements isWhitespace]] >>> >>> Is there a way not to produce empty nodes? >>> Is there a simple way not to have to handle them >>> >>> Now each time we are dealing with a node with have to check. >>> >>> Stef >>> >>> >> |
Norbert
Should I say to the tool generating the XML that it is an idiot? Even that I cannot. It is a tool I do not control. I have no control about what I get. Now why we cannot control that if people add a line return or not does not matter? Why I cannot be in charge of deciding? I take the risk of the interpretation but now the "standard" does not help me at all. It just tells me that is good for me. I implemented in the past "standards" like XMI to found that there were bugs in the spec. At then end, each time I visit a node I have to check visitNodeWithElements: aNodeWithElements | currentNode | currentNode := OkStubNode new. self cleanNode: aNodeWithElements. aNodeWithElements hasChildren ifTrue: [ | tokenNode | self cleanNode: aNodeWithElements nodes first. tokenNode := self visitElement: aNodeWithElements nodes first. self assert: tokenNode isToken. currentNode addChild: tokenNode. aNodeWithElements nodes allButFirst do: [ :each | currentNode addChild: (self visitNodeWithElements: each) ] ]. ^ currentNode And I do not like to modify a structure while I'm visiting it. cleanNode: aNodeWithElements aNodeWithElements removeNodes: (aNodeWithElements nodes select: [ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ]) So I understand why people are going away from XML. Stef On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl <[hidden email]> wrote: > > >> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[hidden email]>: >> >> Hi monty >> >> >>> On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: >>> By "empty XML nodes," do you mean whitespace-only string nodes? >> >> Yes >> >>> Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space >> >> I know. There was a discussion a while ago. I just lost a couple of >> hours understanding that :( >> >> But this is a super super super annoying practices. >> We had to test each nodes to see if it is a empty nodes so it makes >> everything a lot more complex without real justification >> beside the fact that these standardizers probably never implemented >> some real cases. >> This standard is a really out of reality from that perspective. > > Are you sure you do not oversimplify things? XML would be even more complex if these cases would be in the standard. It is not easy to decide if a whitespace is important or not. > Where do this whitespaces in your case come from? Most probably because the XML is pretty printed. That is inserting whitespaces into the serialized text. So why not just stopping to pretty print and your problem is gone. > > Norbert >> >>> The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent >> >> Well the XML files that I had (I did not choose XML because I would >> have prefer JSON :) ), had no DTD :( >> >> So at the end of the day, this wonderful standard puts all the stress >> and burden to people. >> >>> >>> For example, if you declare an element like this: >>> >>> <!ELEMENT one (two,three*,four?)> >>> >>> Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. >>> >>> I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. >> >> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! >> >> >> Because reality is that people have XML files with just nodes and no >> empty nodes and they are forced to >> Let me know because I could try. >> >> I was showing how to use Pharo to import code to pharo learners and >> this was a big drag. >> >> Stef >> >> >> I tried to set some values in the parser but it did not work. >> BTW I saw that the configuration logic forces to write the following >> >> | parser doc visitor | >> parser := XMLDOMParser new >> on: self xmlContents; >> preservesIgnorableWhitespace: true. >> >> and not >> >> | parser doc visitor | >> parser := XMLDOMParser new >> preservesIgnorableWhitespace: true. >> on: self xmlContents; >> >> >>> >>>> Sent: Tuesday, December 05, 2017 at 8:29 AM >>>> From: "Stephane Ducasse" <[hidden email]> >>>> To: "Pharo Development List" <[hidden email]> >>>> Subject: [Pharo-dev] How to get rid of empty XML nodes? >>>> >>>> )Hi >>>> >>>> we are manipulating an XML document and I would like to get rid of the >>>> spurious empty string. >>>> We saw that the gt panes are doing it. >>>> >>>> (aNodeWithElements isStringNode >>>> and: [aNodeWithElements isEmpty >>>> or: [aNodeWithElements isWhitespace]] >>>> >>>> Is there a way not to produce empty nodes? >>>> Is there a simple way not to have to handle them >>>> >>>> Now each time we are dealing with a node with have to check. >>>> >>>> Stef >>>> >>>> >>> > Original-java.xml (16K) Download Attachment |
Sure it can get quite annoying. It would be good to have a switch to prevent the creation of whitespace-only nodes at parse time.
Norbert > Am 10.12.2017 um 08:42 schrieb Stephane Ducasse <[hidden email]>: > > Norbert > > Should I say to the tool generating the XML that it is an idiot? Even > that I cannot. It is a tool I do not control. > I have no control about what I get. > Now why we cannot control that if people add a line return or not does > not matter? > Why I cannot be in charge of deciding? I take the risk of the > interpretation but now > the "standard" does not help me at all. It just tells me that is good for me. > > I implemented in the past "standards" like XMI to found that there > were bugs in the spec. > > At then end, each time I visit a node I have to check > > visitNodeWithElements: aNodeWithElements > | currentNode | > currentNode := OkStubNode new. > self cleanNode: aNodeWithElements. > aNodeWithElements hasChildren > ifTrue: [ | tokenNode | > self cleanNode: aNodeWithElements nodes first. > tokenNode := self visitElement: aNodeWithElements > nodes first. > self assert: tokenNode isToken. > currentNode addChild: tokenNode. > aNodeWithElements nodes allButFirst > do: [ :each | currentNode addChild: (self > visitNodeWithElements: each) ] ]. > ^ currentNode > > And I do not like to modify a structure while I'm visiting it. > > > cleanNode: aNodeWithElements > aNodeWithElements removeNodes: (aNodeWithElements nodes select: > [ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ]) > > So I understand why people are going away from XML. > > Stef > >> On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl <[hidden email]> wrote: >> >> >>> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse <[hidden email]>: >>> >>> Hi monty >>> >>> >>>> On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: >>>> By "empty XML nodes," do you mean whitespace-only string nodes? >>> >>> Yes >>> >>>> Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space >>> >>> I know. There was a discussion a while ago. I just lost a couple of >>> hours understanding that :( >>> >>> But this is a super super super annoying practices. >>> We had to test each nodes to see if it is a empty nodes so it makes >>> everything a lot more complex without real justification >>> beside the fact that these standardizers probably never implemented >>> some real cases. >>> This standard is a really out of reality from that perspective. >> >> Are you sure you do not oversimplify things? XML would be even more complex if these cases would be in the standard. It is not easy to decide if a whitespace is important or not. >> Where do this whitespaces in your case come from? Most probably because the XML is pretty printed. That is inserting whitespaces into the serialized text. So why not just stopping to pretty print and your problem is gone. >> >> Norbert >>> >>>> The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent >>> >>> Well the XML files that I had (I did not choose XML because I would >>> have prefer JSON :) ), had no DTD :( >>> >>> So at the end of the day, this wonderful standard puts all the stress >>> and burden to people. >>> >>>> >>>> For example, if you declare an element like this: >>>> >>>> <!ELEMENT one (two,three*,four?)> >>>> >>>> Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. >>>> >>>> I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. >>> >>> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! >>> >>> >>> Because reality is that people have XML files with just nodes and no >>> empty nodes and they are forced to >>> Let me know because I could try. >>> >>> I was showing how to use Pharo to import code to pharo learners and >>> this was a big drag. >>> >>> Stef >>> >>> >>> I tried to set some values in the parser but it did not work. >>> BTW I saw that the configuration logic forces to write the following >>> >>> | parser doc visitor | >>> parser := XMLDOMParser new >>> on: self xmlContents; >>> preservesIgnorableWhitespace: true. >>> >>> and not >>> >>> | parser doc visitor | >>> parser := XMLDOMParser new >>> preservesIgnorableWhitespace: true. >>> on: self xmlContents; >>> >>> >>>> >>>>> Sent: Tuesday, December 05, 2017 at 8:29 AM >>>>> From: "Stephane Ducasse" <[hidden email]> >>>>> To: "Pharo Development List" <[hidden email]> >>>>> Subject: [Pharo-dev] How to get rid of empty XML nodes? >>>>> >>>>> )Hi >>>>> >>>>> we are manipulating an XML document and I would like to get rid of the >>>>> spurious empty string. >>>>> We saw that the gt panes are doing it. >>>>> >>>>> (aNodeWithElements isStringNode >>>>> and: [aNodeWithElements isEmpty >>>>> or: [aNodeWithElements isWhitespace]] >>>>> >>>>> Is there a way not to produce empty nodes? >>>>> Is there a simple way not to have to handle them >>>>> >>>>> Now each time we are dealing with a node with have to check. >>>>> >>>>> Stef >>>>> >>>>> >>>> >> > <Original-java.xml> |
In reply to this post by Stephane Ducasse-3
See #removeAllFormattingNodes and its comment in the latest version.
And instances of SAXHandler and subclasses are meant to be created with #on: (or another "instance creation" message), _not #new_, otherwise they won't be properly initialized. The class comment is clear about this, but I should have overridden #new to raise an error like Stream does. Your misuse was helpful in bringing this to my attention, and I added a Stream-like #new implementation to SAXHandler. > Sent: Friday, December 08, 2017 at 9:21 AM > From: "Stephane Ducasse" <[hidden email]> > To: "Pharo Development List" <[hidden email]> > Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? > > Hi monty > > > On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: > > By "empty XML nodes," do you mean whitespace-only string nodes? > > Yes > > > Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space > > I know. There was a discussion a while ago. I just lost a couple of > hours understanding that :( > > But this is a super super super annoying practices. > We had to test each nodes to see if it is a empty nodes so it makes > everything a lot more complex without real justification > beside the fact that these standardizers probably never implemented > some real cases. > This standard is a really out of reality from that perspective. > > > The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent > > Well the XML files that I had (I did not choose XML because I would > have prefer JSON :) ), had no DTD :( > > So at the end of the day, this wonderful standard puts all the stress > and burden to people. > > > > > For example, if you declare an element like this: > > > > <!ELEMENT one (two,three*,four?)> > > > > Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. > > > > I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. > > It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! > > > Because reality is that people have XML files with just nodes and no > empty nodes and they are forced to > Let me know because I could try. > > I was showing how to use Pharo to import code to pharo learners and > this was a big drag. > > Stef > > > I tried to set some values in the parser but it did not work. > BTW I saw that the configuration logic forces to write the following > > | parser doc visitor | > parser := XMLDOMParser new > on: self xmlContents; > preservesIgnorableWhitespace: true. > > and not > > | parser doc visitor | > parser := XMLDOMParser new > preservesIgnorableWhitespace: true. > on: self xmlContents; > > > > > >> Sent: Tuesday, December 05, 2017 at 8:29 AM > >> From: "Stephane Ducasse" <[hidden email]> > >> To: "Pharo Development List" <[hidden email]> > >> Subject: [Pharo-dev] How to get rid of empty XML nodes? > >> > >> )Hi > >> > >> we are manipulating an XML document and I would like to get rid of the > >> spurious empty string. > >> We saw that the gt panes are doing it. > >> > >> (aNodeWithElements isStringNode > >> and: [aNodeWithElements isEmpty > >> or: [aNodeWithElements isWhitespace]] > >> > >> Is there a way not to produce empty nodes? > >> Is there a simple way not to have to handle them > >> > >> Now each time we are dealing with a node with have to check. > >> > >> Stef > >> > >> > > > > |
Tx Monty!
This is a really important addition :) Because a super frequent scenario. Stef On Fri, Jan 26, 2018 at 8:37 AM, monty <[hidden email]> wrote: > See #removeAllFormattingNodes and its comment in the latest version. > > And instances of SAXHandler and subclasses are meant to be created with #on: (or another "instance creation" message), _not #new_, otherwise they won't be properly initialized. The class comment is clear about this, but I should have overridden #new to raise an error like Stream does. Your misuse was helpful in bringing this to my attention, and I added a Stream-like #new implementation to SAXHandler. > >> Sent: Friday, December 08, 2017 at 9:21 AM >> From: "Stephane Ducasse" <[hidden email]> >> To: "Pharo Development List" <[hidden email]> >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? >> >> Hi monty >> >> >> On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: >> > By "empty XML nodes," do you mean whitespace-only string nodes? >> >> Yes >> >> > Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space >> >> I know. There was a discussion a while ago. I just lost a couple of >> hours understanding that :( >> >> But this is a super super super annoying practices. >> We had to test each nodes to see if it is a empty nodes so it makes >> everything a lot more complex without real justification >> beside the fact that these standardizers probably never implemented >> some real cases. >> This standard is a really out of reality from that perspective. >> >> > The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent >> >> Well the XML files that I had (I did not choose XML because I would >> have prefer JSON :) ), had no DTD :( >> >> So at the end of the day, this wonderful standard puts all the stress >> and burden to people. >> >> > >> > For example, if you declare an element like this: >> > >> > <!ELEMENT one (two,three*,four?)> >> > >> > Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. >> > >> > I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. >> >> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! >> >> >> Because reality is that people have XML files with just nodes and no >> empty nodes and they are forced to >> Let me know because I could try. >> >> I was showing how to use Pharo to import code to pharo learners and >> this was a big drag. >> >> Stef >> >> >> I tried to set some values in the parser but it did not work. >> BTW I saw that the configuration logic forces to write the following >> >> | parser doc visitor | >> parser := XMLDOMParser new >> on: self xmlContents; >> preservesIgnorableWhitespace: true. >> >> and not >> >> | parser doc visitor | >> parser := XMLDOMParser new >> preservesIgnorableWhitespace: true. >> on: self xmlContents; >> >> >> > >> >> Sent: Tuesday, December 05, 2017 at 8:29 AM >> >> From: "Stephane Ducasse" <[hidden email]> >> >> To: "Pharo Development List" <[hidden email]> >> >> Subject: [Pharo-dev] How to get rid of empty XML nodes? >> >> >> >> )Hi >> >> >> >> we are manipulating an XML document and I would like to get rid of the >> >> spurious empty string. >> >> We saw that the gt panes are doing it. >> >> >> >> (aNodeWithElements isStringNode >> >> and: [aNodeWithElements isEmpty >> >> or: [aNodeWithElements isWhitespace]] >> >> >> >> Is there a way not to produce empty nodes? >> >> Is there a simple way not to have to handle them >> >> >> >> Now each time we are dealing with a node with have to check. >> >> >> >> Stef >> >> >> >> >> > >> >> > |
I attached a commit patch (apply with `git am ...`) to the 'books.pharo.org' repo to update the Scraping .pdf link. (The .pdf it links to now is obsolete.)
> Sent: Friday, January 26, 2018 at 2:30 PM > From: "Stephane Ducasse" <[hidden email]> > To: "Pharo Development List" <[hidden email]> > Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? > > Tx Monty! > This is a really important addition :) > Because a super frequent scenario. > > Stef > > On Fri, Jan 26, 2018 at 8:37 AM, monty <[hidden email]> wrote: > > See #removeAllFormattingNodes and its comment in the latest version. > > > > And instances of SAXHandler and subclasses are meant to be created with #on: (or another "instance creation" message), _not #new_, otherwise they won't be properly initialized. The class comment is clear about this, but I should have overridden #new to raise an error like Stream does. Your misuse was helpful in bringing this to my attention, and I added a Stream-like #new implementation to SAXHandler. > > > >> Sent: Friday, December 08, 2017 at 9:21 AM > >> From: "Stephane Ducasse" <[hidden email]> > >> To: "Pharo Development List" <[hidden email]> > >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? > >> > >> Hi monty > >> > >> > >> On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: > >> > By "empty XML nodes," do you mean whitespace-only string nodes? > >> > >> Yes > >> > >> > Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space > >> > >> I know. There was a discussion a while ago. I just lost a couple of > >> hours understanding that :( > >> > >> But this is a super super super annoying practices. > >> We had to test each nodes to see if it is a empty nodes so it makes > >> everything a lot more complex without real justification > >> beside the fact that these standardizers probably never implemented > >> some real cases. > >> This standard is a really out of reality from that perspective. > >> > >> > The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent > >> > >> Well the XML files that I had (I did not choose XML because I would > >> have prefer JSON :) ), had no DTD :( > >> > >> So at the end of the day, this wonderful standard puts all the stress > >> and burden to people. > >> > >> > > >> > For example, if you declare an element like this: > >> > > >> > <!ELEMENT one (two,three*,four?)> > >> > > >> > Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. > >> > > >> > I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. > >> > >> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! > >> > >> > >> Because reality is that people have XML files with just nodes and no > >> empty nodes and they are forced to > >> Let me know because I could try. > >> > >> I was showing how to use Pharo to import code to pharo learners and > >> this was a big drag. > >> > >> Stef > >> > >> > >> I tried to set some values in the parser but it did not work. > >> BTW I saw that the configuration logic forces to write the following > >> > >> | parser doc visitor | > >> parser := XMLDOMParser new > >> on: self xmlContents; > >> preservesIgnorableWhitespace: true. > >> > >> and not > >> > >> | parser doc visitor | > >> parser := XMLDOMParser new > >> preservesIgnorableWhitespace: true. > >> on: self xmlContents; > >> > >> > >> > > >> >> Sent: Tuesday, December 05, 2017 at 8:29 AM > >> >> From: "Stephane Ducasse" <[hidden email]> > >> >> To: "Pharo Development List" <[hidden email]> > >> >> Subject: [Pharo-dev] How to get rid of empty XML nodes? > >> >> > >> >> )Hi > >> >> > >> >> we are manipulating an XML document and I would like to get rid of the > >> >> spurious empty string. > >> >> We saw that the gt panes are doing it. > >> >> > >> >> (aNodeWithElements isStringNode > >> >> and: [aNodeWithElements isEmpty > >> >> or: [aNodeWithElements isWhitespace]] > >> >> > >> >> Is there a way not to produce empty nodes? > >> >> Is there a simple way not to have to handle them > >> >> > >> >> Now each time we are dealing with a node with have to check. > >> >> > >> >> Stef > >> >> > >> >> > >> > > >> > >> > > > > 0001-updated-Scraping-booklet-.pdf-link.patch (1K) Download Attachment |
Tx monty.
I will update it because I do not want to lose all the pds if bintray collapse. I plan to revise all the booklets since I will put them on lulu so that people can get them printed. Stef On Mon, Jan 29, 2018 at 2:00 PM, monty <[hidden email]> wrote: > I attached a commit patch (apply with `git am ...`) to the 'books.pharo.org' repo to update the Scraping .pdf link. (The .pdf it links to now is obsolete.) > >> Sent: Friday, January 26, 2018 at 2:30 PM >> From: "Stephane Ducasse" <[hidden email]> >> To: "Pharo Development List" <[hidden email]> >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? >> >> Tx Monty! >> This is a really important addition :) >> Because a super frequent scenario. >> >> Stef >> >> On Fri, Jan 26, 2018 at 8:37 AM, monty <[hidden email]> wrote: >> > See #removeAllFormattingNodes and its comment in the latest version. >> > >> > And instances of SAXHandler and subclasses are meant to be created with #on: (or another "instance creation" message), _not #new_, otherwise they won't be properly initialized. The class comment is clear about this, but I should have overridden #new to raise an error like Stream does. Your misuse was helpful in bringing this to my attention, and I added a Stream-like #new implementation to SAXHandler. >> > >> >> Sent: Friday, December 08, 2017 at 9:21 AM >> >> From: "Stephane Ducasse" <[hidden email]> >> >> To: "Pharo Development List" <[hidden email]> >> >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes? >> >> >> >> Hi monty >> >> >> >> >> >> On Fri, Dec 8, 2017 at 9:03 AM, monty <[hidden email]> wrote: >> >> > By "empty XML nodes," do you mean whitespace-only string nodes? >> >> >> >> Yes >> >> >> >> > Those are included because all in-element whitespace is assumed significant by the spec: https://www.w3.org/TR/xml/#sec-white-space >> >> >> >> I know. There was a discussion a while ago. I just lost a couple of >> >> hours understanding that :( >> >> >> >> But this is a super super super annoying practices. >> >> We had to test each nodes to see if it is a empty nodes so it makes >> >> everything a lot more complex without real justification >> >> beside the fact that these standardizers probably never implemented >> >> some real cases. >> >> This standard is a really out of reality from that perspective. >> >> >> >> > The exception is if the element is declared in the DTD as only having element children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent >> >> >> >> Well the XML files that I had (I did not choose XML because I would >> >> have prefer JSON :) ), had no DTD :( >> >> >> >> So at the end of the day, this wonderful standard puts all the stress >> >> and burden to people. >> >> >> >> > >> >> > For example, if you declare an element like this: >> >> > >> >> > <!ELEMENT one (two,three*,four?)> >> >> > >> >> > Any whitespace around a "two," "three," or "four" element child of a "one" element is insignificant and ignored (unless #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and Xerces, behave the same way. >> >> > >> >> > I'll see if I can come up with some easier way to deal with this, like an optional parser setting, new enumeration methods, or maybe a tree transformation. >> >> >> >> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!! >> >> >> >> >> >> Because reality is that people have XML files with just nodes and no >> >> empty nodes and they are forced to >> >> Let me know because I could try. >> >> >> >> I was showing how to use Pharo to import code to pharo learners and >> >> this was a big drag. >> >> >> >> Stef >> >> >> >> >> >> I tried to set some values in the parser but it did not work. >> >> BTW I saw that the configuration logic forces to write the following >> >> >> >> | parser doc visitor | >> >> parser := XMLDOMParser new >> >> on: self xmlContents; >> >> preservesIgnorableWhitespace: true. >> >> >> >> and not >> >> >> >> | parser doc visitor | >> >> parser := XMLDOMParser new >> >> preservesIgnorableWhitespace: true. >> >> on: self xmlContents; >> >> >> >> >> >> > >> >> >> Sent: Tuesday, December 05, 2017 at 8:29 AM >> >> >> From: "Stephane Ducasse" <[hidden email]> >> >> >> To: "Pharo Development List" <[hidden email]> >> >> >> Subject: [Pharo-dev] How to get rid of empty XML nodes? >> >> >> >> >> >> )Hi >> >> >> >> >> >> we are manipulating an XML document and I would like to get rid of the >> >> >> spurious empty string. >> >> >> We saw that the gt panes are doing it. >> >> >> >> >> >> (aNodeWithElements isStringNode >> >> >> and: [aNodeWithElements isEmpty >> >> >> or: [aNodeWithElements isWhitespace]] >> >> >> >> >> >> Is there a way not to produce empty nodes? >> >> >> Is there a simple way not to have to handle them >> >> >> >> >> >> Now each time we are dealing with a node with have to check. >> >> >> >> >> >> Stef >> >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> |
Free forum by Nabble | Edit this page |