Hello,
I am attempting to read and write an XML document. Currently I have parsed the document successfully. I have basic navigation and have learned how to modify the XMLDocument. Now I want to write the modified document back to the file system. What I have tried so far is: writer := XMLWriter new. xmldoc document writeXMLOn: writer. writer stream. f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'. f nextPutAll: (writer write contents). f flush. f close. It does write an xml document to the file system. However, it has exploded in size. The original is 28mb and is in UTF-8. The newly written file is 112mb and is UTF-32. I do not know why the change in encoding or how to correct or manually set the encoding. Any help in understanding how to correctly write an XML document that I have read and minimally modified would be greatly appreciated. Thanks. Jimmie |
I still do not know how to do this correctly. But I have something that
seems to work for the moment. Using #asUTF8Bytes f nextPutAll: (writer write contents asUTF8Bytes). Now the file is in UTF-8 and normal size. Jimmie On 09/13/2017 12:02 PM, Jimmie Houchin wrote: > Hello, > > I am attempting to read and write an XML document. > > Currently I have parsed the document successfully. I have basic > navigation and have learned how to modify the XMLDocument. > > Now I want to write the modified document back to the file system. > What I have tried so far is: > > writer := XMLWriter new. > xmldoc document writeXMLOn: writer. > writer stream. > f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'. > f nextPutAll: (writer write contents). > f flush. > f close. > > It does write an xml document to the file system. However, it has > exploded in size. The original is 28mb and is in UTF-8. The newly > written file is 112mb and is UTF-32. > > I do not know why the change in encoding or how to correct or manually > set the encoding. > > Any help in understanding how to correctly write an XML document that I > have read and minimally modified would be greatly appreciated. > > Thanks. > > Jimmie > > |
Jimmie Houchin-5 wrote
> I still do not know how to do this correctly. But I have something that > seems to work for the moment. > > Using #asUTF8Bytes > > f nextPutAll: (writer write contents asUTF8Bytes). > > Now the file is in UTF-8 and normal size. openForWriteFileNamed: opens a binary stream (... which incidentally, also lets you put strings/widestrings as if they were bytes/doublewords), to write string source as utf8, the best way is to wrap it in a stream which converts strings -> utf8 bytes, an example: binaryStream := (ByteArray new: 100) writeStream. encodedStream := ZnCharacterWriteStream on: binaryStream encoding: #utf8. encodedStream nextPutAll: '€'. binaryStream contents The "best"* way is to use an API which provides an encoded stream with scoped use (so you don't have to close it manually); for instance: '/home/jimmie/xmldoc.xml' asFileReference writeStreamDo: [ :ws | ws nextPutAll: writer write contents ] should default to a file stream outputting utf8. Cheers, Henry *May or may not work in Pharo 7 though, seeing as how the old default encoded stream has been deprecated, I haven't checked. -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html |
Thanks. I am trying to learn the ways of Pharo 6 and not use
StandardFileStream and MultiByteFileStream. So I do not know all of the best ways to do things. Thanks for the education. Your "best" way worked perfectly in Pharo 6. Again, thanks. Jimmie On 09/14/2017 07:24 AM, Henrik Sperre Johansen wrote: > Jimmie Houchin-5 wrote >> I still do not know how to do this correctly. But I have something that >> seems to work for the moment. >> >> Using #asUTF8Bytes >> >> f nextPutAll: (writer write contents asUTF8Bytes). >> >> Now the file is in UTF-8 and normal size. > > openForWriteFileNamed: opens a binary stream (... which incidentally, also > lets you put strings/widestrings as if they were bytes/doublewords), to > write string source as utf8, the best way is to wrap it in a stream which > converts strings -> utf8 bytes, an example: > > binaryStream := (ByteArray new: 100) writeStream. > encodedStream := ZnCharacterWriteStream on: binaryStream encoding: #utf8. > encodedStream nextPutAll: '€'. > binaryStream contents > > The "best"* way is to use an API which provides an encoded stream with > scoped use (so you don't have to close it manually); for instance: > '/home/jimmie/xmldoc.xml' asFileReference writeStreamDo: [ :ws | ws > nextPutAll: writer write contents ] > should default to a file stream outputting utf8. > > Cheers, > Henry > > *May or may not work in Pharo 7 though, seeing as how the old default > encoded stream has been deprecated, I haven't checked. > > > > -- > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html > |
Henrik is cool :)
On Thu, Sep 14, 2017 at 8:17 PM, Jimmie Houchin <[hidden email]> wrote: > Thanks. I am trying to learn the ways of Pharo 6 and not use > StandardFileStream and MultiByteFileStream. So I do not know all of the best > ways to do things. Thanks for the education. Your "best" way worked > perfectly in Pharo 6. > > Again, thanks. > > > Jimmie > > > > On 09/14/2017 07:24 AM, Henrik Sperre Johansen wrote: >> >> Jimmie Houchin-5 wrote >>> >>> I still do not know how to do this correctly. But I have something that >>> seems to work for the moment. >>> >>> Using #asUTF8Bytes >>> >>> f nextPutAll: (writer write contents asUTF8Bytes). >>> >>> Now the file is in UTF-8 and normal size. >> >> >> openForWriteFileNamed: opens a binary stream (... which incidentally, also >> lets you put strings/widestrings as if they were bytes/doublewords), to >> write string source as utf8, the best way is to wrap it in a stream which >> converts strings -> utf8 bytes, an example: >> >> binaryStream := (ByteArray new: 100) writeStream. >> encodedStream := ZnCharacterWriteStream on: binaryStream encoding: #utf8. >> encodedStream nextPutAll: '€'. >> binaryStream contents >> >> The "best"* way is to use an API which provides an encoded stream with >> scoped use (so you don't have to close it manually); for instance: >> '/home/jimmie/xmldoc.xml' asFileReference writeStreamDo: [ :ws | ws >> nextPutAll: writer write contents ] >> should default to a file stream outputting utf8. >> >> Cheers, >> Henry >> >> *May or may not work in Pharo 7 though, seeing as how the old default >> encoded stream has been deprecated, I haven't checked. >> >> >> >> -- >> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html >> > > |
In reply to this post by Jimmie Houchin-5
If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing.
> Sent: Wednesday, September 13, 2017 at 1:02 PM > From: "Jimmie Houchin" <[hidden email]> > To: "Any question about pharo is welcome" <[hidden email]> > Subject: [Pharo-users] Writing XML > > Hello, > > I am attempting to read and write an XML document. > > Currently I have parsed the document successfully. I have basic > navigation and have learned how to modify the XMLDocument. > > Now I want to write the modified document back to the file system. > What I have tried so far is: > > writer := XMLWriter new. > xmldoc document writeXMLOn: writer. > writer stream. > f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'. > f nextPutAll: (writer write contents). > f flush. > f close. > > It does write an xml document to the file system. However, it has > exploded in size. The original is 28mb and is in UTF-8. The newly > written file is 112mb and is UTF-32. > > I do not know why the change in encoding or how to correct or manually > set the encoding. > > Any help in understanding how to correctly write an XML document that I > have read and minimally modified would be greatly appreciated. > > Thanks. > > Jimmie > > |
Thanks for the reply.
I appreciate the education the people on this list provide. I was already doing the XMLDOMParser #onFileNamed: to open the file. It was showing the correct #encoding for the parsed file. It was just the writing of the nearly identical file which was different. I tried as you wrote #printToFileNamed: and it did as you explained. Thanks again. Jimmie On 09/15/2017 09:29 AM, monty wrote: > If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing. > >> Sent: Wednesday, September 13, 2017 at 1:02 PM >> From: "Jimmie Houchin" <[hidden email]> >> To: "Any question about pharo is welcome" <[hidden email]> >> Subject: [Pharo-users] Writing XML >> >> Hello, >> >> I am attempting to read and write an XML document. >> >> Currently I have parsed the document successfully. I have basic >> navigation and have learned how to modify the XMLDocument. >> >> Now I want to write the modified document back to the file system. >> What I have tried so far is: >> >> writer := XMLWriter new. >> xmldoc document writeXMLOn: writer. >> writer stream. >> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'. >> f nextPutAll: (writer write contents). >> f flush. >> f close. >> >> It does write an xml document to the file system. However, it has >> exploded in size. The original is 28mb and is in UTF-8. The newly >> written file is 112mb and is UTF-32. >> >> I do not know why the change in encoding or how to correct or manually >> set the encoding. >> >> Any help in understanding how to correctly write an XML document that I >> have read and minimally modified would be greatly appreciated. >> >> Thanks. >> >> Jimmie >> >> |
In reply to this post by monty-3
I didn't pay attention to this previously. But I just noticed that using
#printToFileNamed: preserved the DOM tree's original line ending where as previously I had to insure the the XMLWriter #lineEnding was changed from defaultLineEnding to canonicalLineEnding. The original document used LF and not CR. Overall this was a nice win. It cleaned up my method to save the file and reduced 7 lines to 2. Nice. :) Thanks. Jimmie On 09/15/2017 09:29 AM, monty wrote: > If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing. > >> Sent: Wednesday, September 13, 2017 at 1:02 PM >> From: "Jimmie Houchin" <[hidden email]> >> To: "Any question about pharo is welcome" <[hidden email]> >> Subject: [Pharo-users] Writing XML >> >> Hello, >> >> I am attempting to read and write an XML document. >> >> Currently I have parsed the document successfully. I have basic >> navigation and have learned how to modify the XMLDocument. >> >> Now I want to write the modified document back to the file system. >> What I have tried so far is: >> >> writer := XMLWriter new. >> xmldoc document writeXMLOn: writer. >> writer stream. >> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'. >> f nextPutAll: (writer write contents). >> f flush. >> f close. >> >> It does write an xml document to the file system. However, it has >> exploded in size. The original is 28mb and is in UTF-8. The newly >> written file is 112mb and is UTF-32. >> >> I do not know why the change in encoding or how to correct or manually >> set the encoding. >> >> Any help in understanding how to correctly write an XML document that I >> have read and minimally modified would be greatly appreciated. >> >> Thanks. >> >> Jimmie >> >> |
> Sent: Friday, September 15, 2017 at 4:30 PM > From: "Jimmie Houchin" <[hidden email]> > To: "Any question about pharo is welcome" <[hidden email]> > Subject: Re: [Pharo-users] Writing XML > > I didn't pay attention to this previously. But I just noticed that using > #printToFileNamed: preserved the DOM tree's original line ending where > as previously I had to insure the the XMLWriter #lineEnding was changed > from defaultLineEnding to canonicalLineEnding. The original document > used LF and not CR. To clarify, #printToFileNamed: and company use CRLF on Windows and LF elsewhere. XMLWriter uses Pharo's LE by default (CR), but it will use the preferred LE of your platform (LF or CRLF) with #enablePlatformSpecificLineBreak, LF when canonical XML (https://www.w3.org/TR/xml-c14n) is enabled, or whatever LE you want with #lineBreak:. You can use #printToFileNamed:beforeWritingDo: with a block that sends #lineBreak: to the writer argument to get a custom LE when printing a DOM tree to a file. > Overall this was a nice win. > It cleaned up my method to save the file and reduced 7 lines to 2. > Nice. :) > > Thanks. > > Jimmie > > > > On 09/15/2017 09:29 AM, monty wrote: > > If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing. > > > >> Sent: Wednesday, September 13, 2017 at 1:02 PM > >> From: "Jimmie Houchin" <[hidden email]> > >> To: "Any question about pharo is welcome" <[hidden email]> > >> Subject: [Pharo-users] Writing XML > >> > >> Hello, > >> > >> I am attempting to read and write an XML document. > >> > >> Currently I have parsed the document successfully. I have basic > >> navigation and have learned how to modify the XMLDocument. > >> > >> Now I want to write the modified document back to the file system. > >> What I have tried so far is: > >> > >> writer := XMLWriter new. > >> xmldoc document writeXMLOn: writer. > >> writer stream. > >> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'. > >> f nextPutAll: (writer write contents). > >> f flush. > >> f close. > >> > >> It does write an xml document to the file system. However, it has > >> exploded in size. The original is 28mb and is in UTF-8. The newly > >> written file is 112mb and is UTF-32. > >> > >> I do not know why the change in encoding or how to correct or manually > >> set the encoding. > >> > >> Any help in understanding how to correctly write an XML document that I > >> have read and minimally modified would be greatly appreciated. > >> > >> Thanks. > >> > >> Jimmie > >> > >> > > > |
Free forum by Nabble | Edit this page |