Writing XML

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Writing XML

Jimmie Houchin-5
Hello,

I am attempting to read and write an XML document.

Currently I have parsed the document successfully. I have basic
navigation and have learned how to modify the XMLDocument.

Now I want to write the modified document back to the file system.
What I have tried so far is:

writer := XMLWriter new.
xmldoc document writeXMLOn: writer.
writer stream.
f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'.
f nextPutAll: (writer write contents).
f flush.
f close.

It does write an xml document to the file system. However, it has
exploded in size. The original is 28mb and is in UTF-8. The newly
written file is 112mb and is UTF-32.

I do not know why the change in encoding or how to correct or manually
set the encoding.

Any help in understanding how to correctly write an XML document that I
have read and minimally modified would be greatly appreciated.

Thanks.

Jimmie

Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

Jimmie Houchin-5
I still do not know how to do this correctly. But I have something that
seems to work for the moment.

Using #asUTF8Bytes

f nextPutAll: (writer write contents asUTF8Bytes).

Now the file is in UTF-8 and normal size.


Jimmie


On 09/13/2017 12:02 PM, Jimmie Houchin wrote:

> Hello,
>
> I am attempting to read and write an XML document.
>
> Currently I have parsed the document successfully. I have basic
> navigation and have learned how to modify the XMLDocument.
>
> Now I want to write the modified document back to the file system.
> What I have tried so far is:
>
> writer := XMLWriter new.
> xmldoc document writeXMLOn: writer.
> writer stream.
> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'.
> f nextPutAll: (writer write contents).
> f flush.
> f close.
>
> It does write an xml document to the file system. However, it has
> exploded in size. The original is 28mb and is in UTF-8. The newly
> written file is 112mb and is UTF-32.
>
> I do not know why the change in encoding or how to correct or manually
> set the encoding.
>
> Any help in understanding how to correctly write an XML document that I
> have read and minimally modified would be greatly appreciated.
>
> Thanks.
>
> Jimmie
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

Henrik Sperre Johansen
Jimmie Houchin-5 wrote
> I still do not know how to do this correctly. But I have something that
> seems to work for the moment.
>
> Using #asUTF8Bytes
>
> f nextPutAll: (writer write contents asUTF8Bytes).
>
> Now the file is in UTF-8 and normal size.


openForWriteFileNamed: opens a binary stream (... which incidentally, also
lets you put strings/widestrings as if they were bytes/doublewords), to
write string source as utf8, the best way is to wrap it in a stream which
converts strings -> utf8 bytes, an example:

binaryStream := (ByteArray new: 100) writeStream.
encodedStream := ZnCharacterWriteStream on: binaryStream encoding: #utf8.
encodedStream nextPutAll: '€'.
binaryStream contents

The "best"* way is to use an API which provides an encoded stream with
scoped use (so you don't have to close it manually); for instance:
'/home/jimmie/xmldoc.xml' asFileReference writeStreamDo: [ :ws | ws
nextPutAll: writer write contents ]
should default to a file stream outputting utf8.

Cheers,
Henry

*May or may not work in Pharo 7 though, seeing as how the old default
encoded stream has been deprecated, I haven't checked.



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

Jimmie Houchin-5
Thanks. I am trying to learn the ways of Pharo 6 and not use
StandardFileStream and MultiByteFileStream. So I do not know all of the
best ways to do things. Thanks for the education. Your "best" way worked
perfectly in Pharo 6.

Again, thanks.


Jimmie


On 09/14/2017 07:24 AM, Henrik Sperre Johansen wrote:

> Jimmie Houchin-5 wrote
>> I still do not know how to do this correctly. But I have something that
>> seems to work for the moment.
>>
>> Using #asUTF8Bytes
>>
>> f nextPutAll: (writer write contents asUTF8Bytes).
>>
>> Now the file is in UTF-8 and normal size.
>
> openForWriteFileNamed: opens a binary stream (... which incidentally, also
> lets you put strings/widestrings as if they were bytes/doublewords), to
> write string source as utf8, the best way is to wrap it in a stream which
> converts strings -> utf8 bytes, an example:
>
> binaryStream := (ByteArray new: 100) writeStream.
> encodedStream := ZnCharacterWriteStream on: binaryStream encoding: #utf8.
> encodedStream nextPutAll: '€'.
> binaryStream contents
>
> The "best"* way is to use an API which provides an encoded stream with
> scoped use (so you don't have to close it manually); for instance:
> '/home/jimmie/xmldoc.xml' asFileReference writeStreamDo: [ :ws | ws
> nextPutAll: writer write contents ]
> should default to a file stream outputting utf8.
>
> Cheers,
> Henry
>
> *May or may not work in Pharo 7 though, seeing as how the old default
> encoded stream has been deprecated, I haven't checked.
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

Stephane Ducasse-3
Henrik is cool :)

On Thu, Sep 14, 2017 at 8:17 PM, Jimmie Houchin <[hidden email]> wrote:

> Thanks. I am trying to learn the ways of Pharo 6 and not use
> StandardFileStream and MultiByteFileStream. So I do not know all of the best
> ways to do things. Thanks for the education. Your "best" way worked
> perfectly in Pharo 6.
>
> Again, thanks.
>
>
> Jimmie
>
>
>
> On 09/14/2017 07:24 AM, Henrik Sperre Johansen wrote:
>>
>> Jimmie Houchin-5 wrote
>>>
>>> I still do not know how to do this correctly. But I have something that
>>> seems to work for the moment.
>>>
>>> Using #asUTF8Bytes
>>>
>>> f nextPutAll: (writer write contents asUTF8Bytes).
>>>
>>> Now the file is in UTF-8 and normal size.
>>
>>
>> openForWriteFileNamed: opens a binary stream (... which incidentally, also
>> lets you put strings/widestrings as if they were bytes/doublewords), to
>> write string source as utf8, the best way is to wrap it in a stream which
>> converts strings -> utf8 bytes, an example:
>>
>> binaryStream := (ByteArray new: 100) writeStream.
>> encodedStream := ZnCharacterWriteStream on: binaryStream encoding: #utf8.
>> encodedStream nextPutAll: '€'.
>> binaryStream contents
>>
>> The "best"* way is to use an API which provides an encoded stream with
>> scoped use (so you don't have to close it manually); for instance:
>> '/home/jimmie/xmldoc.xml' asFileReference writeStreamDo: [ :ws | ws
>> nextPutAll: writer write contents ]
>> should default to a file stream outputting utf8.
>>
>> Cheers,
>> Henry
>>
>> *May or may not work in Pharo 7 though, seeing as how the old default
>> encoded stream has been deprecated, I haven't checked.
>>
>>
>>
>> --
>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

monty-3
In reply to this post by Jimmie Houchin-5
If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing.

> Sent: Wednesday, September 13, 2017 at 1:02 PM
> From: "Jimmie Houchin" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: [Pharo-users] Writing XML
>
> Hello,
>
> I am attempting to read and write an XML document.
>
> Currently I have parsed the document successfully. I have basic
> navigation and have learned how to modify the XMLDocument.
>
> Now I want to write the modified document back to the file system.
> What I have tried so far is:
>
> writer := XMLWriter new.
> xmldoc document writeXMLOn: writer.
> writer stream.
> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'.
> f nextPutAll: (writer write contents).
> f flush.
> f close.
>
> It does write an xml document to the file system. However, it has
> exploded in size. The original is 28mb and is in UTF-8. The newly
> written file is 112mb and is UTF-32.
>
> I do not know why the change in encoding or how to correct or manually
> set the encoding.
>
> Any help in understanding how to correctly write an XML document that I
> have read and minimally modified would be greatly appreciated.
>
> Thanks.
>
> Jimmie
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

Jimmie Houchin-5
Thanks for the reply.
I appreciate the education the people on this list provide.

I was already doing the XMLDOMParser #onFileNamed: to open the file.
It was showing the correct #encoding for the parsed file.
It was just the writing of the nearly identical file which was different.
I tried as you wrote #printToFileNamed:    and it did as you explained.

Thanks again.

Jimmie

On 09/15/2017 09:29 AM, monty wrote:

> If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing.
>
>> Sent: Wednesday, September 13, 2017 at 1:02 PM
>> From: "Jimmie Houchin" <[hidden email]>
>> To: "Any question about pharo is welcome" <[hidden email]>
>> Subject: [Pharo-users] Writing XML
>>
>> Hello,
>>
>> I am attempting to read and write an XML document.
>>
>> Currently I have parsed the document successfully. I have basic
>> navigation and have learned how to modify the XMLDocument.
>>
>> Now I want to write the modified document back to the file system.
>> What I have tried so far is:
>>
>> writer := XMLWriter new.
>> xmldoc document writeXMLOn: writer.
>> writer stream.
>> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'.
>> f nextPutAll: (writer write contents).
>> f flush.
>> f close.
>>
>> It does write an xml document to the file system. However, it has
>> exploded in size. The original is 28mb and is in UTF-8. The newly
>> written file is 112mb and is UTF-32.
>>
>> I do not know why the change in encoding or how to correct or manually
>> set the encoding.
>>
>> Any help in understanding how to correctly write an XML document that I
>> have read and minimally modified would be greatly appreciated.
>>
>> Thanks.
>>
>> Jimmie
>>
>>


Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

Jimmie Houchin-5
In reply to this post by monty-3
I didn't pay attention to this previously. But I just noticed that using
#printToFileNamed:   preserved the DOM tree's original line ending where
as previously I had to insure the the XMLWriter #lineEnding was changed
from defaultLineEnding to canonicalLineEnding. The original document
used LF and not CR.

Overall this was a nice win.
It cleaned up my method to save the file and reduced 7 lines to 2.
Nice.  :)

Thanks.

Jimmie



On 09/15/2017 09:29 AM, monty wrote:

> If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing.
>
>> Sent: Wednesday, September 13, 2017 at 1:02 PM
>> From: "Jimmie Houchin" <[hidden email]>
>> To: "Any question about pharo is welcome" <[hidden email]>
>> Subject: [Pharo-users] Writing XML
>>
>> Hello,
>>
>> I am attempting to read and write an XML document.
>>
>> Currently I have parsed the document successfully. I have basic
>> navigation and have learned how to modify the XMLDocument.
>>
>> Now I want to write the modified document back to the file system.
>> What I have tried so far is:
>>
>> writer := XMLWriter new.
>> xmldoc document writeXMLOn: writer.
>> writer stream.
>> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'.
>> f nextPutAll: (writer write contents).
>> f flush.
>> f close.
>>
>> It does write an xml document to the file system. However, it has
>> exploded in size. The original is 28mb and is in UTF-8. The newly
>> written file is 112mb and is UTF-32.
>>
>> I do not know why the change in encoding or how to correct or manually
>> set the encoding.
>>
>> Any help in understanding how to correctly write an XML document that I
>> have read and minimally modified would be greatly appreciated.
>>
>> Thanks.
>>
>> Jimmie
>>
>>


Reply | Threaded
Open this post in threaded view
|

Re: Writing XML

monty-3


> Sent: Friday, September 15, 2017 at 4:30 PM
> From: "Jimmie Houchin" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: Re: [Pharo-users] Writing XML
>
> I didn't pay attention to this previously. But I just noticed that using
> #printToFileNamed:   preserved the DOM tree's original line ending where
> as previously I had to insure the the XMLWriter #lineEnding was changed
> from defaultLineEnding to canonicalLineEnding. The original document
> used LF and not CR.

To clarify, #printToFileNamed: and company use CRLF on Windows and LF elsewhere. XMLWriter uses Pharo's LE by default (CR), but it will use the preferred LE of your platform (LF or CRLF) with #enablePlatformSpecificLineBreak, LF when canonical XML (https://www.w3.org/TR/xml-c14n) is enabled, or whatever LE you want with #lineBreak:.

You can use #printToFileNamed:beforeWritingDo: with a block that sends #lineBreak: to the writer argument to get a custom LE when printing a DOM tree to a file.

> Overall this was a nice win.
> It cleaned up my method to save the file and reduced 7 lines to 2.
> Nice.  :)
>
> Thanks.
>
> Jimmie
>
>
>
> On 09/15/2017 09:29 AM, monty wrote:
> > If you want to write a DOM tree to a file, send #printToFileNamed: (or a related message like #canonicallyPrintToFileNamed: or #printToFileNamed:beforeWritingDo:) to the root. See the XMLNode "printing" category for more. This will automatically encode the file with the encoding the XMLDocument>>#encoding attribute specifies (if recognized), and it's portable across Pharo, Squeak, and GemStone. Use #parseFileNamed:/#onFileNamed: to get portable automatic file decoding when parsing.
> >
> >> Sent: Wednesday, September 13, 2017 at 1:02 PM
> >> From: "Jimmie Houchin" <[hidden email]>
> >> To: "Any question about pharo is welcome" <[hidden email]>
> >> Subject: [Pharo-users] Writing XML
> >>
> >> Hello,
> >>
> >> I am attempting to read and write an XML document.
> >>
> >> Currently I have parsed the document successfully. I have basic
> >> navigation and have learned how to modify the XMLDocument.
> >>
> >> Now I want to write the modified document back to the file system.
> >> What I have tried so far is:
> >>
> >> writer := XMLWriter new.
> >> xmldoc document writeXMLOn: writer.
> >> writer stream.
> >> f := File openForWriteFileNamed: '/home/jimmie/xmldoc.xml'.
> >> f nextPutAll: (writer write contents).
> >> f flush.
> >> f close.
> >>
> >> It does write an xml document to the file system. However, it has
> >> exploded in size. The original is 28mb and is in UTF-8. The newly
> >> written file is 112mb and is UTF-32.
> >>
> >> I do not know why the change in encoding or how to correct or manually
> >> set the encoding.
> >>
> >> Any help in understanding how to correctly write an XML document that I
> >> have read and minimally modified would be greatly appreciated.
> >>
> >> Thanks.
> >>
> >> Jimmie
> >>
> >>
>
>
>