Smalltalk › Cincom › VisualWorks

Parsing an xml

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

mani kartha

Parsing an xml

Hi all,

from what i understand i can use the following ways to parse an XML

1) create & communicate with a DOM using XML.Parser with DOM_SAXDriver

2) create & use MyDOM_SAXDriver and get the information in my required format while the Parser actually parses the XML

3) XML to Object Binding

4) create a DOM and use XPath

situation: my requirement is to parse a big xml which contains large amount of data and only 25 % of that data will be useful for me. The data which i intend to collect is fragmented in the xml. Say that the xml has section A,B and C. section A gives a set of independent information, section B gives another set of independed information and section C gives information about how to relate and use the information in section A & B. But the combined information of A,B and C will become only 25% of the total data in the xml.

For the above situation what will be the best way to parse the xml from the above 4? Or if there are other ways. I would like to know how you decide on the trade-off between development effort vs performance of execution (if at all a trade-off is necessary)

Thanks in advance,

Mani

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

Steffen Märcker

Re: Parsing an xml

Hi,

I try to speak from my experience with projects involving XML.
At first, I'd give the DOM parser a try to see whether it it handles the
documents sufficiently fast and does not run out of memory. Only if it
doesn't I'd go and build a specialized DOM_SAXDriver/SAX parser.
I think, querying the DOM with XPath "by hand" is an easy and convenient
way to get your information, especially for testing purposes. But it
becomes pretty complex and hard to maintain as soon as the document, data
or the relations in your model become more complex. If possible, I'd stick
to one of the XO libraries.
I've made good experiences with Cincom's solution which handles both
directions, i.e, XML to Object and vice versa. It is well documented and
provides support XMLSchema specifications.
If marshalling objects to XML is not required or the XML is 'convoluted',
I stick to SimpleXO. It's a library I had to build to map pretty bad and
complex XML documents easily to objects. It provides very flexible
mappings that can be specified using a dedicated DSL or pure ST code. The
latter is the recommended approach. SimpleXO and its test suite are
available in the public repository under the MIT license. Please tell me
if you give it a try!

Kind regards,
Steffen

Am 07.02.2014, 14:46 Uhr, schrieb mani kartha <[hidden email]>:

> Hi all,
>
> from what i understand i can use the following ways to parse an XML
>
> 1) create & communicate with a DOM using XML.Parser with DOM_SAXDriver
> 2) create & use MyDOM_SAXDriver and get the information in my required
> format while the Parser actually parses the XML
> 3) XML to Object Binding
> 4) create a DOM and use XPath
>
> *situation: *my requirement is to parse a big xml which contains large
> amount of data and only 25 % of that data will be useful for me. The data
> which i intend to collect is fragmented in the xml. Say that the xml has
> section A,B and C. section A gives a set of independent information,
> section B gives another set of independed information and section C gives
> information about how to relate and use the information in section A & B.
> But the combined information of A,B and C will become only 25% of the
> total data in the xml.
>
> For the above situation what will be the best way to parse the xml from
> the
> above 4? Or if there are other ways. I would like to know how you decide
> on
> the trade-off between development effort vs performance of execution (if
> at
> all a trade-off is necessary)
>
> Thanks in advance,
>
> Mani

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc