Re: XML Demarshalling (Hern?n Morales Durand)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Larry Gadallah
On 16 July 2010 11:54,  <[hidden email]> wrote:

> Message: 7
> Date: Fri, 16 Jul 2010 15:52:27 -0300
> From: Hern?n Morales Durand <[hidden email]>
> Subject: Re: [Pharo-project] XML Demarshalling
> To: [hidden email]
> Message-ID:
>        <[hidden email]>
> Content-Type: text/plain; charset=UTF-8
>
> How big is your XML file? Molecular biology?
>
> Hern?n
>
> 2010/7/16 Larry Gadallah <[hidden email]>:
>> Hi all:
>>
>> I am a noob searching for an example of how to demarshall a set of
>> simple and complex XML nodes into an object using Pastell or something
>> similar. I have found many examples of how to pick out all of the
>> elements of a certain type into a collection, but I'm trying to grab
>> the data from a mixed collection of (~20) elements that are children
>> of a particular element type and unpack them into an object.
>>
>> Does anyone know of some good examples of how to do this using Pastell
>> or something similar?
>>

Hernan:

The files are definitely not molecular biology. They are quite small,
and they represent simple catalog items with typical attributes (i.e.
item number, name, rating, etc.).

The element hierarchy looks something like this:

<catalog>
<status>
</status>
<products>
<list>
<product>

<id></id>
<name></name>
<reviews>
...
</reviews>
<labels>
....
<labels>
<type></type>
</product>

</list>
</products>
</catalog>

So, what I want to do is be able to demarshall <product> elements into
a product object. I have seen plenty of examples of how to grab the
content of all of the name elements, but none showing how to pull all
of the sub-elements of a given element type into an object.

Thanks,
--
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 917E DDB7 C911 9EC1 0CD9  C06B 06C4 835F 0BB8 7336

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Levente Uzonyi-2
On Fri, 16 Jul 2010, Larry Gadallah wrote:

> On 16 July 2010 11:54,  <[hidden email]> wrote:
>> Message: 7
>> Date: Fri, 16 Jul 2010 15:52:27 -0300
>> From: Hern?n Morales Durand <[hidden email]>
>> Subject: Re: [Pharo-project] XML Demarshalling
>> To: [hidden email]
>> Message-ID:
>>        <[hidden email]>
>> Content-Type: text/plain; charset=UTF-8
>>
>> How big is your XML file? Molecular biology?
>>
>> Hern?n
>>
>> 2010/7/16 Larry Gadallah <[hidden email]>:
>>> Hi all:
>>>
>>> I am a noob searching for an example of how to demarshall a set of
>>> simple and complex XML nodes into an object using Pastell or something
>>> similar. I have found many examples of how to pick out all of the
>>> elements of a certain type into a collection, but I'm trying to grab
>>> the data from a mixed collection of (~20) elements that are children
>>> of a particular element type and unpack them into an object.
>>>
>>> Does anyone know of some good examples of how to do this using Pastell
>>> or something similar?
>>>
>
> Hernan:
>
> The files are definitely not molecular biology. They are quite small,
> and they represent simple catalog items with typical attributes (i.e.
> item number, name, rating, etc.).
>
> The element hierarchy looks something like this:
>
> <catalog>
> <status>
> </status>
> <products>
> <list>
> <product>
>
> <id></id>
> <name></name>
> <reviews>
> ...
> </reviews>
> <labels>
> ....
> <labels>
> <type></type>
> </product>
>
> </list>
> </products>
> </catalog>
>
> So, what I want to do is be able to demarshall <product> elements into
> a product object. I have seen plenty of examples of how to grab the
> content of all of the name elements, but none showing how to pull all
> of the sub-elements of a given element type into an object.
If your xml documents are small (fit into memory), then the best is to use
XMLDOMParser to build you a dom tree. Then you can do all kind of queries
depending on the structure. The following example assumes a flat structure
and non-repeating child nodes:

doc := '<yourxmldocument/>'
dom := XMLDOMParser parseDocumentFrom: doc readStream.
objects := Array streamContents: [ :stream |
  dom tagsNamed: #product do: [ :tag |
  | object |
  object := Dictionary new.
  tag elementsDo: [ :element |
  object at: element name put: element contentString ].
  stream nextPut: object ] ].

(Btw the idea that XMLElement >> #tag is returning a symbol and the
#tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It
degrades performance.)

And where did the XML-Parser go from Pharo?


Levente

>
> Thanks,
> --
> Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
> PGP Sig: 917E DDB7 C911 9EC1 0CD9  C06B 06C4 835F 0BB8 7336
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Alexandre Bergel
> (Btw the idea that XMLElement >> #tag is returning a symbol and the #tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It degrades performance.)

Feel free to provide a fix. I will be pleased to review and include it.

Pastell is also very convenient for XPath querying.

Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Levente Uzonyi-2
On Fri, 16 Jul 2010, Alexandre Bergel wrote:

>> (Btw the idea that XMLElement >> #tag is returning a symbol and the #tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It degrades performance.)
>
> Feel free to provide a fix. I will be pleased to review and include it.

Well, thinking about it again I found that it depends on the use case.
It's faster if the xml is small, has only a few different tags and not
many tags use namespaces. It's slower in other cases. Also the
parse/extract matters. Btw I still can't find the package in a core image.


Levente

>
> Pastell is also very convenient for XPath querying.
>
> Cheers,
> Alexandre
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel  http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Alexandre Bergel
> Well, thinking about it again I found that it depends on the use case. It's faster if the xml is small, has only a few different tags and not many tags use namespaces. It's slower in other cases. Also the parse/extract matters. Btw I still can't find the package in a core image.

Which package are you referring to? XML-Support in the core?

Cheers,
Alexandre

>
>
> Levente
>
>>
>> Pastell is also very convenient for XPath querying.
>>
>> Cheers,
>> Alexandre
>> --
>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>> Alexandre Bergel  http://www.bergel.eu
>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

jaayer
In reply to this post by Levente Uzonyi-2


---- On Fri, 16 Jul 2010 12:25:54 -0700 Levente Uzonyi  wrote ----

>(Btw the idea that XMLElement >> #tag is returning a symbol and the
>#tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It
>degrades performance.)
>
>And where did the XML-Parser go from Pharo?

XMLSupport has completely string-based for a few months now. Expecting or supplying symbols will probably still work in Squeak and Pharo as Symbol is a subclass of string and #test = ' test' evaluates to true, but this is not portable and, as you pointed out, results in a degradation of performance.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

jaayer
In reply to this post by Levente Uzonyi-2


---- On Fri, 16 Jul 2010 15:54:31 -0700 Levente Uzonyi  wrote ----

>On Fri, 16 Jul 2010, Alexandre Bergel wrote:
>
>>> (Btw the idea that XMLElement >> #tag is returning a symbol and the #tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It degrades performance.)
>>
>> Feel free to provide a fix. I will be pleased to review and include it.

It was fixed some time ago; I posted about it on this very group a few months back.

>Well, thinking about it again I found that it depends on the use case.
>It's faster if the xml is small, has only a few different tags and not
>many tags use namespaces. It's slower in other cases. Also the
>parse/extract matters. Btw I still can't find the package in a core image.


The package is XMLSupport on squeaksource.com.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Stéphane Ducasse
is there a configurationOfXMLParser for metacelloRepository?

Stef

On Jul 18, 2010, at 7:27 AM, jaayer wrote:

>
>
> ---- On Fri, 16 Jul 2010 15:54:31 -0700 Levente Uzonyi  wrote ----
>
>> On Fri, 16 Jul 2010, Alexandre Bergel wrote:
>>
>>>> (Btw the idea that XMLElement >> #tag is returning a symbol and the #tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It degrades performance.)
>>>
>>> Feel free to provide a fix. I will be pleased to review and include it.
>
> It was fixed some time ago; I posted about it on this very group a few months back.
>
>> Well, thinking about it again I found that it depends on the use case.
>> It's faster if the xml is small, has only a few different tags and not
>> many tags use namespaces. It's slower in other cases. Also the
>> parse/extract matters. Btw I still can't find the package in a core image.
>
>
> The package is XMLSupport on squeaksource.com.
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

jaayer


---- On Sun, 18 Jul 2010 04:01:33 -0700 Stéphane Ducasse  wrote ----

>is there a configurationOfXMLParser for metacelloRepository?
>
>Stef
>

Alexandre uploaded two ConfigurationOfXMLSupport packages: http://www.squeaksource.com/XMLSupport.html


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Stéphane Ducasse
excellent!

tx alex
On Jul 18, 2010, at 5:01 PM, jaayer wrote:

>
>
> ---- On Sun, 18 Jul 2010 04:01:33 -0700 Stéphane Ducasse  wrote ----
>
>> is there a configurationOfXMLParser for metacelloRepository?
>>
>> Stef
>>
>
> Alexandre uploaded two ConfigurationOfXMLSupport packages: http://www.squeaksource.com/XMLSupport.html
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Alexandre Bergel
In reply to this post by jaayer
> #test = ' test' evaluates to true, but this is not portable and, as you pointed out, results in a degradation of performance.

I am sure Jaayer meant #test = 'test', without space between the ''

Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

jaayer


---- On Sun, 18 Jul 2010 12:46:48 -0700 Alexandre Bergel <[hidden email]> wrote ----

 > > #test = ' test' evaluates to true, but this is not portable and, as you pointed out, results in a degradation of performance.
 >  
 > I am sure Jaayer meant #test = 'test', without space between the ''
 >  
 > Cheers,
 > Alexandre

Oops. Good catch.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Levente Uzonyi-2
In reply to this post by jaayer
On Sat, 17 Jul 2010, jaayer wrote:

>
>
> ---- On Fri, 16 Jul 2010 12:25:54 -0700 Levente Uzonyi  wrote ----
>
>> (Btw the idea that XMLElement >> #tag is returning a symbol and the
>> #tags*[dD]o: methods are expecting a symbol for tag is a very bad idea. It
>> degrades performance.)
>>
>> And where did the XML-Parser go from Pharo?
>
> XMLSupport has completely string-based for a few months now. Expecting or supplying symbols will probably still work in Squeak and Pharo as Symbol is a subclass of string and #test = ' test' evaluates to true, but this is not portable and, as you pointed out, results in a degradation of performance.

I see. Lots of things changed this year.
There were no string-symbol comparison in the code I checked. The parser
converted all tagnames to symbols and the queries expected symbol
arguments, so symbols were compared with symbols. The problem with this
approach is that the symbol table is spammed with all the tags found in
the xml document and parsing is slower. The queries are faster of course
(if no namespaces are involved).
Btw is the package ment to be cross-dialect?


Levente

>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

Alexandre Bergel
> There were no string-symbol comparison in the code I checked. The parser converted all tagnames to symbols and the queries expected symbol arguments, so symbols were compared with symbols. The problem with this approach is that the symbol table is spammed with all the tags found in the xml document and parsing is slower. The queries are faster of course (if no namespaces are involved).
> Btw is the package ment to be cross-dialect?

Not that I am aware of. Maybe Dale ported it to Gemstone, but I am not sure.

Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

hernanmd
In reply to this post by Larry Gadallah
Hi Larry,

I think what you're searching for is something like some experiments I
did with this project http://www.squeaksource.com/LanguageInfo.html

you will find information on how to install and test here
http://swikicaicyt.homeip.net/WebOpus/44 (do not forget to download
and uncompress the data files)

For your specific question you may put a halt into
LIStatisticsReader>>addStatistics or CDLRISO15924Reader>>addScripts to
see examples of demarshalling a XML file. Hope it helps.

Hernán

2010/7/16 Larry Gadallah <[hidden email]>:

> On 16 July 2010 11:54,  <[hidden email]> wrote:
>> Message: 7
>> Date: Fri, 16 Jul 2010 15:52:27 -0300
>> From: Hern?n Morales Durand <[hidden email]>
>> Subject: Re: [Pharo-project] XML Demarshalling
>> To: [hidden email]
>> Message-ID:
>>        <[hidden email]>
>> Content-Type: text/plain; charset=UTF-8
>>
>> How big is your XML file? Molecular biology?
>>
>> Hern?n
>>
>> 2010/7/16 Larry Gadallah <[hidden email]>:
>>> Hi all:
>>>
>>> I am a noob searching for an example of how to demarshall a set of
>>> simple and complex XML nodes into an object using Pastell or something
>>> similar. I have found many examples of how to pick out all of the
>>> elements of a certain type into a collection, but I'm trying to grab
>>> the data from a mixed collection of (~20) elements that are children
>>> of a particular element type and unpack them into an object.
>>>
>>> Does anyone know of some good examples of how to do this using Pastell
>>> or something similar?
>>>
>
> Hernan:
>
> The files are definitely not molecular biology. They are quite small,
> and they represent simple catalog items with typical attributes (i.e.
> item number, name, rating, etc.).
>
> The element hierarchy looks something like this:
>
> <catalog>
> <status>
> </status>
> <products>
> <list>
> <product>
>
> <id></id>
> <name></name>
> <reviews>
> ...
> </reviews>
> <labels>
> ....
> <labels>
> <type></type>
> </product>
>
> </list>
> </products>
> </catalog>
>
> So, what I want to do is be able to demarshall <product> elements into
> a product object. I have seen plenty of examples of how to grab the
> content of all of the name elements, but none showing how to pull all
> of the sub-elements of a given element type into an object.
>
> Thanks,
> --
> Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
> PGP Sig: 917E DDB7 C911 9EC1 0CD9  C06B 06C4 835F 0BB8 7336
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: XML Demarshalling (Hern?n Morales Durand)

NorbertHartl
In reply to this post by Alexandre Bergel

On 19.07.2010, at 16:33, Alexandre Bergel wrote:

>> There were no string-symbol comparison in the code I checked. The parser converted all tagnames to symbols and the queries expected symbol arguments, so symbols were compared with symbols. The problem with this approach is that the symbol table is spammed with all the tags found in the xml document and parsing is slower. The queries are faster of course (if no namespaces are involved).
>> Btw is the package ment to be cross-dialect?
>
> Not that I am aware of. Maybe Dale ported it to Gemstone, but I am not sure.
>
I did it. But the last version I ported is probably two month old. I'll (or someone else) pick up the newest versions and bring to gemstone. We discussed the whole symbol thing because in gemstone #test = 'test' is false and so we had to tweak some things. I hope the latest version in ConfigurationOfXMLSupport for gemstone is still right. I can provide more details if needed.

Norbert



_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project