Loading large XML files in MOOSE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Loading large XML files in MOOSE

SergeStinckwich
Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

Tudor Girba-2
I encountered a similar situation before, but there is a way to go beyond that limit.

I do not have the code at my disposal right now, but look at the XMLDOMParser constructor, and at some point you will see a hardcoded limit value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich <[hidden email]> wrote:
Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev



--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

Tudor Girba-2
Hi,

Ok, I looked a bit, and here is a way to get it to work. It is not at all ideal given that it loads the entire string before parsing it, but it surely will work with an xml of your size:

contents := fileReference readStreamDo: [ :stream | stream contents ].
svnlog := (XMLDOMParser on: contents )
               documentReadLimit: contents size;
               parseDocument.

Cheers,
Doru



On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:
I encountered a similar situation before, but there is a way to go beyond that limit.

I do not have the code at my disposal right now, but look at the XMLDOMParser constructor, and at some point you will see a hardcoded limit value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich <[hidden email]> wrote:
Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev



--

"Every thing has its own flow"



--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

abergel
Serge, let us know how it goes. What you are trying to do is really important.

Alexandre


On Jun 7, 2014, at 4:56 PM, Tudor Girba <[hidden email]> wrote:

> Hi,
>
> Ok, I looked a bit, and here is a way to get it to work. It is not at all ideal given that it loads the entire string before parsing it, but it surely will work with an xml of your size:
>
> contents := fileReference readStreamDo: [ :stream | stream contents ].
> svnlog := (XMLDOMParser on: contents )
>                documentReadLimit: contents size;
>                parseDocument.
>
> Cheers,
> Doru
>
>
>
> On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:
> I encountered a similar situation before, but there is a way to go beyond that limit.
>
> I do not have the code at my disposal right now, but look at the XMLDOMParser constructor, and at some point you will see a hardcoded limit value. You should be able to pass another one in.
>
>
> Cheers,
> Doru
>
>
> On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich <[hidden email]> wrote:
> Dear all,
>
> I want to process large XML file (typically event logs with more than
> 80Mb data) and I'm not able to do that at the moment with the XML DOM
> parser (it says I reach the read limit after 3094 XML lines) and I
> guess I will have problem to manage such a large file in memory after
> that.
>
> Should I switch to an event-driven XML parser in order to avoid
> loading all the XML file in memory ? Do we have such a parser for
> Pharo ?
>
> --
> Serge Stinckwich
> UCBN & UMI UMMISCO 209 (IRD/UPMC)
> Every DSL ends up being Smalltalk
> http://www.doesnotunderstand.org/
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>
>
>
> --
> www.tudorgirba.com
>
> "Every thing has its own flow"
>
>
>
> --
> www.tudorgirba.com
>
> "Every thing has its own flow"
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.



_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

SergeStinckwich
In reply to this post by Tudor Girba-2
Thank you Doru for the help.

First case: 74Mb XML file. I'm able to load the string in memory but
the XML parser crash the image before the end.
Second case: 10Mb XML file. I'm able to load the string in memory,
parse the XML file. When I start to process the XML file in order to
create an object structure, the panel "Space is too low" appear.

I would like to give a try with the XMLPullParser. I guess this is
working correctly, because I find it with the Configuration Browser.


On Sat, Jun 7, 2014 at 10:56 PM, Tudor Girba <[hidden email]> wrote:

> Hi,
>
> Ok, I looked a bit, and here is a way to get it to work. It is not at all
> ideal given that it loads the entire string before parsing it, but it surely
> will work with an xml of your size:
>
> contents := fileReference readStreamDo: [ :stream | stream contents ].
> svnlog := (XMLDOMParser on: contents )
>                documentReadLimit: contents size;
>                parseDocument.
>
> Cheers,
> Doru
>
>
>
> On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:
>>
>> I encountered a similar situation before, but there is a way to go beyond
>> that limit.
>>
>> I do not have the code at my disposal right now, but look at the
>> XMLDOMParser constructor, and at some point you will see a hardcoded limit
>> value. You should be able to pass another one in.
>>
>>
>> Cheers,
>> Doru
>>
>>
>> On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich
>> <[hidden email]> wrote:
>>>
>>> Dear all,
>>>
>>> I want to process large XML file (typically event logs with more than
>>> 80Mb data) and I'm not able to do that at the moment with the XML DOM
>>> parser (it says I reach the read limit after 3094 XML lines) and I
>>> guess I will have problem to manage such a large file in memory after
>>> that.
>>>
>>> Should I switch to an event-driven XML parser in order to avoid
>>> loading all the XML file in memory ? Do we have such a parser for
>>> Pharo ?
>>>
>>> --
>>> Serge Stinckwich
>>> UCBN & UMI UMMISCO 209 (IRD/UPMC)
>>> Every DSL ends up being Smalltalk
>>> http://www.doesnotunderstand.org/
>>> _______________________________________________
>>> Moose-dev mailing list
>>> [hidden email]
>>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>>
>>
>>
>> --
>> www.tudorgirba.com
>>
>> "Every thing has its own flow"
>
>
>
>
> --
> www.tudorgirba.com
>
> "Every thing has its own flow"
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>



--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

abergel
Serge, you are using a mac don’t you? Have you tried to augment the memory of the image?

Open using a text editor the Info.plist file, contained in the VM folder. By checking on the internet, 1880000000 is apparently the biggest value possible.



Cheers,
Alexandre


On Jun 8, 2014, at 4:35 PM, Serge Stinckwich <[hidden email]> wrote:

Thank you Doru for the help.

First case: 74Mb XML file. I'm able to load the string in memory but
the XML parser crash the image before the end.
Second case: 10Mb XML file. I'm able to load the string in memory,
parse the XML file. When I start to process the XML file in order to
create an object structure, the panel "Space is too low" appear.

I would like to give a try with the XMLPullParser. I guess this is
working correctly, because I find it with the Configuration Browser.


On Sat, Jun 7, 2014 at 10:56 PM, Tudor Girba <[hidden email]> wrote:
Hi,

Ok, I looked a bit, and here is a way to get it to work. It is not at all
ideal given that it loads the entire string before parsing it, but it surely
will work with an xml of your size:

contents := fileReference readStreamDo: [ :stream | stream contents ].
svnlog := (XMLDOMParser on: contents )
              documentReadLimit: contents size;
              parseDocument.

Cheers,
Doru



On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:

I encountered a similar situation before, but there is a way to go beyond
that limit.

I do not have the code at my disposal right now, but look at the
XMLDOMParser constructor, and at some point you will see a hardcoded limit
value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich
<[hidden email]> wrote:

Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
www.tudorgirba.com

"Every thing has its own flow"




--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




-- 
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- 
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

SergeStinckwich
Thank you for the trick Alex. We should put these information somewhere, maybe in a tuning chapter of Pharo Enterprise book ?
Unfortunately even with the trick, I was unable to parse a 10Mb XML file ... Pharo memory grows up until 1.9 Go and crash after that.

I will try now with the pull parser.




On Sun, Jun 8, 2014 at 11:05 PM, Alexandre Bergel <[hidden email]> wrote:
Serge, you are using a mac don’t you? Have you tried to augment the memory of the image?

Open using a text editor the Info.plist file, contained in the VM folder. By checking on the internet, 1880000000 is apparently the biggest value possible.



Cheers,
Alexandre


On Jun 8, 2014, at 4:35 PM, Serge Stinckwich <[hidden email]> wrote:

Thank you Doru for the help.

First case: 74Mb XML file. I'm able to load the string in memory but
the XML parser crash the image before the end.
Second case: 10Mb XML file. I'm able to load the string in memory,
parse the XML file. When I start to process the XML file in order to
create an object structure, the panel "Space is too low" appear.

I would like to give a try with the XMLPullParser. I guess this is
working correctly, because I find it with the Configuration Browser.


On Sat, Jun 7, 2014 at 10:56 PM, Tudor Girba <[hidden email]> wrote:
Hi,

Ok, I looked a bit, and here is a way to get it to work. It is not at all
ideal given that it loads the entire string before parsing it, but it surely
will work with an xml of your size:

contents := fileReference readStreamDo: [ :stream | stream contents ].
svnlog := (XMLDOMParser on: contents )
              documentReadLimit: contents size;
              parseDocument.

Cheers,
Doru



On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:

I encountered a similar situation before, but there is a way to go beyond
that limit.

I do not have the code at my disposal right now, but look at the
XMLDOMParser constructor, and at some point you will see a hardcoded limit
value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich
<[hidden email]> wrote:

Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
www.tudorgirba.com

"Every thing has its own flow"




--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




-- 
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- 

_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

Tudor Girba-2
I think you are doing something wrong. Could you pass me the XML file?

Doru


On Mon, Jun 9, 2014 at 9:49 PM, Serge Stinckwich <[hidden email]> wrote:
Thank you for the trick Alex. We should put these information somewhere, maybe in a tuning chapter of Pharo Enterprise book ?
Unfortunately even with the trick, I was unable to parse a 10Mb XML file ... Pharo memory grows up until 1.9 Go and crash after that.

I will try now with the pull parser.




On Sun, Jun 8, 2014 at 11:05 PM, Alexandre Bergel <[hidden email]> wrote:
Serge, you are using a mac don’t you? Have you tried to augment the memory of the image?

Open using a text editor the Info.plist file, contained in the VM folder. By checking on the internet, 1880000000 is apparently the biggest value possible.



Cheers,
Alexandre


On Jun 8, 2014, at 4:35 PM, Serge Stinckwich <[hidden email]> wrote:

Thank you Doru for the help.

First case: 74Mb XML file. I'm able to load the string in memory but
the XML parser crash the image before the end.
Second case: 10Mb XML file. I'm able to load the string in memory,
parse the XML file. When I start to process the XML file in order to
create an object structure, the panel "Space is too low" appear.

I would like to give a try with the XMLPullParser. I guess this is
working correctly, because I find it with the Configuration Browser.


On Sat, Jun 7, 2014 at 10:56 PM, Tudor Girba <[hidden email]> wrote:
Hi,

Ok, I looked a bit, and here is a way to get it to work. It is not at all
ideal given that it loads the entire string before parsing it, but it surely
will work with an xml of your size:

contents := fileReference readStreamDo: [ :stream | stream contents ].
svnlog := (XMLDOMParser on: contents )
              documentReadLimit: contents size;
              parseDocument.

Cheers,
Doru



On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:

I encountered a similar situation before, but there is a way to go beyond
that limit.

I do not have the code at my disposal right now, but look at the
XMLDOMParser constructor, and at some point you will see a hardcoded limit
value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich
<[hidden email]> wrote:

Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
www.tudorgirba.com

"Every thing has its own flow"




--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




-- 
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- 

_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

SergeStinckwich
You can try in a fresh Moose 5.0 image :

==================================================================
|graph builder contents document|

Gofer it
   package: 'Moose-XES'; load.
contents := ((GZipReadStream on: 'http://data.3tu.nl/repository/uuid:f5ea9bc6-536f-4744-9c6f-9eb45a907178/DATA1' asUrl retrieveContents) upToEnd) asString.
document := (XMLDOMParser on: contents)
               documentReadLimit: contents size;
               parseDocument.
graph := (XESAlphaAlgorithm on:(XESParser new parseDocument: document)) run.
builder := RTGraphBuilder new.
builder nodes if:[:m| m type = #transition]; shape:(RTEllipse new size: 20) + RTLabel.
builder nodes if:[:m| m type = #place]; shape:(RTBox new size: 20).
builder edges
seed: graph edges;
connectFrom: #from; 
connectTo: #to;
useInLayout.
builder layout forceCharge: -350.
builder addAll: graph nodes.
builder open.
==================================================================

The problem is not parsing the XML file but generating the object tree after that.

Thank you.


On Mon, Jun 9, 2014 at 10:15 PM, Tudor Girba <[hidden email]> wrote:
I think you are doing something wrong. Could you pass me the XML file?

Doru


On Mon, Jun 9, 2014 at 9:49 PM, Serge Stinckwich <[hidden email]> wrote:
Thank you for the trick Alex. We should put these information somewhere, maybe in a tuning chapter of Pharo Enterprise book ?
Unfortunately even with the trick, I was unable to parse a 10Mb XML file ... Pharo memory grows up until 1.9 Go and crash after that.

I will try now with the pull parser.




On Sun, Jun 8, 2014 at 11:05 PM, Alexandre Bergel <[hidden email]> wrote:
Serge, you are using a mac don’t you? Have you tried to augment the memory of the image?

Open using a text editor the Info.plist file, contained in the VM folder. By checking on the internet, 1880000000 is apparently the biggest value possible.



Cheers,
Alexandre


On Jun 8, 2014, at 4:35 PM, Serge Stinckwich <[hidden email]> wrote:

Thank you Doru for the help.

First case: 74Mb XML file. I'm able to load the string in memory but
the XML parser crash the image before the end.
Second case: 10Mb XML file. I'm able to load the string in memory,
parse the XML file. When I start to process the XML file in order to
create an object structure, the panel "Space is too low" appear.

I would like to give a try with the XMLPullParser. I guess this is
working correctly, because I find it with the Configuration Browser.


On Sat, Jun 7, 2014 at 10:56 PM, Tudor Girba <[hidden email]> wrote:
Hi,

Ok, I looked a bit, and here is a way to get it to work. It is not at all
ideal given that it loads the entire string before parsing it, but it surely
will work with an xml of your size:

contents := fileReference readStreamDo: [ :stream | stream contents ].
svnlog := (XMLDOMParser on: contents )
              documentReadLimit: contents size;
              parseDocument.

Cheers,
Doru



On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:

I encountered a similar situation before, but there is a way to go beyond
that limit.

I do not have the code at my disposal right now, but look at the
XMLDOMParser constructor, and at some point you will see a hardcoded limit
value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich
<[hidden email]> wrote:

Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
www.tudorgirba.com

"Every thing has its own flow"




--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




-- 
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- 

_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

Tudor Girba-2
I am not sure I understand.

The issue is not the XML loading at all. Instead, it seems to be in the code that produces the model out of the DOM tree.

In this case, why would you look for the PullXMLParser?

Doru



On Mon, Jun 9, 2014 at 10:41 PM, Serge Stinckwich <[hidden email]> wrote:
You can try in a fresh Moose 5.0 image :

==================================================================
|graph builder contents document|

Gofer it
   package: 'Moose-XES'; load.
contents := ((GZipReadStream on: 'http://data.3tu.nl/repository/uuid:f5ea9bc6-536f-4744-9c6f-9eb45a907178/DATA1' asUrl retrieveContents) upToEnd) asString.
document := (XMLDOMParser on: contents)
               documentReadLimit: contents size;
               parseDocument.
graph := (XESAlphaAlgorithm on:(XESParser new parseDocument: document)) run.
builder := RTGraphBuilder new.
builder nodes if:[:m| m type = #transition]; shape:(RTEllipse new size: 20) + RTLabel.
builder nodes if:[:m| m type = #place]; shape:(RTBox new size: 20).
builder edges
seed: graph edges;
connectFrom: #from; 
connectTo: #to;
useInLayout.
builder layout forceCharge: -350.
builder addAll: graph nodes.
builder open.
==================================================================

The problem is not parsing the XML file but generating the object tree after that.

Thank you.


On Mon, Jun 9, 2014 at 10:15 PM, Tudor Girba <[hidden email]> wrote:
I think you are doing something wrong. Could you pass me the XML file?

Doru


On Mon, Jun 9, 2014 at 9:49 PM, Serge Stinckwich <[hidden email]> wrote:
Thank you for the trick Alex. We should put these information somewhere, maybe in a tuning chapter of Pharo Enterprise book ?
Unfortunately even with the trick, I was unable to parse a 10Mb XML file ... Pharo memory grows up until 1.9 Go and crash after that.

I will try now with the pull parser.




On Sun, Jun 8, 2014 at 11:05 PM, Alexandre Bergel <[hidden email]> wrote:
Serge, you are using a mac don’t you? Have you tried to augment the memory of the image?

Open using a text editor the Info.plist file, contained in the VM folder. By checking on the internet, 1880000000 is apparently the biggest value possible.



Cheers,
Alexandre


On Jun 8, 2014, at 4:35 PM, Serge Stinckwich <[hidden email]> wrote:

Thank you Doru for the help.

First case: 74Mb XML file. I'm able to load the string in memory but
the XML parser crash the image before the end.
Second case: 10Mb XML file. I'm able to load the string in memory,
parse the XML file. When I start to process the XML file in order to
create an object structure, the panel "Space is too low" appear.

I would like to give a try with the XMLPullParser. I guess this is
working correctly, because I find it with the Configuration Browser.


On Sat, Jun 7, 2014 at 10:56 PM, Tudor Girba <[hidden email]> wrote:
Hi,

Ok, I looked a bit, and here is a way to get it to work. It is not at all
ideal given that it loads the entire string before parsing it, but it surely
will work with an xml of your size:

contents := fileReference readStreamDo: [ :stream | stream contents ].
svnlog := (XMLDOMParser on: contents )
              documentReadLimit: contents size;
              parseDocument.

Cheers,
Doru



On Fri, Jun 6, 2014 at 12:50 PM, Tudor Girba <[hidden email]> wrote:

I encountered a similar situation before, but there is a way to go beyond
that limit.

I do not have the code at my disposal right now, but look at the
XMLDOMParser constructor, and at some point you will see a hardcoded limit
value. You should be able to pass another one in.


Cheers,
Doru


On Fri, Jun 6, 2014 at 11:22 AM, Serge Stinckwich
<[hidden email]> wrote:

Dear all,

I want to process large XML file (typically event logs with more than
80Mb data) and I'm not able to do that at the moment with the XML DOM
parser (it says I reach the read limit after 3094 XML lines) and I
guess I will have problem to manage such a large file in memory after
that.

Should I switch to an event-driven XML parser in order to avoid
loading all the XML file in memory ? Do we have such a parser for
Pharo ?

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
www.tudorgirba.com

"Every thing has its own flow"




--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




-- 
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- 

_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

SergeStinckwich



On Wed, Jun 11, 2014 at 4:22 PM, Tudor Girba <[hidden email]> wrote:
I am not sure I understand.

The issue is not the XML loading at all. Instead, it seems to be in the code that produces the model out of the DOM tree.

Yes, you are right.
 
In this case, why would you look for the PullXMLParser?


Saving some memory space for my model ;-)
 
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

Tudor Girba-2
I see. But, I think your problem is of a different nature given that you said that it went to 1.9 Gb :)

Doru




On Wed, Jun 11, 2014 at 4:38 PM, Serge Stinckwich <[hidden email]> wrote:



On Wed, Jun 11, 2014 at 4:22 PM, Tudor Girba <[hidden email]> wrote:
I am not sure I understand.

The issue is not the XML loading at all. Instead, it seems to be in the code that produces the model out of the DOM tree.

Yes, you are right.
 
In this case, why would you look for the PullXMLParser?


Saving some memory space for my model ;-)
 
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev




--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

SergeStinckwich
hum, you are right, I have to think how to reduce the footprint of my
model creation ...
I don't have time to wait for Spur ;-)


On Wed, Jun 11, 2014 at 4:53 PM, Tudor Girba <[hidden email]> wrote:

> I see. But, I think your problem is of a different nature given that you
> said that it went to 1.9 Gb :)
>
> Doru
>
>
>
>
> On Wed, Jun 11, 2014 at 4:38 PM, Serge Stinckwich
> <[hidden email]> wrote:
>>
>>
>>
>>
>> On Wed, Jun 11, 2014 at 4:22 PM, Tudor Girba <[hidden email]> wrote:
>>>
>>> I am not sure I understand.
>>>
>>> The issue is not the XML loading at all. Instead, it seems to be in the
>>> code that produces the model out of the DOM tree.
>>
>>
>> Yes, you are right.
>>
>>>
>>> In this case, why would you look for the PullXMLParser?
>>>
>>
>> Saving some memory space for my model ;-)
>>
>> --
>> Serge Stinckwich
>> UCBN & UMI UMMISCO 209 (IRD/UPMC)
>> Every DSL ends up being Smalltalk
>> http://www.doesnotunderstand.org/
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>
>
>
> --
> www.tudorgirba.com
>
> "Every thing has its own flow"
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>



--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

hernanmd
I am *very* suprised of this thread given that I have reported several
times in the pharo mailing lists that XML DOM parsing is not the right
solution for processing large XML files.

XMLDOM parsing is not even adequate for files larger than 10 Mbytes.
Why some people insist using it?

The XMLPullParser is the right way to go.

Cheers,

Hernán


2014-06-11 12:01 GMT-03:00 Serge Stinckwich <[hidden email]>:

> hum, you are right, I have to think how to reduce the footprint of my
> model creation ...
> I don't have time to wait for Spur ;-)
>
>
> On Wed, Jun 11, 2014 at 4:53 PM, Tudor Girba <[hidden email]> wrote:
>> I see. But, I think your problem is of a different nature given that you
>> said that it went to 1.9 Gb :)
>>
>> Doru
>>
>>
>>
>>
>> On Wed, Jun 11, 2014 at 4:38 PM, Serge Stinckwich
>> <[hidden email]> wrote:
>>>
>>>
>>>
>>>
>>> On Wed, Jun 11, 2014 at 4:22 PM, Tudor Girba <[hidden email]> wrote:
>>>>
>>>> I am not sure I understand.
>>>>
>>>> The issue is not the XML loading at all. Instead, it seems to be in the
>>>> code that produces the model out of the DOM tree.
>>>
>>>
>>> Yes, you are right.
>>>
>>>>
>>>> In this case, why would you look for the PullXMLParser?
>>>>
>>>
>>> Saving some memory space for my model ;-)
>>>
>>> --
>>> Serge Stinckwich
>>> UCBN & UMI UMMISCO 209 (IRD/UPMC)
>>> Every DSL ends up being Smalltalk
>>> http://www.doesnotunderstand.org/
>>>
>>> _______________________________________________
>>> Moose-dev mailing list
>>> [hidden email]
>>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>>
>>
>>
>>
>> --
>> www.tudorgirba.com
>>
>> "Every thing has its own flow"
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>
>
>
> --
> Serge Stinckwich
> UCBN & UMI UMMISCO 209 (IRD/UPMC)
> Every DSL ends up being Smalltalk
> http://www.doesnotunderstand.org/
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Loading large XML files in MOOSE

Tudor Girba-2
Hi,

I would distinguish between use cases.

XMLDOM is extremely convenient for prototyping in particular in conjunction with GTInspector. I do not have to leave the inspection context to write code somewhere else and come back at a later time. Instead I focus on extracting what I want. I worked like this with XML files of ~100MB without any problems other than the funny default size limit that you have to overpass explicitly.

However, it is true that I would not recommend it for long term usage.

Cheers,
Doru


On Sat, Jun 14, 2014 at 11:00 PM, Hernán Morales Durand <[hidden email]> wrote:
I am *very* suprised of this thread given that I have reported several
times in the pharo mailing lists that XML DOM parsing is not the right
solution for processing large XML files.

XMLDOM parsing is not even adequate for files larger than 10 Mbytes.
Why some people insist using it?

The XMLPullParser is the right way to go.

Cheers,

Hernán


2014-06-11 12:01 GMT-03:00 Serge Stinckwich <[hidden email]>:
> hum, you are right, I have to think how to reduce the footprint of my
> model creation ...
> I don't have time to wait for Spur ;-)
>
>
> On Wed, Jun 11, 2014 at 4:53 PM, Tudor Girba <[hidden email]> wrote:
>> I see. But, I think your problem is of a different nature given that you
>> said that it went to 1.9 Gb :)
>>
>> Doru
>>
>>
>>
>>
>> On Wed, Jun 11, 2014 at 4:38 PM, Serge Stinckwich
>> <[hidden email]> wrote:
>>>
>>>
>>>
>>>
>>> On Wed, Jun 11, 2014 at 4:22 PM, Tudor Girba <[hidden email]> wrote:
>>>>
>>>> I am not sure I understand.
>>>>
>>>> The issue is not the XML loading at all. Instead, it seems to be in the
>>>> code that produces the model out of the DOM tree.
>>>
>>>
>>> Yes, you are right.
>>>
>>>>
>>>> In this case, why would you look for the PullXMLParser?
>>>>
>>>
>>> Saving some memory space for my model ;-)
>>>
>>> --
>>> Serge Stinckwich
>>> UCBN & UMI UMMISCO 209 (IRD/UPMC)
>>> Every DSL ends up being Smalltalk
>>> http://www.doesnotunderstand.org/
>>>
>>> _______________________________________________
>>> Moose-dev mailing list
>>> [hidden email]
>>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>>
>>
>>
>>
>> --
>> www.tudorgirba.com
>>
>> "Every thing has its own flow"
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>
>
>
> --
> Serge Stinckwich
> UCBN & UMI UMMISCO 209 (IRD/UPMC)
> Every DSL ends up being Smalltalk
> http://www.doesnotunderstand.org/
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev



--

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev