PetitParser and huge files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

PetitParser and huge files

Blondeau Vincent

Hi,

 

I would like to parse a log file with PetitParser. I wrote the parser and it works with a test file of 5,3 Mo.

But the real case is 73,6 Mo…

The main problem is that PetitParser calls asPetitStream, which calls contents on the file.

At this point,  the primitive 72 #elementsForwardIdentityTo: fails for #'insufficient object memory'. However, my image is new so it should not raise an error because I have at least 500 Mo of free space.

 

Is there an way to parse the file as a stream to avoid the call to contents?

Maybe by using PPStream?

 

Thanks in advance,

 

Cheers,

Vincent

 


*************************************************************************************
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted."

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitParser and huge files

Guillaume Larcheveque
You should feed your parser line by line if it is a log file.

It is a bad idea to give a filestream to a parser because a parser will have a random access to stream position with backwards so it will be cruelly inefficient.

2016-02-24 18:49 GMT+01:00 Blondeau Vincent <[hidden email]>:

Hi,

 

I would like to parse a log file with PetitParser. I wrote the parser and it works with a test file of 5,3 Mo.

But the real case is 73,6 Mo…

The main problem is that PetitParser calls asPetitStream, which calls contents on the file.

At this point,  the primitive 72 #elementsForwardIdentityTo: fails for #'insufficient object memory'. However, my image is new so it should not raise an error because I have at least 500 Mo of free space.

 

Is there an way to parse the file as a stream to avoid the call to contents?

Maybe by using PPStream?

 

Thanks in advance,

 

Cheers,

Vincent

 


*************************************************************************************
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted."

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev




--
Guillaume Larcheveque


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitParser and huge files

Jan Kurš

Hi,

PetitParser does not usually backtrack much. So if you provide a class with PetitStream interface and limited buffer for backtracking (eg few lines) PetitParser will work.

The core problem will be to implement the #position: message.

I think Jan Vrany did something similar in Smalltalk/X dialect and I plan to add this feature as well.

Cheers Jan


On Wed, Feb 24, 2016, 9:57 AM Guillaume Larcheveque <[hidden email]> wrote:
You should feed your parser line by line if it is a log file.

It is a bad idea to give a filestream to a parser because a parser will have a random access to stream position with backwards so it will be cruelly inefficient.

2016-02-24 18:49 GMT+01:00 Blondeau Vincent <[hidden email]>:

Hi,

 

I would like to parse a log file with PetitParser. I wrote the parser and it works with a test file of 5,3 Mo.

But the real case is 73,6 Mo…

The main problem is that PetitParser calls asPetitStream, which calls contents on the file.

At this point,  the primitive 72 #elementsForwardIdentityTo: fails for #'insufficient object memory'. However, my image is new so it should not raise an error because I have at least 500 Mo of free space.

 

Is there an way to parse the file as a stream to avoid the call to contents?

Maybe by using PPStream?

 

Thanks in advance,

 

Cheers,

Vincent

 


*************************************************************************************
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted."

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev




--
Guillaume Larcheveque

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitParser and huge files

Stephan Eggermont-3
On 25-02-16 03:12, Jan Kurš wrote:
>
> PetitParser does not usually backtrack much. So if you provide a class
> with PetitStream interface and limited buffer for backtracking (eg few
> lines) PetitParser will work.
>
Of course it sometimes backtracks a lot. That just depends on your
language definition.

Stephan
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: PetitParser and huge files

Blondeau Vincent
In reply to this post by Jan Kurš

Hi,

 

In my case there is not much backtracking (only few lines). So a streaming parsing with a light buffer will be enough.

 

Do you have a link to the Jan Vrany source code?

 

Cheers,

Vincent

 

De : [hidden email] [mailto:[hidden email]] De la part de Jan Kurš
Envoyé : jeudi 25 février 2016 03:12
À : Moose-related development
Objet : [Moose-dev] Re: PetitParser and huge files

 

Hi,

PetitParser does not usually backtrack much. So if you provide a class with PetitStream interface and limited buffer for backtracking (eg few lines) PetitParser will work.

The core problem will be to implement the #position: message.

I think Jan Vrany did something similar in Smalltalk/X dialect and I plan to add this feature as well.

Cheers Jan

 

On Wed, Feb 24, 2016, 9:57 AM Guillaume Larcheveque <[hidden email]> wrote:

You should feed your parser line by line if it is a log file.

 

It is a bad idea to give a filestream to a parser because a parser will have a random access to stream position with backwards so it will be cruelly inefficient.

 

2016-02-24 18:49 GMT+01:00 Blondeau Vincent <[hidden email]>:

Hi,

 

I would like to parse a log file with PetitParser. I wrote the parser and it works with a test file of 5,3 Mo.

But the real case is 73,6 Mo…

The main problem is that PetitParser calls asPetitStream, which calls contents on the file.

At this point,  the primitive 72 #elementsForwardIdentityTo: fails for #'insufficient object memory'. However, my image is new so it should not raise an error because I have at least 500 Mo of free space.

 

Is there an way to parse the file as a stream to avoid the call to contents?

Maybe by using PPStream?

 

Thanks in advance,

 

Cheers,

Vincent

 


*************************************************************************************
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted."


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev



 

--

Guillaume Larcheveque

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev




Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev