Citezen loading (Rio-Kernel defect)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Citezen loading (Rio-Kernel defect)

Schwab,Wilhelm K
Damien,

I found the problem: #newForWin32 is missing the #new, so it sends the class to do an instance's job.  I fixed that, but Monitcello was determined to undo my efforts.  After a few tries at getting around that, I realized that it should run on Linux, so I did the installation there and then added my fix so that it should run on win32 also.  That appears to have worked.

I find that it cannot parse my "real" .bib files, but it does seem to handle things that I build up from entries from Google scholar.  The bib files in question are tested in the sense the LaTeX/BibTeX are happy with them, but I have yet to try bibclean on them.  Is there any documentation or example code available?

Bill



_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Citezen loading (Rio-Kernel defect)

Oscar Nierstrasz

Hi Bill,

We're using Citezen now for parsing and rending the SCG bibliography  
and all our publications.

http://scg.unibe.ch/publications

(See Citezen-Pier.)

I found the best strategy to be to split a bibtex file up into  
strings, one for each entry and check if parsing raises an error.  
When there is an error, I just print the raw string and the error msg  
so I can fix it.

But a more robust and forgiving parser is needed.

Cheers,
- on

On Jun 5, 2009, at 16:20, Schwab,Wilhelm K wrote:

> Damien,
>
> I found the problem: #newForWin32 is missing the #new, so it sends  
> the class to do an instance's job.  I fixed that, but Monitcello was  
> determined to undo my efforts.  After a few tries at getting around  
> that, I realized that it should run on Linux, so I did the  
> installation there and then added my fix so that it should run on  
> win32 also.  That appears to have worked.
>
> I find that it cannot parse my "real" .bib files, but it does seem  
> to handle things that I build up from entries from Google scholar.  
> The bib files in question are tested in the sense the LaTeX/BibTeX  
> are happy with them, but I have yet to try bibclean on them.  Is  
> there any documentation or example code available?
>
> Bill
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Citezen loading (Rio-Kernel defect)

Damien Pollet
On Fri, Jun 5, 2009 at 18:41, Oscar Nierstrasz<[hidden email]> wrote:
> But a more robust and forgiving parser is needed.

Well at least it would be nice if it could report errors, but that's a
pain to do with LR parsers.
I think I have an OMeta attempt somewhere but it will probably be
slower than SmaCC.
Personally I'd rather have a strict but simple parser and some
assistance to fix bad .bib files than a permissive parser.

Maybe one step would be to keep the current parser, but only use it to
parse entries one by one instead of using a grammar rule for the whole
file.

> On Jun 5, 2009, at 16:20, Schwab,Wilhelm K wrote:
>> I found the problem: #newForWin32 is missing the #new, so it sends

That's a Rio bug then, please report it there.

>> the class to do an instance's job.  I fixed that, but Monitcello was
>> determined to undo my efforts.  After a few tries at getting around
>> that, I realized that it should run on Linux, so I did the
>> installation there and then added my fix so that it should run on
>> win32 also.  That appears to have worked.
>>
>> I find that it cannot parse my "real" .bib files, but it does seem
>> to handle things that I build up from entries from Google scholar.
>> The bib files in question are tested in the sense the LaTeX/BibTeX
>> are happy with them, but I have yet to try bibclean on them.  Is
>> there any documentation or example code available?
>>
>> Bill
>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Citezen loading (Rio-Kernel defect)

Lukas Renggli
>> But a more robust and forgiving parser is needed.
>
> Well at least it would be nice if it could report errors, but that's a
> pain to do with LR parsers.

SmaCC can report errors.

I guess for Citezen something that can silently parse across errors is
needed. Something that builds an as complete as possible model from
input that is not necessarily entirely valid.

I suggest that you have a look at the PetitParser PEG framework
(http://source.lukas-renggli.ch/petit.html). It uses an
object-oriented approach, pure Smalltalk syntax, and comes with an
extensive test suite that covers 100% of the code. Ambiguous grammars
are supported and you can speed them up using memoization if you want
to.

> I think I have an OMeta attempt somewhere but it will probably be
> slower than SmaCC.

Parsing all methods of Object and Morph (1390 methods in total) takes:

   688ms with the hand written RBParser,
   806ms with the hand written Squeak parser,
   2518ms with the pre-compiled and heavy optimized SmaCC parser,
   3700ms with the not optimized PetitParser Smalltalk parser

I don't know where OMeta would be in this comparison, unfortunately it
does only include a parser for Smalltalk expressions. A probably less
accurate comparison with just a single expression parser (the
factorial function) parsed 1000 times gives:

   560ms with the hand written RBParser,
   602ms with the hand written Squeak parser,
   2564ms with the pre-compiled and heavy optimized SmaCC parser,
   4867ms with the not optimized PetitParser Smalltalk parser,
   25098ms with the OMeta Smalltalk expression parser

I have no idea why OMeta is so slow? Otherwise however, I conclude
that it doesn't matter much speed-wise what kind of parser you pick.
LR parser are probably not that much of a hype anymore ;-)

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Citezen loading (Rio-Kernel defect)

Damien Pollet
On Sun, Jun 7, 2009 at 15:20, Lukas Renggli<[hidden email]> wrote:
> that it doesn't matter much speed-wise what kind of parser you pick.
> LR parser are probably not that much of a hype anymore ;-)

Well, at least parsing scg.bib should be acceptably quick. Currently
it's slow enough that a progress bar would make sense :)

I quickly went through the code in Helvetia when you announced it.
I'll have a look for Citezen, but unfortunately I'm going to be quite
busy until July. So in the meantime, if anyone has ideas, please go
for it, the repository is open.

I will probably re-enter Citezen for the ESUG awards, and try to
actually get something to show this time.

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project