Smalltalk › Pharo › Pharo Smalltalk Developers

Citezen loading (Rio-Kernel defect)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Schwab,Wilhelm K

Citezen loading (Rio-Kernel defect)

Damien,

I found the problem: #newForWin32 is missing the #new, so it sends the class to do an instance's job. I fixed that, but Monitcello was determined to undo my efforts. After a few tries at getting around that, I realized that it should run on Linux, so I did the installation there and then added my fix so that it should run on win32 also. That appears to have worked.

I find that it cannot parse my "real" .bib files, but it does seem to handle things that I build up from entries from Google scholar. The bib files in question are tested in the sense the LaTeX/BibTeX are happy with them, but I have yet to try bibclean on them. Is there any documentation or example code available?

Bill

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Oscar Nierstrasz

Re: Citezen loading (Rio-Kernel defect)

Hi Bill,

We're using Citezen now for parsing and rending the SCG bibliography
and all our publications.

http://scg.unibe.ch/publications

(See Citezen-Pier.)

I found the best strategy to be to split a bibtex file up into
strings, one for each entry and check if parsing raises an error.
When there is an error, I just print the raw string and the error msg
so I can fix it.

But a more robust and forgiving parser is needed.

Cheers,
- on

On Jun 5, 2009, at 16:20, Schwab,Wilhelm K wrote:

> Damien,
>
> I found the problem: #newForWin32 is missing the #new, so it sends
> the class to do an instance's job. I fixed that, but Monitcello was
> determined to undo my efforts. After a few tries at getting around
> that, I realized that it should run on Linux, so I did the
> installation there and then added my fix so that it should run on
> win32 also. That appears to have worked.
>
> I find that it cannot parse my "real" .bib files, but it does seem
> to handle things that I build up from entries from Google scholar.
> The bib files in question are tested in the sense the LaTeX/BibTeX
> are happy with them, but I have yet to try bibclean on them. Is
> there any documentation or example code available?
>
> Bill
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Damien Pollet

Re: Citezen loading (Rio-Kernel defect)

On Fri, Jun 5, 2009 at 18:41, Oscar Nierstrasz<[hidden email]> wrote:
> But a more robust and forgiving parser is needed.

Well at least it would be nice if it could report errors, but that's a
pain to do with LR parsers.
I think I have an OMeta attempt somewhere but it will probably be
slower than SmaCC.
Personally I'd rather have a strict but simple parser and some
assistance to fix bad .bib files than a permissive parser.

Maybe one step would be to keep the current parser, but only use it to
parse entries one by one instead of using a grammar rule for the whole
file.

> On Jun 5, 2009, at 16:20, Schwab,Wilhelm K wrote:
>> I found the problem: #newForWin32 is missing the #new, so it sends

That's a Rio bug then, please report it there.

>> the class to do an instance's job. I fixed that, but Monitcello was
>> determined to undo my efforts. After a few tries at getting around
>> that, I realized that it should run on Linux, so I did the
>> installation there and then added my fix so that it should run on
>> win32 also. That appears to have worked.
>>
>> I find that it cannot parse my "real" .bib files, but it does seem
>> to handle things that I build up from entries from Google scholar.
>> The bib files in question are tested in the sense the LaTeX/BibTeX
>> are happy with them, but I have yet to try bibclean on them. Is
>> there any documentation or example code available?
>>
>> Bill
>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Lukas Renggli

Re: Citezen loading (Rio-Kernel defect)

>> But a more robust and forgiving parser is needed.
>
> Well at least it would be nice if it could report errors, but that's a
> pain to do with LR parsers.

SmaCC can report errors.

I guess for Citezen something that can silently parse across errors is
needed. Something that builds an as complete as possible model from
input that is not necessarily entirely valid.

I suggest that you have a look at the PetitParser PEG framework
(http://source.lukas-renggli.ch/petit.html). It uses an
object-oriented approach, pure Smalltalk syntax, and comes with an
extensive test suite that covers 100% of the code. Ambiguous grammars
are supported and you can speed them up using memoization if you want
to.

> I think I have an OMeta attempt somewhere but it will probably be
> slower than SmaCC.

Parsing all methods of Object and Morph (1390 methods in total) takes:

688ms with the hand written RBParser,
806ms with the hand written Squeak parser,
2518ms with the pre-compiled and heavy optimized SmaCC parser,
3700ms with the not optimized PetitParser Smalltalk parser

I don't know where OMeta would be in this comparison, unfortunately it
does only include a parser for Smalltalk expressions. A probably less
accurate comparison with just a single expression parser (the
factorial function) parsed 1000 times gives:

560ms with the hand written RBParser,
602ms with the hand written Squeak parser,
2564ms with the pre-compiled and heavy optimized SmaCC parser,
4867ms with the not optimized PetitParser Smalltalk parser,
25098ms with the OMeta Smalltalk expression parser

I have no idea why OMeta is so slow? Otherwise however, I conclude
that it doesn't matter much speed-wise what kind of parser you pick.
LR parser are probably not that much of a hype anymore ;-)

Cheers,
Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Damien Pollet

Re: Citezen loading (Rio-Kernel defect)

On Sun, Jun 7, 2009 at 15:20, Lukas Renggli<[hidden email]> wrote:
> that it doesn't matter much speed-wise what kind of parser you pick.
> LR parser are probably not that much of a hype anymore ;-)

Well, at least parsing scg.bib should be acceptably quick. Currently
it's slow enough that a progress bar would make sense :)

I quickly went through the code in Helvetia when you announced it.
I'll have a look for Citezen, but unfortunately I'm going to be quite
busy until July. So in the meantime, if anyone has ideas, please go
for it, the repository is open.

I will probably re-enter Citezen for the ESUG awards, and try to
actually get something to show this time.

--
Damien Pollet
type less, do more [ | ] http://people.untyped.org/damien.pollet

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project