Damien,
I found the problem: #newForWin32 is missing the #new, so it sends the class to do an instance's job. I fixed that, but Monitcello was determined to undo my efforts. After a few tries at getting around that, I realized that it should run on Linux, so I did the installation there and then added my fix so that it should run on win32 also. That appears to have worked. I find that it cannot parse my "real" .bib files, but it does seem to handle things that I build up from entries from Google scholar. The bib files in question are tested in the sense the LaTeX/BibTeX are happy with them, but I have yet to try bibclean on them. Is there any documentation or example code available? Bill _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Hi Bill, We're using Citezen now for parsing and rending the SCG bibliography and all our publications. http://scg.unibe.ch/publications (See Citezen-Pier.) I found the best strategy to be to split a bibtex file up into strings, one for each entry and check if parsing raises an error. When there is an error, I just print the raw string and the error msg so I can fix it. But a more robust and forgiving parser is needed. Cheers, - on On Jun 5, 2009, at 16:20, Schwab,Wilhelm K wrote: > Damien, > > I found the problem: #newForWin32 is missing the #new, so it sends > the class to do an instance's job. I fixed that, but Monitcello was > determined to undo my efforts. After a few tries at getting around > that, I realized that it should run on Linux, so I did the > installation there and then added my fix so that it should run on > win32 also. That appears to have worked. > > I find that it cannot parse my "real" .bib files, but it does seem > to handle things that I build up from entries from Google scholar. > The bib files in question are tested in the sense the LaTeX/BibTeX > are happy with them, but I have yet to try bibclean on them. Is > there any documentation or example code available? > > Bill > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Fri, Jun 5, 2009 at 18:41, Oscar Nierstrasz<[hidden email]> wrote:
> But a more robust and forgiving parser is needed. Well at least it would be nice if it could report errors, but that's a pain to do with LR parsers. I think I have an OMeta attempt somewhere but it will probably be slower than SmaCC. Personally I'd rather have a strict but simple parser and some assistance to fix bad .bib files than a permissive parser. Maybe one step would be to keep the current parser, but only use it to parse entries one by one instead of using a grammar rule for the whole file. > On Jun 5, 2009, at 16:20, Schwab,Wilhelm K wrote: >> I found the problem: #newForWin32 is missing the #new, so it sends That's a Rio bug then, please report it there. >> the class to do an instance's job. I fixed that, but Monitcello was >> determined to undo my efforts. After a few tries at getting around >> that, I realized that it should run on Linux, so I did the >> installation there and then added my fix so that it should run on >> win32 also. That appears to have worked. >> >> I find that it cannot parse my "real" .bib files, but it does seem >> to handle things that I build up from entries from Google scholar. >> The bib files in question are tested in the sense the LaTeX/BibTeX >> are happy with them, but I have yet to try bibclean on them. Is >> there any documentation or example code available? >> >> Bill >> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Damien Pollet type less, do more [ | ] http://people.untyped.org/damien.pollet _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
>> But a more robust and forgiving parser is needed.
> > Well at least it would be nice if it could report errors, but that's a > pain to do with LR parsers. SmaCC can report errors. I guess for Citezen something that can silently parse across errors is needed. Something that builds an as complete as possible model from input that is not necessarily entirely valid. I suggest that you have a look at the PetitParser PEG framework (http://source.lukas-renggli.ch/petit.html). It uses an object-oriented approach, pure Smalltalk syntax, and comes with an extensive test suite that covers 100% of the code. Ambiguous grammars are supported and you can speed them up using memoization if you want to. > I think I have an OMeta attempt somewhere but it will probably be > slower than SmaCC. Parsing all methods of Object and Morph (1390 methods in total) takes: 688ms with the hand written RBParser, 806ms with the hand written Squeak parser, 2518ms with the pre-compiled and heavy optimized SmaCC parser, 3700ms with the not optimized PetitParser Smalltalk parser I don't know where OMeta would be in this comparison, unfortunately it does only include a parser for Smalltalk expressions. A probably less accurate comparison with just a single expression parser (the factorial function) parsed 1000 times gives: 560ms with the hand written RBParser, 602ms with the hand written Squeak parser, 2564ms with the pre-compiled and heavy optimized SmaCC parser, 4867ms with the not optimized PetitParser Smalltalk parser, 25098ms with the OMeta Smalltalk expression parser I have no idea why OMeta is so slow? Otherwise however, I conclude that it doesn't matter much speed-wise what kind of parser you pick. LR parser are probably not that much of a hype anymore ;-) Cheers, Lukas -- Lukas Renggli http://www.lukas-renggli.ch _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Sun, Jun 7, 2009 at 15:20, Lukas Renggli<[hidden email]> wrote:
> that it doesn't matter much speed-wise what kind of parser you pick. > LR parser are probably not that much of a hype anymore ;-) Well, at least parsing scg.bib should be acceptably quick. Currently it's slow enough that a progress bar would make sense :) I quickly went through the code in Helvetia when you announced it. I'll have a look for Citezen, but unfortunately I'm going to be quite busy until July. So in the meantime, if anyone has ideas, please go for it, the repository is open. I will probably re-enter Citezen for the ESUG awards, and try to actually get something to show this time. -- Damien Pollet type less, do more [ | ] http://people.untyped.org/damien.pollet _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |