HTML parser (again)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

HTML parser (again)

Andrei Stebakov
I've been looking for a nice and fast HTML parser.
I've found Zulq Alam's Soup
(http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
but it's way too slow for me (takes 5 sec to parse the page, my
current lisp parser takes about 1 sec for that.)
I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?

What other options I have for an HTML parser?
Looking at Pharo speed I wonder if there is any way to optimize it? Is
JIT or some other speed optimization in plans for Pharo/Squeak?

Thank you,
Andrei

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

laurent laffont


On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]> wrote:
I've been looking for a nice and fast HTML parser.
I've found Zulq Alam's Soup
(http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
but it's way too slow for me (takes 5 sec to parse the page, my
current lisp parser takes about 1 sec for that.)
I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?

What other options I have for an HTML parser?
Looking at Pharo speed I wonder if there is any way to optimize it? Is
JIT or some other speed optimization in plans for Pharo/Squeak?


What do you need to do ?

There's XMLSupport http://www.squeaksource.com/XMLSupport.html
Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html

The CogVM has JIT.

Laurent.
 

Thank you,
Andrei

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Andrei Stebakov
Web page scraping. XML parser chokes on bad html input.

On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Andrei Stebakov
In reply to this post by laurent laffont
I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading:
HtmlTokenizer private-initialization initialize:
initialize: s
        text _ s withSqueakLineEndings.
        pos _ Nothing more expected ->1.
        textAreaLevel _ 0.

On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

Eliot Miranda-2
In reply to this post by Andrei Stebakov


On Tue, Aug 17, 2010 at 10:50 PM, Andrei Stebakov <[hidden email]> wrote:
I've been looking for a nice and fast HTML parser.
I've found Zulq Alam's Soup
(http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
but it's way too slow for me (takes 5 sec to parse the page, my
current lisp parser takes about 1 sec for that.)

Have you tried Cog as Laurent suggests?  It may make the difference you need.  In any case I'd be interested in teh speed comparison.

I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?

What other options I have for an HTML parser?
Looking at Pharo speed I wonder if there is any way to optimize it? Is
JIT or some other speed optimization in plans for Pharo/Squeak?

Thank you,
Andrei

cheers,
Eliot 



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Mariano Martinez Peck
In reply to this post by Andrei Stebakov


On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov <[hidden email]> wrote:
I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading:
HtmlTokenizer private-initialization initialize:
initialize: s
       text _ s withSqueakLineEndings.
       pos _ Nothing more expected ->1.
       textAreaLevel _ 0.


That code is using underscore as assigment, don't allowed anymore in Pharo 1.1 unless you explicity set a specific setting.

So....or set that setting or update the code (in another image)

cheers

mariano

 
On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:
>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Andrei Stebakov
Where can I make this setting?

On Wed, Aug 18, 2010 at 1:14 PM, Mariano Martinez Peck
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I tried to load Scamper's Network-HTML, I got a Syntax Error during
>> reloading:
>> HtmlTokenizer private-initialization initialize:
>> initialize: s
>>        text _ s withSqueakLineEndings.
>>        pos _ Nothing more expected ->1.
>>        textAreaLevel _ 0.
>>
>
> That code is using underscore as assigment, don't allowed anymore in Pharo
> 1.1 unless you explicity set a specific setting.
>
> So....or set that setting or update the code (in another image)
>
> cheers
>
> mariano
>
>
>>
>> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
>> <[hidden email]> wrote:
>> >
>> >
>> > On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>> > wrote:
>> >>
>> >> I've been looking for a nice and fast HTML parser.
>> >> I've found Zulq Alam's Soup
>> >> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> >> but it's way too slow for me (takes 5 sec to parse the page, my
>> >> current lisp parser takes about 1 sec for that.)
>> >> I found another one, Todd Blanchard's HTML and CSS parser
>> >> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> >> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> >> It complains about some syntax error and leaves the progress bar which
>> >> I can't kill...
>> >> I wonder if anyone (Todd?) can take a look at the parser and figure
>> >> out how to fix it?
>> >>
>> >> What other options I have for an HTML parser?
>> >> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> >> JIT or some other speed optimization in plans for Pharo/Squeak?
>> >
>> >
>> > What do you need to do ?
>> > There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>> > Scamper might have a standalone HTML
>> > parser http://www.squeaksource.com/Scamper.html
>> > The CogVM has JIT.
>> > Laurent.
>> >
>> >>
>> >> Thank you,
>> >> Andrei
>> >>
>> >> _______________________________________________
>> >> Pharo-project mailing list
>> >> [hidden email]
>> >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> >
>> >
>> >
>> >
>> >
>>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Andrei Stebakov
In reply to this post by laurent laffont
Is there a one-click image for CogVM somewhere so I can download it?


On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Andrei Stebakov
In reply to this post by laurent laffont
As for Scamper when I try to evaluate (in Pharo 1.1)
tok := HtmlTokenizer on: '<html />'.

There is an error:

Error: My subclass should have overridden #contents
Proceed
Abandon
Debug
HtmlTokenizer(Object)>>error:
HtmlTokenizer(Object)>>subclassResponsibility
HtmlTokenizer(Stream)>>contents
HtmlTokenizer(Stream)>>printOn:
[] in HtmlTokenizer(Object)>>printStringLimitedTo:
String class(SequenceableCollection class)>>streamContents:limitedTo:
HtmlTokenizer(Object)>>printStringLimitedTo:
HtmlTokenizer(Object)>>printString
TextMorphForShoutEditor(ParagraphEditor)>>printIt
[] in TextMorphForShoutEditor(ParagraphEditor)>>printIt:
TextMorphForShoutEditor(ParagraphEditor)>>terminateAndInitializeAround:
TextMorphForShoutEditor(ParagraphEditor)>>printIt:
TextMorphForShoutEditor(ParagraphEditor)>>dispatchOnKeyEvent:with:
TextMorphForShoutEditor(TextMorphEditor)>>dispatchOnKeyEvent:with:
TextMorphForShoutEditor(ParagraphEditor)>>keystroke:
TextMorphForShoutEditor(TextMorphEditor)>>keystroke:
[] in [] in TextMorphForShout(TextMorph)>>keyStroke:
TextMorphForShout(TextMorph)>>handleInteraction:
TextMorphForShout(TextMorphForEditView)>>handleInteraction:
[] in TextMorphForShout(TextMorph)>>keyStroke:



On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Andrei Stebakov
I am sorry, this error only happens when I try to print it instead of do it.

On Wed, Aug 18, 2010 at 2:10 PM, Andrei Stebakov <[hidden email]> wrote:

> As for Scamper when I try to evaluate (in Pharo 1.1)
> tok := HtmlTokenizer on: '<html />'.
>
> There is an error:
>
> Error: My subclass should have overridden #contents
> Proceed
> Abandon
> Debug
> HtmlTokenizer(Object)>>error:
> HtmlTokenizer(Object)>>subclassResponsibility
> HtmlTokenizer(Stream)>>contents
> HtmlTokenizer(Stream)>>printOn:
> [] in HtmlTokenizer(Object)>>printStringLimitedTo:
> String class(SequenceableCollection class)>>streamContents:limitedTo:
> HtmlTokenizer(Object)>>printStringLimitedTo:
> HtmlTokenizer(Object)>>printString
> TextMorphForShoutEditor(ParagraphEditor)>>printIt
> [] in TextMorphForShoutEditor(ParagraphEditor)>>printIt:
> TextMorphForShoutEditor(ParagraphEditor)>>terminateAndInitializeAround:
> TextMorphForShoutEditor(ParagraphEditor)>>printIt:
> TextMorphForShoutEditor(ParagraphEditor)>>dispatchOnKeyEvent:with:
> TextMorphForShoutEditor(TextMorphEditor)>>dispatchOnKeyEvent:with:
> TextMorphForShoutEditor(ParagraphEditor)>>keystroke:
> TextMorphForShoutEditor(TextMorphEditor)>>keystroke:
> [] in [] in TextMorphForShout(TextMorph)>>keyStroke:
> TextMorphForShout(TextMorph)>>handleInteraction:
> TextMorphForShout(TextMorphForEditView)>>handleInteraction:
> [] in TextMorphForShout(TextMorph)>>keyStroke:
>
>
>
> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
> <[hidden email]> wrote:
>>
>>
>> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>> wrote:
>>>
>>> I've been looking for a nice and fast HTML parser.
>>> I've found Zulq Alam's Soup
>>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>>> but it's way too slow for me (takes 5 sec to parse the page, my
>>> current lisp parser takes about 1 sec for that.)
>>> I found another one, Todd Blanchard's HTML and CSS parser
>>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>>> It complains about some syntax error and leaves the progress bar which
>>> I can't kill...
>>> I wonder if anyone (Todd?) can take a look at the parser and figure
>>> out how to fix it?
>>>
>>> What other options I have for an HTML parser?
>>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>>> JIT or some other speed optimization in plans for Pharo/Squeak?
>>
>>
>> What do you need to do ?
>> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>> Scamper might have a standalone HTML
>> parser http://www.squeaksource.com/Scamper.html
>> The CogVM has JIT.
>> Laurent.
>>
>>>
>>> Thank you,
>>> Andrei
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

stephane ducasse-2
In reply to this post by Andrei Stebakov
no CogVM is not ready for us.


> Is there a one-click image for CogVM somewhere so I can download it?
>
>
> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
> <[hidden email]> wrote:
>>
>>
>> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>> wrote:
>>>
>>> I've been looking for a nice and fast HTML parser.
>>> I've found Zulq Alam's Soup
>>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>>> but it's way too slow for me (takes 5 sec to parse the page, my
>>> current lisp parser takes about 1 sec for that.)
>>> I found another one, Todd Blanchard's HTML and CSS parser
>>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>>> It complains about some syntax error and leaves the progress bar which
>>> I can't kill...
>>> I wonder if anyone (Todd?) can take a look at the parser and figure
>>> out how to fix it?
>>>
>>> What other options I have for an HTML parser?
>>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>>> JIT or some other speed optimization in plans for Pharo/Squeak?
>>
>>
>> What do you need to do ?
>> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>> Scamper might have a standalone HTML
>> parser http://www.squeaksource.com/Scamper.html
>> The CogVM has JIT.
>> Laurent.
>>
>>>
>>> Thank you,
>>> Andrei
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)

laurent laffont
In reply to this post by Andrei Stebakov
On Wed, Aug 18, 2010 at 7:48 PM, Andrei Stebakov <[hidden email]> wrote:
Is there a one-click image for CogVM somewhere so I can download it?

It's planned but for now it seems you have to build it yourself.

Laurent

 


On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:
>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project



Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

Sean P. DeNigris
Administrator
In reply to this post by Andrei Stebakov
Todd Blanchard's HTML and CSS parser at http://www.squeaksource.com/htmlcssparser now loads in Squeak 4.1 and Pharo 1.1.  It can be found at http://www.squeaksource.com/SPDProjectUpdates (HTML package).

I'm forwarding my post about this experience from the Pharo list to a new thread to talk about improving the situation for community contribution for non-supported packages.

Sean
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

johnmci
In reply to this post by stephane ducasse-2
I will try to push a CogVM for the mac this weekend, Eliot and I are planing some time then to get this out the door.

On 2010-08-18, at 2:05 PM, stephane ducasse wrote:

> no CogVM is not ready for us.
>
>
>

--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================





Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] HTML parser (again)

Casey Ransberger-2
In reply to this post by stephane ducasse-2
I only have a cellphone with me, so I can't check, but I think if you filein changesNecessaryForCogToWork.cs (I might have the filename wrong) Cog will probably work. Do note that after saving an image with Cog, you won't be able to open it with a classic VM.

On Aug 18, 2010, at 2:05 PM, stephane ducasse <[hidden email]> wrote:

> no CogVM is not ready for us.
>
>
>> Is there a one-click image for CogVM somewhere so I can download it?
>>
>>
>> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
>> <[hidden email]> wrote:
>>>
>>>
>>> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>>> wrote:
>>>>
>>>> I've been looking for a nice and fast HTML parser.
>>>> I've found Zulq Alam's Soup
>>>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>>>> but it's way too slow for me (takes 5 sec to parse the page, my
>>>> current lisp parser takes about 1 sec for that.)
>>>> I found another one, Todd Blanchard's HTML and CSS parser
>>>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>>>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>>>> It complains about some syntax error and leaves the progress bar which
>>>> I can't kill...
>>>> I wonder if anyone (Todd?) can take a look at the parser and figure
>>>> out how to fix it?
>>>>
>>>> What other options I have for an HTML parser?
>>>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>>>> JIT or some other speed optimization in plans for Pharo/Squeak?
>>>
>>>
>>> What do you need to do ?
>>> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>>> Scamper might have a standalone HTML
>>> parser http://www.squeaksource.com/Scamper.html
>>> The CogVM has JIT.
>>> Laurent.
>>>
>>>>
>>>> Thank you,
>>>> Andrei
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [hidden email]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>>
>>>
>>>
>>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)

Tudor Girba
In reply to this post by johnmci
That would be really great. As I mentioned before, I am using the  
CogVM since its release and it is pretty stable (with the exception of  
crashes due to this socket problem).

Is there a place to report possible bugs related to it, or is this  
mailing list the most appropriate place?

Cheers,
Doru


On 19 Aug 2010, at 01:26, John M McIntosh wrote:

> I will try to push a CogVM for the mac this weekend, Eliot and I are  
> planing some time then to get this out the door.
>
> On 2010-08-18, at 2:05 PM, stephane ducasse wrote:
>
>> no CogVM is not ready for us.
>>
>>
>>
>
> --
> =
> =
> =
> =
> =
> ======================================================================
> John M. McIntosh <[hidden email]>   Twitter:  
> squeaker68882
> Corporate Smalltalk Consulting Ltd.  http://
> www.smalltalkconsulting.com
> =
> =
> =
> =
> =
> ======================================================================
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
www.tudorgirba.com

"Speaking louder won't make the point worthier."