HTML parser (again)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

HTML parser (again)

Andrei Stebakov
I've been looking for a nice and fast HTML parser.
I've found Zulq Alam's Soup
(http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
but it's way too slow for me (takes 5 sec to parse the page, my
current lisp parser takes about 1 sec for that.)
I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?

What other options I have for an HTML parser?
Looking at Pharo speed I wonder if there is any way to optimize it? Is
JIT or some other speed optimization in plans for Pharo/Squeak?

Thank you,
Andrei

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

laurent laffont


On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]> wrote:
I've been looking for a nice and fast HTML parser.
I've found Zulq Alam's Soup
(http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
but it's way too slow for me (takes 5 sec to parse the page, my
current lisp parser takes about 1 sec for that.)
I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?

What other options I have for an HTML parser?
Looking at Pharo speed I wonder if there is any way to optimize it? Is
JIT or some other speed optimization in plans for Pharo/Squeak?


What do you need to do ?

There's XMLSupport http://www.squeaksource.com/XMLSupport.html
Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html

The CogVM has JIT.

Laurent.
 

Thank you,
Andrei

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Andrei Stebakov
Web page scraping. XML parser chokes on bad html input.

On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Nick


On 18 August 2010 16:01, Andrei Stebakov <[hidden email]> wrote:
Web page scraping. XML parser chokes on bad html input.


How about using Selenium:

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Andrei Stebakov
In reply to this post by laurent laffont
I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading:
HtmlTokenizer private-initialization initialize:
initialize: s
        text _ s withSqueakLineEndings.
        pos _ Nothing more expected ->1.
        textAreaLevel _ 0.

On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Mariano Martinez Peck


On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov <[hidden email]> wrote:
I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading:
HtmlTokenizer private-initialization initialize:
initialize: s
       text _ s withSqueakLineEndings.
       pos _ Nothing more expected ->1.
       textAreaLevel _ 0.


That code is using underscore as assigment, don't allowed anymore in Pharo 1.1 unless you explicity set a specific setting.

So....or set that setting or update the code (in another image)

cheers

mariano

 
On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:
>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>



_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Andrei Stebakov
Where can I make this setting?

On Wed, Aug 18, 2010 at 1:14 PM, Mariano Martinez Peck
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I tried to load Scamper's Network-HTML, I got a Syntax Error during
>> reloading:
>> HtmlTokenizer private-initialization initialize:
>> initialize: s
>>        text _ s withSqueakLineEndings.
>>        pos _ Nothing more expected ->1.
>>        textAreaLevel _ 0.
>>
>
> That code is using underscore as assigment, don't allowed anymore in Pharo
> 1.1 unless you explicity set a specific setting.
>
> So....or set that setting or update the code (in another image)
>
> cheers
>
> mariano
>
>
>>
>> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
>> <[hidden email]> wrote:
>> >
>> >
>> > On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>> > wrote:
>> >>
>> >> I've been looking for a nice and fast HTML parser.
>> >> I've found Zulq Alam's Soup
>> >> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> >> but it's way too slow for me (takes 5 sec to parse the page, my
>> >> current lisp parser takes about 1 sec for that.)
>> >> I found another one, Todd Blanchard's HTML and CSS parser
>> >> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> >> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> >> It complains about some syntax error and leaves the progress bar which
>> >> I can't kill...
>> >> I wonder if anyone (Todd?) can take a look at the parser and figure
>> >> out how to fix it?
>> >>
>> >> What other options I have for an HTML parser?
>> >> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> >> JIT or some other speed optimization in plans for Pharo/Squeak?
>> >
>> >
>> > What do you need to do ?
>> > There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>> > Scamper might have a standalone HTML
>> > parser http://www.squeaksource.com/Scamper.html
>> > The CogVM has JIT.
>> > Laurent.
>> >
>> >>
>> >> Thank you,
>> >> Andrei
>> >>
>> >> _______________________________________________
>> >> Pharo-project mailing list
>> >> [hidden email]
>> >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> >
>> >
>> >
>> >
>> >
>>
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Mariano Martinez Peck


On Wed, Aug 18, 2010 at 7:34 PM, Andrei Stebakov <[hidden email]> wrote:
Where can I make this setting?


In Pharo, go to System -> Settings. In the "search for" type "underscore" and hit enter.

You will see a setting that says "allow underscore as assigment"
 

On Wed, Aug 18, 2010 at 1:14 PM, Mariano Martinez Peck
<[hidden email]> wrote:
>
>
> On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I tried to load Scamper's Network-HTML, I got a Syntax Error during
>> reloading:
>> HtmlTokenizer private-initialization initialize:
>> initialize: s
>>        text _ s withSqueakLineEndings.
>>        pos _ Nothing more expected ->1.
>>        textAreaLevel _ 0.
>>
>
> That code is using underscore as assigment, don't allowed anymore in Pharo
> 1.1 unless you explicity set a specific setting.
>
> So....or set that setting or update the code (in another image)
>
> cheers
>
> mariano
>
>
>>
>> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
>> <[hidden email]> wrote:
>> >
>> >
>> > On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>> > wrote:
>> >>
>> >> I've been looking for a nice and fast HTML parser.
>> >> I've found Zulq Alam's Soup
>> >> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> >> but it's way too slow for me (takes 5 sec to parse the page, my
>> >> current lisp parser takes about 1 sec for that.)
>> >> I found another one, Todd Blanchard's HTML and CSS parser
>> >> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> >> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> >> It complains about some syntax error and leaves the progress bar which
>> >> I can't kill...
>> >> I wonder if anyone (Todd?) can take a look at the parser and figure
>> >> out how to fix it?
>> >>
>> >> What other options I have for an HTML parser?
>> >> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> >> JIT or some other speed optimization in plans for Pharo/Squeak?
>> >
>> >
>> > What do you need to do ?
>> > There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>> > Scamper might have a standalone HTML
>> > parser http://www.squeaksource.com/Scamper.html
>> > The CogVM has JIT.
>> > Laurent.
>> >
>> >>
>> >> Thank you,
>> >> Andrei
>> >>
>> >> _______________________________________________
>> >> Pharo-project mailing list
>> >> [hidden email]
>> >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> >
>> >
>> >
>> >
>> >
>>
>
>
>
>
>



_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Andrei Stebakov
In reply to this post by laurent laffont
Is there a one-click image for CogVM somewhere so I can download it?


On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Andrei Stebakov
In reply to this post by laurent laffont
As for Scamper when I try to evaluate (in Pharo 1.1)
tok := HtmlTokenizer on: '<html />'.

There is an error:

Error: My subclass should have overridden #contents
Proceed
Abandon
Debug
HtmlTokenizer(Object)>>error:
HtmlTokenizer(Object)>>subclassResponsibility
HtmlTokenizer(Stream)>>contents
HtmlTokenizer(Stream)>>printOn:
[] in HtmlTokenizer(Object)>>printStringLimitedTo:
String class(SequenceableCollection class)>>streamContents:limitedTo:
HtmlTokenizer(Object)>>printStringLimitedTo:
HtmlTokenizer(Object)>>printString
TextMorphForShoutEditor(ParagraphEditor)>>printIt
[] in TextMorphForShoutEditor(ParagraphEditor)>>printIt:
TextMorphForShoutEditor(ParagraphEditor)>>terminateAndInitializeAround:
TextMorphForShoutEditor(ParagraphEditor)>>printIt:
TextMorphForShoutEditor(ParagraphEditor)>>dispatchOnKeyEvent:with:
TextMorphForShoutEditor(TextMorphEditor)>>dispatchOnKeyEvent:with:
TextMorphForShoutEditor(ParagraphEditor)>>keystroke:
TextMorphForShoutEditor(TextMorphEditor)>>keystroke:
[] in [] in TextMorphForShout(TextMorph)>>keyStroke:
TextMorphForShout(TextMorph)>>handleInteraction:
TextMorphForShout(TextMorphForEditView)>>handleInteraction:
[] in TextMorphForShout(TextMorph)>>keyStroke:



On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:

>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

Andrei Stebakov
I am sorry, this error only happens when I try to print it instead of do it.

On Wed, Aug 18, 2010 at 2:10 PM, Andrei Stebakov <[hidden email]> wrote:

> As for Scamper when I try to evaluate (in Pharo 1.1)
> tok := HtmlTokenizer on: '<html />'.
>
> There is an error:
>
> Error: My subclass should have overridden #contents
> Proceed
> Abandon
> Debug
> HtmlTokenizer(Object)>>error:
> HtmlTokenizer(Object)>>subclassResponsibility
> HtmlTokenizer(Stream)>>contents
> HtmlTokenizer(Stream)>>printOn:
> [] in HtmlTokenizer(Object)>>printStringLimitedTo:
> String class(SequenceableCollection class)>>streamContents:limitedTo:
> HtmlTokenizer(Object)>>printStringLimitedTo:
> HtmlTokenizer(Object)>>printString
> TextMorphForShoutEditor(ParagraphEditor)>>printIt
> [] in TextMorphForShoutEditor(ParagraphEditor)>>printIt:
> TextMorphForShoutEditor(ParagraphEditor)>>terminateAndInitializeAround:
> TextMorphForShoutEditor(ParagraphEditor)>>printIt:
> TextMorphForShoutEditor(ParagraphEditor)>>dispatchOnKeyEvent:with:
> TextMorphForShoutEditor(TextMorphEditor)>>dispatchOnKeyEvent:with:
> TextMorphForShoutEditor(ParagraphEditor)>>keystroke:
> TextMorphForShoutEditor(TextMorphEditor)>>keystroke:
> [] in [] in TextMorphForShout(TextMorph)>>keyStroke:
> TextMorphForShout(TextMorph)>>handleInteraction:
> TextMorphForShout(TextMorphForEditView)>>handleInteraction:
> [] in TextMorphForShout(TextMorph)>>keyStroke:
>
>
>
> On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
> <[hidden email]> wrote:
>>
>>
>> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
>> wrote:
>>>
>>> I've been looking for a nice and fast HTML parser.
>>> I've found Zulq Alam's Soup
>>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>>> but it's way too slow for me (takes 5 sec to parse the page, my
>>> current lisp parser takes about 1 sec for that.)
>>> I found another one, Todd Blanchard's HTML and CSS parser
>>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>>> It complains about some syntax error and leaves the progress bar which
>>> I can't kill...
>>> I wonder if anyone (Todd?) can take a look at the parser and figure
>>> out how to fix it?
>>>
>>> What other options I have for an HTML parser?
>>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>>> JIT or some other speed optimization in plans for Pharo/Squeak?
>>
>>
>> What do you need to do ?
>> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
>> Scamper might have a standalone HTML
>> parser http://www.squeaksource.com/Scamper.html
>> The CogVM has JIT.
>> Laurent.
>>
>>>
>>> Thank you,
>>> Andrei
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>>
>>
>>
>>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

Sean P. DeNigris
Administrator
In reply to this post by Andrei Stebakov
Andrei Stebakov wrote
I found another one, Todd Blanchard's HTML and CSS parser
(http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
couldn't load it into Pharo 1.1 or Squeak 4.1.
It complains about some syntax error and leaves the progress bar which
I can't kill...
I wonder if anyone (Todd?) can take a look at the parser and figure
out how to fix it?
I fixed it - I swear I didn't mean to.  I had other things to do and I promised myself I was just going to take a look, but one thing led to another and...

I don't have write access to the repo, so you can get it here: http://www.squeaksource.com/SPDProjectUpdates (look for the HTML package)

Sean

p.s. I've never used it, so I don't know if it works, but it loads
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

Andrei Stebakov
Thank you Sean, it load now.
I only wish the library had some test cases or description since it's
not obvious how to use it.

On Wed, Aug 18, 2010 at 2:55 PM, Sean P. DeNigris <[hidden email]> wrote:

>
>
> Andrei Stebakov wrote:
>>
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>
> I fixed it - I swear I didn't mean to.  I had other things to do and I
> promised myself I was just going to take a look, but one thing led to
> another and...
>
> I don't have write access to the repo, so you can get it here:
> http://www.squeaksource.com/SPDProjectUpdates (look for the HTML package)
>
> Sean
>
> p.s. I've never used it, so I don't know if it works, but it loads
> --
> View this message in context: http://forum.world.st/HTML-parser-again-tp2329387p2330254.html
> Sent from the Pharo Smalltalk mailing list archive at Nabble.com.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

laurent laffont
In reply to this post by Sean P. DeNigris
Marcus, Stéphane,

is it possible to have the Scamper repository with public write access ? Or at least add Andrei and Sean...

Cheers,

Laurent 

On Wed, Aug 18, 2010 at 8:55 PM, Sean P. DeNigris <[hidden email]> wrote:


Andrei Stebakov wrote:
>
> I found another one, Todd Blanchard's HTML and CSS parser
> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
> couldn't load it into Pharo 1.1 or Squeak 4.1.
> It complains about some syntax error and leaves the progress bar which
> I can't kill...
> I wonder if anyone (Todd?) can take a look at the parser and figure
> out how to fix it?
>

I fixed it - I swear I didn't mean to.  I had other things to do and I
promised myself I was just going to take a look, but one thing led to
another and...

I don't have write access to the repo, so you can get it here:
http://www.squeaksource.com/SPDProjectUpdates (look for the HTML package)

Sean

p.s. I've never used it, so I don't know if it works, but it loads
--
View this message in context: http://forum.world.st/HTML-parser-again-tp2329387p2330254.html
Sent from the Pharo Smalltalk mailing list archive at Nabble.com.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

Stéphane Ducasse
sure
I was not aware I was admin. I saw that marcus did it ;)

Stef

On Aug 18, 2010, at 10:14 PM, laurent laffont wrote:

> Marcus, Stéphane,
>
> is it possible to have the Scamper repository with public write access ? Or at least add Andrei and Sean...
>
> Cheers,
>
> Laurent
>
> On Wed, Aug 18, 2010 at 8:55 PM, Sean P. DeNigris <[hidden email]> wrote:
>
>
> Andrei Stebakov wrote:
> >
> > I found another one, Todd Blanchard's HTML and CSS parser
> > (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
> > couldn't load it into Pharo 1.1 or Squeak 4.1.
> > It complains about some syntax error and leaves the progress bar which
> > I can't kill...
> > I wonder if anyone (Todd?) can take a look at the parser and figure
> > out how to fix it?
> >
>
> I fixed it - I swear I didn't mean to.  I had other things to do and I
> promised myself I was just going to take a look, but one thing led to
> another and...
>
> I don't have write access to the repo, so you can get it here:
> http://www.squeaksource.com/SPDProjectUpdates (look for the HTML package)
>
> Sean
>
> p.s. I've never used it, so I don't know if it works, but it loads
> --
> View this message in context: http://forum.world.st/HTML-parser-again-tp2329387p2330254.html
> Sent from the Pharo Smalltalk mailing list archive at Nabble.com.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

laurent laffont
Thanks.

Sean, could you put your package there ?

Laurent


On Wed, Aug 18, 2010 at 11:08 PM, Stéphane Ducasse <[hidden email]> wrote:
sure
I was not aware I was admin. I saw that marcus did it ;)

Stef

On Aug 18, 2010, at 10:14 PM, laurent laffont wrote:

> Marcus, Stéphane,
>
> is it possible to have the Scamper repository with public write access ? Or at least add Andrei and Sean...
>
> Cheers,
>
> Laurent
>
> On Wed, Aug 18, 2010 at 8:55 PM, Sean P. DeNigris <[hidden email]> wrote:
>
>
> Andrei Stebakov wrote:
> >
> > I found another one, Todd Blanchard's HTML and CSS parser
> > (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
> > couldn't load it into Pharo 1.1 or Squeak 4.1.
> > It complains about some syntax error and leaves the progress bar which
> > I can't kill...
> > I wonder if anyone (Todd?) can take a look at the parser and figure
> > out how to fix it?
> >
>
> I fixed it - I swear I didn't mean to.  I had other things to do and I
> promised myself I was just going to take a look, but one thing led to
> another and...
>
> I don't have write access to the repo, so you can get it here:
> http://www.squeaksource.com/SPDProjectUpdates (look for the HTML package)
>
> Sean
>
> p.s. I've never used it, so I don't know if it works, but it loads
> --
> View this message in context: http://forum.world.st/HTML-parser-again-tp2329387p2330254.html
> Sent from the Pharo Smalltalk mailing list archive at Nabble.com.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

laurent laffont
In reply to this post by Andrei Stebakov
On Wed, Aug 18, 2010 at 7:48 PM, Andrei Stebakov <[hidden email]> wrote:
Is there a one-click image for CogVM somewhere so I can download it?

It's planned but for now it seems you have to build it yourself.

Laurent

 


On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont
<[hidden email]> wrote:
>
>
> On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov <[hidden email]>
> wrote:
>>
>> I've been looking for a nice and fast HTML parser.
>> I've found Zulq Alam's Soup
>> (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice
>> but it's way too slow for me (takes 5 sec to parse the page, my
>> current lisp parser takes about 1 sec for that.)
>> I found another one, Todd Blanchard's HTML and CSS parser
>> (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I
>> couldn't load it into Pharo 1.1 or Squeak 4.1.
>> It complains about some syntax error and leaves the progress bar which
>> I can't kill...
>> I wonder if anyone (Todd?) can take a look at the parser and figure
>> out how to fix it?
>>
>> What other options I have for an HTML parser?
>> Looking at Pharo speed I wonder if there is any way to optimize it? Is
>> JIT or some other speed optimization in plans for Pharo/Squeak?
>
>
> What do you need to do ?
> There's XMLSupport http://www.squeaksource.com/XMLSupport.html
> Scamper might have a standalone HTML
> parser http://www.squeaksource.com/Scamper.html
> The CogVM has JIT.
> Laurent.
>
>>
>> Thank you,
>> Andrei
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
>
>
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

Sean P. DeNigris
Administrator
In reply to this post by laurent laffont
laurent laffont wrote
Sean, could you put your package there ?
The wonderful world of Squeak packages...  The package I fixed is Todd Blanchard's HTML & CSS Validating Parser at http://www.squeaksource.com/htmlcssparser/, not the Scamper HTML from http://www.squeaksource.com/HTML, although both packages are called "HTML."

However, this is a lovely opportunity to repeat my call for either (or maybe both):
* (my favorite) create an inbox for each project on SqS, just like for Squeak and Pharo trunk, so users can choose between the bleeding edge (which would include contributions like this one) or the last officially blessed one; but they would all be in the same place and obvious to find.
* or, send an email to all SqS emails saying that if they don't affirm responsibility for their project within X amount of time, the repo will be released to the community i.e. made w/r.

I also seem to remember a suggestion at one point to have a list of people that were approved to commit to any repo on SqS.

The point is, make it easy to contribute and people will.  It is a downer to go through the work of fixing packages, only to put them in my own repo where they may never be found by users, because the repo is read-only and I can't get in touch with the admins.

<rant>
Also, adding oneself to each repo is RUBBISH!!!!!  Even though I usually take the time, I shudder at the thought of all the community fixes that were kept personally or thrown away because it was a hassle to share them.  I'm sure many people, like me, just fix things that are broken.  This is the whole beauty of a live system that's turtles all the way down - my system's menus are broken, great, I just spend 20 minutes fixing them for every user on the planet vs. the typical X months (if ever) for an OS vendor to get around to a fix
</rant>

Sean
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: HTML parser (again)

johnmci
In reply to this post by Andrei Stebakov
I will try to push a CogVM for the mac this weekend, Eliot and I are planing some time then to get this out the door.

On 2010-08-18, at 2:05 PM, stephane ducasse wrote:

> no CogVM is not ready for us.
>
>
>

--
===========================================================================
John M. McIntosh <[hidden email]>   Twitter:  squeaker68882
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================





_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: HTML parser (again)

laurent laffont
In reply to this post by Sean P. DeNigris
Yes Sean, actually SqueakSource is not really "Share Friendly".  I wonder if a solution is not having only one repository for all as Monticello seems to handle branches itself.  It seems INRIA will start working on this starting from sept. / oct.

The automatic inbox is a solution too. But it doesn't mean packages will be integrated mainstream.

I like public writable repository.

I also wonder why SS is free for private repository, should be paying (so it can pay someone to manage / evolve SqueakSource).

Laurent


On Thu, Aug 19, 2010 at 12:30 AM, Sean P. DeNigris <[hidden email]> wrote:


laurent laffont wrote:
>
> Sean, could you put your package there ?
>

The wonderful world of Squeak packages...  The package I fixed is Todd
Blanchard's HTML & CSS Validating Parser at
http://www.squeaksource.com/htmlcssparser/, not the Scamper HTML from
http://www.squeaksource.com/HTML, although both packages are called "HTML."

However, this is a lovely opportunity to repeat my call for either (or maybe
both):
* (my favorite) create an inbox for each project on SqS, just like for
Squeak and Pharo trunk, so users can choose between the bleeding edge (which
would include contributions like this one) or the last officially blessed
one; but they would all be in the same place and obvious to find.
* or, send an email to all SqS emails saying that if they don't affirm
responsibility for their project within X amount of time, the repo will be
released to the community i.e. made w/r.

I also seem to remember a suggestion at one point to have a list of people
that were approved to commit to any repo on SqS.

The point is, make it easy to contribute and people will.  It is a downer to
go through the work of fixing packages, only to put them in my own repo
where they may never be found by users, because the repo is read-only and I
can't get in touch with the admins.

<rant>
Also, adding oneself to each repo is RUBBISH!!!!!  Even though I usually
take the time, I shudder at the thought of all the community fixes that were
kept personally or thrown away because it was a hassle to share them.  I'm
sure many people, like me, just fix things that are broken.  This is the
whole beauty of a live system that's turtles all the way down - my system's
menus are broken, great, I just spend 20 minutes fixing them for every user
on the planet vs. the typical X months (if ever) for an OS vendor to get
around to a fix
</rant>

Sean
--
View this message in context: http://forum.world.st/HTML-parser-again-tp2329387p2330466.html
Sent from the Pharo Smalltalk mailing list archive at Nabble.com.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
12