Html Parser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Html Parser

flebber
I was wondering if there was a html parser for squeak. I want to
capture data from website and then convert these to xml and export
into an excel program I have.

Is this possible in squeak?
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Html Parser

Paul C Johnson
I have no Idea.

On Sat, Oct 9, 2010 at 5:07 AM, Sayth Renshaw <[hidden email]> wrote:
I was wondering if there was a html parser for squeak. I want to
capture data from website and then convert these to xml and export
into an excel program I have.

Is this possible in squeak?
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners



--
Later
Paul

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Html Parser

K K Subbu
In reply to this post by flebber
On Saturday 09 Oct 2010 5:37:56 pm Sayth Renshaw wrote:
> I was wondering if there was a html parser for squeak. I want to
> capture data from website and then convert these to xml and export
> into an excel program I have.
>
> Is this possible in squeak?
Yes. Browse HtmlParser class.

A good way to dig out such information is the Message Finder (world-menu-
>windows->find message names). Or select the string "html" or "parser" and
press CTRL+SHIFT+W.

HTH .. Subbu
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Html Parser

Levente Uzonyi-2
In reply to this post by flebber
On Sat, 9 Oct 2010, Sayth Renshaw wrote:

> I was wondering if there was a html parser for squeak. I want to
> capture data from website and then convert these to xml and export
> into an excel program I have.
>
> Is this possible in squeak?

Yes it is, we are using Soup (http://www.squeaksource.com/Soup.html ) to
parse html files. It's pretty good, though not perfect. There are also 2-3
other html parsers for Squeak. We're using this one because it's designed
to be able to parse not standard compilant html files (which are very
common) The tools for xml building are in the Squeak image, look for
XMLNode and it's subclasses (XMLDocument, XMLNodeWithElements, XMLString,
etc).


Levente

> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Html Parser

flebber
In reply to this post by flebber
I was trying to follow the guide here http://softwareengineering.vazexqi.com/2007/05/26/installing-packages-in-squeak to install the beautiful soup package. However I cannot locate the Squeakmap Package Loader.

I know the menu and interface has been updated since this tutorial(and it looks good) but I am looking in the "old desktop menu" but can't locate it.

Thanks

Sayth.

Yes it is, we are using Soup (http://www.squeaksource.com/Soup.html ) to 
parse html files. It's pretty good, though not perfect. There are also 2-3 
other html parsers for Squeak. We're using this one because it's designed 
to be able to parse not standard compilant html files (which are very 
common) The tools for xml building are in the Squeak image, look for 
XMLNode and it's subclasses (XMLDocument, XMLNodeWithElements, XMLString, 
etc).


Levente


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Html Parser

Levente Uzonyi-2
On Sun, 10 Oct 2010, Sayth Renshaw wrote:

> I was trying to follow the guide here
> http://softwareengineering.vazexqi.com/2007/05/26/installing-packages-in-squeak
> to
> install the beautiful soup package. However I cannot locate the Squeakmap
> Package Loader.
>
> I know the menu and interface has been updated since this tutorial(and it
> looks good) but I am looking in the "old desktop menu" but can't locate it.

This package (and all packages on squeaksource.com can be installed with
the Monticello Browser or with Installer. If you're using Squeak 4.1 or
4.2 alpha, then you can open Monticello Browser from the Tools menu of the
Docking Bar (on the top of the screen). If you're using an earlier
version of Squeak, then open the desktop menu, select Open... then select
Monticello Browser (btw the SqueakMap Package Loader can also be found in
this menu).

If the Monticello Browser is open, then add the repository of Soup and
load the latest version. If you never used the Monticello Browser, then
you'll find this link useful (the images are a bit outdated, but this part
seems to be ok): http://wiki.squeak.org/squeak/43#Opening%20a%20Repository

If you want to use Installer to install this package, then evaluate the
following in a workspace:

Installer squeaksource
  project: 'Soup';
  install: 'Soup'


Levente

>
> Thanks
>
> Sayth.
>
> Yes it is, we are using Soup (http://www.squeaksource.com/Soup.html ) to
> parse html files. It's pretty good, though not perfect. There are also 2-3
> other html parsers for Squeak. We're using this one because it's designed
> to be able to parse not standard compilant html files (which are very
> common) The tools for xml building are in the Squeak image, look for
> XMLNode and it's subclasses (XMLDocument, XMLNodeWithElements, XMLString,
> etc).
>
>
> Levente
>
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Html Parser

Bert Freudenberg
In reply to this post by Levente Uzonyi-2
On 09.10.2010, at 12:27, Levente Uzonyi wrote:

> On Sat, 9 Oct 2010, Sayth Renshaw wrote:
>
>> I was wondering if there was a html parser for squeak. I want to
>> capture data from website and then convert these to xml and export
>> into an excel program I have.
>>
>> Is this possible in squeak?
>
> Yes it is, we are using Soup (http://www.squeaksource.com/Soup.html ) to parse html files. It's pretty good, though not perfect. There are also 2-3 other html parsers for Squeak. We're using this one because it's designed to be able to parse not standard compilant html files (which are very common) The tools for xml building are in the Squeak image, look for XMLNode and it's subclasses (XMLDocument, XMLNodeWithElements, XMLString, etc).
>
>
> Levente

Oh great, I had no idea there was a Beautiful Soup port for Squeak. It's excellent for scraping web pages.

- Bert -


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners