Xtreams skipThroughAll: ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Xtreams skipThroughAll: ?

Steven Kelly

Hi,

 

I’m enjoying playing around with Xstreams. In particular the substreams definitely makes parsing text much more pleasantly OO. One thing I haven’t figured out yet is how to do things like skipThroughAll:. E.g. if I want to pick the word “Indoor” out of this HTML:

 

<tr><td><b>Indoor</b>&nbsp;</td>

 

There are good facilities for making a substream that cutting off the end after “Indoor” (ending: '</b'). But is there anything for cutting off after the beginning (and making sure the beginning exists)? I suppose a nice equivalent would be “starting: '<b>'”, and maybe also starting:ending:, since so often the separator we’re trying to substream based on has both opening and closing forms – e.g. XML, brackets, quotes.

 

Cheers,

Steve


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Michael Lucas-Smith-2
On 8/7/10 3:17 PM, Steven Kelly wrote:

Hi,

 

I’m enjoying playing around with Xstreams. In particular the substreams definitely makes parsing text much more pleasantly OO. One thing I haven’t figured out yet is how to do things like skipThroughAll:. E.g. if I want to pick the word “Indoor” out of this HTML:

 

<tr><td><b>Indoor</b>&nbsp;</td>

 

There are good facilities for making a substream that cutting off the end after “Indoor” (ending: '</b'). But is there anything for cutting off after the beginning (and making sure the beginning exists)? I suppose a nice equivalent would be “starting: '<b>'”, and maybe also starting:ending:, since so often the separator we’re trying to substream based on has both opening and closing forms – e.g. XML, brackets, quotes.

 

It sounds a lot like you're getting in to parsing here. You might consider using Xtreams-PEG ?

Besides that, you can explore:
isIndoor := stream exploring: [(stream read: 6) = 'Indoor'].

Of course in this case you must be positionable, so if you're on a socket, make sure you first wrap it as a positionable stream:
stream := socketstream positioning.

Cheers,
Michael

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams skipThroughAll: ?

Steven Kelly
In reply to this post by Steven Kelly
Thanks for replying, Michael! Unfortunately the content probably isn't regular enough for parsing. I tend to try a little light stream hacking for simple problems like these.
 
In most cases the bit of text I want to read varies - I can't search explicitly for "Indoor". But I do know what precedes and follows the bit of text I want.
 
Maybe this is just something that old streams do better than Xstreams? For now I'm just combining old and new:
 
substream rest readStream skipThroughAll: '<tr><td><b>'; upToAll: '</b>'
 
That's the beauty of substreams for me: before if the bold tag was never closed, this would read all the way to the end of the HTML page, losing all the information from here onwards. Now the substream created with "ender: '</tr>'" limits the damage to just this row.
 
Cheers,
Steve
 


From: [hidden email] on behalf of Michael Lucas-Smith
Sent: Sun 08/08/2010 22:46
To: [hidden email]
Subject: Re: [vwnc] Xtreams skipThroughAll: ?

On 8/7/10 3:17 PM, Steven Kelly wrote:

Hi,

 

I’m enjoying playing around with Xstreams. In particular the substreams definitely makes parsing text much more pleasantly OO. One thing I haven’t figured out yet is how to do things like skipThroughAll:. E.g. if I want to pick the word “Indoor” out of this HTML:

 

<tr><td><b>Indoor</b>&nbsp;</td>

 

There are good facilities for making a substream that cutting off the end after “Indoor” (ending: '</b'). But is there anything for cutting off after the beginning (and making sure the beginning exists)? I suppose a nice equivalent would be “starting: '<b>'”, and maybe also starting:ending:, since so often the separator we’re trying to substream based on has both opening and closing forms – e.g. XML, brackets, quotes.

 

It sounds a lot like you're getting in to parsing here. You might consider using Xtreams-PEG ?

Besides that, you can explore:
isIndoor := stream exploring: [(stream read: 6) = 'Indoor'].

Of course in this case you must be positionable, so if you're on a socket, make sure you first wrap it as a positionable stream:
stream := socketstream positioning.

Cheers,
Michael

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc