Tools for easy subtext extraction from text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Guillermo Polito


On Sun, Jan 22, 2017 at 1:14 PM, Denis Kudriashov <[hidden email]> wrote:

2017-01-22 11:54 GMT+01:00 Guillermo Polito <[hidden email]>:
Stef, we need to think about it carefullly. Streams are used in the kernel for many tasks. Replacing them by a big framework will be a huge drawback for bootstrapping purposes.

I would not say that Xtreams is bigger library then current streams in system. I measure it a bit:

"5 packages: core parts + file streams + socket streams"
ps := RPackageOrganizer default packages select: [ :each | each name beginsWith: 'Xtreams-' ]. 
ps sum: [ :each | each definedClasses size ] "45".
ps sum: [ :each | (each definedClasses sum: [ :c | c methods size ])
+ each extensionMethods size]  "585".

Just curious, how about extension methods also?
 

And current streams:

Stream package definedClasses size "13".
(Stream package definedClasses sum: [ :c | c methods size ])
+ Stream package extensionMethods size "304".

these are the kernel ones
 
({AbstractBinaryFileStream. FileStream} flatCollect: #withAllSubclasses) size."6"
({AbstractBinaryFileStream. FileStream} flatCollect: #withAllSubclasses) sum: [ :c | c methods size ]."226"
 

These are file streams, I'm not counting them in as kernel streams. And maybe I'm wrong but Xtreams requires them, doesn't it?
 
SocketStream methods size "81"

Sockets are not in the kernel, they are loaded afterwards 
 
So in summary current streams are ~600 methods which is similar to xtreams.

So this is not quite true. In any case, I'm not simply against, I'd like that we make a serious analysis of the impact before we integrate something like this. How many things do change? Is it modular? Can we maintain it?
 
But maybe current streams is much bigger code base. I not take into account compression part, encodings and others.

Again, not everything is in the kernel.
 

Anyway idea to replace current streams completely is huge task. I doubt that we can move such way.


Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

stepharong
In reply to this post by Guillermo Polito


Stef, we need to think about it carefullly. Streams are used in the kernel for many tasks. Replacing them by a big framework will be a huge drawback for bootstrapping purposes.

Alternatively, we could think of refactoring the kernel to not use streams, but so far this is not possible... the kernel uses the compiler and the code importer that depend on parsing streams...

We should plan :)

sure as always :)




On Sat, Jan 21, 2017 at 10:07 PM, stepharong <[hidden email]> wrote:
Yes I should finish to convert everything. 
I hope that in Pharo 70 we will be able add Xtream like library and remove the old stream
but this is large task.

stef


On Sat, 21 Jan 2017 21:06:34 +0100, [hidden email] <[hidden email]> wrote:


On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> Hi.
>
> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>
> > In Ruby it is dead simple:
> > str[/\[(.*)\]/,1].hex # "=> 37"
> >
>
> I always wondering when people think it is dead simple.
> I use streams for such cases. It is logical, readable and dead simple

I've never mentioned readability, because the code is throwaway.
I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.

I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.


> approach without crappy syntax. And with Xtreams library it become much
> more easy and fun

Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.

---

>
>> In Ruby it is dead simple:
>>
>
> and dead unreadable
>
> Pharo way is both dead simple and dead readable

Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

You can always write .match() instead of []; e.g. in python:

int(re.split('\[(.*)\]', str)[1], 16)
int(re.search('\[(.*)\]', str).group(1), 16)

But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.

Peter






--
Using Opera's mail client: http://www.opera.com/mail/




--
Using Opera's mail client: http://www.opera.com/mail/
Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

stepharong
In reply to this post by Peter Uhnak
BTW we were discussing about deprecating Regexp and using a RePlugin to  
bind to a default mainstream regexp expressions lib
so that people can use the same expressions that they are used to in Pharo.

Any ideas?
If this is important for you, you can help making it true.

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

stepharong
In reply to this post by Guillermo Polito
Guille

I think that we should also have a look at Xtreams and see if there is not a core inside. 

What I think that is that current stream implementation is quite terrible.

Stef
Reply | Threaded
Open this post in threaded view
|

Re: Xtreams docs (previously: Tools for easy subtext extraction from text)

stepharong
In reply to this post by Peter Uhnak
Hi Peter


> Is there a build for the PharoLimbo, or do I have to compile it myself?

No and it is incomplete.
I should convert the web site to a chapter. But let us see it
 from time to time I 'm tired to do such kind of boring job.

I will send you the pdf I have



12