Tools for easy subtext extraction from text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Tools for easy subtext extraction from text

Peter Uhnak
Hi,

what are the tools available from easier text extraction?

The input is unstructured text, but I want to extract portion from it.

I am not looking for an engineered approach (writing a parser or something), but something that can be done quickly by hand (i.e. interactively).

For example I have string
str = ' Temperature 0 37C (98F) [0x25] (TMPIN0)'
Now I want to extract '0x25' from it and convert it into integer

In Ruby it is dead simple:
str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16)

In Pharo I have to break my fingers first:
rx := '.*\[0x(.*)\].*' asRegex.
rx matches: str.
Integer readFrom: (rx subexpression: 2) base: 16 "=>37".

* I have to know that to get subexpression I have to match first to manipulate the internal state
* I have to store the matcher to access the subexpression
* I need to explicitly use some global variable Integer as a conversion utility to convert hex to dec
* I have to manually remove the 0x, even though it is a very common way of expressing hex numbers


So the question is:
do we have a better way to do these things? And as I've mentioned - the use case is interactive coding where you often throw the code away when you are done; so dead easy to write and use.

Thanks,
Peter


Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Henrik-Nergaard
You can use #match and #upTo: on a ReadStream for easy extraction
----------------------------------------
| text digits|
text := ' Temperature 0 37C (98F) [0x25] (TMPIN0)'
digits:= text readStream match: '[0x' ; upTo: $].
( '16r' , digits ) asNumber "37"
----------------------------------------

Best regards,
Henrik


-----Opprinnelig melding-----
Fra: Pharo-users [mailto:[hidden email]] På vegne av Peter Uhnak
Sendt: 20 January 2017 16:16
Til: [hidden email]
Emne: [Pharo-users] Tools for easy subtext extraction from text

Hi,

what are the tools available from easier text extraction?

The input is unstructured text, but I want to extract portion from it.

I am not looking for an engineered approach (writing a parser or something), but something that can be done quickly by hand (i.e. interactively).

For example I have string
str = ' Temperature 0 37C (98F) [0x25] (TMPIN0)'
Now I want to extract '0x25' from it and convert it into integer

In Ruby it is dead simple:
str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16)

In Pharo I have to break my fingers first:
rx := '.*\[0x(.*)\].*' asRegex.
rx matches: str.
Integer readFrom: (rx subexpression: 2) base: 16 "=>37".

* I have to know that to get subexpression I have to match first to manipulate the internal state
* I have to store the matcher to access the subexpression
* I need to explicitly use some global variable Integer as a conversion utility to convert hex to dec
* I have to manually remove the 0x, even though it is a very common way of expressing hex numbers


So the question is:
do we have a better way to do these things? And as I've mentioned - the use case is interactive coding where you often throw the code away when you are done; so dead easy to write and use.

Thanks,
Peter



Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

kilon.alios
In reply to this post by Peter Uhnak

In Ruby it is dead simple:
str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16)

and dead unreadable

Pharo way is both dead simple and dead readable

str:= '       Temperature 0           37C (98F) [0x25] (TMPIN0)'.

('16r',(text copyWithRegex: '(.*\[0x)|(\].*)'  matchesReplacedWith: '')) asNumber . 
Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Henrik-Nergaard
In reply to this post by Peter Uhnak
>* I have to manually remove the 0x, even though it is a very common way of expressing hex numbers

Adding a few lines to NumberParser>>#nextNumber enables it to parse 0x...
-----------------------------------
((sourceStream peekFor: $r)) ifTrue: ["<base>r<integer>"
        ...
]
ifFalse: [
        (sourceStream peekFor: $x) ifTrue: [ "0x<integer>"
                (integerPart isZero and: [ numberOfTrailingZeroInIntegerPart = 1]) ifFalse: [
                        sourceStream skip: -1.
                        ^ self expected: 'one leading 0 before x'
                ].
                ^ self nextUnsignedIntegerBase: 16
        ]
].
----------------------------------
0x10. "16"
0xFF + 16rFF = (2 * 0xff). "true"

Best regards,
Henrik

-----Opprinnelig melding-----
Fra: Pharo-users [mailto:[hidden email]] På vegne av Peter Uhnak
Sendt: 20 January 2017 16:16
Til: [hidden email]
Emne: [Pharo-users] Tools for easy subtext extraction from text

Hi,

what are the tools available from easier text extraction?

The input is unstructured text, but I want to extract portion from it.

I am not looking for an engineered approach (writing a parser or something), but something that can be done quickly by hand (i.e. interactively).

For example I have string
str = ' Temperature 0 37C (98F) [0x25] (TMPIN0)'
Now I want to extract '0x25' from it and convert it into integer

In Ruby it is dead simple:
str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16)

In Pharo I have to break my fingers first:
rx := '.*\[0x(.*)\].*' asRegex.
rx matches: str.
Integer readFrom: (rx subexpression: 2) base: 16 "=>37".

* I have to know that to get subexpression I have to match first to manipulate the internal state
* I have to store the matcher to access the subexpression
* I need to explicitly use some global variable Integer as a conversion utility to convert hex to dec
* I have to manually remove the 0x, even though it is a very common way of expressing hex numbers


So the question is:
do we have a better way to do these things? And as I've mentioned - the use case is interactive coding where you often throw the code away when you are done; so dead easy to write and use.

Thanks,
Peter



Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Denis Kudriashov
In reply to this post by Peter Uhnak
Hi.

2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
In Ruby it is dead simple:
str[/\[(.*)\]/,1].hex # "=> 37"

I always wondering when people think it is dead simple.
I use streams for such cases. It is logical, readable and dead simple approach without crappy syntax. And with Xtreams library it become much more easy and fun
Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

philippeback
In reply to this post by kilon.alios
Nice one. str:= --> text :=

On Sat, Jan 21, 2017 at 10:07 AM, Dimitris Chloupis <[hidden email]> wrote:

In Ruby it is dead simple:
str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16)

and dead unreadable

Pharo way is both dead simple and dead readable

str:= '       Temperature 0           37C (98F) [0x25] (TMPIN0)'.

('16r',(text copyWithRegex: '(.*\[0x)|(\].*)'  matchesReplacedWith: '')) asNumber . 

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Peter Uhnak
In reply to this post by Denis Kudriashov
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:

> Hi.
>
> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>
> > In Ruby it is dead simple:
> > str[/\[(.*)\]/,1].hex # "=> 37"
> >
>
> I always wondering when people think it is dead simple.
> I use streams for such cases. It is logical, readable and dead simple

I've never mentioned readability, because the code is throwaway.
I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.

I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.


> approach without crappy syntax. And with Xtreams library it become much
> more easy and fun

Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.

---

>
>> In Ruby it is dead simple:
>>
>
> and dead unreadable
>
> Pharo way is both dead simple and dead readable

Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

You can always write .match() instead of []; e.g. in python:

int(re.split('\[(.*)\]', str)[1], 16)
int(re.search('\[(.*)\]', str).group(1), 16)

But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.

Peter

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

philippeback
I collected some content about that and wanted to do something about it but, yeah, it got on the backburner.

This is currently just the extract of the package comment or something like that.


Phil

On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> Hi.
>
> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>
> > In Ruby it is dead simple:
> > str[/\[(.*)\]/,1].hex # "=> 37"
> >
>
> I always wondering when people think it is dead simple.
> I use streams for such cases. It is logical, readable and dead simple

I've never mentioned readability, because the code is throwaway.
I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.

I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.


> approach without crappy syntax. And with Xtreams library it become much
> more easy and fun

Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.

---

>
>> In Ruby it is dead simple:
>>
>
> and dead unreadable
>
> Pharo way is both dead simple and dead readable

Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

You can always write .match() instead of []; e.g. in python:

int(re.split('\[(.*)\]', str)[1], 16)
int(re.search('\[(.*)\]', str).group(1), 16)

But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.

Peter



Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Tudor Girba-2
In reply to this post by Peter Uhnak
I use PetitParser for this purpose.

It provides:
- an incremental way to build a parser especially when you use its bounded sea abilities,
- a way to debug problems, and
- a reasonably readable outcome that can be extended to a more complicated parser.

For example, in your case, you can use:

string := 'Temperature 0 37C (98F) [0x25] (TMPIN0)'.
(('[' asPParser, ']' asPParser negate star flatten, ']' asPParser ==> #second) sea ==> #second)
        parse: string

Cheers,
Doru


> On Jan 21, 2017, at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
>
> On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
>> Hi.
>>
>> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>>
>>> In Ruby it is dead simple:
>>> str[/\[(.*)\]/,1].hex # "=> 37"
>>>
>>
>> I always wondering when people think it is dead simple.
>> I use streams for such cases. It is logical, readable and dead simple
>
> I've never mentioned readability, because the code is throwaway.
> I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.
>
> I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.
>
>
>> approach without crappy syntax. And with Xtreams library it become much
>> more easy and fun
>
> Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.
>
> ---
>
>>
>>> In Ruby it is dead simple:
>>>
>>
>> and dead unreadable
>>
>> Pharo way is both dead simple and dead readable
>
> Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.
>
> Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.
>
> You can always write .match() instead of []; e.g. in python:
>
> int(re.split('\[(.*)\]', str)[1], 16)
> int(re.search('\[(.*)\]', str).group(1), 16)
>
> But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.
>
> Peter
>

--
www.tudorgirba.com
www.feenk.com

"Value is always contextual."





Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Denis Kudriashov
In reply to this post by philippeback

2017-01-21 17:09 GMT+01:00 [hidden email] <[hidden email]>:
I collected some content about that and wanted to do something about it but, yeah, it got on the backburner.

This is currently just the extract of the package comment or something like that.


And look here:

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Peter Uhnak
On Sat, Jan 21, 2017 at 07:09:19PM +0100, Denis Kudriashov wrote:

> 2017-01-21 17:09 GMT+01:00 [hidden email] <[hidden email]>:
>
> > I collected some content about that and wanted to do something about it
> > but, yeah, it got on the backburner.
> >
> > This is currently just the extract of the package comment or something
> > like that.
> >
> > https://github.com/philippeback/xstreamsdoc
> >
>
> And look here:
>
> https://github.com/mkobetic/Xtreams

Ah, the wiki has several pages. I just saw the welcome page and uninformative readme.

> https://code.google.com/archive/p/xtreams/wikis


Thanks!

Peter

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

philippeback
Actually the things I snatched are from there.

Phil

On Sat, Jan 21, 2017 at 7:19 PM, Peter Uhnak <[hidden email]> wrote:
On Sat, Jan 21, 2017 at 07:09:19PM +0100, Denis Kudriashov wrote:
> 2017-01-21 17:09 GMT+01:00 [hidden email] <[hidden email]>:
>
> > I collected some content about that and wanted to do something about it
> > but, yeah, it got on the backburner.
> >
> > This is currently just the extract of the package comment or something
> > like that.
> >
> > https://github.com/philippeback/xstreamsdoc
> >
>
> And look here:
>
> https://github.com/mkobetic/Xtreams

Ah, the wiki has several pages. I just saw the welcome page and uninformative readme.

> https://code.google.com/archive/p/xtreams/wikis


Thanks!

Peter



Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

philippeback
In reply to this post by Peter Uhnak

On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> Hi.
>
> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>
> > In Ruby it is dead simple:
> > str[/\[(.*)\]/,1].hex # "=> 37"
> >
>
> I always wondering when people think it is dead simple.
> I use streams for such cases. It is logical, readable and dead simple

I've never mentioned readability, because the code is throwaway.
I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.

I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.


> approach without crappy syntax. And with Xtreams library it become much
> more easy and fun

Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.

---

>
>> In Ruby it is dead simple:
>>
>
> and dead unreadable
>
> Pharo way is both dead simple and dead readable

Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

You can always write .match() instead of []; e.g. in python:

int(re.split('\[(.*)\]', str)[1], 16)
int(re.search('\[(.*)\]', str).group(1), 16)

But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.

Peter



Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

stepharong
Yes I should finish to convert everything. 
I hope that in Pharo 70 we will be able add Xtream like library and remove the old stream
but this is large task.

stef


On Sat, 21 Jan 2017 21:06:34 +0100, [hidden email] <[hidden email]> wrote:


On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> Hi.
>
> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>
> > In Ruby it is dead simple:
> > str[/\[(.*)\]/,1].hex # "=> 37"
> >
>
> I always wondering when people think it is dead simple.
> I use streams for such cases. It is logical, readable and dead simple

I've never mentioned readability, because the code is throwaway.
I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.

I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.


> approach without crappy syntax. And with Xtreams library it become much
> more easy and fun

Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.

---

>
>> In Ruby it is dead simple:
>>
>
> and dead unreadable
>
> Pharo way is both dead simple and dead readable

Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

You can always write .match() instead of []; e.g. in python:

int(re.split('\[(.*)\]', str)[1], 16)
int(re.search('\[(.*)\]', str).group(1), 16)

But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.

Peter






--
Using Opera's mail client: http://www.opera.com/mail/
Reply | Threaded
Open this post in threaded view
|

Xtreams docs (previously: Tools for easy subtext extraction from text)

Peter Uhnak
In reply to this post by philippeback
Ah, I now see my googing issue, because apparently noone knows whether the name is Xtreams or Xstreams

If I google 'Pharo Xtreams' (which is the correct name it seems) I get fuckall,
but if I google 'Pharo Xstreams' I get the link as second result, just because someone misspelled the filename. -_-

Is there a build for the PharoLimbo, or do I have to compile it myself?

Peter


On Sat, Jan 21, 2017 at 09:06:34PM +0100, [hidden email] wrote:

> There is also this
>
> https://github.com/SquareBracketAssociates/PharoLimbo/tree/master/Xtreams
>
>
>
> On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
>
> > On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> > > Hi.
> > >
> > > 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
> > >
> > > > In Ruby it is dead simple:
> > > > str[/\[(.*)\]/,1].hex # "=> 37"
> > > >
> > >
> > > I always wondering when people think it is dead simple.
> > > I use streams for such cases. It is logical, readable and dead simple
> >
> > I've never mentioned readability, because the code is throwaway.
> > I guess if you are not using regexes it could look odd, but as a linux
> > user it is very casual; if I had to extract the information I would just
> > pipe it through sed or grep.
> >
> > I wouldn't use such thing in code that I want to keep, but I explicitly
> > mentioned that.
> >
> >
> > > approach without crappy syntax. And with Xtreams library it become much
> > > more easy and fun
> >
> > Are there any docs for Xtreams? I found several repositories, but none
> > explain what Xtreams even is.
> >
> > ---
> >
> > >
> > >> In Ruby it is dead simple:
> > >>
> > >
> > > and dead unreadable
> > >
> > > Pharo way is both dead simple and dead readable
> >
> > Dtto as above. Readability was never a question. And if it was, then you
> > just doubled the regex complexity, and made the code more confusing by
> > turning the problem upside down, due to the limited API.
> >
> > Complaining about the compact syntax makes as much sense as complaining
> > that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which
> > you can do btw); the point of compactness is that when you know what you
> > are doing you can save some time.
> >
> > You can always write .match() instead of []; e.g. in python:
> >
> > int(re.split('\[(.*)\]', str)[1], 16)
> > int(re.search('\[(.*)\]', str).group(1), 16)
> >
> > But my point was not addressing this particular problem, but the general
> > problem --- I often find it much easier to preprocess data with standard
> > linux tools and then feed it to Pharo then to try to do the same in Pharo
> > itself.
> >
> > Peter
> >
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

kilon.alios
In reply to this post by Peter Uhnak


Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

Typing speed is pretty much the 1000th reason for a slow down, 1th is the ability to understand code, at least for me. 

Ruby is a more flexible and better designed language, but I would pick python any day over it which by the way beats even Pharo in readability because of its culture on how to write APIs. 

On the subject of your issue, this is no work around I offered, this is standard regex syntax. Exclusion and combination of matches. Nothing special. 

 I am using regex to parse python code, in pharo, for me at least Pharo regex API rocks. 
Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

philippeback
Reability: the more you read a language, the more you can read it.

And saying Python is super readable can have a look at how BSON is dealt with in the MongoDB driver. https://github.com/mongodb/mongo-python-driver/tree/master/bson

Now go check the same in MongoTalk. Where you can actually understand what it does and how it works.
Without having to resort to C code every once in a while.

Check some Perl code from a few years ago and come back with terseness and readability arguments. Yeah, sure.

Phil


On Sat, Jan 21, 2017 at 11:16 PM, Dimitris Chloupis <[hidden email]> wrote:


Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

Typing speed is pretty much the 1000th reason for a slow down, 1th is the ability to understand code, at least for me. 

Ruby is a more flexible and better designed language, but I would pick python any day over it which by the way beats even Pharo in readability because of its culture on how to write APIs. 

On the subject of your issue, this is no work around I offered, this is standard regex syntax. Exclusion and combination of matches. Nothing special. 

 I am using regex to parse python code, in pharo, for me at least Pharo regex API rocks. 

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Peter Uhnak
> Check some Perl code from a few years ago and come back with terseness and
> readability arguments. Yeah, sure.

I've been using Perl years ago. ;) When you are using it actively its very compact syntax is very cool once it becomes habitual; but of course Perl is write-only language.

But we are ranting on, as I've mentioned the readability is not a factor when you know what you are doing --- you quickly write it, you execute it, and you throw the code away, or rewrite it (rewrite != modify).

Peter

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Guillermo Polito
In reply to this post by stepharong
Stef, we need to think about it carefullly. Streams are used in the kernel for many tasks. Replacing them by a big framework will be a huge drawback for bootstrapping purposes.

Alternatively, we could think of refactoring the kernel to not use streams, but so far this is not possible... the kernel uses the compiler and the code importer that depend on parsing streams...

We should plan :)

On Sat, Jan 21, 2017 at 10:07 PM, stepharong <[hidden email]> wrote:
Yes I should finish to convert everything. 
I hope that in Pharo 70 we will be able add Xtream like library and remove the old stream
but this is large task.

stef


On Sat, 21 Jan 2017 21:06:34 +0100, [hidden email] <[hidden email]> wrote:


On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote:
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> Hi.
>
> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
>
> > In Ruby it is dead simple:
> > str[/\[(.*)\]/,1].hex # "=> 37"
> >
>
> I always wondering when people think it is dead simple.
> I use streams for such cases. It is logical, readable and dead simple

I've never mentioned readability, because the code is throwaway.
I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep.

I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that.


> approach without crappy syntax. And with Xtreams library it become much
> more easy and fun

Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is.

---

>
>> In Ruby it is dead simple:
>>
>
> and dead unreadable
>
> Pharo way is both dead simple and dead readable

Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API.

Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time.

You can always write .match() instead of []; e.g. in python:

int(re.split('\[(.*)\]', str)[1], 16)
int(re.search('\[(.*)\]', str).group(1), 16)

But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself.

Peter






--
Using Opera's mail client: http://www.opera.com/mail/

Reply | Threaded
Open this post in threaded view
|

Re: Tools for easy subtext extraction from text

Denis Kudriashov

2017-01-22 11:54 GMT+01:00 Guillermo Polito <[hidden email]>:
Stef, we need to think about it carefullly. Streams are used in the kernel for many tasks. Replacing them by a big framework will be a huge drawback for bootstrapping purposes.

I would not say that Xtreams is bigger library then current streams in system. I measure it a bit:

"5 packages: core parts + file streams + socket streams"
ps := RPackageOrganizer default packages select: [ :each | each name beginsWith: 'Xtreams-' ]. 
ps sum: [ :each | each definedClasses size ] "45".
ps sum: [ :each | (each definedClasses sum: [ :c | c methods size ])
+ each extensionMethods size]  "585".

And current streams:

Stream package definedClasses size "13".
(Stream package definedClasses sum: [ :c | c methods size ])
+ Stream package extensionMethods size "304".
({AbstractBinaryFileStream. FileStream} flatCollect: #withAllSubclasses) size."6"
({AbstractBinaryFileStream. FileStream} flatCollect: #withAllSubclasses) sum: [ :c | c methods size ]."226"
 
SocketStream methods size "81"

So in summary current streams are ~600 methods which is similar to xtreams.
But maybe current streams is much bigger code base. I not take into account compression part, encodings and others.

Anyway idea to replace current streams completely is huge task. I doubt that we can move such way.

12