Hi,
what are the tools available from easier text extraction? The input is unstructured text, but I want to extract portion from it. I am not looking for an engineered approach (writing a parser or something), but something that can be done quickly by hand (i.e. interactively). For example I have string str = ' Temperature 0 37C (98F) [0x25] (TMPIN0)' Now I want to extract '0x25' from it and convert it into integer In Ruby it is dead simple: str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16) In Pharo I have to break my fingers first: rx := '.*\[0x(.*)\].*' asRegex. rx matches: str. Integer readFrom: (rx subexpression: 2) base: 16 "=>37". * I have to know that to get subexpression I have to match first to manipulate the internal state * I have to store the matcher to access the subexpression * I need to explicitly use some global variable Integer as a conversion utility to convert hex to dec * I have to manually remove the 0x, even though it is a very common way of expressing hex numbers So the question is: do we have a better way to do these things? And as I've mentioned - the use case is interactive coding where you often throw the code away when you are done; so dead easy to write and use. Thanks, Peter |
You can use #match and #upTo: on a ReadStream for easy extraction
---------------------------------------- | text digits| text := ' Temperature 0 37C (98F) [0x25] (TMPIN0)' digits:= text readStream match: '[0x' ; upTo: $]. ( '16r' , digits ) asNumber "37" ---------------------------------------- Best regards, Henrik -----Opprinnelig melding----- Fra: Pharo-users [mailto:[hidden email]] På vegne av Peter Uhnak Sendt: 20 January 2017 16:16 Til: [hidden email] Emne: [Pharo-users] Tools for easy subtext extraction from text Hi, what are the tools available from easier text extraction? The input is unstructured text, but I want to extract portion from it. I am not looking for an engineered approach (writing a parser or something), but something that can be done quickly by hand (i.e. interactively). For example I have string str = ' Temperature 0 37C (98F) [0x25] (TMPIN0)' Now I want to extract '0x25' from it and convert it into integer In Ruby it is dead simple: str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16) In Pharo I have to break my fingers first: rx := '.*\[0x(.*)\].*' asRegex. rx matches: str. Integer readFrom: (rx subexpression: 2) base: 16 "=>37". * I have to know that to get subexpression I have to match first to manipulate the internal state * I have to store the matcher to access the subexpression * I need to explicitly use some global variable Integer as a conversion utility to convert hex to dec * I have to manually remove the 0x, even though it is a very common way of expressing hex numbers So the question is: do we have a better way to do these things? And as I've mentioned - the use case is interactive coding where you often throw the code away when you are done; so dead easy to write and use. Thanks, Peter |
In reply to this post by Peter Uhnak
and dead unreadable Pharo way is both dead simple and dead readable str:= ' Temperature 0 37C (98F) [0x25] (TMPIN0)'. ('16r',(text copyWithRegex: '(.*\[0x)|(\].*)' matchesReplacedWith: '')) asNumber . |
In reply to this post by Peter Uhnak
>* I have to manually remove the 0x, even though it is a very common way of expressing hex numbers
Adding a few lines to NumberParser>>#nextNumber enables it to parse 0x... ----------------------------------- ((sourceStream peekFor: $r)) ifTrue: ["<base>r<integer>" ... ] ifFalse: [ (sourceStream peekFor: $x) ifTrue: [ "0x<integer>" (integerPart isZero and: [ numberOfTrailingZeroInIntegerPart = 1]) ifFalse: [ sourceStream skip: -1. ^ self expected: 'one leading 0 before x' ]. ^ self nextUnsignedIntegerBase: 16 ] ]. ---------------------------------- 0x10. "16" 0xFF + 16rFF = (2 * 0xff). "true" Best regards, Henrik -----Opprinnelig melding----- Fra: Pharo-users [mailto:[hidden email]] På vegne av Peter Uhnak Sendt: 20 January 2017 16:16 Til: [hidden email] Emne: [Pharo-users] Tools for easy subtext extraction from text Hi, what are the tools available from easier text extraction? The input is unstructured text, but I want to extract portion from it. I am not looking for an engineered approach (writing a parser or something), but something that can be done quickly by hand (i.e. interactively). For example I have string str = ' Temperature 0 37C (98F) [0x25] (TMPIN0)' Now I want to extract '0x25' from it and convert it into integer In Ruby it is dead simple: str[/\[(.*)\]/,1].hex # "=> 37" , or .to_i(16) In Pharo I have to break my fingers first: rx := '.*\[0x(.*)\].*' asRegex. rx matches: str. Integer readFrom: (rx subexpression: 2) base: 16 "=>37". * I have to know that to get subexpression I have to match first to manipulate the internal state * I have to store the matcher to access the subexpression * I need to explicitly use some global variable Integer as a conversion utility to convert hex to dec * I have to manually remove the 0x, even though it is a very common way of expressing hex numbers So the question is: do we have a better way to do these things? And as I've mentioned - the use case is interactive coding where you often throw the code away when you are done; so dead easy to write and use. Thanks, Peter |
In reply to this post by Peter Uhnak
Hi. 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>:
I always wondering when people think it is dead simple. I use streams for such cases. It is logical, readable and dead simple approach without crappy syntax. And with Xtreams library it become much more easy and fun |
In reply to this post by kilon.alios
Nice one. str:= --> text := On Sat, Jan 21, 2017 at 10:07 AM, Dimitris Chloupis <[hidden email]> wrote:
|
In reply to this post by Denis Kudriashov
On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote:
> Hi. > > 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>: > > > In Ruby it is dead simple: > > str[/\[(.*)\]/,1].hex # "=> 37" > > > > I always wondering when people think it is dead simple. > I use streams for such cases. It is logical, readable and dead simple I've never mentioned readability, because the code is throwaway. I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep. I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that. > approach without crappy syntax. And with Xtreams library it become much > more easy and fun Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is. --- > >> In Ruby it is dead simple: >> > > and dead unreadable > > Pharo way is both dead simple and dead readable Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API. Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time. You can always write .match() instead of []; e.g. in python: int(re.split('\[(.*)\]', str)[1], 16) int(re.search('\[(.*)\]', str).group(1), 16) But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself. Peter |
I collected some content about that and wanted to do something about it but, yeah, it got on the backburner. This is currently just the extract of the package comment or something like that. Phil On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote: On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote: |
In reply to this post by Peter Uhnak
I use PetitParser for this purpose.
It provides: - an incremental way to build a parser especially when you use its bounded sea abilities, - a way to debug problems, and - a reasonably readable outcome that can be extended to a more complicated parser. For example, in your case, you can use: string := 'Temperature 0 37C (98F) [0x25] (TMPIN0)'. (('[' asPParser, ']' asPParser negate star flatten, ']' asPParser ==> #second) sea ==> #second) parse: string Cheers, Doru > On Jan 21, 2017, at 3:08 PM, Peter Uhnak <[hidden email]> wrote: > > On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote: >> Hi. >> >> 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>: >> >>> In Ruby it is dead simple: >>> str[/\[(.*)\]/,1].hex # "=> 37" >>> >> >> I always wondering when people think it is dead simple. >> I use streams for such cases. It is logical, readable and dead simple > > I've never mentioned readability, because the code is throwaway. > I guess if you are not using regexes it could look odd, but as a linux user it is very casual; if I had to extract the information I would just pipe it through sed or grep. > > I wouldn't use such thing in code that I want to keep, but I explicitly mentioned that. > > >> approach without crappy syntax. And with Xtreams library it become much >> more easy and fun > > Are there any docs for Xtreams? I found several repositories, but none explain what Xtreams even is. > > --- > >> >>> In Ruby it is dead simple: >>> >> >> and dead unreadable >> >> Pharo way is both dead simple and dead readable > > Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API. > > Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time. > > You can always write .match() instead of []; e.g. in python: > > int(re.split('\[(.*)\]', str)[1], 16) > int(re.search('\[(.*)\]', str).group(1), 16) > > But my point was not addressing this particular problem, but the general problem --- I often find it much easier to preprocess data with standard linux tools and then feed it to Pharo then to try to do the same in Pharo itself. > > Peter > -- www.tudorgirba.com www.feenk.com "Value is always contextual." |
In reply to this post by philippeback
2017-01-21 17:09 GMT+01:00 [hidden email] <[hidden email]>:
And look here: |
On Sat, Jan 21, 2017 at 07:09:19PM +0100, Denis Kudriashov wrote:
> 2017-01-21 17:09 GMT+01:00 [hidden email] <[hidden email]>: > > > I collected some content about that and wanted to do something about it > > but, yeah, it got on the backburner. > > > > This is currently just the extract of the package comment or something > > like that. > > > > https://github.com/philippeback/xstreamsdoc > > > > And look here: > > https://github.com/mkobetic/Xtreams Ah, the wiki has several pages. I just saw the welcome page and uninformative readme. > https://code.google.com/archive/p/xtreams/wikis Thanks! Peter |
Actually the things I snatched are from there. Phil On Sat, Jan 21, 2017 at 7:19 PM, Peter Uhnak <[hidden email]> wrote: On Sat, Jan 21, 2017 at 07:09:19PM +0100, Denis Kudriashov wrote: |
In reply to this post by Peter Uhnak
On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote: On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote: |
Yes I should finish to convert everything. I hope that in Pharo 70 we will be able add Xtream like library and remove the old stream but this is large task. stef
-- Using Opera's mail client: http://www.opera.com/mail/ |
In reply to this post by philippeback
Ah, I now see my googing issue, because apparently noone knows whether the name is Xtreams or Xstreams
If I google 'Pharo Xtreams' (which is the correct name it seems) I get fuckall, but if I google 'Pharo Xstreams' I get the link as second result, just because someone misspelled the filename. -_- Is there a build for the PharoLimbo, or do I have to compile it myself? Peter On Sat, Jan 21, 2017 at 09:06:34PM +0100, [hidden email] wrote: > There is also this > > https://github.com/SquareBracketAssociates/PharoLimbo/tree/master/Xtreams > > > > On Sat, Jan 21, 2017 at 3:08 PM, Peter Uhnak <[hidden email]> wrote: > > > On Sat, Jan 21, 2017 at 02:01:59PM +0100, Denis Kudriashov wrote: > > > Hi. > > > > > > 2017-01-20 16:15 GMT+01:00 Peter Uhnak <[hidden email]>: > > > > > > > In Ruby it is dead simple: > > > > str[/\[(.*)\]/,1].hex # "=> 37" > > > > > > > > > > I always wondering when people think it is dead simple. > > > I use streams for such cases. It is logical, readable and dead simple > > > > I've never mentioned readability, because the code is throwaway. > > I guess if you are not using regexes it could look odd, but as a linux > > user it is very casual; if I had to extract the information I would just > > pipe it through sed or grep. > > > > I wouldn't use such thing in code that I want to keep, but I explicitly > > mentioned that. > > > > > > > approach without crappy syntax. And with Xtreams library it become much > > > more easy and fun > > > > Are there any docs for Xtreams? I found several repositories, but none > > explain what Xtreams even is. > > > > --- > > > > > > > >> In Ruby it is dead simple: > > >> > > > > > > and dead unreadable > > > > > > Pharo way is both dead simple and dead readable > > > > Dtto as above. Readability was never a question. And if it was, then you > > just doubled the regex complexity, and made the code more confusing by > > turning the problem upside down, due to the limited API. > > > > Complaining about the compact syntax makes as much sense as complaining > > that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which > > you can do btw); the point of compactness is that when you know what you > > are doing you can save some time. > > > > You can always write .match() instead of []; e.g. in python: > > > > int(re.split('\[(.*)\]', str)[1], 16) > > int(re.search('\[(.*)\]', str).group(1), 16) > > > > But my point was not addressing this particular problem, but the general > > problem --- I often find it much easier to preprocess data with standard > > linux tools and then feed it to Pharo then to try to do the same in Pharo > > itself. > > > > Peter > > > > > > |
In reply to this post by Peter Uhnak
Dtto as above. Readability was never a question. And if it was, then you just doubled the regex complexity, and made the code more confusing by turning the problem upside down, due to the limited API. Complaining about the compact syntax makes as much sense as complaining that `1+2` is too cryptic and should be written as `1 digitAdd: 2` (which you can do btw); the point of compactness is that when you know what you are doing you can save some time. Typing speed is pretty much the 1000th reason for a slow down, 1th is the ability to understand code, at least for me. Ruby is a more flexible and better designed language, but I would pick python any day over it which by the way beats even Pharo in readability because of its culture on how to write APIs. On the subject of your issue, this is no work around I offered, this is standard regex syntax. Exclusion and combination of matches. Nothing special. I am using regex to parse python code, in pharo, for me at least Pharo regex API rocks. |
Reability: the more you read a language, the more you can read it. And saying Python is super readable can have a look at how BSON is dealt with in the MongoDB driver. https://github.com/mongodb/mongo-python-driver/tree/master/bson Now go check the same in MongoTalk. Where you can actually understand what it does and how it works. Without having to resort to C code every once in a while. Check some Perl code from a few years ago and come back with terseness and readability arguments. Yeah, sure. Phil On Sat, Jan 21, 2017 at 11:16 PM, Dimitris Chloupis <[hidden email]> wrote:
|
> Check some Perl code from a few years ago and come back with terseness and
> readability arguments. Yeah, sure. I've been using Perl years ago. ;) When you are using it actively its very compact syntax is very cool once it becomes habitual; but of course Perl is write-only language. But we are ranting on, as I've mentioned the readability is not a factor when you know what you are doing --- you quickly write it, you execute it, and you throw the code away, or rewrite it (rewrite != modify). Peter |
In reply to this post by stepharong
Stef, we need to think about it carefullly. Streams are used in the kernel for many tasks. Replacing them by a big framework will be a huge drawback for bootstrapping purposes. Alternatively, we could think of refactoring the kernel to not use streams, but so far this is not possible... the kernel uses the compiler and the code importer that depend on parsing streams... We should plan :) On Sat, Jan 21, 2017 at 10:07 PM, stepharong <[hidden email]> wrote:
|
2017-01-22 11:54 GMT+01:00 Guillermo Polito <[hidden email]>:
I would not say that Xtreams is bigger library then current streams in system. I measure it a bit:
And current streams:
So in summary current streams are ~600 methods which is similar to xtreams. But maybe current streams is much bigger code base. I not take into account compression part, encodings and others. Anyway idea to replace current streams completely is huge task. I doubt that we can move such way. |
Free forum by Nabble | Edit this page |