Smalltalk › Pharo › Pharo Smalltalk Developers

Easier way to parse this?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Mariano Martinez Peck

Easier way to parse this?

Hi guys,

I wonder which is the easier way to do the following. I have a string which inside could have something like 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'. I need to parse that string, get all places where I have things surrounded with <code>SOMETHING</code>, get the "SOMETHING" (in previous example, that would be 'some funny lines'), execute that (this is something internal) , and from that I get the real string (imagine in this case the answer is 'SOME FUNNY LINES'). Finally, I need to replace the orignal string... So ... given the input:

'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'

And given my specific domain logic transformation (in this example I assume a simple #asUppercase), I would like to get:

'this is a string with SOME FUNNY LINES and here is another HAHA'

I got it working with below lines. But it is a hack and terrible slow (I imagine).

So...anyone has an idea how can I do this simpler/faster? Maybe some RB re-write rule?

Thanks in advance

| dom string originalString stringToBeAbleToParse xmlDocument replacements finalString |

replacements := Dictionary new.

originalString := 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'.

stringToBeAbleToParse := '<hack>', originalString, '</hack>'.

dom := XMLDOMParser on: stringToBeAbleToParse.

dom configuration isValidating: false.

xmlDocument := dom parseDocument.

(xmlDocument allElementsNamed: 'code') do: [ :aXMLElement |

"Let's simulate my domain transformation logic as a simple #asUppercase"

replacements at: aXMLElement asString put: (([:code | code asUppercase ]) value: aXMLElement nodes first asString).

finalString := originalString.

replacements keysAndValuesDo: [ :originalText :new |

finalString := finalString copyReplaceAll: originalText with: new.

finalString

Mariano
http://marianopeck.wordpress.com

alistairgrant

Re: Easier way to parse this?

Hi Mariano,

On Thu, May 12, 2016 at 04:50:29PM -0300, Mariano Martinez Peck wrote:

> Hi guys,
>
> I wonder which is the easier way to do the following. I have a string which
> inside could have something like 'this is a string with <code>some funny lines
> </code> and here is another <code>haha</code>'. I need to parse that string,
> get all places where I have things surrounded with <code>SOMETHING</code>, get
> the "SOMETHING" (in previous example, that would be 'some funny lines'),
> execute that (this is something internal) , and from that I get the real string
> (imagine in this case the answer is 'SOME FUNNY LINES'). Finally, I need to
> replace the orignal string... So ... given the input:
>
> 'this is a string with <code>some funny lines</code> and here is another <code>
> haha</code>'
>
> And given my specific domain logic transformation (in this example I assume a
> simple #asUppercase), I would like to get:
>
> 'this is a string with SOME FUNNY LINES and here is another HAHA'
>
> I got it working with below lines. But it is a hack and terrible slow (I
> imagine).
> So...anyone has an idea how can I do this simpler/faster? Maybe some RB
> re-write rule?
>
> Thanks in advance
>
>
>
> | dom string originalString stringToBeAbleToParse xmlDocument replacements
> finalString |
> replacements := Dictionary new.
> originalString := 'this is a string with <code>some funny lines</code> and here
> is another <code>haha</code>'.
> stringToBeAbleToParse := '<hack>', originalString, '</hack>'.
> dom := XMLDOMParser on: stringToBeAbleToParse.
> dom configuration isValidating: false.
> xmlDocument := dom parseDocument.
> (xmlDocument allElementsNamed: 'code') do: [ :aXMLElement |
>
> "Let's simulate my domain transformation logic as a simple #
> asUppercase"
> replacements at: aXMLElement asString put: (([:code | code asUppercase ])
> value: aXMLElement nodes first asString).
> ].
> finalString := originalString.
> replacements keysAndValuesDo: [ :originalText :new |
> finalString := finalString copyReplaceAll: originalText with: new.
> ].
> finalString

I don't think this is quite what you want, but it should be close enough
to get you started:

| str re oc |

str := 'this is a string with <code>some funny lines</code> and here is
another <code>haha</code>'.
re := '<code>([^<]*)</code>' asRegex.
re copy: str translatingMatchesUsing: [ :each | each asUppercase].

HTH,
Alistair

Denis Kudriashov

Re: Easier way to parse this?

In reply to this post by Mariano Martinez Peck

Hi.

I always solve such kind of problems with streams. It is super easy and much easy then regex (I hate regex). For your case it would be something like:

in := source readStream.
result := String streamContents: [:out |
[in atEnd] whileFalse: [
out nextPutAll: (in upToAll: '<code>').
code := in upToAll: '</code>'.
out nextPutAll: code asUppercase; nextPutAll: '</code>'].
]

And it could be much nicer with Xtreams but I not remember it API (maybe tomorrow I will remember).

2016-05-12 21:50 GMT+02:00 Mariano Martinez Peck <[hidden email]>:

Hi guys,

I wonder which is the easier way to do the following. I have a string which inside could have something like 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'. I need to parse that string, get all places where I have things surrounded with <code>SOMETHING</code>, get the "SOMETHING" (in previous example, that would be 'some funny lines'), execute that (this is something internal) , and from that I get the real string (imagine in this case the answer is 'SOME FUNNY LINES'). Finally, I need to replace the orignal string... So ... given the input:

'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'

And given my specific domain logic transformation (in this example I assume a simple #asUppercase), I would like to get:

'this is a string with SOME FUNNY LINES and here is another HAHA'

I got it working with below lines. But it is a hack and terrible slow (I imagine).
So...anyone has an idea how can I do this simpler/faster? Maybe some RB re-write rule?

Thanks in advance

| dom string originalString stringToBeAbleToParse xmlDocument replacements finalString |
replacements := Dictionary new.
originalString := 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'.
stringToBeAbleToParse := '<hack>', originalString, '</hack>'.
dom := XMLDOMParser on: stringToBeAbleToParse.
dom configuration isValidating: false.
xmlDocument := dom parseDocument.
(xmlDocument allElementsNamed: 'code') do: [ :aXMLElement |

"Let's simulate my domain transformation logic as a simple #asUppercase"
replacements at: aXMLElement asString put: (([:code | code asUppercase ]) value: aXMLElement nodes first asString).
].
finalString := originalString.
replacements keysAndValuesDo: [ :originalText :new |
finalString := finalString copyReplaceAll: originalText with: new.
].
finalString

--
Mariano
http://marianopeck.wordpress.com

Tudor Girba-2

Re: Easier way to parse this?

And here it is with PetitParser using the islands support. I think this is the nicest to read:

codeParser := (
'<code>' asParser,
'</code>' asParser negate star flatten ,
'</code>' asParser) ==> [ :t | t second asUppercase ].
parser := codeParser island star ==> [: t | '' join: t flatten ].

originalString := 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'.
parser parse: originalString

“--> this is a string with SOME FUNNY LINES and here is another HAHA"

Cheers,
Doru

> On May 12, 2016, at 10:35 PM, Denis Kudriashov <[hidden email]> wrote:
>
> Hi.
>
> I always solve such kind of problems with streams. It is super easy and much easy then regex (I hate regex). For your case it would be something like:
>
> in := source readStream.
> result := String streamContents: [:out |
> [in atEnd] whileFalse: [
> out nextPutAll: (in upToAll: '<code>').
> code := in upToAll: '</code>'.
> out nextPutAll: code asUppercase; nextPutAll: '</code>'].
> ]
>
> And it could be much nicer with Xtreams but I not remember it API (maybe tomorrow I will remember).
>
> 2016-05-12 21:50 GMT+02:00 Mariano Martinez Peck <[hidden email]>:
> Hi guys,
>
> I wonder which is the easier way to do the following. I have a string which inside could have something like 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'. I need to parse that string, get all places where I have things surrounded with <code>SOMETHING</code>, get the "SOMETHING" (in previous example, that would be 'some funny lines'), execute that (this is something internal) , and from that I get the real string (imagine in this case the answer is 'SOME FUNNY LINES'). Finally, I need to replace the orignal string... So ... given the input:
>
> 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'
>
> And given my specific domain logic transformation (in this example I assume a simple #asUppercase), I would like to get:
>
> 'this is a string with SOME FUNNY LINES and here is another HAHA'
>
> I got it working with below lines. But it is a hack and terrible slow (I imagine).
> So...anyone has an idea how can I do this simpler/faster? Maybe some RB re-write rule?
>
> Thanks in advance
>
>
>
> | dom string originalString stringToBeAbleToParse xmlDocument replacements finalString |
> replacements := Dictionary new.
> originalString := 'this is a string with <code>some funny lines</code> and here is another <code>haha</code>'.
> stringToBeAbleToParse := '<hack>', originalString, '</hack>'.
> dom := XMLDOMParser on: stringToBeAbleToParse.
> dom configuration isValidating: false.
> xmlDocument := dom parseDocument.
> (xmlDocument allElementsNamed: 'code') do: [ :aXMLElement |
>
> "Let's simulate my domain transformation logic as a simple #asUppercase"
> replacements at: aXMLElement asString put: (([:code | code asUppercase ]) value: aXMLElement nodes first asString).
> ].
> finalString := originalString.
> replacements keysAndValuesDo: [ :originalText :new |
> finalString := finalString copyReplaceAll: originalText with: new.
> ].
> finalString
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>

--
www.tudorgirba.com
www.feenk.com

"From an abstract enough point of view, any two things are similar."