[on] how to match partial regexes? how to match and delete string prefixes?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[on] how to match partial regexes? how to match and delete string prefixes?

onierstrasz

Hi Folks,

Is there a way ask a regex if it matches any part of a string?  
matches: wants an exact match, and matchesPrefix: clearly only matches  
a prefix.  The best I found was to abuse matchesIn: and check the size  
of the response, but this is not very clean.

Does anybody know if there is a better way?

normalize: url
        | normal re |
        ...
        re := '[^/]+\/\.\.\/' asRegex.
        [ (re matchesIn: normal) notEmpty ] whileTrue: [ normal := re copy:  
normal replacingMatchesWith: '' ].
        ^ normal

Here I am collapsing ".." navigations within a URL against parent  
directories, as long as there is such a match. But  
copy:replacingMatchesWith: considers submatches, whereas matches: does  
not.  Bummer.

---

A second question: I would like to strip a string prefix if it  
matches.  Since this is a literal string, I should not need regexes.  
But I cannot find a nice existing method to do this.  With regexes I  
must escape all special regex chars:

localize: url
        "remove prefix if url is for this website"
        self aliasesAsRegexes do: [ :re | (re matchesPrefix: url) ifTrue: [ ^  
re copy: url replacingMatchesWith: '' ] ].
        ^ url

This is supposed to delete all known aliases for a given web site from  
the start of a url to turn them into local urls.

aliasesAsRegexes
        aliasesAsRegexes ifNil: [
        aliasesAsRegexes := aliases collect: [:alias |
                alias := '\/' asRegex copy: alias replacingMatchesWith: '\/'.
                alias := '\~' asRegex copy: alias replacingMatchesWith: '\~'.
                alias := '\:' asRegex copy: alias replacingMatchesWith: '\:'.
                alias := '\.' asRegex copy: alias replacingMatchesWith: '\.'.
                alias asRegex
                ]].
        ^ aliasesAsRegexes

Bleh!  I am ashamed to have to write such crappy code.

The Method Finder did not find me a cool method to do this:

'abra' . 'abracadabra' . 'cadabra'

Thanks for any hints.

- on

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: [on] how to match partial regexes? how to match and delete string prefixes?

Zulq Alam-2
Hi Oscar,

Oscar Nierstrasz wrote:
>
> Hi Folks,
>
> Is there a way ask a regex if it matches any part of a string?  matches:
> wants an exact match, and matchesPrefix: clearly only matches a prefix.  
> The best I found was to abuse matchesIn: and check the size of the
> response, but this is not very clean.

Have a look at #search: as I think this will help you.

> ---
>
> A second question: I would like to strip a string prefix if it matches.  
> Since this is a literal string, I should not need regexes.  But I cannot
> find a nice existing method to do this.  With regexes I must escape all
> special regex chars:

I can't think of anything off the top of my head but the great thing
about smalltalk is you can easily add what you need. For example, you
could use #beginsWith: when adding a method to String.

copyReplacePrefix: prefix with: replacement
   self beginsWith: prefix
     ifTrue: [^ replacement , (self copyFrom: prefix size to: self size)]
   ^ self

Then you will be able to do:

'abracadabra' copyReplacePrefix: 'abra' with: ''

Next step would be to write a few unit tests and then attach both as to
an enhancement request at http://bugs.squeak.org.

>
> aliasesAsRegexes
>     aliasesAsRegexes ifNil: [
>     aliasesAsRegexes := aliases collect: [:alias |
>         alias := '\/' asRegex copy: alias replacingMatchesWith: '\/'.
>         alias := '\~' asRegex copy: alias replacingMatchesWith: '\~'.
>         alias := '\:' asRegex copy: alias replacingMatchesWith: '\:'.
>         alias := '\.' asRegex copy: alias replacingMatchesWith: '\.'.
>         alias asRegex
>         ]].
>     ^ aliasesAsRegexes
>

I don't think /, ~ or : have any special meaning so why do you need to
escape them? Even if you did, a better way might be to do:

alias
   copyWithRegex: '[/~:.]'
   matchesTranslatedUsing: [:ea | '\' , ea]

But if you're just replacing '.' then you can use:

alias copyReplaceAll: '.' with: '\.'

Hope this helps.

Zulq
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners