Named subexpressions in RxParser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Named subexpressions in RxParser

Joerg Beekmann, DeepCove Labs (YVR)

Is it possible to tag a sub-expression with a name and then when there is a match extract the value associated with the tagged sub-expression?

 

Joerg

-----
Joerg Beekmann
DeepCove Labs
4th floor 595 Howe Street
Vancouver, BC, V6C 2T5
voice +1.604.689.0322
fax   +1.604.689.0311
[hidden email]

 


CONFIDENTIALITY NOTICE
Unless otherwise indicated this email contains information that is private
and confidential. If you have received it in error, please notify the sender
and delete this message along with any attachments.

 

Reply | Threaded
Open this post in threaded view
|

Re: Named subexpressions in RxParser

Randy Coulman

On 3/17/06, Joerg Beekmann <[hidden email]> wrote:

Is it possible to tag a sub-expression with a name and then when there is a match extract the value associated with the tagged sub-expression?

I keep thinking I should come up with a nicer API for this, but here's what you need to do:

matcher := RxMatcher forString: 'a*([^a*])a*([^a*])a*'.
matcher matches: 'aaaabbbbaccaaa'. "print it yields true"
(1 to: matcher subexpressionCount) collect: [:index | matcher subexpression: index]. "print it"

Subexpression 1 is the entire matched string.  The nested subexpressions (anything enclosed in $( and $)) start at index 2.  So the above example has 3 sub-expressions.  The first is the entire string.  The second is 'bbb' and the third is 'cc'.

Randy
--
Randy Coulman
[hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Named subexpressions in RxParser

Joerg Beekmann, DeepCove Labs (YVR)
In reply to this post by Joerg Beekmann, DeepCove Labs (YVR)

Hi Randy

 

This is what we are doing now and it works well if the person writing the regular expression is the same person who wrote the code that interprets the matches. However in our case we would like users to supply a regular expression to parse a line of a bank statement. In this case the developer is not sure how many sub-expressions there will be or which one contains the value of interest. We get around this by insisting the user’s expression contain 5 sub-expressions and the one of interest is the third one. Uses then pad out their regular expression with empty sub-expressions as required.

 

Allowing named sub-expressions would solve this problem nicely. Boris pointed out that Python does this and he was good enough to send me a suggested syntax based on the Python regular expression syntax:

 

Indexed sub-expression access:

=======================

 

matcher := RxMatcher forString:

'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[ ]+(:isDigit::isDigit:?)[

]*,[ ]*(19|20)(:isDigit::isDigit:)'.

(matcher matches: 'Aug 6, 1996')

            ifTrue:

                        [Array

                                    with: (matcher subexpression: 5)

                                    with: (matcher subexpression: 2)

                                    with: (matcher subexpression: 3)]

            ifFalse: ['no match']

 

Named sub-expression access:

=======================

 

matcher := RxMatcher forString:

'(?P<month>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[

]+(?P<day>:isDigit::isDigit:?)[ ]*,[

]*(?P<year1>19|20)(?P<year2>:isDigit::isDigit:)'.

(matcher matches: 'Aug 6, 1996')

            ifTrue:

                        [Array

                                    with: (matcher subexpression: #year2)

                                    with: (matcher subexpression: #month)

                                    with: (matcher subexpression: #day)]

            ifFalse: ['no match']

 

 


From: Randy Coulman [mailto:[hidden email]]
Sent: Friday, March 17, 2006 10:08 AM
To: Joerg Beekmann
Subject: Re: Named subexpressions in RxParser

 

 

On 3/17/06, Joerg Beekmann <[hidden email]> wrote:

Is it possible to tag a sub-expression with a name and then when there is a match extract the value associated with the tagged sub-expression?

I keep thinking I should come up with a nicer API for this, but here's what you need to do:

matcher := RxMatcher forString: 'a*([^a*])a*([^a*])a*'.
matcher matches: 'aaaabbbbaccaaa'. "print it yields true"
(1 to: matcher subexpressionCount) collect: [:index | matcher subexpression: index]. "print it"

Subexpression 1 is the entire matched string.  The nested subexpressions (anything enclosed in $( and $)) start at index 2.  So the above example has 3 sub-expressions.  The first is the entire string.  The second is 'bbb' and the third is 'cc'.

Randy
--
Randy Coulman
[hidden email]