The Trunk: Regex-Core-ct.55.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Regex-Core-ct.55.mcz

commits-2
Nicolas Cellier uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ct.55.mcz

==================== Summary ====================

Name: Regex-Core-ct.55
Author: ct
Time: 6 March 2020, 7:08:55.997601 pm
UUID: 4f76095b-f67f-4c41-afec-d936b7dfeecb
Ancestors: Regex-Core-eem.54

Implements positive lookaheads in Regular Expressions for Squeak

There were already some stubs and a bit of documentation, but while negative lookaheads (such as 'q(?!u)' asRegex) have been working in the past, positive lookaheads (such as 'q(?=u)' asRegex) never worked before.

- Fix erroneous parsing of positive lookahead syntax (the previous implementation missed a side effect of #regex)
- Add #positive argument to construction messages for lookahead nodes/links (see RxsLookaround >> #dispatchTo: and others, these steps had actually been forgotten)*
- In RxMatcher >> #matchAgainstLookahead:positive:nextLink:, actually respect the #positive argument
- Fix typos in documentation and category names

*Note: I decided to remove but not deprecate the original construction messages for lookahead nodes/links. The cause is that IMO, the default value should never be a negative setting, which you would not expect at first glance. Also, all the link and node classes are rather an implementation detail of Regex-Core, so I think we do not need to move these methods into the Deprecated package. Please let me know if you agree with this.

Please review! Further information about lookaheads can be found here: https://www.regular-expressions.info/lookaround.html

=============== Diff against Regex-Core-eem.54 ===============

Item was removed:
- ----- Method: RxMatchOptimizer>>syntaxLookaround: (in category 'double dispatch') -----
- syntaxLookaround: lookaroundNode
- "Do nothing."!

Item was added:
+ ----- Method: RxMatchOptimizer>>syntaxLookaround:positive: (in category 'double dispatch') -----
+ syntaxLookaround: lookaroundNode positive: positive
+ "Do nothing."!

Item was removed:
- ----- Method: RxMatcher>>matchAgainstLookahead:nextLink: (in category 'matching') -----
- matchAgainstLookahead: lookahead nextLink: anRmxLink
-
- | position result |
- position := stream position.
- result := lookahead matchAgainst: self.
- stream position: position.
- result ifTrue: [ ^false ].
- ^anRmxLink matchAgainst: self!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstLookahead:positive:nextLink: (in category 'matching') -----
+ matchAgainstLookahead: lookahead positive: positive nextLink: anRmxLink
+
+ | position result |
+ position := stream position.
+ result := lookahead matchAgainst: self.
+ stream position: position.
+ ^ result = positive and: [
+ anRmxLink matchAgainst: self]!

Item was removed:
- ----- Method: RxMatcher>>syntaxLookaround: (in category 'double dispatch') -----
- syntaxLookaround: lookaroundNode
- "Double dispatch from the syntax tree.
- Special link can handle lookarounds (look ahead, positive and negative)."
- | piece |
- piece := lookaroundNode piece dispatchTo: self.
- ^ RxmLookahead with: piece!

Item was added:
+ ----- Method: RxMatcher>>syntaxLookaround:positive: (in category 'double dispatch') -----
+ syntaxLookaround: lookaroundNode positive: positiveBoolean
+ "Double dispatch from the syntax tree.
+ Special link can handle lookarounds (look ahead, positive and negative)."
+ | piece |
+ piece := lookaroundNode piece dispatchTo: self.
+ ^ RxmLookahead with: piece positive: positiveBoolean!

Item was changed:
  ----- Method: RxParser>>lookAround (in category 'recursive descent') -----
  lookAround
+ "Parse a lookaround expression after: (?<lookaround>)
+ <lookaround> ::= !!<regex> | =<regex>"
+ | positive |
+ ('!!=' includes: lookahead) ifFalse: [
+ ^ self signalParseError: 'Invalid lookaround expression ?', lookahead asString].
+ positive := lookahead == $=.
- "Parse a lookaround expression after: (?<lookround>)
- <lookround> ::= !!<regex> | =<regex>"
- | lookaround |
- (lookahead == $!!
- or: [ lookahead == $=])
- ifFalse: [ ^ self signalParseError: 'Invalid lookaround expression ?', lookahead asString ].
  self next.
+ ^ RxsLookaround
+ with: self regex
+ positive: positive!
- lookaround := RxsLookaround with: self regex.
- lookahead == $!!
- ifTrue: [ lookaround beNegative ].
- ^ lookaround
- !

Item was changed:
  RxmLink subclass: #RxmLookahead
+ instanceVariableNames: 'lookahead positive'
- instanceVariableNames: 'lookahead'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Regex-Core'!
 
+ !RxmLookahead commentStamp: 'ct 3/6/2020 18:29' prior: 0!
+ Instance holds onto a lookahead which matches but does not consume anything.
- !RxmLookahead commentStamp: '<historical>' prior: 0!
- Instance holds onto a lookead which matches but does not consume anything.
 
+ Instance Variables
+ lookahead: <RxmLink>
+ positive: <Boolean>
+ !
- Instance variables:
- predicate <RxmLink>!

Item was removed:
- ----- Method: RxmLookahead class>>with: (in category 'instance creation') -----
- with: aPiece
-
- ^self new lookahead: aPiece!

Item was added:
+ ----- Method: RxmLookahead class>>with:positive: (in category 'instance creation') -----
+ with: aPiece positive: aBoolean
+
+ ^self new lookahead: aPiece positive: aBoolean!

Item was removed:
- ----- Method: RxmLookahead>>lookahead: (in category 'accessing') -----
- lookahead: anRxmLink
- lookahead := anRxmLink!

Item was added:
+ ----- Method: RxmLookahead>>lookahead:positive: (in category 'accessing') -----
+ lookahead: anRxmLink positive: aBoolean
+ lookahead := anRxmLink.
+ positive := aBoolean.!

Item was changed:
  ----- Method: RxmLookahead>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
  "Match if the predicate block evaluates to true when given the
  current stream character as the argument."
 
+ ^aMatcher matchAgainstLookahead: lookahead positive: positive nextLink: next!
- ^aMatcher matchAgainstLookahead: lookahead nextLink: next!

Item was removed:
- ----- Method: RxsLookaround class>>with: (in category 'instance creation') -----
- with: anRsxPiece
- ^ self new
- initializePiece: anRsxPiece!

Item was added:
+ ----- Method: RxsLookaround class>>with:positive: (in category 'instance creation') -----
+ with: aRxsRegex positive: positiveBoolean
+ ^ self new
+ initializePiece: aRxsRegex
+ positive: positiveBoolean!

Item was changed:
+ ----- Method: RxsLookaround>>beNegative (in category 'initialize-release') -----
- ----- Method: RxsLookaround>>beNegative (in category 'initailize-release') -----
  beNegative
  positive := false!

Item was changed:
+ ----- Method: RxsLookaround>>bePositive (in category 'initialize-release') -----
- ----- Method: RxsLookaround>>bePositive (in category 'initailize-release') -----
  bePositive
  positive := true!

Item was changed:
  ----- Method: RxsLookaround>>dispatchTo: (in category 'accessing') -----
  dispatchTo: aBuilder
+ "Inform the matcher of the kind of the node, and it will do whatever it has to."
+ ^aBuilder syntaxLookaround: self positive: self positive!
- "Inform the matcher of the kind of the node, and it
- will do whatever it has to."
- ^aBuilder syntaxLookaround: self!

Item was added:
+ ----- Method: RxsLookaround>>initialize (in category 'initialize-release') -----
+ initialize
+
+ super initialize.
+ self bePositive.!

Item was removed:
- ----- Method: RxsLookaround>>initializePiece: (in category 'initailize-release') -----
- initializePiece: anRsxPiece
- super initialize.
- piece := anRsxPiece.!

Item was added:
+ ----- Method: RxsLookaround>>initializePiece:positive: (in category 'initialize-release') -----
+ initializePiece: anRsxPiece positive: positiveBoolean
+
+ piece := anRsxPiece.
+ positive := positiveBoolean.!

Item was added:
+ ----- Method: RxsLookaround>>positive (in category 'accessing') -----
+ positive
+
+ ^ positive!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Regex-Core-ct.55.mcz

Christoph Thiede

Thank you for reviewing and merging this all, Nicolas!


Actually, these versions were still WIP as noted in the inbox thread.

Didn't you notice this or did you rate it as non-critical? :)

However, I guess it's not a big problem because this is not a regression.


Will fix the open bugs ASAP (but unfortunately, it may take me some weeks to find the time ...)!


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von [hidden email] <[hidden email]>
Gesendet: Freitag, 8. Mai 2020 22:24:45
An: [hidden email]; [hidden email]
Betreff: [squeak-dev] The Trunk: Regex-Core-ct.55.mcz
 
Nicolas Cellier uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ct.55.mcz

==================== Summary ====================

Name: Regex-Core-ct.55
Author: ct
Time: 6 March 2020, 7:08:55.997601 pm
UUID: 4f76095b-f67f-4c41-afec-d936b7dfeecb
Ancestors: Regex-Core-eem.54

Implements positive lookaheads in Regular Expressions for Squeak

There were already some stubs and a bit of documentation, but while negative lookaheads (such as 'q(?!u)' asRegex) have been working in the past, positive lookaheads (such as 'q(?=u)' asRegex) never worked before.

- Fix erroneous parsing of positive lookahead syntax (the previous implementation missed a side effect of #regex)
- Add #positive argument to construction messages for lookahead nodes/links (see RxsLookaround >> #dispatchTo: and others, these steps had actually been forgotten)*
- In RxMatcher >> #matchAgainstLookahead:positive:nextLink:, actually respect the #positive argument
- Fix typos in documentation and category names

*Note: I decided to remove but not deprecate the original construction messages for lookahead nodes/links. The cause is that IMO, the default value should never be a negative setting, which you would not expect at first glance. Also, all the link and node classes are rather an implementation detail of Regex-Core, so I think we do not need to move these methods into the Deprecated package. Please let me know if you agree with this.

Please review! Further information about lookaheads can be found here: https://www.regular-expressions.info/lookaround.html

=============== Diff against Regex-Core-eem.54 ===============

Item was removed:
- ----- Method: RxMatchOptimizer>>syntaxLookaround: (in category 'double dispatch') -----
- syntaxLookaround: lookaroundNode
-        "Do nothing."!

Item was added:
+ ----- Method: RxMatchOptimizer>>syntaxLookaround:positive: (in category 'double dispatch') -----
+ syntaxLookaround: lookaroundNode positive: positive
+        "Do nothing."!

Item was removed:
- ----- Method: RxMatcher>>matchAgainstLookahead:nextLink: (in category 'matching') -----
- matchAgainstLookahead: lookahead nextLink: anRmxLink
-
-        | position result |
-        position := stream position.
-        result := lookahead matchAgainst: self.
-        stream position: position.
-        result ifTrue: [ ^false ].
-        ^anRmxLink matchAgainst: self!

Item was added:
+ ----- Method: RxMatcher>>matchAgainstLookahead:positive:nextLink: (in category 'matching') -----
+ matchAgainstLookahead: lookahead positive: positive nextLink: anRmxLink
+
+        | position result |
+        position := stream position.
+        result := lookahead matchAgainst: self.
+        stream position: position.
+        ^ result = positive and: [
+                anRmxLink matchAgainst: self]!

Item was removed:
- ----- Method: RxMatcher>>syntaxLookaround: (in category 'double dispatch') -----
- syntaxLookaround: lookaroundNode
-        "Double dispatch from the syntax tree.
-        Special link can handle lookarounds (look ahead, positive and negative)."
-        | piece |
-        piece := lookaroundNode piece dispatchTo: self.
-        ^ RxmLookahead with: piece!

Item was added:
+ ----- Method: RxMatcher>>syntaxLookaround:positive: (in category 'double dispatch') -----
+ syntaxLookaround: lookaroundNode positive: positiveBoolean
+        "Double dispatch from the syntax tree.
+        Special link can handle lookarounds (look ahead, positive and negative)."
+        | piece |
+        piece := lookaroundNode piece dispatchTo: self.
+        ^ RxmLookahead with: piece positive: positiveBoolean!

Item was changed:
  ----- Method: RxParser>>lookAround (in category 'recursive descent') -----
  lookAround
+        "Parse a lookaround expression after: (?<lookaround>)
+        <lookaround> ::= !!<regex> | =<regex>"
+        | positive |
+        ('!!=' includes: lookahead) ifFalse: [
+                ^ self signalParseError: 'Invalid lookaround expression ?', lookahead asString].
+        positive := lookahead == $=.
-        "Parse a lookaround expression after: (?<lookround>)
-        <lookround> ::= !!<regex> | =<regex>"
-        | lookaround |
-        (lookahead == $!!
-        or: [ lookahead == $=])
-                ifFalse: [ ^ self signalParseError: 'Invalid lookaround expression ?', lookahead asString ].
         self next.
+        ^ RxsLookaround
+                with: self regex
+                positive: positive!
-        lookaround := RxsLookaround with: self regex.
-        lookahead == $!!
-                ifTrue: [ lookaround beNegative ].
-        ^ lookaround
-        !

Item was changed:
  RxmLink subclass: #RxmLookahead
+        instanceVariableNames: 'lookahead positive'
-        instanceVariableNames: 'lookahead'
         classVariableNames: ''
         poolDictionaries: ''
         category: 'Regex-Core'!
 
+ !RxmLookahead commentStamp: 'ct 3/6/2020 18:29' prior: 0!
+ Instance holds onto a lookahead which matches but does not consume anything.
- !RxmLookahead commentStamp: '<historical>' prior: 0!
- Instance holds onto a lookead which matches but does not consume anything.
 
+ Instance Variables
+        lookahead:              <RxmLink>
+        positive:               <Boolean>
+ !
- Instance variables:
-        predicate               <RxmLink>!

Item was removed:
- ----- Method: RxmLookahead class>>with: (in category 'instance creation') -----
- with: aPiece
-
-        ^self new lookahead: aPiece!

Item was added:
+ ----- Method: RxmLookahead class>>with:positive: (in category 'instance creation') -----
+ with: aPiece positive: aBoolean
+
+        ^self new lookahead: aPiece positive: aBoolean!

Item was removed:
- ----- Method: RxmLookahead>>lookahead: (in category 'accessing') -----
- lookahead: anRxmLink
-        lookahead := anRxmLink!

Item was added:
+ ----- Method: RxmLookahead>>lookahead:positive: (in category 'accessing') -----
+ lookahead: anRxmLink positive: aBoolean
+        lookahead := anRxmLink.
+        positive := aBoolean.!

Item was changed:
  ----- Method: RxmLookahead>>matchAgainst: (in category 'matching') -----
  matchAgainst: aMatcher
         "Match if the predicate block evaluates to true when given the
         current stream character as the argument."
 
+        ^aMatcher matchAgainstLookahead: lookahead positive: positive nextLink: next!
-        ^aMatcher matchAgainstLookahead: lookahead nextLink: next!

Item was removed:
- ----- Method: RxsLookaround class>>with: (in category 'instance creation') -----
- with: anRsxPiece
-        ^ self new
-                initializePiece: anRsxPiece!

Item was added:
+ ----- Method: RxsLookaround class>>with:positive: (in category 'instance creation') -----
+ with: aRxsRegex positive: positiveBoolean
+        ^ self new
+                initializePiece: aRxsRegex
+                positive: positiveBoolean!

Item was changed:
+ ----- Method: RxsLookaround>>beNegative (in category 'initialize-release') -----
- ----- Method: RxsLookaround>>beNegative (in category 'initailize-release') -----
  beNegative
         positive := false!

Item was changed:
+ ----- Method: RxsLookaround>>bePositive (in category 'initialize-release') -----
- ----- Method: RxsLookaround>>bePositive (in category 'initailize-release') -----
  bePositive
         positive := true!

Item was changed:
  ----- Method: RxsLookaround>>dispatchTo: (in category 'accessing') -----
  dispatchTo: aBuilder
+        "Inform the matcher of the kind of the node, and it will do whatever it has to."
+        ^aBuilder syntaxLookaround: self positive: self positive!
-        "Inform the matcher of the kind of the node, and it
-        will do whatever it has to."
-        ^aBuilder syntaxLookaround: self!

Item was added:
+ ----- Method: RxsLookaround>>initialize (in category 'initialize-release') -----
+ initialize
+
+        super initialize.
+        self bePositive.!

Item was removed:
- ----- Method: RxsLookaround>>initializePiece: (in category 'initailize-release') -----
- initializePiece: anRsxPiece
-        super initialize.
-        piece := anRsxPiece.!

Item was added:
+ ----- Method: RxsLookaround>>initializePiece:positive: (in category 'initialize-release') -----
+ initializePiece: anRsxPiece positive: positiveBoolean
+
+        piece := anRsxPiece.
+        positive := positiveBoolean.!

Item was added:
+ ----- Method: RxsLookaround>>positive (in category 'accessing') -----
+ positive
+
+        ^ positive!




Carpe Squeak!