The Trunk: Regex-Core-ul.44.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Regex-Core-ul.44.mcz

commits-2
Levente Uzonyi uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ul.44.mcz

==================== Summary ====================

Name: Regex-Core-ul.44
Author: ul
Time: 28 August 2015, 2:21:49.191 pm
UUID: eb190b36-cf56-4381-97cb-58ef044c6416
Ancestors: Regex-Core-ul.43

RxMatcher:
- updated class comment
- renamed two class variables
- added some speedups for those regular expressions which don't have any subexpressions
- #resetMarkerPositions is now clearly something extracted from #tryMatch
- use the fact that the stream is a stream of characters, so when #next returns nil, then we have found the end of the stream.

=============== Diff against Regex-Core-ul.43 ===============

Item was changed:
  Object subclass: #RxMatcher
+ instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions previousMarkerPositions markerCount lastResult firstTryMatch'
- instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions markerCount lastResult oldMarkerPositions firstTry'
  classVariableNames: 'Cr Lf'
  poolDictionaries: ''
  category: 'Regex-Core'!
 
+ !RxMatcher commentStamp: 'ul 8/28/2015 14:18' prior: 0!
- !RxMatcher commentStamp: 'Tbn 11/12/2010 23:13' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  This is a recursive regex matcher. Not strikingly efficient, but simple. Also, keeps track of matched subexpressions.  The life cycle goes as follows:
 
  1. Initialization. Accepts a syntax tree (presumably produced by RxParser) and compiles it into a matcher built of other classes in this category.
 
  2. Matching. Accepts a stream or a string and returns a boolean indicating whether the whole stream or its prefix -- depending on the message sent -- matches the regex.
 
  3. Subexpression query. After a successful match, and before any other match, the matcher may be queried about the range of specific stream (string) positions that matched to certain parenthesized subexpressions of the original expression.
 
  Any number of queries may follow a successful match, and any number or matches may follow a successful initialization.
 
  Note that `matcher' is actually a sort of a misnomer. The actual matcher is a web of Rxm* instances built by RxMatcher during initialization. RxMatcher is just the interface facade of this network.  It is also a builder of it, and also provides a stream-like protocol to easily access the stream being matched.
 
  Instance variables:
+ matcher <RxmLink> The entry point into the actual matcher.
+ igoreCase <Boolean> Whether the matching algorithm should be case sensitive or not.
+ startOptimizer <RxMatchOptimizer> An object which can quickly decide whether the next character can be the prefix of a match or not.
+ stream <Stream> The stream currently being matched against.
+ markerPositions <Array of: nil | Integer | OrderedCollection> Positions of markers' matches.
+ previousMarkerPositions <Array of: nil |  Integer | OrderedCollection> Positions of markers from the previous #tryMatch send.
+ markerCount <Integer> Number of markers.
+ lastResult <Boolean> Whether the latest match attempt succeeded or not.
+ firtTryMatch <Boolean> True if there hasn't been any send of #tryMatch during the current matching.!
- matcher <RxmLink> The entry point into the actual matcher.
- stream <Stream> The stream currently being matched against.
- markerPositions <Array of: Integer> Positions of markers' matches.
- markerCount <Integer> Number of markers.
- lastResult <Boolean> Whether the latest match attempt succeeded or not.
- lastChar <Character | nil> character last seen in the matcher stream!

Item was changed:
  ----- Method: RxMatcher>>copyStream:to:replacingMatchesWith: (in category 'match enumeration') -----
  copyStream: aStream to: writeStream replacingMatchesWith: aString
  "Copy the contents of <aStream> on the <writeStream>, except for the matches. Replace each match with <aString>."
 
  | searchStart matchStart matchEnd |
  stream := aStream.
+ firstTryMatch := true.
- self resetMarkerPositions.
  [searchStart := aStream position.
  self proceedSearchingStream: aStream] whileTrue:
  [matchStart := (self subBeginning: 1) last.
  matchEnd := (self subEnd: 1) last.
  aStream position: searchStart.
  searchStart to: matchStart - 1 do:
  [:ignoredPos | writeStream nextPut: aStream next].
  writeStream nextPutAll: aString.
  aStream position: matchEnd.
  "Be extra careful about successful matches which consume no input.
  After those, make sure to advance or finish if already at end."
  matchEnd = searchStart ifTrue:
  [aStream atEnd
  ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
  ifFalse: [writeStream nextPut: aStream next]]].
  aStream position: searchStart.
  [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]!

Item was changed:
  ----- Method: RxMatcher>>copyStream:to:translatingMatchesUsing: (in category 'match enumeration') -----
  copyStream: aStream to: writeStream translatingMatchesUsing: aBlock
  "Copy the contents of <aStream> on the <writeStream>, except for the matches. For each match, evaluate <aBlock> passing the matched substring as the argument.  Expect the block to answer a String, and write the answer to <writeStream> in place of the match."
 
  | searchStart matchStart matchEnd match |
  stream := aStream.
+ firstTryMatch := true.
- self resetMarkerPositions.
  [searchStart := aStream position.
  self proceedSearchingStream: aStream] whileTrue:
  [matchStart := (self subBeginning: 1) last.
  matchEnd := (self subEnd: 1) last.
  aStream position: searchStart.
  searchStart to: matchStart - 1 do:
  [:ignoredPos | writeStream nextPut: aStream next].
  match := (String new: matchEnd - matchStart + 1) writeStream.
  matchStart to: matchEnd - 1 do:
  [:ignoredPos | match nextPut: aStream next].
  writeStream nextPutAll: (aBlock value: match contents).
  "Be extra careful about successful matches which consume no input.
  After those, make sure to advance or finish if already at end."
  matchEnd = searchStart ifTrue:
  [aStream atEnd
  ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
  ifFalse: [writeStream nextPut: aStream next]]].
  aStream position: searchStart.
  [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]!

Item was changed:
  ----- Method: RxMatcher>>initialize:ignoreCase: (in category 'initialize-release') -----
  initialize: syntaxTreeRoot ignoreCase: aBoolean
  "Compile thyself for the regex with the specified syntax tree.
  See comment and `building' protocol in this class and
  #dispatchTo: methods in syntax tree components for details
  on double-dispatch building.
  The argument is supposedly a RxsRegex."
 
  ignoreCase := aBoolean.
  self buildFrom: syntaxTreeRoot.
+ self initializeMarkerPositions.
  startOptimizer := RxMatchOptimizer new initialize: syntaxTreeRoot ignoreCase: aBoolean!

Item was added:
+ ----- Method: RxMatcher>>initializeMarkerPositions (in category 'initialize-release') -----
+ initializeMarkerPositions
+
+ markerPositions := Array new: markerCount.
+ previousMarkerPositions := Array new: markerCount..
+ 3 to: markerCount do: [ :index |
+ markerPositions at: index put: (OrderedCollection new: 1).
+ previousMarkerPositions at: index put: (OrderedCollection new: 1) ].!

Item was changed:
  ----- Method: RxMatcher>>isWordChar: (in category 'private') -----
  isWordChar: aCharacterOrNil
  "Answer whether the argument is a word constituent character:
  alphanumeric or _."
 
+ aCharacterOrNil ifNil: [ ^false ].
+ ^aCharacterOrNil isAlphaNumeric!
- ^aCharacterOrNil ~~ nil
- and: [aCharacterOrNil isAlphaNumeric]!

Item was changed:
  ----- Method: RxMatcher>>markerPositionAt:add: (in category 'privileged') -----
+ markerPositionAt: index add: position
- markerPositionAt: anIndex add: position
  "Remember position of another instance of the given marker."
 
+ index <= 2 ifTrue: [
+ markerPositions at: index put: position.
+ ^self ].
+ (markerPositions at: index) addLast: position!
- (markerPositions at: anIndex) addLast: position!

Item was changed:
  ----- Method: RxMatcher>>matchesStreamPrefix: (in category 'accessing') -----
  matchesStreamPrefix: theStream
  "Match thyself against a positionable stream."
 
  stream := theStream.
+ firstTryMatch := true.
- self resetMarkerPositions.
  ^self tryMatch!

Item was changed:
  ----- Method: RxMatcher>>proceedSearchingStream: (in category 'private') -----
  proceedSearchingStream: aStream
 
  | position |
+ [
+ position := aStream position.
+ self tryMatch ifTrue: [ ^true ].
+ (aStream position: position; next) ifNil: [
+ "Try match at the very stream end too!!"
+ ^self tryMatch ] ] repeat!
- position := aStream position.
- [aStream atEnd] whileFalse:
- [self tryMatch ifTrue: [^true].
- aStream position: position; next.
- position := aStream position].
- "Try match at the very stream end too!!"
- self tryMatch ifTrue: [^true].
- ^false!

Item was changed:
  ----- Method: RxMatcher>>resetMarkerPositions (in category 'private') -----
  resetMarkerPositions
+ "Reset the marker positions. This method should only be sent from #tryMatch. When this is after the first #tryMatch send, then the marker positions must be swapped."
- "This method should be sent before the first #tryMatch send."
 
+ firstTryMatch
+ ifTrue: [ firstTryMatch := false ]
+ ifFalse: [
+ | temp |
+ temp := previousMarkerPositions.
+ previousMarkerPositions := markerPositions.
+ markerPositions := temp ].
+ markerPositions
+ at: 1 put: nil;
+ at: 2 put: nil.
+ 3 to: markerCount do: [ :index |
+ (markerPositions at: index) resetTo: 1 ]!
- firstTry := true.
- markerPositions ifNotNil: [
- markerPositions do: [ :each | each resetTo: 1 ] ]!

Item was changed:
  ----- Method: RxMatcher>>searchStream: (in category 'accessing') -----
  searchStream: aStream
  "Search the stream for occurrence of something matching myself.
  After the search has occurred, stop positioned after the end of the
  matched substring. Answer a Boolean indicating success."
 
  | position |
  stream := aStream.
  position := aStream position.
+ firstTryMatch := true.
- self resetMarkerPositions.
  [aStream atEnd] whileFalse:
  [self tryMatch ifTrue: [^true].
  aStream position: position; next.
  position := aStream position].
  "Try match at the very stream end too!!"
  ^self tryMatch!

Item was changed:
  ----- Method: RxMatcher>>subBeginning: (in category 'accessing') -----
  subBeginning: subIndex
 
+ subIndex = 1 ifTrue: [
+ (markerPositions at: 1)
+ ifNil: [ ^#()]
+ ifNotNil: [ :mp | ^{ mp } ] ].
  ^markerPositions at: subIndex * 2 - 1!

Item was changed:
  ----- Method: RxMatcher>>subEnd: (in category 'accessing') -----
  subEnd: subIndex
 
+ subIndex = 1 ifTrue: [
+ (markerPositions at: 2)
+ ifNil: [ ^#()]
+ ifNotNil: [ :mp | ^{ mp } ] ].
  ^markerPositions at: subIndex * 2!

Item was changed:
  ----- Method: RxMatcher>>tryMatch (in category 'private') -----
  tryMatch
  "Match thyself against the current stream."
 
+ | wasFirstTryMatch |
+ wasFirstTryMatch := firstTryMatch.
+ self resetMarkerPositions.
- | newMarkerPositions wasFirstTry |
- wasFirstTry := firstTry.
- firstTry := false.
- newMarkerPositions := oldMarkerPositions.
- oldMarkerPositions := markerPositions.
- markerPositions := newMarkerPositions.
- markerPositions
- ifNil: [
- markerPositions := Array new: markerCount.
- 1 to: markerCount do: [ :i |
- "There are usually 0 or 1 objects to store."
- markerPositions at: i put: (OrderedCollection new: 2) ] ]
- ifNotNil: [
- 1 to: markerCount do: [ :i |
- (markerPositions at: i) resetTo: 1 ] ].
  lastResult := startOptimizer
+ ifNil: [ matcher matchAgainst: self ]
- ifNil: [ matcher matchAgainst: self]
  ifNotNil: [ (startOptimizer canStartMatch: stream peek in: self) and: [ matcher matchAgainst: self ] ].
  "check for duplicates"
  lastResult ifFalse: [ ^false ].
+ wasFirstTryMatch ifTrue: [ ^true ].
+ (previousMarkerPositions hasEqualElements: markerPositions) ifFalse: [ ^true ].
- wasFirstTry ifTrue: [ ^true ].
- (oldMarkerPositions hasEqualElements: markerPositions) ifFalse: [ ^true ].
  "this is a duplicate match"
  ^ lastResult := false!