The Trunk: Regex-Core-ul.43.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Regex-Core-ul.43.mcz

commits-2
Levente Uzonyi uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ul.43.mcz

==================== Summary ====================

Name: Regex-Core-ul.43
Author: ul
Time: 23 August 2015, 11:52:59.334 pm
UUID: 7d13e585-1608-426d-8849-1c46a09bc5cf
Ancestors: Regex-Core-ul.42

A few more tweaks here and there:
- keep all marker positions in RxMatcher
- optimize the case of one sized character set in RxsCharSet>>enumerablePartPredicateIgnoringCase:
- avoid collection copies in RxMatchOptimizer when possible

=============== Diff against Regex-Core-ul.42 ===============

Item was added:
+ ----- Method: RxMatchOptimizer>>addNonPrefixes: (in category 'private') -----
+ addNonPrefixes: aSet
+
+ ^nonPrefixes
+ ifNil: [ nonPrefixes := aSet ]
+ ifNotNil: [ nonPrefixes addAll: aSet ]!

Item was added:
+ ----- Method: RxMatchOptimizer>>addPrefixes: (in category 'private') -----
+ addPrefixes: aSet
+
+ ^prefixes
+ ifNil: [ prefixes := aSet ]
+ ifNotNil: [ prefixes addAll: aSet ]!

Item was changed:
  ----- Method: RxMatchOptimizer>>syntaxCharSet: (in category 'double dispatch') -----
  syntaxCharSet: charSetNode
  "All these (or none of these) characters is the prefix."
 
  (charSetNode enumerableSetIgnoringCase: ignoreCase) ifNotNil: [ :enumerableSet |
  charSetNode isNegated
+ ifTrue: [ self addNonPrefixes: enumerableSet ]
+ ifFalse: [ self addPrefixes: enumerableSet ] ].
- ifTrue: [ enumerableSet do: [ :each | self addNonPrefix: each ] ]
- ifFalse: [ enumerableSet do: [ :each | self addPrefix: each ] ] ].
 
  charSetNode predicates ifNotNil: [ :charsetPredicates |
  charSetNode isNegated
  ifTrue: [
  charsetPredicates do: [ :each | self addNonPredicate: each ] ]
  ifFalse: [
  charsetPredicates do: [ :each | self addPredicate: each ] ] ]!

Item was changed:
  Object subclass: #RxMatcher
+ instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions markerCount lastResult oldMarkerPositions firstTry'
- instanceVariableNames: 'matcher ignoreCase startOptimizer stream markerPositions markerCount lastResult oldMarkerPositions'
  classVariableNames: 'Cr Lf'
  poolDictionaries: ''
  category: 'Regex-Core'!
 
  !RxMatcher commentStamp: 'Tbn 11/12/2010 23:13' prior: 0!
  -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
  --
  This is a recursive regex matcher. Not strikingly efficient, but simple. Also, keeps track of matched subexpressions.  The life cycle goes as follows:
 
  1. Initialization. Accepts a syntax tree (presumably produced by RxParser) and compiles it into a matcher built of other classes in this category.
 
  2. Matching. Accepts a stream or a string and returns a boolean indicating whether the whole stream or its prefix -- depending on the message sent -- matches the regex.
 
  3. Subexpression query. After a successful match, and before any other match, the matcher may be queried about the range of specific stream (string) positions that matched to certain parenthesized subexpressions of the original expression.
 
  Any number of queries may follow a successful match, and any number or matches may follow a successful initialization.
 
  Note that `matcher' is actually a sort of a misnomer. The actual matcher is a web of Rxm* instances built by RxMatcher during initialization. RxMatcher is just the interface facade of this network.  It is also a builder of it, and also provides a stream-like protocol to easily access the stream being matched.
 
  Instance variables:
  matcher <RxmLink> The entry point into the actual matcher.
  stream <Stream> The stream currently being matched against.
  markerPositions <Array of: Integer> Positions of markers' matches.
  markerCount <Integer> Number of markers.
  lastResult <Boolean> Whether the latest match attempt succeeded or not.
  lastChar <Character | nil> character last seen in the matcher stream!

Item was changed:
  ----- Method: RxMatcher>>copyStream:to:replacingMatchesWith: (in category 'match enumeration') -----
  copyStream: aStream to: writeStream replacingMatchesWith: aString
  "Copy the contents of <aStream> on the <writeStream>, except for the matches. Replace each match with <aString>."
 
  | searchStart matchStart matchEnd |
  stream := aStream.
+ self resetMarkerPositions.
- oldMarkerPositions := markerPositions := nil.
  [searchStart := aStream position.
  self proceedSearchingStream: aStream] whileTrue:
  [matchStart := (self subBeginning: 1) last.
  matchEnd := (self subEnd: 1) last.
  aStream position: searchStart.
  searchStart to: matchStart - 1 do:
  [:ignoredPos | writeStream nextPut: aStream next].
  writeStream nextPutAll: aString.
  aStream position: matchEnd.
  "Be extra careful about successful matches which consume no input.
  After those, make sure to advance or finish if already at end."
  matchEnd = searchStart ifTrue:
  [aStream atEnd
  ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
  ifFalse: [writeStream nextPut: aStream next]]].
  aStream position: searchStart.
  [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]!

Item was changed:
  ----- Method: RxMatcher>>copyStream:to:translatingMatchesUsing: (in category 'match enumeration') -----
  copyStream: aStream to: writeStream translatingMatchesUsing: aBlock
  "Copy the contents of <aStream> on the <writeStream>, except for the matches. For each match, evaluate <aBlock> passing the matched substring as the argument.  Expect the block to answer a String, and write the answer to <writeStream> in place of the match."
 
  | searchStart matchStart matchEnd match |
  stream := aStream.
+ self resetMarkerPositions.
- oldMarkerPositions := markerPositions := nil.
  [searchStart := aStream position.
  self proceedSearchingStream: aStream] whileTrue:
  [matchStart := (self subBeginning: 1) last.
  matchEnd := (self subEnd: 1) last.
  aStream position: searchStart.
  searchStart to: matchStart - 1 do:
  [:ignoredPos | writeStream nextPut: aStream next].
  match := (String new: matchEnd - matchStart + 1) writeStream.
  matchStart to: matchEnd - 1 do:
  [:ignoredPos | match nextPut: aStream next].
  writeStream nextPutAll: (aBlock value: match contents).
  "Be extra careful about successful matches which consume no input.
  After those, make sure to advance or finish if already at end."
  matchEnd = searchStart ifTrue:
  [aStream atEnd
  ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
  ifFalse: [writeStream nextPut: aStream next]]].
  aStream position: searchStart.
  [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]!

Item was changed:
  ----- Method: RxMatcher>>matchesStreamPrefix: (in category 'accessing') -----
  matchesStreamPrefix: theStream
  "Match thyself against a positionable stream."
 
  stream := theStream.
+ self resetMarkerPositions.
- oldMarkerPositions := markerPositions := nil.
  ^self tryMatch!

Item was added:
+ ----- Method: RxMatcher>>resetMarkerPositions (in category 'private') -----
+ resetMarkerPositions
+ "This method should be sent before the first #tryMatch send."
+
+ firstTry := true.
+ markerPositions ifNotNil: [
+ markerPositions do: [ :each | each resetTo: 1 ] ]!

Item was changed:
  ----- Method: RxMatcher>>searchStream: (in category 'accessing') -----
  searchStream: aStream
  "Search the stream for occurrence of something matching myself.
  After the search has occurred, stop positioned after the end of the
  matched substring. Answer a Boolean indicating success."
 
  | position |
  stream := aStream.
  position := aStream position.
+ self resetMarkerPositions.
- oldMarkerPositions := markerPositions := nil.
  [aStream atEnd] whileFalse:
  [self tryMatch ifTrue: [^true].
  aStream position: position; next.
  position := aStream position].
  "Try match at the very stream end too!!"
  ^self tryMatch!

Item was changed:
  ----- Method: RxMatcher>>tryMatch (in category 'private') -----
  tryMatch
  "Match thyself against the current stream."
 
+ | newMarkerPositions wasFirstTry |
+ wasFirstTry := firstTry.
+ firstTry := false.
- | newMarkerPositions |
  newMarkerPositions := oldMarkerPositions.
  oldMarkerPositions := markerPositions.
  markerPositions := newMarkerPositions.
  markerPositions
  ifNil: [
  markerPositions := Array new: markerCount.
  1 to: markerCount do: [ :i |
  "There are usually 0 or 1 objects to store."
  markerPositions at: i put: (OrderedCollection new: 2) ] ]
  ifNotNil: [
  1 to: markerCount do: [ :i |
  (markerPositions at: i) resetTo: 1 ] ].
  lastResult := startOptimizer
  ifNil: [ matcher matchAgainst: self]
  ifNotNil: [ (startOptimizer canStartMatch: stream peek in: self) and: [ matcher matchAgainst: self ] ].
  "check for duplicates"
  lastResult ifFalse: [ ^false ].
+ wasFirstTry ifTrue: [ ^true ].
- oldMarkerPositions ifNil: [ ^true ].
  (oldMarkerPositions hasEqualElements: markerPositions) ifFalse: [ ^true ].
  "this is a duplicate match"
  ^ lastResult := false!

Item was changed:
  ----- Method: RxsCharSet>>enumerablePartPredicateIgnoringCase: (in category 'privileged') -----
  enumerablePartPredicateIgnoringCase: aBoolean
 
+ | set |
+ set := (self enumerableSetIgnoringCase: aBoolean) ifNil: [ ^nil ].
+ set size = 1 ifTrue: [
+ | p |
+ p := set anyOne.
+ negated ifTrue: [ ^[ :character | character ~~ p ] ].
+ ^[ :character | character == p ] ].
+ negated ifTrue: [ ^[ :char | (set includes: char) not ] ].
+ ^[ :char | set includes: char ]!
- | enumeration |
- enumeration := (self enumerableSetIgnoringCase: aBoolean) ifNil: [ ^nil ].
- negated ifTrue: [ ^[ :char | (enumeration includes: char) not ] ].
- ^[ :char | enumeration includes: char ]!