Levente Uzonyi uploaded a new version of Regex-Core to project The Trunk:
http://source.squeak.org/trunk/Regex-Core-ul.45.mcz ==================== Summary ==================== Name: Regex-Core-ul.45 Author: ul Time: 25 September 2015, 10:14:55.741 am UUID: 0b7b582a-5091-43e4-95f2-981f236e991c Ancestors: Regex-Core-ul.44 - Allow escaping any character in a character set. - Use RxsCharacter instead of RxsPredicate for single character escapes like \r, \n, etc. - Use nil instead of #epsilon for the extremal stream element in RxParser. - RxParser >> #next returns the value of lookahead. Use it where it makes sense. - RxsPredicate's class variables' initialization is thread-safe. - Reinitialize EscapedLetterSelectors in the postscript. =============== Diff against Regex-Core-ul.44 =============== Item was changed: ----- Method: RxCharSetParser>>parseEscapeChar (in category 'parsing') ----- parseEscapeChar self match: $\. + elements add: ((RxsPredicate forEscapedLetter: lookahead) + ifNil: [ RxsCharacter with: lookahead ]). - $- == lookahead - ifTrue: [elements add: (RxsCharacter with: $-)] - ifFalse: [elements add: (RxsPredicate forEscapedLetter: lookahead)]. self next! Item was changed: ----- Method: RxParser>>atom (in category 'recursive descent') ----- atom "An atom is one of a lot of possibilities, see below." | atom | + (lookahead == nil - (lookahead == #epsilon or: [ lookahead == $| or: [ lookahead == $) or: [ lookahead == $* or: [ lookahead == $+ or: [ lookahead == $? ]]]]]) ifTrue: [ ^RxsEpsilon new ]. lookahead == $( ifTrue: [ "<atom> ::= '(' <regex> ')' " self match: $(. atom := self regex. self match: $). ^atom ]. lookahead == $[ ifTrue: [ "<atom> ::= '[' <characterSet> ']' " self match: $[. atom := self characterSet. self match: $]. ^atom ]. lookahead == $: ifTrue: [ "<atom> ::= ':' <messagePredicate> ':' " self match: $:. atom := self messagePredicate. self match: $:. ^atom ]. lookahead == $. ifTrue: [ "any non-whitespace character" self next. ^RxsContextCondition new beAny]. lookahead == $^ ifTrue: [ "beginning of line condition" self next. ^RxsContextCondition new beBeginningOfLine]. lookahead == $$ ifTrue: [ "end of line condition" self next. ^RxsContextCondition new beEndOfLine]. lookahead == $\ ifTrue: [ "<atom> ::= '\' <character>" + self next ifNil: [ self signalParseError: 'bad quotation' ]. + (BackslashConstants includesKey: lookahead) ifTrue: [ + atom := RxsCharacter with: (BackslashConstants at: lookahead). + self next. + ^atom]. - self next. - lookahead == #epsilon - ifTrue: [ self signalParseError: 'bad quotation' ]. - (BackslashConstants includesKey: lookahead) - ifTrue: [ - atom := RxsCharacter with: (BackslashConstants at: lookahead). - self next. - ^atom]. self ifSpecial: lookahead then: [:node | self next. ^node]]. "If passed through the above, the following is a regular character." atom := RxsCharacter with: lookahead. self next. ^atom! Item was changed: ----- Method: RxParser>>branch (in category 'recursive descent') ----- branch "<branch> ::= e | <piece> <branch>" | piece branch | piece := self piece. + (lookahead == nil - (lookahead == #epsilon or: [ lookahead == $| or: [ lookahead == $) ]]) ifTrue: [ branch := nil ] ifFalse: [ branch := self branch ]. ^RxsBranch new initializePiece: piece branch: branch! Item was changed: ----- Method: RxParser>>inputUpTo:errorMessage: (in category 'private') ----- inputUpTo: aCharacter errorMessage: aString "Accumulate input stream until <aCharacter> is encountered and answer the accumulated chars as String, not including <aCharacter>. Signal error if end of stream is encountered, passing <aString> as the error description." | accumulator | accumulator := WriteStream on: (String new: 20). + [ lookahead == aCharacter or: [lookahead == nil ] ] - [ lookahead == aCharacter or: [lookahead == #epsilon] ] whileFalse: [ accumulator nextPut: lookahead. self next]. + lookahead ifNil: [ self signalParseError: aString ]. - lookahead == #epsilon - ifTrue: [ self signalParseError: aString ]. ^accumulator contents! Item was changed: ----- Method: RxParser>>inputUpTo:nestedOn:errorMessage: (in category 'private') ----- inputUpTo: aCharacter nestedOn: anotherCharacter errorMessage: aString "Accumulate input stream until <aCharacter> is encountered and answer the accumulated chars as String, not including <aCharacter>. Signal error if end of stream is encountered, passing <aString> as the error description." | accumulator nestLevel | accumulator := WriteStream on: (String new: 20). nestLevel := 0. + [ lookahead == aCharacter and: [ nestLevel = 0 ] ] whileFalse: [ + lookahead ifNil: [ self signalParseError: aString ]. + lookahead == $\ + ifTrue: [ + self next ifNil: [ self signalParseError: aString ]. + BackslashConstants + at: lookahead + ifPresent: [ :unescapedCharacter | accumulator nextPut: unescapedCharacter ] + ifAbsent: [ + accumulator + nextPut: $\; + nextPut: lookahead ] ] + ifFalse: [ + accumulator nextPut: lookahead. + lookahead == anotherCharacter ifTrue: [ nestLevel := nestLevel + 1 ]. + lookahead == aCharacter ifTrue: [ nestLevel := nestLevel - 1 ] ]. + self next ]. - [lookahead == aCharacter and: [nestLevel = 0]] whileFalse: - [#epsilon == lookahead ifTrue: [self signalParseError: aString]. - accumulator nextPut: lookahead. - lookahead == anotherCharacter ifTrue: [nestLevel := nestLevel + 1]. - lookahead == aCharacter ifTrue: [nestLevel := nestLevel - 1]. - self next]. ^accumulator contents! Item was changed: ----- Method: RxParser>>inputUpToAny:errorMessage: (in category 'private') ----- inputUpToAny: aDelimiterString errorMessage: aString "Accumulate input stream until any character from <aDelimiterString> is encountered and answer the accumulated chars as String, not including the matched characters from the <aDelimiterString>. Signal error if end of stream is encountered, passing <aString> as the error description." | accumulator | accumulator := WriteStream on: (String new: 20). + [ lookahead == nil or: [ aDelimiterString includes: lookahead ] ] - [ lookahead == #epsilon or: [ aDelimiterString includes: lookahead ] ] whileFalse: [ accumulator nextPut: lookahead. self next ]. + lookahead ifNil: [ self signalParseError: aString ]. - lookahead == #epsilon - ifTrue: [ self signalParseError: aString ]. ^accumulator contents! Item was changed: ----- Method: RxParser>>next (in category 'private') ----- next "Advance the input storing the just read character as the lookahead." + ^lookahead := input next! - lookahead := input next ifNil: [ #epsilon ]! Item was changed: ----- Method: RxParser>>parseStream: (in category 'accessing') ----- parseStream: aStream "Parse an input from a character stream <aStream>. On success, answers an RxsRegex -- parse tree root. On error, raises `RxParser syntaxErrorSignal' with the current input stream position as the parameter." | tree | input := aStream. + self next. - lookahead := nil. - self match: nil. tree := self regex. + self match: nil. - self match: #epsilon. ^tree! Item was changed: ----- Method: RxParser>>regex (in category 'recursive descent') ----- regex "<regex> ::= e | <branch> `|' <regex>" | branch regex | branch := self branch. + (lookahead == nil - (lookahead == #epsilon or: [ lookahead == $) ]) ifTrue: [ regex := nil ] ifFalse: [ self match: $|. regex := self regex ]. ^RxsRegex new initializeBranch: branch regex: regex! Item was changed: ----- Method: RxsPredicate class>>forEscapedLetter: (in category 'instance creation') ----- forEscapedLetter: aCharacter + "Return a predicate instance for the given character, or nil if there's no such predicate." + ^EscapedLetterSelectors + at: aCharacter + ifPresent: [ :selector | self new perform: selector ]! - ^self new perform: - (EscapedLetterSelectors - at: aCharacter - ifAbsent: [RxParser signalSyntaxException: 'bad backslash escape'])! Item was changed: ----- Method: RxsPredicate class>>initializeEscapedLetterSelectors (in category 'class initialization') ----- initializeEscapedLetterSelectors "self initializeEscapedLetterSelectors" + EscapedLetterSelectors := Dictionary new - | newEscapedLetterSelectors | - newEscapedLetterSelectors := Dictionary new at: $w put: #beWordConstituent; at: $W put: #beNotWordConstituent; at: $d put: #beDigit; at: $D put: #beNotDigit; at: $s put: #beSpace; at: $S put: #beNotSpace; + yourself! - at: $\ put: #beBackslash; - at: $r put: #beCarriageReturn; - at: $n put: #beLineFeed; - at: $t put: #beTab; - yourself. - EscapedLetterSelectors := newEscapedLetterSelectors! Item was changed: ----- Method: RxsPredicate class>>initializeNamedClassSelectors (in category 'class initialization') ----- initializeNamedClassSelectors "self initializeNamedClassSelectors" + NamedClassSelectors := Dictionary new - (NamedClassSelectors := Dictionary new) at: 'alnum' put: #beAlphaNumeric; at: 'alpha' put: #beAlphabetic; at: 'cntrl' put: #beControl; at: 'digit' put: #beDigit; at: 'graph' put: #beGraphics; at: 'lower' put: #beLowercase; at: 'print' put: #bePrintable; at: 'punct' put: #bePunctuation; at: 'space' put: #beSpace; at: 'upper' put: #beUppercase; + at: 'xdigit' put: #beHexDigit; + yourself! - at: 'xdigit' put: #beHexDigit! Item was removed: - ----- Method: RxsPredicate>>beBackslash (in category 'initialize-release') ----- - beBackslash - - self beCharacter: $\! Item was removed: - ----- Method: RxsPredicate>>beCarriageReturn (in category 'initialize-release') ----- - beCarriageReturn - - self beCharacter: Character cr! Item was removed: - ----- Method: RxsPredicate>>beLineFeed (in category 'initialize-release') ----- - beLineFeed - - self beCharacter: Character lf! Item was removed: - ----- Method: RxsPredicate>>beTab (in category 'initialize-release') ----- - beTab - - self beCharacter: Character tab! Item was changed: + (PackageInfo named: 'Regex-Core') postscript: 'RxsPredicate initializeEscapedLetterSelectors.'! - (PackageInfo named: 'Regex-Core') postscript: 'RxsPredicate initializeEscapedLetterSelectors'! |
Free forum by Nabble | Edit this page |