Issue status update for
http://smalltalk.gnu.org/node/835 Post a follow up: http://smalltalk.gnu.org/project/comments/add/835 Project: GNU Smalltalk Version: <none> Component: Base classes Category: bug reports Priority: normal Assigned to: Unassigned Reported by: sergio Updated by: sergio Status: active Small code will be best description: * !/usr/local/bin/gst -f data := 'word1 "" word3'. " next regex catches whatever is between quotes; when there is something it works; when there is not - later attempt to access matches at: 2 throws " (data =~ '(\w+) "([^"]*)" (\w+)') ifMatched: [ :matches | | w2 mm | Transcript show: 'Matches size is: ', matches size asString ; nl ; flush. " w2 := matches at: 2. // will throw " " mm := matches asArray. // will throw as it internally calls at: " " next cycle will throw on 2nd iteration: " 1 to: matches size do: [ :n | Transcript show: 'Matches at: ' , n asString , ' is: [' , (matches at: n) , ']' ; nl ; flush ] ]. Output is: $ ./test1 Matches size is: 3 Matches at: 1 is: [word1] Object: Interval new "<0x2aaaac9b7870>" error: Invalid index 1: index out of range SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254) SystemExceptions.IndexOutOfRange class>>signalOn:withIndex: (SysExcept.st:660) Interval>>first (Interval.st:245) Kernel.MatchingRegexResults>>at: (Regex.st:382) optimized [] in UndefinedObject>>executeStatements (test1:21) Kernel.MatchingRegexResults>>ifNotMatched:ifMatched: (Regex.st:322) Kernel.MatchingRegexResults(RegexResults)>>ifMatched: (Regex.st:188) UndefinedObject>>executeStatements (test1:7) I believe it is regress, it worked for me in gst that was... some years ago, perhaps in Debian 6. I can not say what version it was. _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Fri, Dec 06, 2013 at 03:47:24PM -0700, sergio wrote:
Hi! data := 'word1 "" word3'. (data =~ '(\w+) "([^"]*)" (\w+)') printString is already enough to re-produce the issue. For some reason registers contains three intervals and the second one is an empty one. st> (match instVarNamed: #registers) second inspect An instance of Interval start: 8 stop: 7 step: 1 contents: [ ] Interval() I have not worked much with the regexp code. The issue already appears with GST 3.2. Could you check if you can find a working version of GST? thanks holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sat, Dec 07, 2013 at 07:10:08PM +0100, Holger Hans Peter Freyther wrote:
> I have not worked much with the regexp code. The issue already appears > with GST 3.2. Could you check if you can find a working version of GST? I installed lenny (GST 3.0) and it already has the bug and on etch (GST 2.X) the code doesn't work like this. :) I don't really know the regexp code but I will have a look right now. Could you please provide me with your expectation for matches 1, 2 and 3? holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Sat, Dec 07, 2013 at 10:32:58PM +0100, Holger Hans Peter Freyther wrote:
> I don't really know the regexp code but I will have a look right now. > Could you please provide me with your expectation for matches 1, 2 and 3? Attached is a work-around + test-case. Could you have a look and give it a try? holger _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk 0001-kernel-Fix-IndexOutOfRange-with-empty-matches-in-the.patch (5K) Download Attachment |
In reply to this post by Holger Freyther
Hi
07.12.2013, 22:10, "Holger Hans Peter Freyther" <[hidden email]>: > On Fri, Dec 06, 2013 at 03:47:24PM -0700, sergio wrote: > data := 'word1 "" word3'. > (data =~ '(\w+) "([^"]*)" (\w+)') printString > > is already enough to re-produce the issue. For some reason registers > contains three intervals and the second one is an empty one. Yes, they should contain three. In bug report I said inaccurately, sorry. This regex should catch three pieces of the string, just for example. There are no any problems with 1st and 3rd ones, and the 2nd is also, I believe, captured as expected. But we can not access any captured fragment, if it is an empty string - attempt to access leads to an error throwing... If another input string is passed for example: 'word1 "word2" word3', regex will catch and return substrings: 'word1', 'word2' (without double-quotes) and 'word3'. That is o`key, and there is no throwing of errors on access in such case. Problem only with empty captured string. As far as I understand, aString =~ operation returns subclass of RegexResults, if =~ "matches" than an instance of Kernel.MatchingRegexResults class is returned. Kernel.MatchingRegexResults >> printOn: method defined there internally does: (some source lines skipped) 1 to: self size do: [:each | aStream nextPut: ch; print: (self at: each). ch := $,]. So "self at: " is called here too. I tried to "guess a fix" for issue (half-blindly, as I am novice to Smalltalk and I did not look deeply into sources and those written-in-C parts called by regex-related methods), I had edited Kernel.MatchingRegexResults >> at: a bit. Originally (for 3.2.5) method looks like: at: anIndex [ <category: 'accessing'> | reg text | anIndex = 0 ifTrue: [^self match]. cache isNil ifTrue: [cache := Array new: registers size]. (cache at: anIndex) isNil ifTrue: [reg := registers at: anIndex. text := reg isNil ifTrue: [nil] ifFalse: [self subject copyFrom: reg first to: reg last]. cache at: anIndex put: text]. ^cache at: anIndex ] I had changed it to be: at: anIndex [ <category: 'accessing'> | reg text | anIndex = 0 ifTrue: [^self match]. cache isNil ifTrue: [cache := Array new: registers size]. (cache at: anIndex) isNil ifTrue: [reg := registers at: anIndex. text := reg isNil ifTrue: [nil] ifFalse: [ reg isEmpty " <<< changed here " ifFalse: [self subject copyFrom: reg first to: reg last] ifTrue: [ '' ] ]. cache at: anIndex put: text]. ^cache at: anIndex ] It seems, after this change an issue has gone - I can access empty sub-matches. By the way, another issue I've found - method String >> escapeRegex does not escape curly braces {}, while I suppose it should do. (Unfortunately http://smalltalk.gnu.org/node/add/project-issue server is terribly slow and most of times I got just "service unavailable" trying to submit issue during last three days or so.) Excuse me for slow response. P.S. I have received your last message now, I see you are offering similar fix, but as I understand, your code will return nil in case of empty-string-match (?). I suppose it should return empty string, isn't it? I'll try to apply your patch, if I succeed in that I'll let you know (tomorrow). Sorry for my english also. :) -- /sergio _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Holger Freyther
08.12.2013, 23:30, "Holger Hans Peter Freyther" <[hidden email]>:
> On Sat, Dec 07, 2013 at 10:32:58PM +0100, Holger Hans Peter Freyther wrote: > > Attached is a work-around + test-case. Could you have a look and give it > a try? I guess I need a version from git to try it? As for 3.2.5 only kernel/Regex.st part of patch can be applied to. Yes, it works. But I still think the empty string as return is more consistent design. Here is an example code demonstrating when captured sub-matches contain nils: 'Mary Jane Emily' =~ '(\w+)( \w+)?( \w+)?( \w+)?( \w+)?' ifMatched: [ :mm | Transcript show: 'Matches size: ' , mm size asString ; nl. 1 to: mm size do: [ :n | Transcript show: 'Matches at: ' , n asString , ' is: ' , (mm at: n) printString ; nl ] ]. Here all the groups in braces except the first one are "optional". 2nd and 3rd match to Jane and Emily, 4th and 5th stay "unused". Output: Matches size: 5 Matches at: 1 is: 'Mary' Matches at: 2 is: ' Jane' Matches at: 3 is: ' Emily' Matches at: 4 is: nil Matches at: 5 is: nil That is difference. For "optional catcher" that has NO match and stay unused nil is returned. If empty sequence is successfully matched empty sequence is returned. It's just my vision though. -- /sergio _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
It is possible to have a non nil Interval that is empty. Make
the code work on isNil or isEmpty to avoid the exception. Fixes: SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254) SystemExceptions.IndexOutOfRange class>>signalOn:withIndex: (SysExcept.st:660) Interval>>first (Interval.st:245) Kernel.MatchingRegexResults>>at: (Regex.st:382) 2013-12-08 Holger Hans Peter Freyther <[hidden email]> * kernel/Regex.st: Check for isEmpty of the Interval before trying to use it. 2013-12-08 Holger Hans Peter Freyther <[hidden email]> * kernel/RegexpTests.st: Add tests for Regexp. --- ChangeLog | 5 +++ kernel/Regex.st | 7 +++-- packages/kernel-tests/ChangeLog | 4 +++ packages/kernel-tests/Makefile.frag | 2 +- packages/kernel-tests/kernel/RegexpTests.st | 48 +++++++++++++++++++++++++++++ packages/kernel-tests/package.xml | 2 ++ 6 files changed, 65 insertions(+), 3 deletions(-) create mode 100644 packages/kernel-tests/kernel/RegexpTests.st diff --git a/ChangeLog b/ChangeLog index 68c511a..7b6f5e8 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2013-12-08 Holger Hans Peter Freyther <[hidden email]> + + * kernel/Regex.st: Check for isEmpty of the Interval before + trying to use it. + 2013-12-11 Lee Duhem <[hidden email]> * kernel/MappedColl.st: Fix variable name typos in reject: and diff --git a/kernel/Regex.st b/kernel/Regex.st index 91a4ab3..5bc1046 100644 --- a/kernel/Regex.st +++ b/kernel/Regex.st @@ -377,9 +377,12 @@ RegexResults subclass: MatchingRegexResults [ (cache at: anIndex) isNil ifTrue: [reg := registers at: anIndex. - text := reg isNil + text := reg isNil ifTrue: [nil] - ifFalse: [self subject copyFrom: reg first to: reg last]. + ifFalse: [ + reg isEmpty + ifTrue: [''] + ifFalse: [self subject copyFrom: reg first to: reg last]]. cache at: anIndex put: text]. ^cache at: anIndex ] diff --git a/packages/kernel-tests/ChangeLog b/packages/kernel-tests/ChangeLog index ce2e127..532f991 100644 --- a/packages/kernel-tests/ChangeLog +++ b/packages/kernel-tests/ChangeLog @@ -1,3 +1,7 @@ +2013-12-08 Holger Hans Peter Freyther <[hidden email]> + + * kernel/RegexpTests.st: Add tests for Regexp. + 2013-12-13 Holger Hans Peter Freyther <[hidden email]> * kernel/MappedCollectionTests.st: Add tests for MappedCollection. diff --git a/packages/kernel-tests/Makefile.frag b/packages/kernel-tests/Makefile.frag index 1d19d61..e44befc 100644 --- a/packages/kernel-tests/Makefile.frag +++ b/packages/kernel-tests/Makefile.frag @@ -1,5 +1,5 @@ Kernel-Tests_FILES = \ -packages/kernel-tests/ChangeLog packages/kernel-tests/kernel/CompiledMethodTests.st packages/kernel-tests/kernel/ContextPartTests.st packages/kernel-tests/kernel/MappedCollectionTests.st packages/kernel-tests/kernel/ObjectTests.st +packages/kernel-tests/ChangeLog packages/kernel-tests/kernel/CompiledMethodTests.st packages/kernel-tests/kernel/ContextPartTests.st packages/kernel-tests/kernel/MappedCollectionTests.st packages/kernel-tests/kernel/ObjectTests.st packages/kernel-tests/kernel/RegexpTests.st $(Kernel-Tests_FILES): $(srcdir)/packages/kernel-tests/stamp-classes: $(Kernel-Tests_FILES) touch $(srcdir)/packages/kernel-tests/stamp-classes diff --git a/packages/kernel-tests/kernel/RegexpTests.st b/packages/kernel-tests/kernel/RegexpTests.st new file mode 100644 index 0000000..a239756 --- /dev/null +++ b/packages/kernel-tests/kernel/RegexpTests.st @@ -0,0 +1,48 @@ +TestCase subclass: TestRegexp [ + + testEmptyMatch [ + | data match | + + data := '""'. + match := (data =~ '"([^"]*)"'). + + "Check that it has matched" + self assert: match class equals: Kernel.MatchingRegexResults. + self assert: match matched. + self assert: match size = 1. + self assert: (match at: 1) equals: ''. + + "Check if an exception is thrown" + self shouldnt: [match printString] raise: Exception. + ] + + testMatch [ + | data match | + + data := '"A"'. + match := (data =~ '"([^"]*)"'). + + "Check that it has matched" + self assert: match class equals: Kernel.MatchingRegexResults. + self assert: match matched. + self assert: match size = 1. + self assert: (match at: 1) equals: 'A'. + + "Check if an exception is thrown" + self shouldnt: [match printString] raise: Exception. + ] + + testOptionalMatch [ + | match | + match := 'Mary Jane Emily' =~ '(\w+)( \w+)?( \w+)?( \w+)?( \w+)?'. + + self assert: match class equals: Kernel.MatchingRegexResults. + self assert: match matched. + self assert: match size = 5. + self assert: (match at: 1) equals: 'Mary'. + self assert: (match at: 2) equals: ' Jane'. + self assert: (match at: 3) equals: ' Emily'. + self assert: (match at: 4) equals: nil. + self assert: (match at: 5) equals: nil. + ] +] diff --git a/packages/kernel-tests/package.xml b/packages/kernel-tests/package.xml index 7699d23..18681bb 100644 --- a/packages/kernel-tests/package.xml +++ b/packages/kernel-tests/package.xml @@ -6,10 +6,12 @@ <sunit>TestContextPart</sunit> <sunit>TestMappedCollection</sunit> <sunit>TestObject</sunit> + <sunit>TestRegexp</sunit> <filein>kernel/CompiledMethodTests.st</filein> <filein>kernel/ContextPartTests.st</filein> <filein>kernel/MappedCollectionTests.st</filein> <filein>kernel/ObjectTests.st</filein> + <filein>kernel/RegexpTests.st</filein> </test> <file>ChangeLog</file> -- 1.8.4.rc3 _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by sergio-2
Issue status update for
http://smalltalk.gnu.org/project/issue/835 Post a follow up: http://smalltalk.gnu.org/project/comments/add/835 Project: GNU Smalltalk Version: <none> Component: Base classes Category: bug reports Priority: normal Assigned to: Unassigned Reported by: sergio Updated by: mbratch Status: active This is still a problem. Here's my related example: st> m := 'a=' =~ '(.*?)=(.*)' Object: Interval new "<-0x4ce2bdf0>" error: Invalid index 1: index out of range SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254) SystemExceptions.IndexOutOfRange class>>signalOn:withIndex: (SysExcept.st:660) Interval>>first (Interval.st:245) Kernel.MatchingRegexResults>>at: (Regex.st:382) Kernel.MatchingRegexResults>>printOn: (Regex.st:305) Kernel.MatchingRegexResults(Object)>>printString (Object.st:534) Kernel.MatchingRegexResults(Object)>>printNl (Object.st:571) The regex `(.*?)=(.*)' should match `'a='` without an exception, and then `m at: 2` should yield `nil`. Full problem description is posted at Stackoverflow.com [1]. [1] http://stackoverflow.com/questions/31116594/gst-regular-expression-misma... _______________________________________________ help-smalltalk mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |