[bug] (a RegexResults) at: n throws error if captured at n is empty string

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[bug] (a RegexResults) at: n throws error if captured at n is empty string

sergio-2
Issue status update for
http://smalltalk.gnu.org/node/835
Post a follow up:
http://smalltalk.gnu.org/project/comments/add/835

 Project:      GNU Smalltalk
 Version:      <none>
 Component:    Base classes
 Category:     bug reports
 Priority:     normal
 Assigned to:  Unassigned
 Reported by:  sergio
 Updated by:   sergio
 Status:       active

Small code will be best description:


    * !/usr/local/bin/gst -f


 data := 'word1 "" word3'.

 " next regex catches whatever is between quotes;
   when there is something it works; when there is not - later
   attempt to access matches at: 2 throws "
 
 (data =~ '(\w+) "([^"]*)" (\w+)')
   ifMatched: [ :matches |  
     | w2 mm |
     Transcript
       show: 'Matches size is: ', matches size asString ;  
       nl ;  
       flush.
     
     " w2 := matches at: 2.    // will throw "
     " mm := matches asArray.  // will throw as it internally calls at:
"
     
     " next cycle will throw on 2nd iteration: "
     1 to: matches size do: [ :n |
       Transcript show: 'Matches at: ' , n asString , ' is: ['
           , (matches at: n) , ']' ;  nl ;  flush ] ].

Output is:

$ ./test1
Matches size is: 3
Matches at: 1 is: [word1]
Object: Interval new "<0x2aaaac9b7870>" error: Invalid index 1: index
out of range
SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254)
SystemExceptions.IndexOutOfRange class>>signalOn:withIndex:
(SysExcept.st:660)
Interval>>first (Interval.st:245)
Kernel.MatchingRegexResults>>at: (Regex.st:382)
optimized [] in UndefinedObject>>executeStatements (test1:21)
Kernel.MatchingRegexResults>>ifNotMatched:ifMatched: (Regex.st:322)
Kernel.MatchingRegexResults(RegexResults)>>ifMatched: (Regex.st:188)
UndefinedObject>>executeStatements (test1:7)

I believe it is regress, it worked for me in gst that was... some years
ago, perhaps in Debian 6. I can not say what version it was.



_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [bug] (a RegexResults) at: n throws error if captured at n is empty string

Holger Freyther
On Fri, Dec 06, 2013 at 03:47:24PM -0700, sergio wrote:

Hi!


 data := 'word1 "" word3'.
 (data =~ '(\w+) "([^"]*)" (\w+)') printString

is already enough to re-produce the issue. For some reason registers
contains three intervals and the second one is an empty one.

 st> (match instVarNamed: #registers) second inspect
 An instance of Interval
   start: 8
   stop: 7
   step: 1
   contents: [
   ]
 Interval()

I have not worked much with the regexp code. The issue already appears
with GST 3.2. Could you check if you can find a working version of GST?


thanks
        holger

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [bug] (a RegexResults) at: n throws error if captured at n is empty string

Holger Freyther
On Sat, Dec 07, 2013 at 07:10:08PM +0100, Holger Hans Peter Freyther wrote:

> I have not worked much with the regexp code. The issue already appears
> with GST 3.2. Could you check if you can find a working version of GST?

I installed lenny (GST 3.0) and it already has the bug and on etch
(GST 2.X) the code doesn't work like this. :)

I don't really know the regexp code but I will have a look right now.
Could you please provide me with your expectation for matches 1, 2 and 3?

holger


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [bug] (a RegexResults) at: n throws error if captured at n is empty string

Holger Freyther
On Sat, Dec 07, 2013 at 10:32:58PM +0100, Holger Hans Peter Freyther wrote:

> I don't really know the regexp code but I will have a look right now.
> Could you please provide me with your expectation for matches 1, 2 and 3?

Attached is a work-around + test-case. Could you have a look and give it
a try?

holger

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk

0001-kernel-Fix-IndexOutOfRange-with-empty-matches-in-the.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [bug] (a RegexResults) at: n throws error if captured at n is empty string

sergio-2
In reply to this post by Holger Freyther
Hi

07.12.2013, 22:10, "Holger Hans Peter Freyther" <[hidden email]>:
> On Fri, Dec 06, 2013 at 03:47:24PM -0700, sergio wrote:
>  data := 'word1 "" word3'.
>  (data =~ '(\w+) "([^"]*)" (\w+)') printString
>
> is already enough to re-produce the issue. For some reason registers
> contains three intervals and the second one is an empty one.

Yes, they should contain three. In bug report I said inaccurately, sorry. This regex should catch three pieces of the string, just for example.
There are no any problems with 1st and 3rd ones, and the 2nd is also, I believe, captured as expected. But we can not access any
captured fragment, if it is an empty string - attempt to access leads to an error throwing...
If another input string is passed for example: 'word1 "word2" word3', regex will catch and return substrings: 'word1', 'word2' (without double-quotes) and 'word3'. That is o`key, and there is no throwing of errors on access in such case. Problem only with empty captured string.

As far as I understand, aString =~ operation returns subclass of RegexResults, if =~ "matches" than an instance of Kernel.MatchingRegexResults class is returned.
Kernel.MatchingRegexResults >> printOn: method defined there internally does:

(some source lines skipped)
1 to: self size
            do:
                [:each |
                aStream
                    nextPut: ch;
                    print: (self at: each).
                ch := $,].

So "self at: " is called here too.

I tried to "guess a fix" for issue (half-blindly, as I am novice to Smalltalk and I did not look deeply into sources and those written-in-C parts called by regex-related methods), I had edited Kernel.MatchingRegexResults >> at: a bit.
Originally (for 3.2.5) method looks like:

    at: anIndex [
        <category: 'accessing'>
        | reg text |
        anIndex = 0 ifTrue: [^self match].
        cache isNil ifTrue: [cache := Array new: registers size].
        (cache at: anIndex) isNil
            ifTrue:
                [reg := registers at: anIndex.
                text := reg isNil
                            ifTrue: [nil]
                            ifFalse: [self subject copyFrom: reg first to: reg last].
                cache at: anIndex put: text].
        ^cache at: anIndex
    ]

I had changed it to be:

    at: anIndex [
        <category: 'accessing'>
        | reg text |
        anIndex = 0 ifTrue: [^self match].
        cache isNil ifTrue: [cache := Array new: registers size].
        (cache at: anIndex) isNil
            ifTrue:
                [reg := registers at: anIndex.
                text := reg isNil
                            ifTrue: [nil]
                            ifFalse: [
                                reg isEmpty      " <<< changed here "
                                        ifFalse: [self subject copyFrom: reg first to: reg last]
                                        ifTrue: [ '' ] ].
                cache at: anIndex put: text].
        ^cache at: anIndex
    ]

It seems, after this change an issue has gone - I can access empty sub-matches.



By the way, another issue I've found - method String >> escapeRegex does not escape curly braces {}, while I suppose it should do.
(Unfortunately http://smalltalk.gnu.org/node/add/project-issue server is terribly slow and most of times I got just "service unavailable" trying to submit issue during last three days or so.)

Excuse me for slow response.

P.S.
I have received your last message now, I see you are offering similar fix, but as I understand, your code will return nil in case of empty-string-match (?). I suppose it should return empty string, isn't it?
I'll try to apply your patch, if I succeed in that I'll let you know (tomorrow).

Sorry for my english also. :)

--
/sergio

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [bug] (a RegexResults) at: n throws error if captured at n is empty string

sergio-2
In reply to this post by Holger Freyther
08.12.2013, 23:30, "Holger Hans Peter Freyther" <[hidden email]>:
> On Sat, Dec 07, 2013 at 10:32:58PM +0100, Holger Hans Peter Freyther wrote:
>
> Attached is a work-around + test-case. Could you have a look and give it
> a try?

I guess I need a version from git to try it? As for 3.2.5 only kernel/Regex.st part of patch can be applied to. Yes, it works.


But I still think the empty string as return is more consistent design.

Here is an example code demonstrating when captured sub-matches contain nils:

  'Mary Jane Emily' =~ '(\w+)( \w+)?( \w+)?( \w+)?( \w+)?' ifMatched: [ :mm |
    Transcript show: 'Matches size: ' , mm size asString ;  nl.
    1 to: mm size do: [ :n | Transcript show: 'Matches at: ' , n asString , ' is: ' , (mm at: n) printString ;  nl ] ].

Here all the groups in braces except the first one are "optional". 2nd and 3rd match to Jane and Emily, 4th and 5th stay "unused".
Output:

Matches size: 5
Matches at: 1 is: 'Mary'
Matches at: 2 is: ' Jane'
Matches at: 3 is: ' Emily'
Matches at: 4 is: nil
Matches at: 5 is: nil

That is difference. For "optional catcher" that has NO match and stay unused nil is returned. If empty sequence is successfully matched empty sequence is returned. It's just my vision though.

--
/sergio

_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

[PATCH] kernel: Fix IndexOutOfRange with empty matches in the regexp

Holger Freyther
It is possible to have a non nil Interval that is empty. Make
the code work on isNil or isEmpty to avoid the exception.

Fixes:
SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254)
SystemExceptions.IndexOutOfRange class>>signalOn:withIndex: (SysExcept.st:660)
Interval>>first (Interval.st:245)
Kernel.MatchingRegexResults>>at: (Regex.st:382)

2013-12-08  Holger Hans Peter Freyther  <[hidden email]>

        * kernel/Regex.st: Check for isEmpty of the Interval before
        trying to use it.

2013-12-08  Holger Hans Peter Freyther <[hidden email]>

        * kernel/RegexpTests.st: Add tests for Regexp.
---
 ChangeLog                                   |  5 +++
 kernel/Regex.st                             |  7 +++--
 packages/kernel-tests/ChangeLog             |  4 +++
 packages/kernel-tests/Makefile.frag         |  2 +-
 packages/kernel-tests/kernel/RegexpTests.st | 48 +++++++++++++++++++++++++++++
 packages/kernel-tests/package.xml           |  2 ++
 6 files changed, 65 insertions(+), 3 deletions(-)
 create mode 100644 packages/kernel-tests/kernel/RegexpTests.st

diff --git a/ChangeLog b/ChangeLog
index 68c511a..7b6f5e8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2013-12-08  Holger Hans Peter Freyther  <[hidden email]>
+
+ * kernel/Regex.st: Check for isEmpty of the Interval before
+ trying to use it.
+
 2013-12-11  Lee Duhem  <[hidden email]>
 
  * kernel/MappedColl.st:  Fix variable name typos in reject: and
diff --git a/kernel/Regex.st b/kernel/Regex.st
index 91a4ab3..5bc1046 100644
--- a/kernel/Regex.st
+++ b/kernel/Regex.st
@@ -377,9 +377,12 @@ RegexResults subclass: MatchingRegexResults [
  (cache at: anIndex) isNil
     ifTrue:
  [reg := registers at: anIndex.
- text := reg isNil
+ text := reg isNil
     ifTrue: [nil]
-    ifFalse: [self subject copyFrom: reg first to: reg last].
+    ifFalse: [
+ reg isEmpty
+ ifTrue: ['']
+ ifFalse: [self subject copyFrom: reg first to: reg last]].
  cache at: anIndex put: text].
  ^cache at: anIndex
     ]
diff --git a/packages/kernel-tests/ChangeLog b/packages/kernel-tests/ChangeLog
index ce2e127..532f991 100644
--- a/packages/kernel-tests/ChangeLog
+++ b/packages/kernel-tests/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-08  Holger Hans Peter Freyther  <[hidden email]>
+
+ * kernel/RegexpTests.st: Add tests for Regexp.
+
 2013-12-13  Holger Hans Peter Freyther  <[hidden email]>
 
  * kernel/MappedCollectionTests.st: Add tests for MappedCollection.
diff --git a/packages/kernel-tests/Makefile.frag b/packages/kernel-tests/Makefile.frag
index 1d19d61..e44befc 100644
--- a/packages/kernel-tests/Makefile.frag
+++ b/packages/kernel-tests/Makefile.frag
@@ -1,5 +1,5 @@
 Kernel-Tests_FILES = \
-packages/kernel-tests/ChangeLog packages/kernel-tests/kernel/CompiledMethodTests.st packages/kernel-tests/kernel/ContextPartTests.st packages/kernel-tests/kernel/MappedCollectionTests.st packages/kernel-tests/kernel/ObjectTests.st
+packages/kernel-tests/ChangeLog packages/kernel-tests/kernel/CompiledMethodTests.st packages/kernel-tests/kernel/ContextPartTests.st packages/kernel-tests/kernel/MappedCollectionTests.st packages/kernel-tests/kernel/ObjectTests.st packages/kernel-tests/kernel/RegexpTests.st
 $(Kernel-Tests_FILES):
 $(srcdir)/packages/kernel-tests/stamp-classes: $(Kernel-Tests_FILES)
  touch $(srcdir)/packages/kernel-tests/stamp-classes
diff --git a/packages/kernel-tests/kernel/RegexpTests.st b/packages/kernel-tests/kernel/RegexpTests.st
new file mode 100644
index 0000000..a239756
--- /dev/null
+++ b/packages/kernel-tests/kernel/RegexpTests.st
@@ -0,0 +1,48 @@
+TestCase subclass: TestRegexp [
+
+    testEmptyMatch [
+        | data match |
+
+ data := '""'.
+ match := (data =~ '"([^"]*)"').
+
+        "Check that it has matched"
+        self assert: match class equals: Kernel.MatchingRegexResults.
+        self assert: match matched.
+        self assert: match size = 1.
+        self assert: (match at: 1) equals: ''.
+
+        "Check if an exception is thrown"
+        self shouldnt: [match printString] raise: Exception.
+    ]
+
+    testMatch [
+        | data match |
+
+ data := '"A"'.
+ match := (data =~ '"([^"]*)"').
+
+        "Check that it has matched"
+        self assert: match class equals: Kernel.MatchingRegexResults.
+        self assert: match matched.
+        self assert: match size = 1.
+        self assert: (match at: 1) equals: 'A'.
+
+        "Check if an exception is thrown"
+        self shouldnt: [match printString] raise: Exception.
+    ]
+
+    testOptionalMatch [
+ | match |
+        match := 'Mary Jane Emily' =~ '(\w+)( \w+)?( \w+)?( \w+)?( \w+)?'.
+
+ self assert: match class equals: Kernel.MatchingRegexResults.
+ self assert: match matched.
+ self assert: match size = 5.
+ self assert: (match at: 1) equals: 'Mary'.
+ self assert: (match at: 2) equals: ' Jane'.
+ self assert: (match at: 3) equals: ' Emily'.
+ self assert: (match at: 4) equals: nil.
+ self assert: (match at: 5) equals: nil.
+    ]
+]
diff --git a/packages/kernel-tests/package.xml b/packages/kernel-tests/package.xml
index 7699d23..18681bb 100644
--- a/packages/kernel-tests/package.xml
+++ b/packages/kernel-tests/package.xml
@@ -6,10 +6,12 @@
    <sunit>TestContextPart</sunit>
    <sunit>TestMappedCollection</sunit>
    <sunit>TestObject</sunit>
+   <sunit>TestRegexp</sunit>
    <filein>kernel/CompiledMethodTests.st</filein>
    <filein>kernel/ContextPartTests.st</filein>
    <filein>kernel/MappedCollectionTests.st</filein>
    <filein>kernel/ObjectTests.st</filein>
+   <filein>kernel/RegexpTests.st</filein>
   </test>
 
   <file>ChangeLog</file>
--
1.8.4.rc3


_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [bug] aRegexResults at: n throws error if the captured at n is empty string

mbratch
In reply to this post by sergio-2
Issue status update for
http://smalltalk.gnu.org/project/issue/835
Post a follow up:
http://smalltalk.gnu.org/project/comments/add/835

 Project:      GNU Smalltalk
 Version:      <none>
 Component:    Base classes
 Category:     bug reports
 Priority:     normal
 Assigned to:  Unassigned
 Reported by:  sergio
 Updated by:   mbratch
 Status:       active

This is still a problem. Here's my related example:

st> m := 'a=' =~ '(.*?)=(.*)'
Object: Interval new "<-0x4ce2bdf0>" error: Invalid index 1: index out
of range
SystemExceptions.IndexOutOfRange(Exception)>>signal (ExcHandling.st:254)
SystemExceptions.IndexOutOfRange class>>signalOn:withIndex:
(SysExcept.st:660)
Interval>>first (Interval.st:245)
Kernel.MatchingRegexResults>>at: (Regex.st:382)
Kernel.MatchingRegexResults>>printOn: (Regex.st:305)
Kernel.MatchingRegexResults(Object)>>printString (Object.st:534)
Kernel.MatchingRegexResults(Object)>>printNl (Object.st:571)


The regex `(.*?)=(.*)' should match `'a='` without an exception, and
then `m at: 2` should yield `nil`.

Full problem description is posted at Stackoverflow.com [1].
[1]
http://stackoverflow.com/questions/31116594/gst-regular-expression-misma...



_______________________________________________
help-smalltalk mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-smalltalk