Bug in Regex?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in Regex?

Mariano Martinez Peck
Hi guys, 

Look at this case:

`'25' matchesRegex: '([0-9]|[1-9][0-9])' -> false`
`'25' matchesRegex: '([1-9][0-9]|[0-9])' -> true`

That is, ( a | b ) is not euqal to ( b | a ) " a or b is not equal to b or a" and should describe a range of 0 to 99.

I don't understand why the first sentence answers false. Even Pharo 1.1 answers false (so this is very very old behavior). I checked on other dialects and it answers true. I then checked here [1] and it also find matches.

So...do you see some rational explanation or sounds like a bug? 


--
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Bug in Regex?

Max Leske

Hi Mariano,

Yes, there is indeed such a bug (if it hasn't been fixed in an update to VBRegex that is). The bug is simple to work around though, as all you have to do is sort your branch (|) terms by length. Here's the comment I've written for a regex generating method is use:

" The sorting is very important because VB-Regex aborts early on branches. Example: 'bl' matchesRegex: 'b|bl' --> false The solution is to sort by length (longest first), then alphabetically, with the short group optimization at the end. "


Cheers,
Max


On 21 Feb 2019, at 3:33, Mariano Martinez Peck wrote:



On Wed, Feb 20, 2019 at 5:56 PM Esteban Maringolo via Glass <[hidden email]> wrote:
What a good case to have GToolkit visualizations help debugging this RX tree ;)


Oh yeah..And look Doru, they have some stuff on the top right "Explanation" panel: https://regex101.com/r/MqVXz8/1


--
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Bug in Regex?

Mariano Martinez Peck
Vassili Bykov Hi Max,

It looks like it was fixed on a later version of VBRegex. I was able to extract the 2 little changes to make it work. Do you know if there is an issue already open for this so that I can submit the changes?

BTW, I am trying to understand the history of Regex. It's clear that the original implementation was done by Vassili Bykov and the packages were called VBRegex and that was in VW under the Public Store. I guess, at some point, that code was ported to Pharo. In Pharo 1.1 the packages are still called 'VB-Regex' but on recent ones its called "Regex".

What I cannot confirm is which version Pharo (or maybe this was even en Squeak before Pharo forked) ported from VW.  All I could find is a class comment on RxMatcher saying "-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov". As  you can see, that is 1999. VW has newer versions from 1.2.x to 1.4.x, latest commit being on 2014. 

The particular problem we are discussing in this thread was fixed in  1.3.1. 

So I am not sure if someone ever tried porting again a newer version to Pharo? It doesn't seem the case. 

Anyone knows a bit more the history here? 

Thanks 


On Thu, Feb 21, 2019 at 5:18 AM Max Leske <[hidden email]> wrote:

Hi Mariano,

Yes, there is indeed such a bug (if it hasn't been fixed in an update to VBRegex that is). The bug is simple to work around though, as all you have to do is sort your branch (|) terms by length. Here's the comment I've written for a regex generating method is use:

" The sorting is very important because VB-Regex aborts early on branches. Example: 'bl' matchesRegex: 'b|bl' --> false The solution is to sort by length (longest first), then alphabetically, with the short group optimization at the end. "


Cheers,
Max


On 21 Feb 2019, at 3:33, Mariano Martinez Peck wrote:



On Wed, Feb 20, 2019 at 5:56 PM Esteban Maringolo via Glass <[hidden email]> wrote:
What a good case to have GToolkit visualizations help debugging this RX tree ;)


Oh yeah..And look Doru, they have some stuff on the top right "Explanation" panel: https://regex101.com/r/MqVXz8/1


--


--
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Bug in Regex?

Max Leske
On 21 Feb 2019, at 13:40, Mariano Martinez Peck wrote:

> Vassili Bykov Hi Max,
>
> It looks like it was fixed on a later version of VBRegex. I was able
> to
> extract the 2 little changes to make it work. Do you know if there is
> an
> issue already open for this so that I can submit the changes?
>
> BTW, I am trying to understand the history of Regex. It's clear that
> the
> original implementation was done by Vassili Bykov and the packages
> were
> called VBRegex and that was in VW under the Public Store. I guess, at
> some
> point, that code was ported to Pharo. In Pharo 1.1 the packages are
> still
> called 'VB-Regex' but on recent ones its called "Regex".
>

The package name changed between Pharo 1.1.2 and Pharo 1.2.2, as far as
I can tell from the images I have on my machine.

> What I cannot confirm is which version Pharo (or maybe this was even
> en
> Squeak before Pharo forked) ported from VW.  All I could find is a
> class
> comment on RxMatcher saying "-- Regular Expression Matcher v 1.1 (C)
> 1996,
> 1999 Vassili Bykov". As  you can see, that is 1999. VW has newer
> versions
> from 1.2.x to 1.4.x, latest commit being on 2014.

The oldest images I have is Squeak3.9 of 7 November 2006 update 7067,
and that already contains VBRegex.

>
> The particular problem we are discussing in this thread was fixed in
> 1.3.1.
>
> So I am not sure if someone ever tried porting again a newer version
> to
> Pharo? It doesn't seem the case.

It would certainly be great to get those fixes!

>
> Anyone knows a bit more the history here?
>
> Thanks
>
>
> On Thu, Feb 21, 2019 at 5:18 AM Max Leske <[hidden email]> wrote:
>
>> Hi Mariano,
>>
>> Yes, there is indeed such a bug (if it hasn't been fixed in an update
>> to
>> VBRegex that is). The bug is simple to work around though, as all you
>> have
>> to do is sort your branch (|) terms by length. Here's the comment
>> I've
>> written for a regex generating method is use:
>>
>> " The sorting is very important because VB-Regex aborts early on
>> branches.
>> Example: 'bl' matchesRegex: 'b|bl' --> false The solution is to sort
>> by
>> length (longest first), then alphabetically, with the short group
>> optimization at the end. "
>>
>> Cheers,
>> Max
>>
>> On 21 Feb 2019, at 3:33, Mariano Martinez Peck wrote:
>>
>>
>>
>> On Wed, Feb 20, 2019 at 5:56 PM Esteban Maringolo via Glass <
>> [hidden email]> wrote:
>>
>>> What a good case to have GToolkit visualizations help debugging this
>>> RX
>>> tree ;)
>>>
>>>
>> Oh yeah..And look Doru, they have some stuff on the top right
>> "Explanation" panel: https://regex101.com/r/MqVXz8/1
>>
>>
>> --
>> Mariano
>> https://twitter.com/MartinezPeck
>> http://marianopeck.wordpress.com
>>
>>
>
> --
> Mariano
> https://twitter.com/MartinezPeck
> http://marianopeck.wordpress.com



Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Bug in Regex?

Mariano Martinez Peck
Hi Max,

I opened this case [1] with the explanation and the fix. If you can give it a try, that would be great. I would also like to improve the test I wrote, so if you have more edge cases or ideas I can assert, let me know. 


Cheers, 


On Thu, Feb 21, 2019 at 10:44 AM Max Leske <[hidden email]> wrote:
On 21 Feb 2019, at 13:40, Mariano Martinez Peck wrote:

> Vassili Bykov Hi Max,
>
> It looks like it was fixed on a later version of VBRegex. I was able
> to
> extract the 2 little changes to make it work. Do you know if there is
> an
> issue already open for this so that I can submit the changes?
>
> BTW, I am trying to understand the history of Regex. It's clear that
> the
> original implementation was done by Vassili Bykov and the packages
> were
> called VBRegex and that was in VW under the Public Store. I guess, at
> some
> point, that code was ported to Pharo. In Pharo 1.1 the packages are
> still
> called 'VB-Regex' but on recent ones its called "Regex".
>

The package name changed between Pharo 1.1.2 and Pharo 1.2.2, as far as
I can tell from the images I have on my machine.

> What I cannot confirm is which version Pharo (or maybe this was even
> en
> Squeak before Pharo forked) ported from VW.  All I could find is a
> class
> comment on RxMatcher saying "-- Regular Expression Matcher v 1.1 (C)
> 1996,
> 1999 Vassili Bykov". As  you can see, that is 1999. VW has newer
> versions
> from 1.2.x to 1.4.x, latest commit being on 2014.

The oldest images I have is Squeak3.9 of 7 November 2006 update 7067,
and that already contains VBRegex.

>
> The particular problem we are discussing in this thread was fixed in
> 1.3.1.
>
> So I am not sure if someone ever tried porting again a newer version
> to
> Pharo? It doesn't seem the case.

It would certainly be great to get those fixes!

>
> Anyone knows a bit more the history here?
>
> Thanks
>
>
> On Thu, Feb 21, 2019 at 5:18 AM Max Leske <[hidden email]> wrote:
>
>> Hi Mariano,
>>
>> Yes, there is indeed such a bug (if it hasn't been fixed in an update
>> to
>> VBRegex that is). The bug is simple to work around though, as all you
>> have
>> to do is sort your branch (|) terms by length. Here's the comment
>> I've
>> written for a regex generating method is use:
>>
>> " The sorting is very important because VB-Regex aborts early on
>> branches.
>> Example: 'bl' matchesRegex: 'b|bl' --> false The solution is to sort
>> by
>> length (longest first), then alphabetically, with the short group
>> optimization at the end. "
>>
>> Cheers,
>> Max
>>
>> On 21 Feb 2019, at 3:33, Mariano Martinez Peck wrote:
>>
>>
>>
>> On Wed, Feb 20, 2019 at 5:56 PM Esteban Maringolo via Glass <
>> [hidden email]> wrote:
>>
>>> What a good case to have GToolkit visualizations help debugging this
>>> RX
>>> tree ;)
>>>
>>>
>> Oh yeah..And look Doru, they have some stuff on the top right
>> "Explanation" panel: https://regex101.com/r/MqVXz8/1
>>
>>
>> --
>> Mariano
>> https://twitter.com/MartinezPeck
>> http://marianopeck.wordpress.com
>>
>>
>
> --
> Mariano
> https://twitter.com/MartinezPeck
> http://marianopeck.wordpress.com





--
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Bug in Regex?

Tim Mackinnon
Nice bit of detective work !

Sent from my iPhone

On 21 Feb 2019, at 15:25, Mariano Martinez Peck <[hidden email]> wrote:

Hi Max,

I opened this case [1] with the explanation and the fix. If you can give it a try, that would be great. I would also like to improve the test I wrote, so if you have more edge cases or ideas I can assert, let me know. 


Cheers, 


On Thu, Feb 21, 2019 at 10:44 AM Max Leske <[hidden email]> wrote:
On 21 Feb 2019, at 13:40, Mariano Martinez Peck wrote:

> Vassili Bykov Hi Max,
>
> It looks like it was fixed on a later version of VBRegex. I was able
> to
> extract the 2 little changes to make it work. Do you know if there is
> an
> issue already open for this so that I can submit the changes?
>
> BTW, I am trying to understand the history of Regex. It's clear that
> the
> original implementation was done by Vassili Bykov and the packages
> were
> called VBRegex and that was in VW under the Public Store. I guess, at
> some
> point, that code was ported to Pharo. In Pharo 1.1 the packages are
> still
> called 'VB-Regex' but on recent ones its called "Regex".
>

The package name changed between Pharo 1.1.2 and Pharo 1.2.2, as far as
I can tell from the images I have on my machine.

> What I cannot confirm is which version Pharo (or maybe this was even
> en
> Squeak before Pharo forked) ported from VW.  All I could find is a
> class
> comment on RxMatcher saying "-- Regular Expression Matcher v 1.1 (C)
> 1996,
> 1999 Vassili Bykov". As  you can see, that is 1999. VW has newer
> versions
> from 1.2.x to 1.4.x, latest commit being on 2014.

The oldest images I have is Squeak3.9 of 7 November 2006 update 7067,
and that already contains VBRegex.

>
> The particular problem we are discussing in this thread was fixed in
> 1.3.1.
>
> So I am not sure if someone ever tried porting again a newer version
> to
> Pharo? It doesn't seem the case.

It would certainly be great to get those fixes!

>
> Anyone knows a bit more the history here?
>
> Thanks
>
>
> On Thu, Feb 21, 2019 at 5:18 AM Max Leske <[hidden email]> wrote:
>
>> Hi Mariano,
>>
>> Yes, there is indeed such a bug (if it hasn't been fixed in an update
>> to
>> VBRegex that is). The bug is simple to work around though, as all you
>> have
>> to do is sort your branch (|) terms by length. Here's the comment
>> I've
>> written for a regex generating method is use:
>>
>> " The sorting is very important because VB-Regex aborts early on
>> branches.
>> Example: 'bl' matchesRegex: 'b|bl' --> false The solution is to sort
>> by
>> length (longest first), then alphabetically, with the short group
>> optimization at the end. "
>>
>> Cheers,
>> Max
>>
>> On 21 Feb 2019, at 3:33, Mariano Martinez Peck wrote:
>>
>>
>>
>> On Wed, Feb 20, 2019 at 5:56 PM Esteban Maringolo via Glass <
>> [hidden email]> wrote:
>>
>>> What a good case to have GToolkit visualizations help debugging this
>>> RX
>>> tree ;)
>>>
>>>
>> Oh yeah..And look Doru, they have some stuff on the top right
>> "Explanation" panel: https://regex101.com/r/MqVXz8/1
>>
>>
>> --
>> Mariano
>> https://twitter.com/MartinezPeck
>> http://marianopeck.wordpress.com
>>
>>
>
> --
> Mariano
> https://twitter.com/MartinezPeck
> http://marianopeck.wordpress.com





--
Reply | Threaded
Open this post in threaded view
|

Re: [Glass] Bug in Regex?

Max Leske
In reply to this post by Mariano Martinez Peck
On 21 Feb 2019, at 16:25, Mariano Martinez Peck wrote:

> Hi Max,
>
> I opened this case [1] with the explanation and the fix. If you can give it
> a try, that would be great. I would also like to improve the test I wrote,
> so if you have more edge cases or ideas I can assert, let me know.

Great. I'll take a look when I get a chance.

>
> [1] https://github.com/pharo-project/pharo/issues/2679
>
> Cheers,
>
>
> On Thu, Feb 21, 2019 at 10:44 AM Max Leske <[hidden email]> wrote:
>
>> On 21 Feb 2019, at 13:40, Mariano Martinez Peck wrote:
>>
>>> Vassili Bykov Hi Max,
>>>
>>> It looks like it was fixed on a later version of VBRegex. I was able
>>> to
>>> extract the 2 little changes to make it work. Do you know if there is
>>> an
>>> issue already open for this so that I can submit the changes?
>>>
>>> BTW, I am trying to understand the history of Regex. It's clear that
>>> the
>>> original implementation was done by Vassili Bykov and the packages
>>> were
>>> called VBRegex and that was in VW under the Public Store. I guess, at
>>> some
>>> point, that code was ported to Pharo. In Pharo 1.1 the packages are
>>> still
>>> called 'VB-Regex' but on recent ones its called "Regex".
>>>
>>
>> The package name changed between Pharo 1.1.2 and Pharo 1.2.2, as far as
>> I can tell from the images I have on my machine.
>>
>>> What I cannot confirm is which version Pharo (or maybe this was even
>>> en
>>> Squeak before Pharo forked) ported from VW.  All I could find is a
>>> class
>>> comment on RxMatcher saying "-- Regular Expression Matcher v 1.1 (C)
>>> 1996,
>>> 1999 Vassili Bykov". As  you can see, that is 1999. VW has newer
>>> versions
>>> from 1.2.x to 1.4.x, latest commit being on 2014.
>>
>> The oldest images I have is Squeak3.9 of 7 November 2006 update 7067,
>> and that already contains VBRegex.
>>
>>>
>>> The particular problem we are discussing in this thread was fixed in
>>> 1.3.1.
>>>
>>> So I am not sure if someone ever tried porting again a newer version
>>> to
>>> Pharo? It doesn't seem the case.
>>
>> It would certainly be great to get those fixes!
>>
>>>
>>> Anyone knows a bit more the history here?
>>>
>>> Thanks
>>>
>>>
>>> On Thu, Feb 21, 2019 at 5:18 AM Max Leske <[hidden email]> wrote:
>>>
>>>> Hi Mariano,
>>>>
>>>> Yes, there is indeed such a bug (if it hasn't been fixed in an update
>>>> to
>>>> VBRegex that is). The bug is simple to work around though, as all you
>>>> have
>>>> to do is sort your branch (|) terms by length. Here's the comment
>>>> I've
>>>> written for a regex generating method is use:
>>>>
>>>> " The sorting is very important because VB-Regex aborts early on
>>>> branches.
>>>> Example: 'bl' matchesRegex: 'b|bl' --> false The solution is to sort
>>>> by
>>>> length (longest first), then alphabetically, with the short group
>>>> optimization at the end. "
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On 21 Feb 2019, at 3:33, Mariano Martinez Peck wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Feb 20, 2019 at 5:56 PM Esteban Maringolo via Glass <
>>>> [hidden email]> wrote:
>>>>
>>>>> What a good case to have GToolkit visualizations help debugging this
>>>>> RX
>>>>> tree ;)
>>>>>
>>>>>
>>>> Oh yeah..And look Doru, they have some stuff on the top right
>>>> "Explanation" panel: https://regex101.com/r/MqVXz8/1
>>>>
>>>>
>>>> --
>>>> Mariano
>>>> https://twitter.com/MartinezPeck
>>>> http://marianopeck.wordpress.com
>>>>
>>>>
>>>
>>> --
>>> Mariano
>>> https://twitter.com/MartinezPeck
>>> http://marianopeck.wordpress.com
>>
>>
>>
>>
>
> --
> Mariano
> https://twitter.com/MartinezPeck
> http://marianopeck.wordpress.com