Support Regular Expressions in Trunk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Support Regular Expressions in Trunk

marcel.taeumel
Hi, there!

Do we want to support and maintain regular expressions in trunk?

http://www.squeaksource.com/Regex.html

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

David T. Lewis
On Tue, May 12, 2015 at 01:31:14AM -0700, marcel.taeumel wrote:
> Hi, there!
>
> Do we want to support and maintain regular expressions in trunk?
>
> http://www.squeaksource.com/Regex.html
>

The latest version loads and runs fine in Squeak, and the tests pass.
Why does it need to be in trunk?

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

marcel.taeumel
Not sure. Just thinking about normal programming activities. Squeak's Strings can only do simple things. There may be some wildcard matching hidden, not sure. It just feels like that regular expressions should be supported out-of-the-box.

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Levente Uzonyi-2
In reply to this post by marcel.taeumel
Definitely. I've been playing with the idea - while I was working on my
still unreleased DNS package - but I came to the conclusion that the VM
side has to be cleaned first.
The plugin should use the platform's support library, instead of some ~10
years old version.
I was also playing with the idea of using the RE2 engine instead of PCRE,
but it's probably a lot more work to make it happen.

Levente

On Tue, 12 May 2015, marcel.taeumel wrote:

> Hi, there!
>
> Do we want to support and maintain regular expressions in trunk?
>
> http://www.squeaksource.com/Regex.html
>
> Best,
> Marcel
>
>
>
> --
> View this message in context: http://forum.world.st/Support-Regular-Expressions-in-Trunk-tp4825937.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Hans-Martin Mosner
Am 12.05.2015 17:07, schrieb Levente Uzonyi:
> Definitely. I've been playing with the idea - while I was working on
> my still unreleased DNS package - but I came to the conclusion that
> the VM side has to be cleaned first.
> The plugin should use the platform's support library, instead of some
> ~10 years old version.
Please, no! It's much easier to work with some old library that has
consistent behaviour everywhere than it is to find and work around the
differences between platform libraries e.g. on Windows, Linux, and some
regular UNIX variant. At work I need to do that with VA Smalltalk code
page conversion (UTF-8 to ISO8859L1 and back) - code that works on the
client doesn't work on the server, so there's a special case and
workaround needed to get everything right.
The only acceptable option if you really want to use an external library
would be to include a standard regex C library with well-defined
functionality in the VM.

Cheers,
Hans-Martin

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

timrowledge
I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ‘well-integrated’.

If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don’t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk.
I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA?

As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Went to the dentist to have his cranial cavity filled.



Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

David T. Lewis
On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote:
> I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?.
>
> If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk.
> I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA?
>
> As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not.
>

There are two distict packages.

The package at www.squeaksource.com/Regex is the Vassili Bykov implementation.
It requires no plugin support or special primitives.

-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
-- See `documentation' protocol of RxParser class for user's guide.

The PCRE package is Andrew Greenberg's Perl compatible regular expressions.
This uses the RePlugin plugin that can be found in the VMMaker package.
RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm
not sure about other platforms or SqueakJS.

  http://wiki.squeak.org/squeak/558

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Tobias Pape

On 13.05.2015, at 00:17, David T. Lewis <[hidden email]> wrote:

> On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote:
>> I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?.
>>
>> If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk.
>> I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA?
>>
>> As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not.
>>
>
> There are two distict packages.
>
> The package at www.squeaksource.com/Regex is the Vassili Bykov implementation.

To Quote there:

NB.
For Pharo users, this package is maintained as part of the core images. For Squeak users, the Pharo repositories might have recent snapshots that weren't actively tested on other branches of Squeak (yet).

Just to note that.


> It requires no plugin support or special primitives.
>
> -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
> -- See `documentation' protocol of RxParser class for user's guide.
>
> The PCRE package is Andrew Greenberg's Perl compatible regular expressions.
> This uses the RePlugin plugin that can be found in the VMMaker package.
> RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm
> not sure about other platforms or SqueakJS.
>
> http://wiki.squeak.org/squeak/558
>
> Dave



Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

David T. Lewis
In reply to this post by David T. Lewis
On Tue, May 12, 2015 at 06:17:09PM -0400, David T. Lewis wrote:

> On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote:
> > I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?.
> >
> > If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk.
> > I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA?
> >
> > As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not.
> >
>
> There are two distict packages.
>
> The package at www.squeaksource.com/Regex is the Vassili Bykov implementation.
> It requires no plugin support or special primitives.
>
> -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
> -- See `documentation' protocol of RxParser class for user's guide.
>
> The PCRE package is Andrew Greenberg's Perl compatible regular expressions.
> This uses the RePlugin plugin that can be found in the VMMaker package.
> RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm
> not sure about other platforms or SqueakJS.
>
>   http://wiki.squeak.org/squeak/558
>

Actually there are probably more than just two. SqueakMap also has BRegexp:

  Description:
    BRegexp is Perl5 compatible regular expression library for Squeak.
    -Supports multilingualized Squeak.
    -Supports only Windows environment.

  Home page:
    http://kminami.fc2web.com/Squeak/Goodies/BRegexp/index.html

SqueakMap has the Andrew Greenberg PCRE as "Regular Expression Plugin"
which seems to load despite an error that pops up in the SAR postscript
processing.

I do not see an entry in SqueakMap for the Vassili Bykov Regex package.

I am fairly sure that regex enthusiasts will have excellent reasons for
favoring the Perl compatible PCRE in some cases (fast, powerful, well
documented), and equally excellent reasons to favor the Vassily Bykov
implementation in other cases (portable, all Smalltalk, works in SqueakJS).

So it seems to me that both of these excellent packages deserve good
entries in SqueakMap to make them easy to find, and easy to load in
the latest Squeak images. If that is done, then I cannot think of any
reason that either one of them would need to be maintained within trunk.

Or to say it another way: If these packages are not important enough to
make them easily loadable from SqueakMap, then they are not very important.
If they are not very important, then we should not put them into trunk.
So please make them loadable :-)

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Levente Uzonyi-2


On Tue, 12 May 2015, David T. Lewis wrote:

> On Tue, May 12, 2015 at 06:17:09PM -0400, David T. Lewis wrote:
>> On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote:
>>> I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?.
>>>
>>> If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk.
>>> I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA?
>>>
>>> As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not.
>>>
>>
>> There are two distict packages.
>>
>> The package at www.squeaksource.com/Regex is the Vassili Bykov implementation.
>> It requires no plugin support or special primitives.
>>
>> -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
>> -- See `documentation' protocol of RxParser class for user's guide.
>>
>> The PCRE package is Andrew Greenberg's Perl compatible regular expressions.
>> This uses the RePlugin plugin that can be found in the VMMaker package.
>> RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm
>> not sure about other platforms or SqueakJS.
>>
>>   http://wiki.squeak.org/squeak/558
>>
>
> Actually there are probably more than just two. SqueakMap also has BRegexp:
>
>  Description:
>    BRegexp is Perl5 compatible regular expression library for Squeak.
>    -Supports multilingualized Squeak.
>    -Supports only Windows environment.
>
>  Home page:
>    http://kminami.fc2web.com/Squeak/Goodies/BRegexp/index.html
>
> SqueakMap has the Andrew Greenberg PCRE as "Regular Expression Plugin"
> which seems to load despite an error that pops up in the SAR postscript
> processing.
>
> I do not see an entry in SqueakMap for the Vassili Bykov Regex package.
>
> I am fairly sure that regex enthusiasts will have excellent reasons for
> favoring the Perl compatible PCRE in some cases (fast, powerful, well
> documented), and equally excellent reasons to favor the Vassily Bykov
> implementation in other cases (portable, all Smalltalk, works in SqueakJS).
>
> So it seems to me that both of these excellent packages deserve good
> entries in SqueakMap to make them easy to find, and easy to load in
> the latest Squeak images. If that is done, then I cannot think of any
> reason that either one of them would need to be maintained within trunk.

If the goal is to use regular expressions in Trunk, then those packages
have to be in Trunk, even if they are not maintained there.

Levente

>
> Or to say it another way: If these packages are not important enough to
> make them easily loadable from SqueakMap, then they are not very important.
> If they are not very important, then we should not put them into trunk.
> So please make them loadable :-)
>
> Dave
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Levente Uzonyi-2
In reply to this post by Hans-Martin Mosner
On Tue, 12 May 2015, Hans-Martin Mosner wrote:

> Am 12.05.2015 17:07, schrieb Levente Uzonyi:
>> Definitely. I've been playing with the idea - while I was working on
>> my still unreleased DNS package - but I came to the conclusion that
>> the VM side has to be cleaned first.
>> The plugin should use the platform's support library, instead of some
>> ~10 years old version.
> Please, no! It's much easier to work with some old library that has
> consistent behaviour everywhere than it is to find and work around the
> differences between platform libraries e.g. on Windows, Linux, and some
> regular UNIX variant. At work I need to do that with VA Smalltalk code page
> conversion (UTF-8 to ISO8859L1 and back) - code that works on the client
> doesn't work on the server, so there's a special case and workaround needed
> to get everything right.
> The only acceptable option if you really want to use an external library
> would be to include a standard regex C library with well-defined
> functionality in the VM.

Are you saying that the PCRE library behaves differently on different
platforms?

Levente

>
> Cheers,
> Hans-Martin
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Nicolas Cellier
Note that at a time when I thought that VB-Regex would be a good candidate for inclusion in trunk,
I put some updates in the inbox backporting some corrections that happened in Visualworks or Pharo branches
http://source.squeak.org/inbox/VB-Regex-nice.20.mcz

The Pharo version has evolved since with more functionalities...

2015-05-13 3:02 GMT+02:00 Levente Uzonyi <[hidden email]>:
On Tue, 12 May 2015, Hans-Martin Mosner wrote:

Am 12.05.2015 17:07, schrieb Levente Uzonyi:
Definitely. I've been playing with the idea - while I was working on
my still unreleased DNS package - but I came to the conclusion that
the VM side has to be cleaned first.
The plugin should use the platform's support library, instead of some
~10 years old version.
Please, no! It's much easier to work with some old library that has consistent behaviour everywhere than it is to find and work around the differences between platform libraries e.g. on Windows, Linux, and some regular UNIX variant. At work I need to do that with VA Smalltalk code page conversion (UTF-8 to ISO8859L1 and back) - code that works on the client doesn't work on the server, so there's a special case and workaround needed to get everything right.
The only acceptable option if you really want to use an external library would be to include a standard regex C library with well-defined functionality in the VM.

Are you saying that the PCRE library behaves differently on different platforms?

Levente


Cheers,
Hans-Martin






Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Hans-Martin Mosner
In reply to this post by Levente Uzonyi-2
Am 13.05.2015 03:02, schrieb Levente Uzonyi:
> On Tue, 12 May 2015, Hans-Martin Mosner wrote:
>
>> Am 12.05.2015 17:07, schrieb Levente Uzonyi:
...

>>> The plugin should use the platform's support library, instead of some
>>> ~10 years old version.
>> Please, no! It's much easier to work with some old library that has
>> consistent behaviour everywhere than it is to find and work around the
>> differences between platform libraries e.g. on Windows, Linux, and
>> some regular UNIX variant. At work I need to do that with VA Smalltalk
>> code page conversion (UTF-8 to ISO8859L1 and back) - code that works
>> on the client doesn't work on the server, so there's a special case
>> and workaround needed to get everything right.
>> The only acceptable option if you really want to use an external
>> library would be to include a standard regex C library with
>> well-defined functionality in the VM.
>
> Are you saying that the PCRE library behaves differently on different
> platforms?
>

Nope, PCRE should behave consistently across platforms if the same
version is installed on those platforms (although I'd be somewhat unsure
about its handling of line ending conventions and unicode/different
character sets on different platforms.)
But I was responding to "platform's support library" which might be
anything from PCRE through some POSIX compatible regex (from the GNU C
library or some other implementation) to anything else that might be
there (see http://unixpapa.com/incnote/regex.html for a list of
possibilities).

Cheers,
Hans-Martin

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Tobias Pape
In reply to this post by Nicolas Cellier

On 13.05.2015, at 08:23, Nicolas Cellier <[hidden email]> wrote:

Note that at a time when I thought that VB-Regex would be a good candidate for inclusion in trunk,
I put some updates in the inbox backporting some corrections that happened in Visualworks or Pharo branches
http://source.squeak.org/inbox/VB-Regex-nice.20.mcz

The Pharo version has evolved since with more functionalities...

Any reason not to include that version?
(whether now[1] or after the release)

Best regards
-Tobias

[1]: but I thought we had feature freeze?


2015-05-13 3:02 GMT+02:00 Levente Uzonyi <[hidden email]>:
On Tue, 12 May 2015, Hans-Martin Mosner wrote:

Am 12.05.2015 17:07, schrieb Levente Uzonyi:
Definitely. I've been playing with the idea - while I was working on
my still unreleased DNS package - but I came to the conclusion that
the VM side has to be cleaned first.
The plugin should use the platform's support library, instead of some
~10 years old version.
Please, no! It's much easier to work with some old library that has consistent behaviour everywhere than it is to find and work around the differences between platform libraries e.g. on Windows, Linux, and some regular UNIX variant. At work I need to do that with VA Smalltalk code page conversion (UTF-8 to ISO8859L1 and back) - code that works on the client doesn't work on the server, so there's a special case and workaround needed to get everything right.
The only acceptable option if you really want to use an external library would be to include a standard regex C library with well-defined functionality in the VM.

Are you saying that the PCRE library behaves differently on different platforms?

Levente


Cheers,
Hans-Martin








Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

marcel.taeumel
Sure. This discussion is beyond 4.6/5.0 :)

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

Levente Uzonyi-2
In reply to this post by Hans-Martin Mosner
On Wed, 13 May 2015, Hans-Martin Mosner wrote:

> Am 13.05.2015 03:02, schrieb Levente Uzonyi:
>> On Tue, 12 May 2015, Hans-Martin Mosner wrote:
>>
>>> Am 12.05.2015 17:07, schrieb Levente Uzonyi:
> ...
>>>> The plugin should use the platform's support library, instead of some
>>>> ~10 years old version.
>>> Please, no! It's much easier to work with some old library that has
>>> consistent behaviour everywhere than it is to find and work around the
>>> differences between platform libraries e.g. on Windows, Linux, and some
>>> regular UNIX variant. At work I need to do that with VA Smalltalk code
>>> page conversion (UTF-8 to ISO8859L1 and back) - code that works on the
>>> client doesn't work on the server, so there's a special case and
>>> workaround needed to get everything right.
>>> The only acceptable option if you really want to use an external library
>>> would be to include a standard regex C library with well-defined
>>> functionality in the VM.
>>
>> Are you saying that the PCRE library behaves differently on different
>> platforms?
>>
>
> Nope, PCRE should behave consistently across platforms if the same version is
> installed on those platforms (although I'd be somewhat unsure about its
> handling of line ending conventions and unicode/different character sets on
> different platforms.)
> But I was responding to "platform's support library" which might be anything
> from PCRE through some POSIX compatible regex (from the GNU C library or some
> other implementation) to anything else that might be there (see
> http://unixpapa.com/incnote/regex.html for a list of possibilities).

The RePlugin is a wrapper for the PCRE library, so what I'm saying is that
we should use the system's installed PCRE library (or statically link an
up-to-date version), instead of a really old fork of it.
The PCRE API won't change anymore, since PCRE2 is here.

Levente

>
> Cheers,
> Hans-Martin
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Support Regular Expressions in Trunk

David T. Lewis
> On Wed, 13 May 2015, Hans-Martin Mosner wrote:
>
>> Am 13.05.2015 03:02, schrieb Levente Uzonyi:
>>> On Tue, 12 May 2015, Hans-Martin Mosner wrote:
>>>
>>>> Am 12.05.2015 17:07, schrieb Levente Uzonyi:
>> ...
>>>>> The plugin should use the platform's support library, instead of some
>>>>> ~10 years old version.
>>>> Please, no! It's much easier to work with some old library that has
>>>> consistent behaviour everywhere than it is to find and work around the
>>>> differences between platform libraries e.g. on Windows, Linux, and
>>>> some
>>>> regular UNIX variant. At work I need to do that with VA Smalltalk code
>>>> page conversion (UTF-8 to ISO8859L1 and back) - code that works on the
>>>> client doesn't work on the server, so there's a special case and
>>>> workaround needed to get everything right.
>>>> The only acceptable option if you really want to use an external
>>>> library
>>>> would be to include a standard regex C library with well-defined
>>>> functionality in the VM.
>>>
>>> Are you saying that the PCRE library behaves differently on different
>>> platforms?
>>>
>>
>> Nope, PCRE should behave consistently across platforms if the same
>> version is
>> installed on those platforms (although I'd be somewhat unsure about its
>> handling of line ending conventions and unicode/different character sets
>> on
>> different platforms.)
>> But I was responding to "platform's support library" which might be
>> anything
>> from PCRE through some POSIX compatible regex (from the GNU C library or
>> some
>> other implementation) to anything else that might be there (see
>> http://unixpapa.com/incnote/regex.html for a list of possibilities).
>
> The RePlugin is a wrapper for the PCRE library, so what I'm saying is that
> we should use the system's installed PCRE library (or statically link an
> up-to-date version), instead of a really old fork of it.
> The PCRE API won't change anymore, since PCRE2 is here.
>

Yes, this is important for Linux distro maintainers.

  http://bugs.squeak.org/view.php?id=7539

Dave