Hi, there!
Do we want to support and maintain regular expressions in trunk? http://www.squeaksource.com/Regex.html Best, Marcel |
On Tue, May 12, 2015 at 01:31:14AM -0700, marcel.taeumel wrote:
> Hi, there! > > Do we want to support and maintain regular expressions in trunk? > > http://www.squeaksource.com/Regex.html > The latest version loads and runs fine in Squeak, and the tests pass. Why does it need to be in trunk? Dave |
Not sure. Just thinking about normal programming activities. Squeak's Strings can only do simple things. There may be some wildcard matching hidden, not sure. It just feels like that regular expressions should be supported out-of-the-box.
Best, Marcel |
In reply to this post by marcel.taeumel
Definitely. I've been playing with the idea - while I was working on my
still unreleased DNS package - but I came to the conclusion that the VM side has to be cleaned first. The plugin should use the platform's support library, instead of some ~10 years old version. I was also playing with the idea of using the RE2 engine instead of PCRE, but it's probably a lot more work to make it happen. Levente On Tue, 12 May 2015, marcel.taeumel wrote: > Hi, there! > > Do we want to support and maintain regular expressions in trunk? > > http://www.squeaksource.com/Regex.html > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/Support-Regular-Expressions-in-Trunk-tp4825937.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
Am 12.05.2015 17:07, schrieb Levente Uzonyi:
> Definitely. I've been playing with the idea - while I was working on > my still unreleased DNS package - but I came to the conclusion that > the VM side has to be cleaned first. > The plugin should use the platform's support library, instead of some > ~10 years old version. Please, no! It's much easier to work with some old library that has consistent behaviour everywhere than it is to find and work around the differences between platform libraries e.g. on Windows, Linux, and some regular UNIX variant. At work I need to do that with VA Smalltalk code page conversion (UTF-8 to ISO8859L1 and back) - code that works on the client doesn't work on the server, so there's a special case and workaround needed to get everything right. The only acceptable option if you really want to use an external library would be to include a standard regex C library with well-defined functionality in the VM. Cheers, Hans-Martin |
I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ‘well-integrated’.
If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don’t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk. I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA? As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- Went to the dentist to have his cranial cavity filled. |
On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote:
> I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?. > > If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk. > I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA? > > As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not. > There are two distict packages. The package at www.squeaksource.com/Regex is the Vassili Bykov implementation. It requires no plugin support or special primitives. -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov -- See `documentation' protocol of RxParser class for user's guide. The PCRE package is Andrew Greenberg's Perl compatible regular expressions. This uses the RePlugin plugin that can be found in the VMMaker package. RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm not sure about other platforms or SqueakJS. http://wiki.squeak.org/squeak/558 Dave |
On 13.05.2015, at 00:17, David T. Lewis <[hidden email]> wrote: > On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote: >> I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?. >> >> If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk. >> I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA? >> >> As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not. >> > > There are two distict packages. > > The package at www.squeaksource.com/Regex is the Vassili Bykov implementation. To Quote there: NB. For Pharo users, this package is maintained as part of the core images. For Squeak users, the Pharo repositories might have recent snapshots that weren't actively tested on other branches of Squeak (yet). Just to note that. > It requires no plugin support or special primitives. > > -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov > -- See `documentation' protocol of RxParser class for user's guide. > > The PCRE package is Andrew Greenberg's Perl compatible regular expressions. > This uses the RePlugin plugin that can be found in the VMMaker package. > RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm > not sure about other platforms or SqueakJS. > > http://wiki.squeak.org/squeak/558 > > Dave |
In reply to this post by David T. Lewis
On Tue, May 12, 2015 at 06:17:09PM -0400, David T. Lewis wrote:
> On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote: > > I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?. > > > > If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk. > > I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA? > > > > As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not. > > > > There are two distict packages. > > The package at www.squeaksource.com/Regex is the Vassili Bykov implementation. > It requires no plugin support or special primitives. > > -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov > -- See `documentation' protocol of RxParser class for user's guide. > > The PCRE package is Andrew Greenberg's Perl compatible regular expressions. > This uses the RePlugin plugin that can be found in the VMMaker package. > RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm > not sure about other platforms or SqueakJS. > > http://wiki.squeak.org/squeak/558 > Actually there are probably more than just two. SqueakMap also has BRegexp: Description: BRegexp is Perl5 compatible regular expression library for Squeak. -Supports multilingualized Squeak. -Supports only Windows environment. Home page: http://kminami.fc2web.com/Squeak/Goodies/BRegexp/index.html SqueakMap has the Andrew Greenberg PCRE as "Regular Expression Plugin" which seems to load despite an error that pops up in the SAR postscript processing. I do not see an entry in SqueakMap for the Vassili Bykov Regex package. I am fairly sure that regex enthusiasts will have excellent reasons for favoring the Perl compatible PCRE in some cases (fast, powerful, well documented), and equally excellent reasons to favor the Vassily Bykov implementation in other cases (portable, all Smalltalk, works in SqueakJS). So it seems to me that both of these excellent packages deserve good entries in SqueakMap to make them easy to find, and easy to load in the latest Squeak images. If that is done, then I cannot think of any reason that either one of them would need to be maintained within trunk. Or to say it another way: If these packages are not important enough to make them easily loadable from SqueakMap, then they are not very important. If they are not very important, then we should not put them into trunk. So please make them loadable :-) Dave |
On Tue, 12 May 2015, David T. Lewis wrote: > On Tue, May 12, 2015 at 06:17:09PM -0400, David T. Lewis wrote: >> On Tue, May 12, 2015 at 10:27:20AM -0700, tim Rowledge wrote: >>> I love the idea of well-integrated expressions etc being available for string handling and in tools. The tricky bit is that ?well-integrated?. >>> >>> If at all possible it should be done without any extra primitives. Unless we have a convincing use case for such string handling needing to be frequently done and very fast I don?t see much value in adding more C code. With Cog and SISTA making the system so much faster we have to consider the possibility that doing all the marshalling, converting, stack fapping, c-code calling, returning, re-converting may take more time than leaving it to Smalltalk. >>> I suppose the counter-argument would be that using a standard library guarantees correct results. Except of course we have a lot of experience with supposedly standard libraries being utterly screwed. Jpeg libraries, anyone? ALSA? >>> >>> As for the actual expression grammar, I imagine most people would be assuming the unix regexp form which has the advantage that a lot of people are familiar with it. And the disadvantage that a very large number of people are not. >>> >> >> There are two distict packages. >> >> The package at www.squeaksource.com/Regex is the Vassili Bykov implementation. >> It requires no plugin support or special primitives. >> >> -- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov >> -- See `documentation' protocol of RxParser class for user's guide. >> >> The PCRE package is Andrew Greenberg's Perl compatible regular expressions. >> This uses the RePlugin plugin that can be found in the VMMaker package. >> RePlugin is an internal plugin in the Unix VMs (interpreter/cog/spur), I'm >> not sure about other platforms or SqueakJS. >> >> http://wiki.squeak.org/squeak/558 >> > > Actually there are probably more than just two. SqueakMap also has BRegexp: > > Description: > BRegexp is Perl5 compatible regular expression library for Squeak. > -Supports multilingualized Squeak. > -Supports only Windows environment. > > Home page: > http://kminami.fc2web.com/Squeak/Goodies/BRegexp/index.html > > SqueakMap has the Andrew Greenberg PCRE as "Regular Expression Plugin" > which seems to load despite an error that pops up in the SAR postscript > processing. > > I do not see an entry in SqueakMap for the Vassili Bykov Regex package. > > I am fairly sure that regex enthusiasts will have excellent reasons for > favoring the Perl compatible PCRE in some cases (fast, powerful, well > documented), and equally excellent reasons to favor the Vassily Bykov > implementation in other cases (portable, all Smalltalk, works in SqueakJS). > > So it seems to me that both of these excellent packages deserve good > entries in SqueakMap to make them easy to find, and easy to load in > the latest Squeak images. If that is done, then I cannot think of any > reason that either one of them would need to be maintained within trunk. If the goal is to use regular expressions in Trunk, then those packages have to be in Trunk, even if they are not maintained there. Levente > > Or to say it another way: If these packages are not important enough to > make them easily loadable from SqueakMap, then they are not very important. > If they are not very important, then we should not put them into trunk. > So please make them loadable :-) > > Dave > > > |
In reply to this post by Hans-Martin Mosner
On Tue, 12 May 2015, Hans-Martin Mosner wrote:
> Am 12.05.2015 17:07, schrieb Levente Uzonyi: >> Definitely. I've been playing with the idea - while I was working on >> my still unreleased DNS package - but I came to the conclusion that >> the VM side has to be cleaned first. >> The plugin should use the platform's support library, instead of some >> ~10 years old version. > Please, no! It's much easier to work with some old library that has > consistent behaviour everywhere than it is to find and work around the > differences between platform libraries e.g. on Windows, Linux, and some > regular UNIX variant. At work I need to do that with VA Smalltalk code page > conversion (UTF-8 to ISO8859L1 and back) - code that works on the client > doesn't work on the server, so there's a special case and workaround needed > to get everything right. > The only acceptable option if you really want to use an external library > would be to include a standard regex C library with well-defined > functionality in the VM. Are you saying that the PCRE library behaves differently on different platforms? Levente > > Cheers, > Hans-Martin > > |
Note that at a time when I thought that VB-Regex would be a good candidate for inclusion in trunk, The Pharo version has evolved since with more functionalities...I put some updates in the inbox backporting some corrections that happened in Visualworks or Pharo branches http://source.squeak.org/inbox/VB-Regex-nice.20.mcz 2015-05-13 3:02 GMT+02:00 Levente Uzonyi <[hidden email]>: On Tue, 12 May 2015, Hans-Martin Mosner wrote: |
In reply to this post by Levente Uzonyi-2
Am 13.05.2015 03:02, schrieb Levente Uzonyi:
> On Tue, 12 May 2015, Hans-Martin Mosner wrote: > >> Am 12.05.2015 17:07, schrieb Levente Uzonyi: ... >>> The plugin should use the platform's support library, instead of some >>> ~10 years old version. >> Please, no! It's much easier to work with some old library that has >> consistent behaviour everywhere than it is to find and work around the >> differences between platform libraries e.g. on Windows, Linux, and >> some regular UNIX variant. At work I need to do that with VA Smalltalk >> code page conversion (UTF-8 to ISO8859L1 and back) - code that works >> on the client doesn't work on the server, so there's a special case >> and workaround needed to get everything right. >> The only acceptable option if you really want to use an external >> library would be to include a standard regex C library with >> well-defined functionality in the VM. > > Are you saying that the PCRE library behaves differently on different > platforms? > Nope, PCRE should behave consistently across platforms if the same version is installed on those platforms (although I'd be somewhat unsure about its handling of line ending conventions and unicode/different character sets on different platforms.) But I was responding to "platform's support library" which might be anything from PCRE through some POSIX compatible regex (from the GNU C library or some other implementation) to anything else that might be there (see http://unixpapa.com/incnote/regex.html for a list of possibilities). Cheers, Hans-Martin |
In reply to this post by Nicolas Cellier
On 13.05.2015, at 08:23, Nicolas Cellier <[hidden email]> wrote:
Any reason not to include that version? (whether now[1] or after the release) Best regards -Tobias [1]: but I thought we had feature freeze?
|
Sure. This discussion is beyond 4.6/5.0 :)
Best, Marcel |
In reply to this post by Hans-Martin Mosner
On Wed, 13 May 2015, Hans-Martin Mosner wrote:
> Am 13.05.2015 03:02, schrieb Levente Uzonyi: >> On Tue, 12 May 2015, Hans-Martin Mosner wrote: >> >>> Am 12.05.2015 17:07, schrieb Levente Uzonyi: > ... >>>> The plugin should use the platform's support library, instead of some >>>> ~10 years old version. >>> Please, no! It's much easier to work with some old library that has >>> consistent behaviour everywhere than it is to find and work around the >>> differences between platform libraries e.g. on Windows, Linux, and some >>> regular UNIX variant. At work I need to do that with VA Smalltalk code >>> page conversion (UTF-8 to ISO8859L1 and back) - code that works on the >>> client doesn't work on the server, so there's a special case and >>> workaround needed to get everything right. >>> The only acceptable option if you really want to use an external library >>> would be to include a standard regex C library with well-defined >>> functionality in the VM. >> >> Are you saying that the PCRE library behaves differently on different >> platforms? >> > > Nope, PCRE should behave consistently across platforms if the same version is > installed on those platforms (although I'd be somewhat unsure about its > handling of line ending conventions and unicode/different character sets on > different platforms.) > But I was responding to "platform's support library" which might be anything > from PCRE through some POSIX compatible regex (from the GNU C library or some > other implementation) to anything else that might be there (see > http://unixpapa.com/incnote/regex.html for a list of possibilities). The RePlugin is a wrapper for the PCRE library, so what I'm saying is that we should use the system's installed PCRE library (or statically link an up-to-date version), instead of a really old fork of it. The PCRE API won't change anymore, since PCRE2 is here. Levente > > Cheers, > Hans-Martin > > |
> On Wed, 13 May 2015, Hans-Martin Mosner wrote:
> >> Am 13.05.2015 03:02, schrieb Levente Uzonyi: >>> On Tue, 12 May 2015, Hans-Martin Mosner wrote: >>> >>>> Am 12.05.2015 17:07, schrieb Levente Uzonyi: >> ... >>>>> The plugin should use the platform's support library, instead of some >>>>> ~10 years old version. >>>> Please, no! It's much easier to work with some old library that has >>>> consistent behaviour everywhere than it is to find and work around the >>>> differences between platform libraries e.g. on Windows, Linux, and >>>> some >>>> regular UNIX variant. At work I need to do that with VA Smalltalk code >>>> page conversion (UTF-8 to ISO8859L1 and back) - code that works on the >>>> client doesn't work on the server, so there's a special case and >>>> workaround needed to get everything right. >>>> The only acceptable option if you really want to use an external >>>> library >>>> would be to include a standard regex C library with well-defined >>>> functionality in the VM. >>> >>> Are you saying that the PCRE library behaves differently on different >>> platforms? >>> >> >> Nope, PCRE should behave consistently across platforms if the same >> version is >> installed on those platforms (although I'd be somewhat unsure about its >> handling of line ending conventions and unicode/different character sets >> on >> different platforms.) >> But I was responding to "platform's support library" which might be >> anything >> from PCRE through some POSIX compatible regex (from the GNU C library or >> some >> other implementation) to anything else that might be there (see >> http://unixpapa.com/incnote/regex.html for a list of possibilities). > > The RePlugin is a wrapper for the PCRE library, so what I'm saying is that > we should use the system's installed PCRE library (or statically link an > up-to-date version), instead of a really old fork of it. > The PCRE API won't change anymore, since PCRE2 is here. > Yes, this is important for Linux distro maintainers. http://bugs.squeak.org/view.php?id=7539 Dave |
Free forum by Nabble | Edit this page |