Hi,
I just noticed that the Pharo regexes do not understand non-greedy matches. A regex engine to be PCRE is kind of essential, not having '.*?' to be a parseable and working regex is a bummer. Are there any more powerful regex engines around for Pharo? I could not find any. Cheers, Manuel |
Hi,
Yes, Pharo regex implementation is very naive. We will be moving to a PCRE binding to match outside world standards but we have not had the time to work on it :( Esteban > On 5 Feb 2019, at 00:27, Manuel Leuenberger <[hidden email]> wrote: > > Hi, > > I just noticed that the Pharo regexes do not understand non-greedy matches. A regex engine to be PCRE is kind of essential, not having '.*?' to be a parseable and working regex is a bummer. Are there any more powerful regex engines around for Pharo? I could not find any. > > Cheers, > Manuel > > |
The Regex engine inside SmaCC allows for non-greedy REs. But it's
integrated as a parser first stage, not as an independent RE engine. Regards, Thierry Le mar. 5 févr. 2019 à 08:34, Esteban Lorenzano <[hidden email]> a écrit : > > Hi, > > Yes, Pharo regex implementation is very naive. > We will be moving to a PCRE binding to match outside world standards but we have not had the time to work on it :( > > Esteban > > > On 5 Feb 2019, at 00:27, Manuel Leuenberger <[hidden email]> wrote: > > > > Hi, > > > > I just noticed that the Pharo regexes do not understand non-greedy matches. A regex engine to be PCRE is kind of essential, not having '.*?' to be a parseable and working regex is a bummer. Are there any more powerful regex engines around for Pharo? I could not find any. > > > > Cheers, > > Manuel > > > > > > |
In reply to this post by EstebanLM
Please DON'T move to PCRE. "Outside world standards"? There are so many. There are two important things to know about PCRE: (1) it is a popular open source regexp library for Perl-style regexps, (2) because of that, it is prone to truly horrendous performance problems. There are alternatives, such as re2, which are not subject to PCRE's intrinsic performance pathologies. As it happens, re2 supports *? +? and ??. On Tue, 5 Feb 2019 at 20:34, Esteban Lorenzano <[hidden email]> wrote: Hi, |
I am not advocating for PCRE in particular, I just need a regex engine that is just as powerful. I guess re2 serves that purpose, although I haven't used it myself (knowingly). Looking at https://github.com/google/re2/wiki/WhyRE2, re2 actually seems to be a good target. "match time is linear in the length of the input string", sounds like a really nice property.
|
In reply to this post by EstebanLM
We can also update pharo version from original VW repositoriy if the current license is appropriate. I think it covers missing parts. 5 февр. 2019 г. 7:34 пользователь "Esteban Lorenzano" <[hidden email]> написал: Hi, |
In reply to this post by EstebanLM
Still, there are advantages to an in-image solution, can't says this enough, these external lib dependencies pose their own problems ...
> On 5 Feb 2019, at 08:33, Esteban Lorenzano <[hidden email]> wrote: > > Hi, > > Yes, Pharo regex implementation is very naive. > We will be moving to a PCRE binding to match outside world standards but we have not had the time to work on it :( > > Esteban > >> On 5 Feb 2019, at 00:27, Manuel Leuenberger <[hidden email]> wrote: >> >> Hi, >> >> I just noticed that the Pharo regexes do not understand non-greedy matches. A regex engine to be PCRE is kind of essential, not having '.*?' to be a parseable and working regex is a bummer. Are there any more powerful regex engines around for Pharo? I could not find any. >> >> Cheers, >> Manuel >> >> > > |
In reply to this post by Richard O'Keefe
On Wed, Feb 06, 2019 at 12:26:00AM +1300, Richard O'Keefe wrote:
> Please DON'T move to PCRE. > "Outside world standards"? There are so many. > There are two important things to know about > PCRE: (1) it is a popular open source regexp > library for Perl-style regexps, (2) because of > that, it is prone to truly horrendous performance > problems. There are alternatives, such as re2, > https://github.com/google/re2 , > which are not subject to PCRE's intrinsic > performance pathologies. As it happens, re2 > supports *? +? and ??. Can you share some examples of PCRE's bad performance? Pierce |
In reply to this post by Sven Van Caekenberghe-2
> Am 05.02.2019 um 16:16 schrieb Sven Van Caekenberghe <[hidden email]>: > > Still, there are advantages to an in-image solution, can't says this enough, these external lib dependencies pose their own problems ... +1 > >> On 5 Feb 2019, at 08:33, Esteban Lorenzano <[hidden email]> wrote: >> >> Hi, >> >> Yes, Pharo regex implementation is very naive. >> We will be moving to a PCRE binding to match outside world standards but we have not had the time to work on it :( >> >> Esteban >> >>> On 5 Feb 2019, at 00:27, Manuel Leuenberger <[hidden email]> wrote: >>> >>> Hi, >>> >>> I just noticed that the Pharo regexes do not understand non-greedy matches. A regex engine to be PCRE is kind of essential, not having '.*?' to be a parseable and working regex is a bummer. Are there any more powerful regex engines around for Pharo? I could not find any. >>> >>> Cheers, >>> Manuel >>> >>> >> >> > > |
In reply to this post by Pierce Ng-3
PCRE has exponential worst-case time. See for example but searching for PCRE exponential time or worst case will find more. That's not the problem. The problem is that it isn't *obvious* which regexps are safe and that people are taught that regular expressions can be matched in linear time, which is sort of the point of them. But PCRE patterns *aren't* regular expressions. On Wed, 6 Feb 2019 at 04:54, Pierce Ng <[hidden email]> wrote: On Wed, Feb 06, 2019 at 12:26:00AM +1300, Richard O'Keefe wrote: |
In reply to this post by NorbertHartl
On Tue, Feb 05, 2019 at 07:25:07PM +0100, Norbert Hartl wrote:
> > Am 05.02.2019 um 16:16 schrieb Sven Van Caekenberghe <[hidden email]>: > > Still, there are advantages to an in-image solution, can't says this > > enough, these external lib dependencies pose their own problems ... > +1 Libraries like libgit2, libssh2 and quite a few more are already a core part of Pharo, so I'd say philosophically might as well go all in to make FFI to external libraries an intrinsic part of computing with Pharo, not just for developing Pharo itself. The more people use UFFI, the better it will become. |
If cre2 meets the standard and community agree, I do not care to be honest, I just mentioned PCRE because it is the “de facto standard” and we are tired of not being compatible :)
About an in-image solutions in general: Yes, it is better than rely in a solution like that. Also about in-image solutions in general: It is harder to maintain them. We are a small community and we cannot allow us to have everything we need implemented in image. Using an FFI solution is a perfect valid way and IMO is preferred many times because of this maintainability issue. Of course, it does not applies to all cases with same intensity :) I don’t know the cost of maintain a regex lib in-image. I know at the moment is infinite because there is no-one there doing it (and I know I do now have the time). So we stay for years with a suboptimal solution :( Anyway I want to point to some things I (In my “architect” role, or whatever is what I do here) I always think: 1- This solution is state-of-the-art? 2- is it maintained or the cost to maintain it can be absorbed ? 2.1- Is maintained by whom? 2.2- Will this kick us back if maintainers leave? 3- is it well tested? 4- is it well documented? People often forgets that making libraries is a commitment with your user community, is a lot more than “I do this and then I forget”. Of course you can proceed like that (and not few of my own projects are of use-and-throw), but we as “pharo makers” need to take this into account with priority to other variables. Notice that “performance” is not in this list. I care a lot about performance, but I care a lot more about the other points. Now, that does not means we do not make decisions we later regret (we are humans, after all… and this is a learning process). Cheers, Esteban > On 6 Feb 2019, at 02:38, Pierce Ng <[hidden email]> wrote: > > On Tue, Feb 05, 2019 at 07:25:07PM +0100, Norbert Hartl wrote: >>> Am 05.02.2019 um 16:16 schrieb Sven Van Caekenberghe <[hidden email]>: >>> Still, there are advantages to an in-image solution, can't says this >>> enough, these external lib dependencies pose their own problems ... >> +1 > > Libraries like libgit2, libssh2 and quite a few more are already a core > part of Pharo, so I'd say philosophically might as well go all in to > make FFI to external libraries an intrinsic part of computing with > Pharo, not just for developing Pharo itself. The more people use UFFI, > the better it will become. > > > |
In reply to this post by Pierce Ng-3
An in-image regex engine would always be preferable by me, but creating a fully-flegged engine with all the fancy lookarounds, named/non-capturing groups, non-greedy matches, Unicode support, etc. sounds like a six-month-length full-time project. Any volunteers? ;) I am all for a pragmatic approach: If there is a solution that allows to reuse Smalltalk streams and strings without much memory copying combined with a library dependency that is battle-proven and maintained by capable people - why not? Funny enough, PCRE already seems to be used in the RePlugin (https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/src/plugins/RePlugin/RePlugin.c#L356). Why is this not used in the image, am I missing something? Regarding third-party dependencies, my only beef is that I do not particularly like that they are distributed and linked together with a copy by the VM. This makes it very portable, but I do not need another copy of a library that I have already installed on my system (git/SSH/SSL/png/FreeType/Cairo). On Linux some libs (Cairo/FreeType/PNG) are already linked to system libs, I would like to have Pharo as an installable package in apt and MacPorts/brew with declared dependencies, which would make it even more lightweight. But that is only marginally related for this thread.
|
As an in-image solution, refactoring the SmaCC scanner to a standalone regex engine might be pretty efficient to gain more RE features, but I cannot really judge how big this effort would be.
|
Administrator
|
In reply to this post by Manuel Leuenberger
Manuel Leuenberger wrote
> An in-image regex engine would always be preferable by me, but... I am all > for a pragmatic approach I feel similarly: Ideally I would love everything in image, but given limited manpower it seems wise to leverage outside libs for standard things so that we can devote our focus to blue plane invention. Some fantasy future moment when we have already conquered the world and have nothing to do would be a perfect time to circle back and reimplement the delegated tasks in image. ----- Cheers, Sean -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
Cheers,
Sean |
Free forum by Nabble | Edit this page |