[poll] regex literals

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[poll] regex literals

Paolo Bonzini-2
I'm thinking of adding regex literals to GNU Smalltalk.  The only syntax
I found that would work is ##/regex/.  /regex/ wouldn't work for the old
syntax, because the lexer has no way to understand that the / in this
example

     a: b
         /regex/ printNl

starts a regex and is not a division operator.  It would work in the new
syntax (after one of [ ( { ^ . keyword: identifier binary-message, and
maybe a few more I forgot, / would start a regex, otherwise it would be
a division operator), but I don't like to add a feature that cannot be
ported to other Smalltalks.

What do you think?  Right now I'm more for "no" or "not yet", but I'm
open to discussion.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Stefan Schmiedl
On Wed, 13 Feb 2008 09:58:40 +0100
Paolo Bonzini <[hidden email]> wrote:

> I'm thinking of adding regex literals to GNU Smalltalk.  The only syntax
> I found that would work is ##/regex/.  
> ...
> What do you think?  Right now I'm more for "no" or "not yet", but I'm
> open to discussion.

One thing I've seen with locale describing symbols in VisualWorks is
#"de_de.UTF-8", so going along with this approach something like
#/.../ makes sense.

s.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Paolo Bonzini-2

>> I'm thinking of adding regex literals to GNU Smalltalk.  The only syntax
>> I found that would work is ##/regex/.  
>> ...
>> What do you think?  Right now I'm more for "no" or "not yet", but I'm
>> open to discussion.
>
> One thing I've seen with locale describing symbols in VisualWorks is
> #"de_de.UTF-8"

Yes, that's #'de_de.UTF-8'.  It's supported in GNU Smalltalk too, for
"weird" symbols that are not valid Smalltalk message names.

> , so going along with this approach something like
> #/.../ makes sense.

Two hashes because #/ is valid Smalltalk.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Stefan Schmiedl
On Wed, 13 Feb 2008 10:38:35 +0100
Paolo Bonzini <[hidden email]> wrote:

> > One thing I've seen with locale describing symbols in VisualWorks is
> > #"de_de.UTF-8"
>
> Yes, that's #'de_de.UTF-8'.  

Obviously, I need Smalltalk syntax coloring for my mail client :-)

s.


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Paolo Bonzini-2
In reply to this post by Paolo Bonzini-2
Tony Garnock-Jones wrote:
> Paolo Bonzini wrote:
>> I'm thinking of adding regex literals to GNU Smalltalk.
>
> I'd be against this.
>
>  'a.*b' asRegex
>
> to me seems better, and doesn't require and lexer/parser changes.

It's also slower, which is why as of today 'a.*b' works even without
sending #asRegex.

However, *always* treating string literals as regexes is going to give
problems in the long term.  In particular, it would break with another
extension that I was thinking about:

     #(1 3 2 6 5 4) select: #odd => #(1 3 5)
     #(1 12 2) select: (1 to: 10) => #(1 12)
     #('foo' 'bar') select: ##/f./ => #('foo')

This would be quite easily implemented (#select: would send a new
message to its argument, e.g. #~, instead of #value:).  If regexes would
be implemented simply as strings, however, there would be a conflict
between the Collection example (second) and the regex example (third):

     'foo' select: 'aeiouy' => 'oo'
     #('foo') select: 'f.' => cannot make it return 'foo' as I'd like!

That's why in this case, simply using string literals as regexes
wouldn't work.  You would need to specify #asRegex to get the desired
behavior.

As I said, I'm also thinking "no"/"not yet".  It's not paramount: older
code would be unaffected, and I could start implementing the above
(which is not happening any time soon), and then see if it is a problem.
  Just, there *might* be one.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Tony Garnock-Jones-2
Paolo Bonzini wrote:
> It's also slower, which is why as of today 'a.*b' works even without
> sending #asRegex.

Slower because of repeated sends of asRegex?

I'd rather see new syntax for compile-time evaluation in literal
position, instead of specialised syntax for regex literals.

##('a.*b' asRegex)

> However, *always* treating string literals as regexes is going to give
> problems in the long term.

Agreed. Type-punning is often not a great idea.

Regards,
  Tony



_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Paolo Bonzini-2

>> It's also slower, which is why as of today 'a.*b' works even without
>> sending #asRegex.
>
> Slower because of repeated sends of asRegex?

Yes.  Or just because 1 send is already more than 0!

> I'd rather see new syntax for compile-time evaluation in literal
> position, instead of specialised syntax for regex literals.
>
> ##('a.*b' asRegex)

A bit verbose but yes, it is a possibility if performance is a concern.
  And it works now.

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Tony Garnock-Jones-2
Paolo Bonzini wrote:
> A bit verbose but yes, it is a possibility if performance is a concern.
>  And it works now.

Sorry? There's existing compile-time-eval syntax? Cool!

Tony



_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

S11001001
In reply to this post by Paolo Bonzini-2
Paolo Bonzini <[hidden email]> writes:
> However, *always* treating string literals as regexes is going to give
> problems in the long term.  In particular, it would break with another
> extension that I was thinking about:
>
>     #(1 3 2 6 5 4) select: #odd => #(1 3 5)

This is sort of in the Presource test suite:

#(1 3 2 6 5 4) select: #odd sendingBlock
  -| #(1 3 2 6 5 4) select: [:gensym | gensym odd]
  => #(1 3 5)

>     #(1 12 2) select: (1 to: 10) => #(1 12)

I would not use that :)

>     #('foo' 'bar') select: ##/f./ => #('foo')
>
> This would be quite easily implemented (#select: would send a new
> message to its argument, e.g. #~, instead of #value:).

I would rather have a generalization of the sendingBlock protocol to
send explicitly, perhaps with this extension (because with literals,
there's no chance for confusion):

Eval [NoCandy.MyCodeMindset installIn: Namespace current]

NoCandy.Presrc.MessageMacro subclass: SelectLiteralBlocks [
    <pool: NoCandy.Presrc>      "eh?"

    "obviously you would memoize this result"
    SelectLiteralBlocks class >> inlinableActions [
        "since, for all these cases, the standard #select: semantics
         *obviously* aren't useful"
        ^{'`@x to: `@y' -> '[:`g1 | `g1 between: `@x and: `@y]'
          -> [:m |
              ((m atAll: #('`@x' '`@y')) allSatisfy: [:each |
                   each isLiteral and: [each value isInteger]])
                  ifTrue: [{'`g1' -> self newVariable}]].
          '`@x' -> '[:`g1 | `g1 `sel]'
          -> [:m | | sel |
              sel := m at: '`@x'.
              {sel isLiteral.
               sel value isSymbol.
               sel value numArgs = 0}
                  condEvery ifTrue: [{'`g1' -> self newVariable.
                                      #'`sel' -> sel value}]].
          '`@x' -> '[:`g1 | `g1 ~ `@x]'
          -> [:m | | x |
              x := m at: '`@x'.
              (x isLiteral and: [x value isRegex])
                  ifTrue: [{'`g1' -> self newVariable}]].
          } collect: [:triplet |
             {CodeTemplate fromExpr: triplet key key.
              CodeTemplate fromExpr: triplet key value.
              triplet value}]
    ]

    expandMessage: sel to: rcv withArguments: args [
        | filter |
        filter := args first.
        self class inlinableActions do: [:triplet | | match expand test |
            match := triplet first. expand := triplet second.
              test := triplet third.
            (match match: filter) ifNotNil: [:pm |
                (test value: pm) ifNotNil: [:xtn |
                    xtn do: [:each | pm add: each].
                    ^STInST.RBMessageNode
                        receiver: rcv
                        selector: sel
                        arguments: {expand expand: pm}]]].
        ^self forgoExpansion
    ]
]

#(1 12 2) select: (1 to: 10)
  -| #(1 12 2) select: [:gensym | gensym between: 1 and: 10]
  => #(1 2)

On a side note, with Unicode, #∋ would be a good name for #~, or maybe
#includes: :)

--
But you know how reluctant paranormal phenomena are to reveal
themselves when skeptics are present. --Robert Sheaffer, SkI 9/2003


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk
Reply | Threaded
Open this post in threaded view
|

Re: [poll] regex literals

Paolo Bonzini-2

> This is sort of in the Presource test suite:
>
> #(1 3 2 6 5 4) select: #odd sendingBlock
>   -| #(1 3 2 6 5 4) select: [:gensym | gensym odd]
>   => #(1 3 5)
>
>>     #(1 12 2) select: (1 to: 10) => #(1 12)
>
> I would not use that :)

Note that it's just a special case of Collections:

        'foobar' select: 'aeiou' => 'ooa'

In fact, "#(1 1.2 2) select: (1 to: 10)" would *not* include 1.2 in the
result.

My desire is to allow the common idea of "select: #odd" without
implementing Symbol>>#value:.  I see no need to implement #sendingBlock
(all this IMHO of course) if you reason that:

1) right now, #select: and #collect: have the same "protocol" for the
argument, but the two are very different.  In the case of
#select:/#reject: the argument should return true/false for any
collection; for #collect: instead the argument should return an object
in the same domain as the source.

Taking an extreme position: #value: is the most overloaded method in
Smalltalk and the less you use it, the better. :-)  (Because then you
can achieve more polymorphism and more DWIM).

2) therefore, I decide that #select: (and #reject:) accept a different
thing than a block, a "predicate".  A predicate can be a unary block of
course, but also a symbol, a regex, a collection, ...  I chose #~ as the
message that the predicate protocol would implement because it's what we
use for regexes, but it's not necessary to implement it with that name
(also because we currently have "aString ~ aRegex", not the other way
round).

3) the same could apply to #collect:, but with a *different* message to
emphasize that the argument is not a "predicate", it is an "xyz" (name
to be decided :-) I didn't find any good one).  I don't have very strong
ideas on how to call the message, but it also could apply to symbols,
regexes and collections: for example

   #('1.2' '3.4') collect: #allButLast => #('1.' '3.')

   #('1.2' '3.4') collect: '^.*\.' asRegex => #('1.' '3.')
   #('1.2' '3.4') collect: '\.(.*)' asRegex => #('2' '4')

   #('foo' 'bar') collect: #(1 3) => #('fo' 'br')

> NoCandy.Presrc.MessageMacro subclass: SelectLiteralBlocks [
>     <pool: NoCandy.Presrc>      "eh?"

You mean <import: ...> here?

> On a side note, with Unicode, #∋ would be a good name for #~, or maybe
> #includes: :)

Now what Unicode symbols would be binary messages, and which would be
okay for identifiers/keywords?  :-)

Paolo


_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk