I'm thinking of adding regex literals to GNU Smalltalk. The only syntax
I found that would work is ##/regex/. /regex/ wouldn't work for the old syntax, because the lexer has no way to understand that the / in this example a: b /regex/ printNl starts a regex and is not a division operator. It would work in the new syntax (after one of [ ( { ^ . keyword: identifier binary-message, and maybe a few more I forgot, / would start a regex, otherwise it would be a division operator), but I don't like to add a feature that cannot be ported to other Smalltalks. What do you think? Right now I'm more for "no" or "not yet", but I'm open to discussion. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Wed, 13 Feb 2008 09:58:40 +0100
Paolo Bonzini <[hidden email]> wrote: > I'm thinking of adding regex literals to GNU Smalltalk. The only syntax > I found that would work is ##/regex/. > ... > What do you think? Right now I'm more for "no" or "not yet", but I'm > open to discussion. One thing I've seen with locale describing symbols in VisualWorks is #"de_de.UTF-8", so going along with this approach something like #/.../ makes sense. s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
>> I'm thinking of adding regex literals to GNU Smalltalk. The only syntax >> I found that would work is ##/regex/. >> ... >> What do you think? Right now I'm more for "no" or "not yet", but I'm >> open to discussion. > > One thing I've seen with locale describing symbols in VisualWorks is > #"de_de.UTF-8" Yes, that's #'de_de.UTF-8'. It's supported in GNU Smalltalk too, for "weird" symbols that are not valid Smalltalk message names. > , so going along with this approach something like > #/.../ makes sense. Two hashes because #/ is valid Smalltalk. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Wed, 13 Feb 2008 10:38:35 +0100
Paolo Bonzini <[hidden email]> wrote: > > One thing I've seen with locale describing symbols in VisualWorks is > > #"de_de.UTF-8" > > Yes, that's #'de_de.UTF-8'. Obviously, I need Smalltalk syntax coloring for my mail client :-) s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Paolo Bonzini-2
Tony Garnock-Jones wrote:
> Paolo Bonzini wrote: >> I'm thinking of adding regex literals to GNU Smalltalk. > > I'd be against this. > > 'a.*b' asRegex > > to me seems better, and doesn't require and lexer/parser changes. It's also slower, which is why as of today 'a.*b' works even without sending #asRegex. However, *always* treating string literals as regexes is going to give problems in the long term. In particular, it would break with another extension that I was thinking about: #(1 3 2 6 5 4) select: #odd => #(1 3 5) #(1 12 2) select: (1 to: 10) => #(1 12) #('foo' 'bar') select: ##/f./ => #('foo') This would be quite easily implemented (#select: would send a new message to its argument, e.g. #~, instead of #value:). If regexes would be implemented simply as strings, however, there would be a conflict between the Collection example (second) and the regex example (third): 'foo' select: 'aeiouy' => 'oo' #('foo') select: 'f.' => cannot make it return 'foo' as I'd like! That's why in this case, simply using string literals as regexes wouldn't work. You would need to specify #asRegex to get the desired behavior. As I said, I'm also thinking "no"/"not yet". It's not paramount: older code would be unaffected, and I could start implementing the above (which is not happening any time soon), and then see if it is a problem. Just, there *might* be one. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Paolo Bonzini wrote:
> It's also slower, which is why as of today 'a.*b' works even without > sending #asRegex. Slower because of repeated sends of asRegex? I'd rather see new syntax for compile-time evaluation in literal position, instead of specialised syntax for regex literals. ##('a.*b' asRegex) > However, *always* treating string literals as regexes is going to give > problems in the long term. Agreed. Type-punning is often not a great idea. Regards, Tony _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
>> It's also slower, which is why as of today 'a.*b' works even without >> sending #asRegex. > > Slower because of repeated sends of asRegex? Yes. Or just because 1 send is already more than 0! > I'd rather see new syntax for compile-time evaluation in literal > position, instead of specialised syntax for regex literals. > > ##('a.*b' asRegex) A bit verbose but yes, it is a possibility if performance is a concern. And it works now. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Paolo Bonzini wrote:
> A bit verbose but yes, it is a possibility if performance is a concern. > And it works now. Sorry? There's existing compile-time-eval syntax? Cool! Tony _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Paolo Bonzini-2
Paolo Bonzini <[hidden email]> writes:
> However, *always* treating string literals as regexes is going to give > problems in the long term. In particular, it would break with another > extension that I was thinking about: > > #(1 3 2 6 5 4) select: #odd => #(1 3 5) This is sort of in the Presource test suite: #(1 3 2 6 5 4) select: #odd sendingBlock -| #(1 3 2 6 5 4) select: [:gensym | gensym odd] => #(1 3 5) > #(1 12 2) select: (1 to: 10) => #(1 12) I would not use that :) > #('foo' 'bar') select: ##/f./ => #('foo') > > This would be quite easily implemented (#select: would send a new > message to its argument, e.g. #~, instead of #value:). I would rather have a generalization of the sendingBlock protocol to send explicitly, perhaps with this extension (because with literals, there's no chance for confusion): Eval [NoCandy.MyCodeMindset installIn: Namespace current] NoCandy.Presrc.MessageMacro subclass: SelectLiteralBlocks [ <pool: NoCandy.Presrc> "eh?" "obviously you would memoize this result" SelectLiteralBlocks class >> inlinableActions [ "since, for all these cases, the standard #select: semantics *obviously* aren't useful" ^{'`@x to: `@y' -> '[:`g1 | `g1 between: `@x and: `@y]' -> [:m | ((m atAll: #('`@x' '`@y')) allSatisfy: [:each | each isLiteral and: [each value isInteger]]) ifTrue: [{'`g1' -> self newVariable}]]. '`@x' -> '[:`g1 | `g1 `sel]' -> [:m | | sel | sel := m at: '`@x'. {sel isLiteral. sel value isSymbol. sel value numArgs = 0} condEvery ifTrue: [{'`g1' -> self newVariable. #'`sel' -> sel value}]]. '`@x' -> '[:`g1 | `g1 ~ `@x]' -> [:m | | x | x := m at: '`@x'. (x isLiteral and: [x value isRegex]) ifTrue: [{'`g1' -> self newVariable}]]. } collect: [:triplet | {CodeTemplate fromExpr: triplet key key. CodeTemplate fromExpr: triplet key value. triplet value}] ] expandMessage: sel to: rcv withArguments: args [ | filter | filter := args first. self class inlinableActions do: [:triplet | | match expand test | match := triplet first. expand := triplet second. test := triplet third. (match match: filter) ifNotNil: [:pm | (test value: pm) ifNotNil: [:xtn | xtn do: [:each | pm add: each]. ^STInST.RBMessageNode receiver: rcv selector: sel arguments: {expand expand: pm}]]]. ^self forgoExpansion ] ] #(1 12 2) select: (1 to: 10) -| #(1 12 2) select: [:gensym | gensym between: 1 and: 10] => #(1 2) On a side note, with Unicode, #∋ would be a good name for #~, or maybe #includes: :) -- But you know how reluctant paranormal phenomena are to reveal themselves when skeptics are present. --Robert Sheaffer, SkI 9/2003 _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
> This is sort of in the Presource test suite: > > #(1 3 2 6 5 4) select: #odd sendingBlock > -| #(1 3 2 6 5 4) select: [:gensym | gensym odd] > => #(1 3 5) > >> #(1 12 2) select: (1 to: 10) => #(1 12) > > I would not use that :) Note that it's just a special case of Collections: 'foobar' select: 'aeiou' => 'ooa' In fact, "#(1 1.2 2) select: (1 to: 10)" would *not* include 1.2 in the result. My desire is to allow the common idea of "select: #odd" without implementing Symbol>>#value:. I see no need to implement #sendingBlock (all this IMHO of course) if you reason that: 1) right now, #select: and #collect: have the same "protocol" for the argument, but the two are very different. In the case of #select:/#reject: the argument should return true/false for any collection; for #collect: instead the argument should return an object in the same domain as the source. Taking an extreme position: #value: is the most overloaded method in Smalltalk and the less you use it, the better. :-) (Because then you can achieve more polymorphism and more DWIM). 2) therefore, I decide that #select: (and #reject:) accept a different thing than a block, a "predicate". A predicate can be a unary block of course, but also a symbol, a regex, a collection, ... I chose #~ as the message that the predicate protocol would implement because it's what we use for regexes, but it's not necessary to implement it with that name (also because we currently have "aString ~ aRegex", not the other way round). 3) the same could apply to #collect:, but with a *different* message to emphasize that the argument is not a "predicate", it is an "xyz" (name to be decided :-) I didn't find any good one). I don't have very strong ideas on how to call the message, but it also could apply to symbols, regexes and collections: for example #('1.2' '3.4') collect: #allButLast => #('1.' '3.') #('1.2' '3.4') collect: '^.*\.' asRegex => #('1.' '3.') #('1.2' '3.4') collect: '\.(.*)' asRegex => #('2' '4') #('foo' 'bar') collect: #(1 3) => #('fo' 'br') > NoCandy.Presrc.MessageMacro subclass: SelectLiteralBlocks [ > <pool: NoCandy.Presrc> "eh?" You mean <import: ...> here? > On a side note, with Unicode, #∋ would be a good name for #~, or maybe > #includes: :) Now what Unicode symbols would be binary messages, and which would be okay for identifiers/keywords? :-) Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Free forum by Nabble | Edit this page |