how to rewrite this to use a stream

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to rewrite this to use a stream

Roelof
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

Richard Sargent
Administrator
It's bad advice in this case, but correctly indicates there is a problem with your code. See below.

On Thu, Dec 13, 2018 at 8:13 AM Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
Constructing a pattern and pattern matching is a heavy-handed approach to find out if a word contains a substring.
I'm not familiar with Pharo's methods, but at the least String inherits #indexOfSubCollection:startingAt: and probably a number of others. There is also a good chance String inherits or implements a more specific method that will test for the presence of a sub-string.

A good piece of advice I once received is (paraphrased as): Spend more time reading and less time writing. :-)
The deeper a class hierarchy, the greater the chance that the exact thing you need has already been implemented in the leaf class or its abstraction.

                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

Roelof
there is a method but then I ran into another warning

FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word |
            [ (forbiddenWords detect: [ :forbidden | word includesAll: forbidden ]) = true ]
                on: NotFound
                do: [ true ] ].
    ^ result

then I see a unnecearry = true but if I deleted that part the code does not what I was intented to do.

Roelof



Op 13-12-2018 om 18:15 schreef Richard Sargent:
It's bad advice in this case, but correctly indicates there is a problem with your code. See below.

On Thu, Dec 13, 2018 at 8:13 AM Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
Constructing a pattern and pattern matching is a heavy-handed approach to find out if a word contains a substring.
I'm not familiar with Pharo's methods, but at the least String inherits #indexOfSubCollection:startingAt: and probably a number of others. There is also a good chance String inherits or implements a more specific method that will test for the presence of a sub-string.

A good piece of advice I once received is (paraphrased as): Spend more time reading and less time writing. :-)
The deeper a class hierarchy, the greater the chance that the exact thing you need has already been implemented in the leaf class or its abstraction.

                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof


Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

Richard Sargent
Administrator
On Thu, Dec 13, 2018 at 12:44 PM Roelof Wobben <[hidden email]> wrote:
there is a method but then I ran into another warning

FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word |
            [ (forbiddenWords detect: [ :forbidden | word includesAll: forbidden ]) = true ]
                on: NotFound
                do: [ true ] ].

There are a number of things you can do, such as the variations on detect , for example. But, even those keep you from the answer. The question you are trying to answer is which words satisfy the condition that they don't include any forbidden word as a sub-string.

There are other variations than detect for that kind of question. Try examining the API in Collection for any which suggest a correspondence, and if not there, check subclasses in turn down to the classes of the collections you are working with. And don't forget that #select: isn't the only way to extract a subset of a collection. Look around for others.

[I'm deliberately not giving you the answer, because I think you will benefit from learning how to work effectively in the environment.]


    ^ result

then I see a unnecearry = true but if I deleted that part the code does not what I was intented to do.

Roelof



Op 13-12-2018 om 18:15 schreef Richard Sargent:
It's bad advice in this case, but correctly indicates there is a problem with your code. See below.

On Thu, Dec 13, 2018 at 8:13 AM Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
Constructing a pattern and pattern matching is a heavy-handed approach to find out if a word contains a substring.
I'm not familiar with Pharo's methods, but at the least String inherits #indexOfSubCollection:startingAt: and probably a number of others. There is also a good chance String inherits or implements a more specific method that will test for the presence of a sub-string.

A good piece of advice I once received is (paraphrased as): Spend more time reading and less time writing. :-)
The deeper a class hierarchy, the greater the chance that the exact thing you need has already been implemented in the leaf class or its abstraction.

                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof


Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

Richard O'Keefe
In reply to this post by Roelof
Starting from the top:\
(0) It's not *wrong* to start a selector with a capital letter, but
    it is certainly unusual.
(1) Intention-revealing names are good.  That's why this name is bad.
    It does not *delete* anything.
(2) The comment again says "delete" but nothing is deleted.  Comments
    that mislead are worse than no comments.
(3) An issue I have seen over and over in student code (in other
    programming languages) is writing regular expressions as string
    literals where they are used, so that every time you try a match
    the regular expression gets recompiled, over and over again.
    You're not exactly doing that, but you *are* rebuilding patterns
    over and over again.  It would make more sense to hold a
    forbiddenPatterns collection.
(4) You are saying "(coll reject: [..test..]) notEmpty", building an
    entire collection just to ask whether it has any members.  What
    you want to know is "are there any elements of coll that do not
    satisfy the test", which can be expressed as
    (coll allSatisfy: [..test..]) not.  (Sadly there is no
    #notAllSatisfy:, although #notAnySatisfy: *does* exist under the
    name #noneSatisfy:.
(5) #match: is a highly risky way to look for a substring.  What if
    you want to look for a literal # or * ?  What if you need to
    look for 'Boring' but not 'boring'?  ("Boring" is a perfectly
    good surname.)
(6) You really don't need the "result" variable.

    allowedWords
      "Answer those words which do not contain any forbiddenWord
       as a substring."
      ^words select: [:each |
         forbiddenWords noneSatisfy: [:forbidden |
            <<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods
    for String.  You will find several candidate methods.
(8) Of course, if there are W words and F forbidden words, in the
    worst case you'll do W*F substring inclusion tests.  This is not
    good.  "Multiple pattern matching" is a much-studied problem,
    see for example Wu and Manber's paper
    There is more recent work also.

    Does anyone know whether there's a multiple pattern algorithm for
    Pharo?  One could always fall back on ternary search trees...



On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

philippeback
Thx for those reviews.

And interesting paper.

Phil

On Sat, Dec 15, 2018, 04:20 Richard O'Keefe <[hidden email] wrote:
Starting from the top:\
(0) It's not *wrong* to start a selector with a capital letter, but
    it is certainly unusual.
(1) Intention-revealing names are good.  That's why this name is bad.
    It does not *delete* anything.
(2) The comment again says "delete" but nothing is deleted.  Comments
    that mislead are worse than no comments.
(3) An issue I have seen over and over in student code (in other
    programming languages) is writing regular expressions as string
    literals where they are used, so that every time you try a match
    the regular expression gets recompiled, over and over again.
    You're not exactly doing that, but you *are* rebuilding patterns
    over and over again.  It would make more sense to hold a
    forbiddenPatterns collection.
(4) You are saying "(coll reject: [..test..]) notEmpty", building an
    entire collection just to ask whether it has any members.  What
    you want to know is "are there any elements of coll that do not
    satisfy the test", which can be expressed as
    (coll allSatisfy: [..test..]) not.  (Sadly there is no
    #notAllSatisfy:, although #notAnySatisfy: *does* exist under the
    name #noneSatisfy:.
(5) #match: is a highly risky way to look for a substring.  What if
    you want to look for a literal # or * ?  What if you need to
    look for 'Boring' but not 'boring'?  ("Boring" is a perfectly
    good surname.)
(6) You really don't need the "result" variable.

    allowedWords
      "Answer those words which do not contain any forbiddenWord
       as a substring."
      ^words select: [:each |
         forbiddenWords noneSatisfy: [:forbidden |
            <<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods
    for String.  You will find several candidate methods.
(8) Of course, if there are W words and F forbidden words, in the
    worst case you'll do W*F substring inclusion tests.  This is not
    good.  "Multiple pattern matching" is a much-studied problem,
    see for example Wu and Manber's paper
    There is more recent work also.

    Does anyone know whether there's a multiple pattern algorithm for
    Pharo?  One could always fall back on ternary search trees...



On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

Roelof
Thanks for the reviews.

I have change my code a little bit.
I now check for 1 string and at the end I check for every word the three filters

So I looks now like this:

checkForbiddenParts: word
    "checks if a word contains some forbidden parts"

    ^ forbiddenWords anySatisfy: [ :forbidden | word includesSubstring: forbidden ]


checkLessThen3Vowels: aString
    "checks if a words contains less then 3 vowels"

    ^ (aString count: #isVowel) < 3


checkOverLapping: aString
    "checks if a string has two overlapping characters"

    | result |
    result := aString asArray overlappingPairsCollect: [ :a :b | a ~= b ].
    ^ result includes: true

isNice: aCollection
    "checks if a word is nice according to the 3 filters"

    | answer |
    answer := aCollection
        reject: [ :word |
            (FindNiceStrings new checkForbiddenParts: word)
                | (FindNiceStrings new checkLessThen3Vowels: word)
                | (FindNiceStrings new checkOverLapping: word) not ].
    ^ answer size


But somewhere must be a bug because the answer the code given in too high.

So I think I for a few hours/days busy to find out where the bug is.

here are the 3 filters described :

A nice string is one with all of the following properties:

  • It contains at least three vowels (aeiou only), like aei, xazegov, or aeiouaeiouaeiou.
  • It contains at least one letter that appears twice in a row, like xx, abcdde (dd), or aabbccdd (aa, bb, cc, or dd).
  • It does not contain the strings ab, cd, pq, or xy, even if they are part of one of the other requirements.

Roelof


Op 15-12-2018 om 09:37 schreef [hidden email]:
Thx for those reviews.

And interesting paper.

Phil

On Sat, Dec 15, 2018, 04:20 Richard O'Keefe <[hidden email] wrote:
Starting from the top:\
(0) It's not *wrong* to start a selector with a capital letter, but
    it is certainly unusual.
(1) Intention-revealing names are good.  That's why this name is bad.
    It does not *delete* anything.
(2) The comment again says "delete" but nothing is deleted.  Comments
    that mislead are worse than no comments.
(3) An issue I have seen over and over in student code (in other
    programming languages) is writing regular expressions as string
    literals where they are used, so that every time you try a match
    the regular expression gets recompiled, over and over again.
    You're not exactly doing that, but you *are* rebuilding patterns
    over and over again.  It would make more sense to hold a
    forbiddenPatterns collection.
(4) You are saying "(coll reject: [..test..]) notEmpty", building an
    entire collection just to ask whether it has any members.  What
    you want to know is "are there any elements of coll that do not
    satisfy the test", which can be expressed as
    (coll allSatisfy: [..test..]) not.  (Sadly there is no
    #notAllSatisfy:, although #notAnySatisfy: *does* exist under the
    name #noneSatisfy:.
(5) #match: is a highly risky way to look for a substring.  What if
    you want to look for a literal # or * ?  What if you need to
    look for 'Boring' but not 'boring'?  ("Boring" is a perfectly
    good surname.)
(6) You really don't need the "result" variable.

    allowedWords
      "Answer those words which do not contain any forbiddenWord
       as a substring."
      ^words select: [:each |
         forbiddenWords noneSatisfy: [:forbidden |
            <<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods
    for String.  You will find several candidate methods.
(8) Of course, if there are W words and F forbidden words, in the
    worst case you'll do W*F substring inclusion tests.  This is not
    good.  "Multiple pattern matching" is a much-studied problem,
    see for example Wu and Manber's paper
    There is more recent work also.

    Does anyone know whether there's a multiple pattern algorithm for
    Pharo?  One could always fall back on ternary search trees...



On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof


Reply | Threaded
Open this post in threaded view
|

Re: how to rewrite this to use a stream

Roelof
And found the culprit after some good night sleep.

But any feedback is appreiciated.

Roelof


Op 15-12-2018 om 10:34 schreef Roelof Wobben:
Thanks for the reviews.

I have change my code a little bit.
I now check for 1 string and at the end I check for every word the three filters

So I looks now like this:

checkForbiddenParts: word
    "checks if a word contains some forbidden parts"

    ^ forbiddenWords anySatisfy: [ :forbidden | word includesSubstring: forbidden ]


checkLessThen3Vowels: aString
    "checks if a words contains less then 3 vowels"

    ^ (aString count: #isVowel) < 3


checkOverLapping: aString
    "checks if a string has two overlapping characters"

    | result |
    result := aString asArray overlappingPairsCollect: [ :a :b | a ~= b ].
    ^ result includes: true

isNice: aCollection
    "checks if a word is nice according to the 3 filters"

    | answer |
    answer := aCollection
        reject: [ :word |
            (FindNiceStrings new checkForbiddenParts: word)
                | (FindNiceStrings new checkLessThen3Vowels: word)
                | (FindNiceStrings new checkOverLapping: word) not ].
    ^ answer size


But somewhere must be a bug because the answer the code given in too high.

So I think I for a few hours/days busy to find out where the bug is.

here are the 3 filters described :

A nice string is one with all of the following properties:

  • It contains at least three vowels (aeiou only), like aei, xazegov, or aeiouaeiouaeiou.
  • It contains at least one letter that appears twice in a row, like xx, abcdde (dd), or aabbccdd (aa, bb, cc, or dd).
  • It does not contain the strings ab, cd, pq, or xy, even if they are part of one of the other requirements.

Roelof


Op 15-12-2018 om 09:37 schreef [hidden email]:
Thx for those reviews.

And interesting paper.

Phil

On Sat, Dec 15, 2018, 04:20 Richard O'Keefe <[hidden email] wrote:
Starting from the top:\
(0) It's not *wrong* to start a selector with a capital letter, but
    it is certainly unusual.
(1) Intention-revealing names are good.  That's why this name is bad.
    It does not *delete* anything.
(2) The comment again says "delete" but nothing is deleted.  Comments
    that mislead are worse than no comments.
(3) An issue I have seen over and over in student code (in other
    programming languages) is writing regular expressions as string
    literals where they are used, so that every time you try a match
    the regular expression gets recompiled, over and over again.
    You're not exactly doing that, but you *are* rebuilding patterns
    over and over again.  It would make more sense to hold a
    forbiddenPatterns collection.
(4) You are saying "(coll reject: [..test..]) notEmpty", building an
    entire collection just to ask whether it has any members.  What
    you want to know is "are there any elements of coll that do not
    satisfy the test", which can be expressed as
    (coll allSatisfy: [..test..]) not.  (Sadly there is no
    #notAllSatisfy:, although #notAnySatisfy: *does* exist under the
    name #noneSatisfy:.
(5) #match: is a highly risky way to look for a substring.  What if
    you want to look for a literal # or * ?  What if you need to
    look for 'Boring' but not 'boring'?  ("Boring" is a perfectly
    good surname.)
(6) You really don't need the "result" variable.

    allowedWords
      "Answer those words which do not contain any forbiddenWord
       as a substring."
      ^words select: [:each |
         forbiddenWords noneSatisfy: [:forbidden |
            <<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods
    for String.  You will find several candidate methods.
(8) Of course, if there are W words and F forbidden words, in the
    worst case you'll do W*F substring inclusion tests.  This is not
    good.  "Multiple pattern matching" is a much-studied problem,
    see for example Wu and Manber's paper
    There is more recent work also.

    Does anyone know whether there's a multiple pattern algorithm for
    Pharo?  One could always fall back on ternary search trees...



On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof