Smalltalk › Pharo › Pharo Smalltalk Users

how to rewrite this to use a stream

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Roelof

how to rewrite this to use a stream

Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Richard Sargent

Re: how to rewrite this to use a stream

Administrator

It's bad advice in this case, but correctly indicates there is a problem with your code. See below.

On Thu, Dec 13, 2018 at 8:13 AM Roelof Wobben <[hidden email]> wrote:

Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]

Constructing a pattern and pattern matching is a heavy-handed approach to find out if a word contains a substring.

I'm not familiar with Pharo's methods, but at the least String inherits #indexOfSubCollection:startingAt: and probably a number of others. There is also a good chance String inherits or implements a more specific method that will test for the presence of a sub-string.

A good piece of advice I once received is (paraphrased as): Spend more time reading and less time writing. :-)

The deeper a class hierarchy, the greater the chance that the exact thing you need has already been implemented in the leaf class or its abstraction.


                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Roelof

Re: how to rewrite this to use a stream

there is a method but then I ran into another warning

FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word |
            [ (forbiddenWords detect: [ :forbidden | word includesAll: forbidden ]) = true ]
                on: NotFound
                do: [ true ] ].
    ^ result

then I see a unnecearry = true but if I deleted that part the code does not what I was intented to do.

Roelof

Op 13-12-2018 om 18:15 schreef Richard Sargent:

It's bad advice in this case, but correctly indicates there is a problem with your code. See below.

On Thu, Dec 13, 2018 at 8:13 AM Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
Constructing a pattern and pattern matching is a heavy-handed approach to find out if a word contains a substring.

I'm not familiar with Pharo's methods, but at the least String inherits #indexOfSubCollection:startingAt: and probably a number of others. There is also a good chance String inherits or implements a more specific method that will test for the presence of a sub-string.

A good piece of advice I once received is (paraphrased as): Spend more time reading and less time writing. :-)

The deeper a class hierarchy, the greater the chance that the exact thing you need has already been implemented in the leaf class or its abstraction.
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Richard Sargent

Re: how to rewrite this to use a stream

Administrator

On Thu, Dec 13, 2018 at 12:44 PM Roelof Wobben <[hidden email]> wrote:

there is a method but then I ran into another warning

FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word |
            [ (forbiddenWords detect: [ :forbidden | word includesAll: forbidden ]) = true ]
                on: NotFound
                do: [ true ] ].

There are a number of things you can do, such as the variations on detect , for example. But, even those keep you from the answer. The question you are trying to answer is which words satisfy the condition that they don't include any forbidden word as a sub-string.

There are other variations than detect for that kind of question. Try examining the API in Collection for any which suggest a correspondence, and if not there, check subclasses in turn down to the classes of the collections you are working with. And don't forget that #select: isn't the only way to extract a subset of a collection. Look around for others.

[I'm deliberately not giving you the answer, because I think you will benefit from learning how to work effectively in the environment.]

^ result

then I see a unnecearry = true but if I deleted that part the code does not what I was intented to do.

Roelof

Op 13-12-2018 om 18:15 schreef Richard Sargent:
It's bad advice in this case, but correctly indicates there is a problem with your code. See below.

On Thu, Dec 13, 2018 at 8:13 AM Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
Constructing a pattern and pattern matching is a heavy-handed approach to find out if a word contains a substring.

I'm not familiar with Pharo's methods, but at the least String inherits #indexOfSubCollection:startingAt: and probably a number of others. There is also a good chance String inherits or implements a more specific method that will test for the presence of a sub-string.

A good piece of advice I once received is (paraphrased as): Spend more time reading and less time writing. :-)

The deeper a class hierarchy, the greater the chance that the exact thing you need has already been implemented in the leaf class or its abstraction.
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Richard O'Keefe

Re: how to rewrite this to use a stream

In reply to this post by Roelof

Starting from the top:\

(0) It's not *wrong* to start a selector with a capital letter, but

    it is certainly unusual.

(1) Intention-revealing names are good.  That's why this name is bad.

    It does not *delete* anything.

(2) The comment again says "delete" but nothing is deleted.  Comments

    that mislead are worse than no comments.

(3) An issue I have seen over and over in student code (in other

    programming languages) is writing regular expressions as string

    literals where they are used, so that every time you try a match

    the regular expression gets recompiled, over and over again.

    You're not exactly doing that, but you *are* rebuilding patterns

    over and over again.  It would make more sense to hold a

    forbiddenPatterns collection.

(4) You are saying "(coll reject: [..test..]) notEmpty", building an

    entire collection just to ask whether it has any members.  What

    you want to know is "are there any elements of coll that do not

    satisfy the test", which can be expressed as

    (coll allSatisfy: [..test..]) not.  (Sadly there is no

    #notAllSatisfy:, although #notAnySatisfy: *does* exist under the

    name #noneSatisfy:.

(5) #match: is a highly risky way to look for a substring.  What if

    you want to look for a literal # or * ?  What if you need to

    look for 'Boring' but not 'boring'?  ("Boring" is a perfectly

    good surname.)

(6) You really don't need the "result" variable.

    allowedWords

      "Answer those words which do not contain any forbiddenWord

       as a substring."

      ^words select: [:each |

         forbiddenWords noneSatisfy: [:forbidden |

            <<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods

    for String.  You will find several candidate methods.

(8) Of course, if there are W words and F forbidden words, in the

    worst case you'll do W*F substring inclusion tests.  This is not

    good.  "Multiple pattern matching" is a much-studied problem,

    see for example Wu and Manber's paper

    http://web.archive.org/web/20130203031412/http://www.inf.fu-berlin.de/inst/ag-bio-expired/FILES/ROOT/Teaching/Lectures/WS0506/WeiterfThemenBioinf/papers/FastAlgorithmMultiPatternSearching-WuManbers-TR94.pdf

    There is more recent work also.

    Does anyone know whether there's a multiple pattern algorithm for

    Pharo?  One could always fall back on ternary search trees...

On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:

Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

philippeback

Re: how to rewrite this to use a stream

Thx for those reviews.

And interesting paper.

Phil

On Sat, Dec 15, 2018, 04:20 Richard O'Keefe <[hidden email] wrote:

Starting from the top:\
(0) It's not *wrong* to start a selector with a capital letter, but
it is certainly unusual.
(1) Intention-revealing names are good. That's why this name is bad.
It does not *delete* anything.
(2) The comment again says "delete" but nothing is deleted. Comments
that mislead are worse than no comments.
(3) An issue I have seen over and over in student code (in other
programming languages) is writing regular expressions as string
literals where they are used, so that every time you try a match
the regular expression gets recompiled, over and over again.
You're not exactly doing that, but you *are* rebuilding patterns
over and over again. It would make more sense to hold a
forbiddenPatterns collection.
(4) You are saying "(coll reject: [..test..]) notEmpty", building an
entire collection just to ask whether it has any members. What
you want to know is "are there any elements of coll that do not
satisfy the test", which can be expressed as
(coll allSatisfy: [..test..]) not. (Sadly there is no
#notAllSatisfy:, although #notAnySatisfy: *does* exist under the
name #noneSatisfy:.
(5) #match: is a highly risky way to look for a substring. What if
you want to look for a literal # or * ? What if you need to
look for 'Boring' but not 'boring'? ("Boring" is a perfectly
good surname.)
(6) You really don't need the "result" variable.

allowedWords
"Answer those words which do not contain any forbiddenWord
as a substring."
^words select: [:each |
forbiddenWords noneSatisfy: [:forbidden |
<<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods
for String. You will find several candidate methods.
(8) Of course, if there are W words and F forbidden words, in the
worst case you'll do W*F substring inclusion tests. This is not
good. "Multiple pattern matching" is a much-studied problem,
see for example Wu and Manber's paper
http://web.archive.org/web/20130203031412/http://www.inf.fu-berlin.de/inst/ag-bio-expired/FILES/ROOT/Teaching/Lectures/WS0506/WeiterfThemenBioinf/papers/FastAlgorithmMultiPatternSearching-WuManbers-TR94.pdf
There is more recent work also.

Does anyone know whether there's a multiple pattern algorithm for
Pharo? One could always fall back on ternary search trees...
On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Roelof

Re: how to rewrite this to use a stream

Thanks for the reviews.

I have change my code a little bit.
I now check for 1 string and at the end I check for every word the three filters

So I looks now like this:

checkForbiddenParts: word
    "checks if a word contains some forbidden parts"

    ^ forbiddenWords anySatisfy: [ :forbidden | word includesSubstring: forbidden ]

checkLessThen3Vowels: aString
    "checks if a words contains less then 3 vowels"

    ^ (aString count: #isVowel) < 3

checkOverLapping: aString
    "checks if a string has two overlapping characters"

    | result |
    result := aString asArray overlappingPairsCollect: [ :a :b | a ~= b ].
    ^ result includes: true

isNice: aCollection
    "checks if a word is nice according to the 3 filters"

    | answer |
    answer := aCollection
        reject: [ :word |
            (FindNiceStrings new checkForbiddenParts: word)
                | (FindNiceStrings new checkLessThen3Vowels: word)
                | (FindNiceStrings new checkOverLapping: word) not ].
    ^ answer size

But somewhere must be a bug because the answer the code given in too high.

So I think I for a few hours/days busy to find out where the bug is.

here are the 3 filters described :

A nice string is one with all of the following properties:

It contains at least three vowels (aeiou only), like aei, xazegov, or aeiouaeiouaeiou.
It contains at least one letter that appears twice in a row, like xx, abcdde (dd), or aabbccdd (aa, bb, cc, or dd).
It does not contain the strings ab, cd, pq, or xy, even if they are part of one of the other requirements.

Roelof

Op 15-12-2018 om 09:37 schreef [hidden email]:

Thx for those reviews.

And interesting paper.

Phil
On Sat, Dec 15, 2018, 04:20 Richard O'Keefe <[hidden email] wrote:
Starting from the top:\

(0) It's not *wrong* to start a selector with a capital letter, but

it is certainly unusual.

(1) Intention-revealing names are good. That's why this name is bad.

It does not *delete* anything.

(2) The comment again says "delete" but nothing is deleted. Comments

that mislead are worse than no comments.

(3) An issue I have seen over and over in student code (in other

programming languages) is writing regular expressions as string

literals where they are used, so that every time you try a match

the regular expression gets recompiled, over and over again.

You're not exactly doing that, but you *are* rebuilding patterns

over and over again. It would make more sense to hold a

forbiddenPatterns collection.

(4) You are saying "(coll reject: [..test..]) notEmpty", building an

entire collection just to ask whether it has any members. What

you want to know is "are there any elements of coll that do not

satisfy the test", which can be expressed as

(coll allSatisfy: [..test..]) not. (Sadly there is no

#notAllSatisfy:, although #notAnySatisfy: *does* exist under the

name #noneSatisfy:.

(5) #match: is a highly risky way to look for a substring. What if

you want to look for a literal # or * ? What if you need to

look for 'Boring' but not 'boring'? ("Boring" is a perfectly

good surname.)

(6) You really don't need the "result" variable.

allowedWords

"Answer those words which do not contain any forbiddenWord

as a substring."

^words select: [:each |

forbiddenWords noneSatisfy: [:forbidden |

<<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods

for String. You will find several candidate methods.

(8) Of course, if there are W words and F forbidden words, in the

worst case you'll do W*F substring inclusion tests. This is not

good. "Multiple pattern matching" is a much-studied problem,

see for example Wu and Manber's paper

http://web.archive.org/web/20130203031412/http://www.inf.fu-berlin.de/inst/ag-bio-expired/FILES/ROOT/Teaching/Lectures/WS0506/WeiterfThemenBioinf/papers/FastAlgorithmMultiPatternSearching-WuManbers-TR94.pdf

There is more recent work also.

Does anyone know whether there's a multiple pattern algorithm for

Pharo? One could always fall back on ternary search trees...
On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof

Roelof

Re: how to rewrite this to use a stream

And found the culprit after some good night sleep.

But any feedback is appreiciated.

Roelof

Op 15-12-2018 om 10:34 schreef Roelof Wobben:

Thanks for the reviews.

I have change my code a little bit.
I now check for 1 string and at the end I check for every word the three filters

So I looks now like this:

checkForbiddenParts: word
    "checks if a word contains some forbidden parts"

    ^ forbiddenWords anySatisfy: [ :forbidden | word includesSubstring: forbidden ]

checkLessThen3Vowels: aString
    "checks if a words contains less then 3 vowels"

    ^ (aString count: #isVowel) < 3

checkOverLapping: aString
    "checks if a string has two overlapping characters"

    | result |
    result := aString asArray overlappingPairsCollect: [ :a :b | a ~= b ].
    ^ result includes: true

isNice: aCollection
    "checks if a word is nice according to the 3 filters"

    | answer |
    answer := aCollection
        reject: [ :word |
            (FindNiceStrings new checkForbiddenParts: word)
                | (FindNiceStrings new checkLessThen3Vowels: word)
                | (FindNiceStrings new checkOverLapping: word) not ].
    ^ answer size

But somewhere must be a bug because the answer the code given in too high.

So I think I for a few hours/days busy to find out where the bug is.

here are the 3 filters described :

A nice string is one with all of the following properties:

It contains at least three vowels (aeiou only), like aei, xazegov, or aeiouaeiouaeiou.

It contains at least one letter that appears twice in a row, like xx, abcdde (dd), or aabbccdd (aa, bb, cc, or dd).

It does not contain the strings ab, cd, pq, or xy, even if they are part of one of the other requirements.

Roelof

Op 15-12-2018 om 09:37 schreef [hidden email]:
Thx for those reviews.

And interesting paper.

Phil
On Sat, Dec 15, 2018, 04:20 Richard O'Keefe <[hidden email] wrote:
Starting from the top:\

(0) It's not *wrong* to start a selector with a capital letter, but

it is certainly unusual.

(1) Intention-revealing names are good. That's why this name is bad.

It does not *delete* anything.

(2) The comment again says "delete" but nothing is deleted. Comments

that mislead are worse than no comments.

(3) An issue I have seen over and over in student code (in other

programming languages) is writing regular expressions as string

literals where they are used, so that every time you try a match

the regular expression gets recompiled, over and over again.

You're not exactly doing that, but you *are* rebuilding patterns

over and over again. It would make more sense to hold a

forbiddenPatterns collection.

(4) You are saying "(coll reject: [..test..]) notEmpty", building an

entire collection just to ask whether it has any members. What

you want to know is "are there any elements of coll that do not

satisfy the test", which can be expressed as

(coll allSatisfy: [..test..]) not. (Sadly there is no

#notAllSatisfy:, although #notAnySatisfy: *does* exist under the

name #noneSatisfy:.

(5) #match: is a highly risky way to look for a substring. What if

you want to look for a literal # or * ? What if you need to

look for 'Boring' but not 'boring'? ("Boring" is a perfectly

good surname.)

(6) You really don't need the "result" variable.

allowedWords

"Answer those words which do not contain any forbiddenWord

as a substring."

^words select: [:each |

forbiddenWords noneSatisfy: [:forbidden |

<<each includes forbidden as a substring>>]]

(7) You can find out how to do the <<part>> by looking at the methods

for String. You will find several candidate methods.

(8) Of course, if there are W words and F forbidden words, in the

worst case you'll do W*F substring inclusion tests. This is not

good. "Multiple pattern matching" is a much-studied problem,

see for example Wu and Manber's paper

http://web.archive.org/web/20130203031412/http://www.inf.fu-berlin.de/inst/ag-bio-expired/FILES/ROOT/Teaching/Lectures/WS0506/WeiterfThemenBioinf/papers/FastAlgorithmMultiPatternSearching-WuManbers-TR94.pdf

There is more recent work also.

Does anyone know whether there's a multiple pattern algorithm for

Pharo? One could always fall back on ternary search trees...
On Fri, 14 Dec 2018 at 05:13, Roelof Wobben <[hidden email]> wrote:
Hello, 

I have this code :  

 FindAndDeleteWordsWithForbiddenParts
    "deletes all the words that contain forbidden parts"

    | result |
    result := words
        select: [ :word | 
            [ (forbiddenWords reject: [ :forbidden | '*' , forbidden , '*' match: word ]) isNotEmpty ]
                on: NotFound
                do: [ false ] ].
    ^ result


but I see warnings that I have to use a stream instead of string concentation.

Anyone hints how to do so ? 

Roelof