Smalltalk › Pharo › Pharo Smalltalk Users

Can it do this way ?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

24 messages Options

Pharo Smalltalk Users mailing list

Sep 08, 2020; 6:30am

Re: Can it do this way ?

914 posts

Op 8-9-2020 om 04:22 schreef Richard O'Keefe:

There are two quite different questions.

(1) Where may dashes occur in a real ISBN-10?

(2) What does Exercism require in the specification and check in the

test cases?

For (1) the rules are

Each ISBN consists of 5 elements with each section being separated by spaces or hyphens. Three of the five elements may be of varying length:

Prefix element – currently this can only be either 978 or 979. It is always 3 digits in length

Registration group element – this identifies the particular country, geographical region, or language area participating in the ISBN system. This element may be between 1 and 5 digits in length

Registrant element - this identifies the particular publisher or imprint. This may be up to 7 digits in length

Publication element – this identifies the particular edition and format of a specific title. This may be up to 6 digits in length

Check digit – this is always the final single digit that mathematically validates the rest of the number. It is calculated using a Modulus 10 system with alternate weights of 1 and 3.

An ISBN-10 does not have the three-digit prefix. So we have

[0-9]{1,5}   -- prefix

[0-9]{1,7}   -- registrant

[0-9]{1,6}   -- publication

[0-9X]

      -- check digit

As an examplw, "Standard C++ IOStreams and Locales" by Langer & Kreft has

ISBN-10 0-201-18395-1

ISBN-13 9780201183955

so I shall assume the separators are optional.

/^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/

Of course the elements cannot all have their maximum length at the same

time. In AWK I would write

   x = a_putative_ISBN_10

   y = x

   gsub(/[- ]+/, "", y)

   if (x ~ /^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/ \

    && y ~ /^[0-9]{9,9}[0-9X]$/ \

   ) {

       x *might* be valid, we still need to check the checksum

   }

For (2), there appear to be no restrictions on where dashes may occur

or how many: "These may be communicated with or without hyphens".

Exercism doesn't allow spaces.

Regular expressions are elegant in their own way, BUT for this problem

they are (a) excessive, (b) inefficient, and (c) insufficient.

   digit count := 0.

   check sum := 0.

   for each character c of the string

       if c is not a hyphen then

           if c is a digit then

               digit value := c's value as a digit

           else if c is X and digit count = 9 then

               digit value := 10

           else

               return false.

           digit count := digit count + 1.

           if digit count > 10 then return false.

           check sum := (11 - digit count) * digit value + check sum.

    return check sum mod 11 is zero.

Part of the insight here is "don't DO it, just PRETEND you did."

That is, instead of copying the string without the hyphens,

just ignore the hyphens as they turn up.

Another part is "if you are only going to use it once, don't store it."

That is, we need a digit's value just once, in the update to check sum,

so we should compute it just before we need it, not store it.

Now the pseudo-code above is classic sequential imperative coding.

Classic functional coding does something like

   let no_dashes = filter (/= '-') (explode string) in

   length no_dashes = 10 and

   let check = last no_dashes in

   (is_digit check or check = 'X') and

   all is_digit (take 9 no_dashes) and

   let xval c = if x = 'X' then 10 else digit_value c in

   dot (map xval no_dashes) [10,9..1]) mod 11 = 0

This pseudo-code translates nicely to Smalltalk too.

You might want to add

SequenceableCollection>>

with: other inject: initial into: aBlock

    |r|

    r := initial.

    self with: other do: [:x :y |

      r := aBlock value: r value: x value: y].

    ^r

dot: other

    ^self with: other inject: 0 into: [:acc :x :y | x*y + acc]

(These methods are so obvious that it would be absurd to claim

any intellectual property rights to them.)

I also have "fusion" methods like

SequenceableCollection>>

from: start to: finish allSatisfy: testBlock

self from: start to: finish do: [:each |

    (aBlock value: each) ifFalse: [^false]].

^true

so that ((seq copyFrom: a to: z) allSatisfy: blk)

can be done as (seq from: a to: z allSatisfy: blk)

without making a copy.

Fusion methods are useful because Smalltall compilers

don't work as hard at eliminating intermediate data

structures as functional language compilers. (Having

other priorities.)

   (isdigit (last no_dashes

           return false if digit count > 10.

... [show rest of quote]

Thanks for letting me see this.
But still I wonder if this is really the OOP way and if that function does more then 1 thing.
It looks to me that its iterating trrough the string. Calculating the crc and checking it.
I learned that it is a good thing that a function and a class does only 1 thing.

Roelof

Steffen Märcker

Sep 08, 2020; 8:31am

Re: Can it do this way ?

381 posts

Hi Richard and Roelof,

thanks for your comprehensive answer. I brought up Regex only to point out alternative solutions. Another one is the following using transducers, where Tee works like the tee command from the command line.

IsbnValidator>>isValidIsbn: aString

| length countChars separators getSeparators lastChar getLastChar filterDigits computeCheckSum checkSum |

"Count number of characters"

length := 0.

countChars := [:count :char | length := length + 1] init: length.

"Get non-digit characters"

separators := Set new

getSeparators := separators <~ #isDigit remove.

"Get last character"

lastChar := nil.

getLastChar := [:prev :char | lastChar := char ] init: lastChar.

"Get digits"

filterDigits := #isDigit filter.

"Calculate check sum"

computeCheckSum := ([:sum :index :digit | sum + index * digit value] init: 0) completing: #\\ .

"Compute"

checkSum := aString

transduce: (Tee to: countChars) * (Tee to: getSeparators) * (Tee to: getLastChar) * filterDigits

reduce: computeCheckSum

init: 0.

"Check validity"

^((length = 10 or: [length = 13])

and: [separators = Set with: $-])

and: [checkSum = (lastChar = $X ifTrue: [10] ifFalse: [lastChar value])]

Kind regards,

Steffen

Am .09.2020, 08:30 Uhr, schrieb Roelof Wobben via Pharo-users <[hidden email]>:

Op 8-9-2020 om 04:22 schreef Richard O'Keefe:

There are two quite different questions.

(1) Where may dashes occur in a real ISBN-10?

(2) What does Exercism require in the specification and check in the

test cases?

For (1) the rules are

Each ISBN consists of 5 elements with each section being separated by spaces or hyphens. Three of the five elements may be of varying length:

Prefix element – currently this can only be either 978 or 979. It is always 3 digits in length

Registration group element – this identifies the particular country, geographical region, or language area participating in the ISBN system. This element may be between 1 and 5 digits in length

Registrant element - this identifies the particular publisher or imprint. This may be up to 7 digits in length

Publication element – this identifies the particular edition and format of a specific title. This may be up to 6 digits in length

Check digit – this is always the final single digit that mathematically validates the rest of the number. It is calculated using a Modulus 10 system with alternate weights of 1 and 3.

An ISBN-10 does not have the three-digit prefix. So we have

[0-9]{1,5}   -- prefix

[0-9]{1,7}   -- registrant

[0-9]{1,6}   -- publication

[0-9X]

      -- check digit

As an examplw, "Standard C++ IOStreams and Locales" by Langer & Kreft has

ISBN-10 0-201-18395-1

ISBN-13 9780201183955

so I shall assume the separators are optional.

/^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/

Of course the elements cannot all have their maximum length at the same

time. In AWK I would write

   x = a_putative_ISBN_10

   y = x

   gsub(/[- ]+/, "", y)

   if (x ~ /^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/ \

    && y ~ /^[0-9]{9,9}[0-9X]$/ \

   ) {

       x *might* be valid, we still need to check the checksum

   }

For (2), there appear to be no restrictions on where dashes may occur

or how many: "These may be communicated with or without hyphens".

Exercism doesn't allow spaces.

Regular expressions are elegant in their own way, BUT for this problem

they are (a) excessive, (b) inefficient, and (c) insufficient.

   digit count := 0.

   check sum := 0.

   for each character c of the string

       if c is not a hyphen then

           if c is a digit then

               digit value := c's value as a digit

           else if c is X and digit count = 9 then

               digit value := 10

           else

               return false.

           digit count := digit count + 1.

           if digit count > 10 then return false.

           check sum := (11 - digit count) * digit value + check sum.

    return check sum mod 11 is zero.

Part of the insight here is "don't DO it, just PRETEND you did."

That is, instead of copying the string without the hyphens,

just ignore the hyphens as they turn up.

Another part is "if you are only going to use it once, don't store it."

That is, we need a digit's value just once, in the update to check sum,

so we should compute it just before we need it, not store it.

Now the pseudo-code above is classic sequential imperative coding.

Classic functional coding does something like

   let no_dashes = filter (/= '-') (explode string) in

   length no_dashes = 10 and

   let check = last no_dashes in

   (is_digit check or check = 'X') and

   all is_digit (take 9 no_dashes) and

   let xval c = if x = 'X' then 10 else digit_value c in

   dot (map xval no_dashes) [10,9..1]) mod 11 = 0

This pseudo-code translates nicely to Smalltalk too.

You might want to add

SequenceableCollection>>

with: other inject: initial into: aBlock

    |r|

    r := initial.

    self with: other do: [:x :y |

      r := aBlock value: r value: x value: y].

    ^r

dot: other

    ^self with: other inject: 0 into: [:acc :x :y | x*y + acc]

(These methods are so obvious that it would be absurd to claim

any intellectual property rights to them.)

I also have "fusion" methods like

SequenceableCollection>>

from: start to: finish allSatisfy: testBlock

self from: start to: finish do: [:each |

    (aBlock value: each) ifFalse: [^false]].

^true

so that ((seq copyFrom: a to: z) allSatisfy: blk)

can be done as (seq from: a to: z allSatisfy: blk)

without making a copy.

Fusion methods are useful because Smalltall compilers

don't work as hard at eliminating intermediate data

structures as functional language compilers. (Having

other priorities.)

   (isdigit (last no_dashes

           return false if digit count > 10.



... [show rest of quote]

Thanks for letting me see this.
But still I wonder if this is really the OOP way and if that function does more then 1 thing.
It looks to me that its iterating trrough the string. Calculating the crc and checking it.
I learned that it is a good thing that a function and a class does only 1 thing.

Roelof

... [show rest of quote]

Pharo Smalltalk Users mailing list

Sep 08, 2020; 11:34am

Re: Can it do this way ?

914 posts

In reply to this post by Pharo Smalltalk Users mailing list

Op 8-9-2020 om 08:30 schreef Roelof Wobben:

Op 8-9-2020 om 04:22 schreef Richard O'Keefe:

There are two quite different questions.

(1) Where may dashes occur in a real ISBN-10?

(2) What does Exercism require in the specification and check in the

test cases?

For (1) the rules are

Each ISBN consists of 5 elements with each section being separated by spaces or hyphens. Three of the five elements may be of varying length:

Prefix element – currently this can only be either 978 or 979. It is always 3 digits in length

Registration group element – this identifies the particular country, geographical region, or language area participating in the ISBN system. This element may be between 1 and 5 digits in length

Registrant element - this identifies the particular publisher or imprint. This may be up to 7 digits in length

Publication element – this identifies the particular edition and format of a specific title. This may be up to 6 digits in length

Check digit – this is always the final single digit that mathematically validates the rest of the number. It is calculated using a Modulus 10 system with alternate weights of 1 and 3.

An ISBN-10 does not have the three-digit prefix. So we have

[0-9]{1,5}   -- prefix

[0-9]{1,7}   -- registrant

[0-9]{1,6}   -- publication

[0-9X]

      -- check digit

As an examplw, "Standard C++ IOStreams and Locales" by Langer & Kreft has

ISBN-10 0-201-18395-1

ISBN-13 9780201183955

so I shall assume the separators are optional.

/^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/

Of course the elements cannot all have their maximum length at the same

time. In AWK I would write

   x = a_putative_ISBN_10

   y = x

   gsub(/[- ]+/, "", y)

   if (x ~ /^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/ \

    && y ~ /^[0-9]{9,9}[0-9X]$/ \

   ) {

       x *might* be valid, we still need to check the checksum

   }

For (2), there appear to be no restrictions on where dashes may occur

or how many: "These may be communicated with or without hyphens".

Exercism doesn't allow spaces.

Regular expressions are elegant in their own way, BUT for this problem

they are (a) excessive, (b) inefficient, and (c) insufficient.

   digit count := 0.

   check sum := 0.

   for each character c of the string

       if c is not a hyphen then

           if c is a digit then

               digit value := c's value as a digit

           else if c is X and digit count = 9 then

               digit value := 10

           else

               return false.

           digit count := digit count + 1.

           if digit count > 10 then return false.

           check sum := (11 - digit count) * digit value + check sum.

    return check sum mod 11 is zero.

Part of the insight here is "don't DO it, just PRETEND you did."

That is, instead of copying the string without the hyphens,

just ignore the hyphens as they turn up.

Another part is "if you are only going to use it once, don't store it."

That is, we need a digit's value just once, in the update to check sum,

so we should compute it just before we need it, not store it.

Now the pseudo-code above is classic sequential imperative coding.

Classic functional coding does something like

   let no_dashes = filter (/= '-') (explode string) in

   length no_dashes = 10 and

   let check = last no_dashes in

   (is_digit check or check = 'X') and

   all is_digit (take 9 no_dashes) and

   let xval c = if x = 'X' then 10 else digit_value c in

   dot (map xval no_dashes) [10,9..1]) mod 11 = 0

This pseudo-code translates nicely to Smalltalk too.

You might want to add

SequenceableCollection>>

with: other inject: initial into: aBlock

    |r|

    r := initial.

    self with: other do: [:x :y |

      r := aBlock value: r value: x value: y].

    ^r

dot: other

    ^self with: other inject: 0 into: [:acc :x :y | x*y + acc]

(These methods are so obvious that it would be absurd to claim

any intellectual property rights to them.)

I also have "fusion" methods like

SequenceableCollection>>

from: start to: finish allSatisfy: testBlock

self from: start to: finish do: [:each |

    (aBlock value: each) ifFalse: [^false]].

^true

so that ((seq copyFrom: a to: z) allSatisfy: blk)

can be done as (seq from: a to: z allSatisfy: blk)

without making a copy.

Fusion methods are useful because Smalltall compilers

don't work as hard at eliminating intermediate data

structures as functional language compilers. (Having

other priorities.)

   (isdigit (last no_dashes

           return false if digit count > 10.



... [show rest of quote]

Thanks for letting me see this.
But still I wonder if this is really the OOP way and if that function does more then 1 thing.
It looks to me that its iterating trrough the string. Calculating the crc and checking it.
I learned that it is a good thing that a function and a class does only 1 thing.

Roelof

... [show rest of quote]

I learned on other oop languages that for mainibility it's needed that a class does one thing and a method does also one thing.
So when software changes , you can easily make the changes. Its I think called SOLID and I like that idea.
So I try to make that also work in Pharo but it seems not so important. It's look like me that getting it work is more important.

Roelof

Richard Sargent

Sep 08, 2020; 2:21pm

Re: Can it do this way ?

Administrator

946 posts

On Tue, Sep 8, 2020, 04:35 Roelof Wobben via Pharo-users <[hidden email]> wrote:

Op 8-9-2020 om 08:30 schreef Roelof Wobben:

Op 8-9-2020 om 04:22 schreef Richard O'Keefe:

There are two quite different questions.

(1) Where may dashes occur in a real ISBN-10?

(2) What does Exercism require in the specification and check in the

test cases?

For (1) the rules are

Each ISBN consists of 5 elements with each section being separated by spaces or hyphens. Three of the five elements may be of varying length:

Prefix element – currently this can only be either 978 or 979. It is always 3 digits in length

Registration group element – this identifies the particular country, geographical region, or language area participating in the ISBN system. This element may be between 1 and 5 digits in length

Registrant element - this identifies the particular publisher or imprint. This may be up to 7 digits in length

Publication element – this identifies the particular edition and format of a specific title. This may be up to 6 digits in length

Check digit – this is always the final single digit that mathematically validates the rest of the number. It is calculated using a Modulus 10 system with alternate weights of 1 and 3.

An ISBN-10 does not have the three-digit prefix. So we have

[0-9]{1,5}   -- prefix

[0-9]{1,7}   -- registrant

[0-9]{1,6}   -- publication

[0-9X]

      -- check digit

As an examplw, "Standard C++ IOStreams and Locales" by Langer & Kreft has

ISBN-10 0-201-18395-1

ISBN-13 9780201183955

so I shall assume the separators are optional.

/^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/

Of course the elements cannot all have their maximum length at the same

time. In AWK I would write

   x = a_putative_ISBN_10

   y = x

   gsub(/[- ]+/, "", y)

   if (x ~ /^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/ \

    && y ~ /^[0-9]{9,9}[0-9X]$/ \

   ) {

       x *might* be valid, we still need to check the checksum

   }

For (2), there appear to be no restrictions on where dashes may occur

or how many: "These may be communicated with or without hyphens".

Exercism doesn't allow spaces.

Regular expressions are elegant in their own way, BUT for this problem

they are (a) excessive, (b) inefficient, and (c) insufficient.

   digit count := 0.

   check sum := 0.

   for each character c of the string

       if c is not a hyphen then

           if c is a digit then

               digit value := c's value as a digit

           else if c is X and digit count = 9 then

               digit value := 10

           else

               return false.

           digit count := digit count + 1.

           if digit count > 10 then return false.

           check sum := (11 - digit count) * digit value + check sum.

    return check sum mod 11 is zero.

Part of the insight here is "don't DO it, just PRETEND you did."

That is, instead of copying the string without the hyphens,

just ignore the hyphens as they turn up.

Another part is "if you are only going to use it once, don't store it."

That is, we need a digit's value just once, in the update to check sum,

so we should compute it just before we need it, not store it.

Now the pseudo-code above is classic sequential imperative coding.

Classic functional coding does something like

   let no_dashes = filter (/= '-') (explode string) in

   length no_dashes = 10 and

   let check = last no_dashes in

   (is_digit check or check = 'X') and

   all is_digit (take 9 no_dashes) and

   let xval c = if x = 'X' then 10 else digit_value c in

   dot (map xval no_dashes) [10,9..1]) mod 11 = 0

This pseudo-code translates nicely to Smalltalk too.

You might want to add

SequenceableCollection>>

with: other inject: initial into: aBlock

    |r|

    r := initial.

    self with: other do: [:x :y |

      r := aBlock value: r value: x value: y].

    ^r

dot: other

    ^self with: other inject: 0 into: [:acc :x :y | x*y + acc]

(These methods are so obvious that it would be absurd to claim

any intellectual property rights to them.)

I also have "fusion" methods like

SequenceableCollection>>

from: start to: finish allSatisfy: testBlock

self from: start to: finish do: [:each |

    (aBlock value: each) ifFalse: [^false]].

^true

so that ((seq copyFrom: a to: z) allSatisfy: blk)

can be done as (seq from: a to: z allSatisfy: blk)

without making a copy.

Fusion methods are useful because Smalltall compilers

don't work as hard at eliminating intermediate data

structures as functional language compilers. (Having

other priorities.)

   (isdigit (last no_dashes

           return false if digit count > 10.



... [show rest of quote]

Thanks for letting me see this.
But still I wonder if this is really the OOP way and if that function does more then 1 thing.
It looks to me that its iterating trrough the string. Calculating the crc and checking it.
I learned that it is a good thing that a function and a class does only 1 thing.

Roelof

... [show rest of quote]

I learned on other oop languages that for mainibility it's needed that a class does one thing and a method does also one thing.
So when software changes , you can easily make the changes. Its I think called SOLID and I like that idea.
So I try to make that also work in Pharo but it seems not so important. It's look like me that getting it work is more important.

... [show rest of quote]

I don't think that is the case here. Identify the objects involved in an ISBN string. I see String and Character. Once parsed, you could have an ISBN itself, with the elements cited by Richard O'keefe. But the exercise isn't about that, but only about the validation.

There is very little behaviour that one could add to String and Character that would help in this exercise and that would be appropriate to their respective roles.

Arguably, you could solve this exercise by creating a parser. I think that would be a lot like using a 10 pound sledge hammer to install a thumbtack.

Roelof