--- Regarding the specification of floating point numbers with radix ~= 10, and in particular the question raised by James Foster(Gemstone). Summarizing the comments from James Foster, Nicolas Collier, Prof Stef Ducasse; followed by some arguments with a lot of hand waving, in order to agree with and support what Nicolas said at the outset: this deserves a new syntax.---
Question: How should it work?
The literal 16rFF is 255 (aSmallInteger). Pharo permits lowercase hexadecimal digits, so the literal 16rff is also taken to be 255.
The literal 1.23e3 is 1230.0 (aSmallDouble in GemStone). Pharo also permits floating point numbers to have a radix, so that e.g. both 2r1.111e3 and 2r1111 are taken to be 15.
This makes certain grammars for numbers ambiguous, because upon encountering either an $E or an $e during the parse of a number, we find two possible interpretations:
Is this a hexadecimal digit or an exponent marker?
James identified four possible solutions: 1) Distinguish by letter case
uppercase $E is a hexadecimal digit lowercase $e is a marker signifying an exponent.
2) Allow exponents on base ten numbers only
3) Distinguish by range
radix >= 15 ifTrue: [$e is the hexadecimal digit ] ifFalse: [$e is a marker signifying exponent]
4) Develop a new syntax for floats that does not use either the letter $e or the letter $E to mark an exponent
Many implementations of Smalltalk use solution #1, whereas Pharo currently uses solution #3.
The result is that any number expressed in other dialects will port to Pharo without issue, while certain expressions for numbers that are recognized in Pharo become ambiguous when ported to other dialects.
The most practical “fix” is for Pharo to adopt the more ‘popular’ solution.
<tl;dr><outcry> Even so, I submit that this is not the right thing to do. But only because it is a hack, on top of a hack, on top of a design error that went unnoticed for far too long, and really ought to be corrected, for a number of reasons, viz.:
1) The root cause of the problem is an ambiguous grammar.
2) This ambiguity is unique to Smalltalk. It does not occur in any other language, as far as I know.
3) The source of the ambiguity is the design decision which introduced a consistent syntax for expressing numbers in different bases by directly specifying the desired radix, instead of choosing from the very limited sets of special cases provided by other languages..
One is usually limited to binary, octal, decimal, and hexadecimal, with a unique syntactic form required for expression in each base. We get B’01’ for binary, \001 for octal, #01 or %01 for hex, and the unadorned 1, left for (the most privileged) decimal form.
Introducing a consistent form for specifying alternate bases was itself a great design decision.
At the same time, however, a change was introduced which impacted some very long standing properties of numeric representations, and the effect of that change was perhaps not fully considered.
As we go about the task of correcting such a latent error, we should take enough time to more fully consider the particulars that brought us here.
4) Smalltalk is a spectacularly consistent design and a spectacularly consistent language to work in. Increasing the consistency of such a language is arguably the right thing to do at every opportunity. This is not always the most popular thing to do—but it is usually the most honorable. And the most useful, in the long term. Practically speaking.
5) Of the four solutions (or cases), we can arguably eliminate three.
1) Distinguish by letter case uppercase $E is a hexadecimal digit lowercase $e is a marker signifying an exponent. 2) Allow exponents on base ten numbers only 3) Distinguish by range radix >= 15 ifTrue: [$e is the hexadecimal digit ] ifFalse: [$e is a marker signifying exponent] 4) Develop a new syntax for floats that does not use either the letter $e or the letter $E to mark an exponent In other domains and languages, numerical values are specified with digits only. In such contexts, using a letter as a syntactic marker is reasonable.
Once we adopt the specifiable-radix form (radix)r(rigits) in which numerical values are expressed using digits AND letters, it becomes far less reasonable to use letters as a marker.
Case (1) :
Differentiation based on the case of letters is fine where the use of letters is pervasive and capitalization is itself generically meaningful, e.g. certain shorthand notations used in regular expressions (%h signifying a match of any lowercase hexadecimal, with %H signifying a match of any uppercase hexadecimal).
Whereas using capitalization to distinguish ‘a value’ from ‘a syntactic marke' is a very poor use of character classes, of pixels, and of synaptic gaps, because the association is made without mnemonic support of any kind. Such ‘rules’ require rote memorization, i.e. perfect match of an arbitrary fact. The hidden assumption--that any ‘skill’ involved is both ubiquitous and evenly distributed— is, alas, unfounded.
Disallowing exponents for all bases other than 10 is a) inconsistent b) contrary to the point of consistently specifying the desired radix. c) lazy
Differentiating meaning based on a particular range of values makes for a great explanation of the ‘discovered’ effect, but is somewhat frightening to consider using *on purpose*. If we were to adopt anything of this ilk, a better crossover of ranges would be
radix <= 10 Values are confined to the set of digits (ascii 16r30-16r39) and $E are exponent markers. See also $s, $d, and $q.
radix >= 11 Letters are required for use as extended values as determined by the radix. We cannot imagine using non-letter characters for this case. Therefore, no exponents for bases above 10.
This leaves us with solution (4), create a new syntax for marking the exponent. Because the other solutions are hacks. Practical—sure. But hacks, nonetheless. Abominable.
7) As Nicolas pointed out, this issue deserves a new syntax.
The moment we adopted the specifiable-radix solution, we needed to also abandon the use of the letter $e for marking exponents.
Now is our chance to make it right.
Thanks Jim we will have to take the time to check.
Now it is also important to understand the usage because adding a new syntax for 0.01% of the cases is not super
We will carefully read your analysis and come back to it.
In reply to this post by jas
You’ve done an excellent job of summarizing the issues and providing strong arguments.
For my part, while consistency is important, simplicity is also important. To that end, a new syntax would need to bring a lot of value to justify the additional cognitive load. So unless you can get `^` to act as the exponent prefix, I’d be skeptical of a new syntax. For me, the idea that an Integer (and not a Float) can have a radix is sufficient and consistent.
Very nice argumentation Jim :)
Is there any urge to fix that?
Given that hardly any source will use float with alternate radix (but maybe some Sunit TestCase),
isn't it a more a theoretical than practical problem?
Le mar. 10 sept. 2019 à 16:03, James Foster <[hidden email]> a écrit :
|Free forum by Nabble||Edit this page|