Blair and/or Andy,
I just spent quite a chunk o' time tracking down a problem in Pocket Smalltalk that was rather hard to find. I won't go into the details of the problem, but it was the result of a change in the #isHexDigit method in class Character. In Dolphin 2.1 upper as well as lower case letters A..F were considered valid hex. 3.0 and forward only accept upper case. I was just wondering what was the reason behind this change. Is it something to do with ANSI ST, or something else? The 2.1 version defined it thusly: isHexDigit ^CRTLibrary default iswxdigit: self while 3.0 and above have this: isHexDigit "Answer whether the receiver is a valid Smalltalk hexadecimal digit (i.e. digits and the uppercase characters A through F)." ^self isDigit or: [self codePoint >= ##($A codePoint) and: [self codePoint <= ##($F codePoint)]] Just curious. Joey -- -- Sun Certified Java2 Programmer -- Political Rants: www.joeygibson.com -- My Pocket Smalltalk Stuff: www.joeygibson.com/st -- -- "We thought about killin' him, but we kinda -- hated to go that far...." -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 80,000 Newsgroups - 16 Different Servers! =----- |
Joey,
I'm not sure if Blair is around this week so here's a reply he posted on the very subect, in reference to a problem with pocketSmalltalk raised by a slightly disgruntled Steve Harris. ~-~-~-~- From Blair 24/1/2000 Sorry if this caused a problem for you. The reason for the change (in case it makes you feel any better) is that: a) #isHexDigit is supposed to report only those characters which are valid hex digits in Smalltalk syntax (it is intended for use by the scanner), and the lowercase letters a..f are not valid as hex digits in Smalltalk. b) The CRT library call also counts various accented characters and odd digit symbols as hex digits - try evaluating: Character allInstances select: [:c | c isHexDigit] in 2.1, and you will see what I mean! |
In reply to this post by Joey Gibson-2
Joey,
I'm not sure if Blair is around this week so here's a reply he posted on the very subect, in reference to a problem with pocketSmalltalk raised by a slightly disgruntled Steve Harris. ~-~-~-~- >From Blair 24/1/2000 Sorry if this caused a problem for you. The reason for the change (in case it makes you feel any better) is that: a) #isHexDigit is supposed to report only those characters which are valid hex digits in Smalltalk syntax (it is intended for use by the scanner), and the lowercase letters a..f are not valid as hex digits in Smalltalk. b) The CRT library call also counts various accented characters and odd digit symbols as hex digits - try evaluating: Character allInstances select: [:c | c isHexDigit] in 2.1, and you will see what I mean! |
In reply to this post by Ian Bartholomew
"Ian Bartholomew" <[hidden email]> wrote in message
news:92c5os$6gb48$[hidden email]... > Joey, > > I'm not sure if Blair is around this week so here's a reply he posted on the > very subect, in reference to a problem with pocketSmalltalk raised by a > slightly disgruntled Steve Harris. > > ~-~-~-~- > From Blair 24/1/2000 > > Sorry if this caused a problem for you. The reason for the change (in case > it makes you feel any better) is that: > > a) #isHexDigit is supposed to report only those characters which are valid > hex digits in Smalltalk syntax (it is intended for use by the scanner), > the lowercase letters a..f are not valid as hex digits in Smalltalk. Hmm... I learn something new all the time ;-( Obviously, that is news to me. We've always allowed them in the "0x" and "0X" numeric form. In QKS Smalltalk v1-v1.X there were restrictions against their use in "<base>r" radix prefixed numeric forms, as of v2.0-v3 those restrictions were lifted. I don't know what the "official" rationale Dolphin (Blair) is referring to, but I can explain some technical issues that may have led to some Smalltalk dialects concluding that lowercase letters are not valid for hex digits. If the "official" reference is the ANSI standard, then I would take it with a grain of salt. If you consider numeric encoding forms there are some potentially ambiguous cases, and some outright conflicting problems that can occur without some restrictions or special case rules regarding lowercase character usage in radix\based numeric forms. Here a some of the supported numeric forms (from QKS Smalltalk) 0x... - base 16 [0-9,A-Z,a-z] 0X... - base 16 [0-9,A-Z,a-z] 0b... - base 2 [0-1] 0B... - base 2 [0-1] <nn>s... - ScaledDecimal [0-9] <base>r... - [0-(<base>-1)] max possible [0-9,A-Z] <nn>e... - one of a number of float forms <nn>f... - one of a number of float forms <nn>g... - one of a number of float forms <NN>j - "j" indicates imaginary part of a complex number <NN>i - "i" indicates imaginary part of a complex number The "r" character is a radix delimiter. The "e,f,g" characters are delimiters in <Floats>. The "s" character is a <ScaledDecimal> tag/marker and delimiter The "i,j" character is (a message) recognized as a tag/marker in a <Complex> The <nn> form includes the optional "." decimal and any subsequent digits. The <NN> form means any valid numeric form. QKS Smalltalk v1-v1.X used to allow floats or any number to be expressed using the radix prefix form. To support this generalization requires disallowing lowercase characters for all numbers with a "<base>r" prefix. In QKS v2 or possibly as late as QKS v3 (1996), I can't remember for sure, the tokenizer was modified to allow upper and lowercase digits and support for a "." decimal. That meant disallowing recognition of "e,f,g,s,..." in radix prefixed numeric forms. This change was made because the "e,f,g,s,..." forms were (practically useless) never used with radix notation, wherease lowercase digits were often desireable. The ability to declare "." decimal <Float> forms in a radix notation was fully removed in QKS v4/SmallScript -- because the use of radix based <Float> forms is not useful and its presence represents a "lingering" partial support of the original generalization. I.e., one could write a number like: 16r7E1 <- Notice some possible problems if lowercase was allowed? 16r7e1 Is this 7.0e1 meaning a <Float>? or is it 0x7E1 meaning a <SmallInteger>? 35r2s3 Is this 2.0s3 meaning a <ScaledDecimal>? or is it 3433 a <SmallInteger>? If you allow the "<base>r" prefix to be applied to any numeric form, then any subsequent digits need to be restricted to uppercase letters. If the "<base>r" prefix is restricted to <Integer> forms then that restriction is not required. I'm guessing that Dolphin Smalltalk doesn't support 0x, 0X forms, and does allow the "<base>r" prefix to be applied to any numeric form. Personally, being able to use upper and lowercase hex digits is very convenient. Especially when working with documentation or source from other languages, or needing to code in multiple languages at the same time. I should mention that QKS Smalltalk also supports "prefix" operator messages. This was done to both enhance and address some other issues in Smalltalk regarding numerics/precedence and sign/processing. "-" - unary prefix message mapped to "negate" "+" - unary prefix message mapped to "yourself" "~" - unary prefix message mapped to "complement" "!" - unary prefix message mapped to "not" QKS Smalltalk compilers have always performed constant folding, and as part of doing so they recognized certain messages when applied to literals. So "-(1+3)" would actually generate opcodes for <-4>. "!(1+3)" would generate opcodes for <false>. "~(0xF | SOME_LITERAL_CONST)" where SOME_LITERAL_CONST == 0x80 would generate opcodes for <0xFFFFFF70>. > b) The CRT library call also counts various accented characters and odd > digit symbols as hex digits - try evaluating: > Character allInstances select: [:c | c isHexDigit] > in 2.1, and you will see what I mean! Ahh. I think understand why... It is likely that Dolphin 2.1 was using the Win32 code-point mapping function for "POSIX (LC_TYPE) 1 character-typing". And then applying the tag mask "C1_XDIGIT". Which is really just a Microsoft specific version of Unicode/CodePage code-point mapping facilities. If you're trying to be portable then you don't want to rely on them -- which may explain some changes in Dolphin 4?; my solution was to build my own equivalent routines for v3 of QKS' AOS Platform to enable portability. The QKS Smalltalk compilers have always been both encoding and font aware. I.e., you could compile styled string source and the compiler not only understood and preserved the encoding it also understood and preserved the font and face/style run information. So the compiler needed rich character processing facilities to support unicode symbols as binary selector characters, etc. In v4 (SmalLScript), the font and face/style run processing mechanism for source code was changed. It no longer pays any attention to style information contained in the <StyleRuns> of <Text/StyledString> source when compiling. Rather, it now treats source input as encoded character streams where it recognizes XML and HTML sequences in comments and strings -- which actually allows a richer and more portable set of extensible text/style annotation constructs. -- Dave Simmons [www.qks.com / www.smallscript.com] "Effectively solving a problem begins with how you express it." |
In reply to this post by Ian Bartholomew
hehehe,
That is putting it mildly ;-) steve In article <92c5os$6gb48$[hidden email]>, "Ian Bartholomew" <[hidden email]> wrote: > Joey, > > I'm not sure if Blair is around this week so here's a reply he posted on the > very subect, in reference to a problem with pocketSmalltalk raised by a > slightly disgruntled Steve Harris. > > ~-~-~-~- > From Blair 24/1/2000 > > Sorry if this caused a problem for you. The reason for the change (in case > it makes you feel any better) is that: > > a) #isHexDigit is supposed to report only those characters which are valid > hex digits in Smalltalk syntax (it is intended for use by the scanner), and > the lowercase letters a..f are not valid as hex digits in Smalltalk. > b) The CRT library call also counts various accented characters and odd > digit symbols as hex digits - try evaluating: > Character allInstances select: [:c | c isHexDigit] > in 2.1, and you will see what I mean! > > Sent via Deja.com http://www.deja.com/ |
In reply to this post by David Simmons
Dave
You wrote in message news:i8m26.45360$[hidden email]... > > a) #isHexDigit is supposed to report only those characters which are valid > > hex digits in Smalltalk syntax (it is intended for use by the scanner), > and > > the lowercase letters a..f are not valid as hex digits in Smalltalk. > > Hmm... > > I learn something new all the time ;-( > > Obviously, that is news to me. > ... > I don't know what the "official" rationale Dolphin (Blair) is referring > but I can explain some technical issues that may have led to some Smalltalk > dialects concluding that lowercase letters are not valid for hex digits. If > the "official" reference is the ANSI standard, then I would take it with a > grain of salt. One could certainly extend the syntax to accept lower-case alphabetic hex digits if one wishes (and I think we probably did originally), but it isn't standard Smalltalk, at least by any of the known standards we refer to: 1) ANSI NCITS 319-1998, Section 3.5.6, p27. integer ::= decimalInteger | radixInteger. decimalInteger ::= digits digits := digit+ radixInteger := radixSpecifier 'r' radixDigits radixSpecifier := digits radixDigits := (digit | uppercaseAlphabetic)+ (the radix is restricted to the range 2..36 inclusive). 2) The IBM Common Base red book also restricts radix digits to decimals and uppercase letters. 3) I seem to remember the Blue Book being the same, but I no longer have a copy. So that's the "official" rationale. >... > I'm guessing that Dolphin Smalltalk doesn't support 0x, 0X forms, Yup. >...and does > allow the "<base>r" prefix to be applied to any numeric form. Nope, that's not standard either: The ANSI standard reader will note only integers can have a radix prefix. I'd sometimes prefer it if lower-case hex digits were accepted myself, but it is one of those restrictions I don't find bothersome enough to mutiny over - it's "just how it is". Portability at the code-transport level is more important to me. If we all agree that lower-case should be acceptable for hex digits (I note that VW does now accept such too), then I'm happy to go along with it. With regards to the other syntax enhancements you mention in your post, I have a suspicion we may have a different attitude to language extensions Dave :-) Regards Blair |
Free forum by Nabble | Edit this page |