Unicode method returns "?"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
jrm
Reply | Threaded
Open this post in threaded view
|

Unicode method returns "?"

jrm
The  problem I am trying to debug occurs in the method #unescapeUnicode while parsing data from a JSON stream at a web address using the JSON package tonyg.39. I have decomposed the problem to a repeatable test case:

Unicode value: (Integer readFrom: '2019' readStream base: 16) 
>> I expect to see a right single quote mark, but a question mark is returned. I have to tried to debug this statement, but I don't understand the code. I also get question marks when I use other decimal numbers: 2000 to: 2030 do:[ :each | Transcript show: (Unicode charFromUnicode: each)]


My work around is to scan results for question marks and replace them.

I get the same results in 64 and 32 bit images.

Squeak5.1
latest update: #16549
Current Change Set: WorkSpace
Image format 68021 (64 bit)

Squeak5.0
latest update: #15113
Current Change Set: BBCR_Dev
Image format 6521 (32 bit)

-jrm


Reply | Threaded
Open this post in threaded view
|

Re: Unicode method returns "?"

Bert Freudenberg

On Sat 25. Nov 2017 at 01:38, John-Reed Maffeo <[hidden email]> wrote:
The  problem I am trying to debug occurs in the method #unescapeUnicode while parsing data from a JSON stream at a web address using the JSON package tonyg.39. I have decomposed the problem to a repeatable test case:

Unicode value: (Integer readFrom: '2019' readStream base: 16) 
>> I expect to see a right single quote mark, but a question mark is returned. I have to tried to debug this statement, but I don't understand the code. I also get question marks when I use other decimal numbers: 2000 to: 2030 do:[ :each | Transcript show: (Unicode charFromUnicode: each)]

Are you sure the font you are using to display those characters does have glyphs for them? Otherwise they will be displayed as question marks.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Unicode method returns "?"

timrowledge
In reply to this post by jrm
In a trunk image I get a char value 8217 - which is 16r2019, so correct - but it renders as a ?. So what we’re almost certainly seeing is a failure of the rendering process to find a font glyph with the relevant characters included. Which, given the large number of chars needed doesn’t surprise me a lot. The rendering does its little dance, finds that there is no glyph in the StrikeFont, delegates to the backup font (a Fixed FaceFont in the typical case) which has a little think and uses #displayErrorOn:length:at:kern:baselineY: to display the ‘substitutionCharacter’ which is ascii 63, or ‘?’.

The only way you’re likely to get the glyph you want would be to use Pango rendering via the UnicodePlugin (I think!). I know that can work because it’s what I used for nice fonts on the Pi for NuScratch, but whether it actually works or is even generated by default for other platforms I can’t say. The NuScratch code is all on SqueakSource if you want to dig into it to find usages of UnicodePlugin etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Homeopathy: Logic diluted, to make it stronger....


jrm
Reply | Threaded
Open this post in threaded view
|

Re: Unicode method returns "?"

jrm
In reply to this post by Bert Freudenberg
Bert, Tim, 
Thanks, I think I understand the problem now. 
A. The default Squeak font does not contain a glyph for the value which was used to create the character. 
B. The question mark character is used as a placeholder for the missing glyph. 
C. There may be a solution using the UnicodePlugin (which is not available on Mac per Tim's note on the VM list:
                  Apr 18, 2017; 1:07pmAdding build of ScratchPlugin & UnicodePlugin
D. The NuScratch code is in the "Improved Scratch 1.4 as used on Raspberry Pi" project at http://www.squeaksource.com/NuScratch.html. and may point in the direction of an ultimate solution.

I think I will just hack an application specific solution for now. The biggest reason that $? is a problem for me is because a random ?  in OSProcess parameter strings causes the VM to lockup. (The string I am using from the JSON data is embedded in a parameter in a command I execute using OSProcess)

jrm

On Sat, Nov 25, 2017 at 1:40 PM, Bert Freudenberg <[hidden email]> wrote:

On Sat 25. Nov 2017 at 01:38, John-Reed Maffeo <[hidden email]> wrote:
The  problem I am trying to debug occurs in the method #unescapeUnicode while parsing data from a JSON stream at a web address using the JSON package tonyg.39. I have decomposed the problem to a repeatable test case:

Unicode value: (Integer readFrom: '2019' readStream base: 16) 
>> I expect to see a right single quote mark, but a question mark is returned. I have to tried to debug this statement, but I don't understand the code. I also get question marks when I use other decimal numbers: 2000 to: 2030 do:[ :each | Transcript show: (Unicode charFromUnicode: each)]

Are you sure the font you are using to display those characters does have glyphs for them? Otherwise they will be displayed as question marks.

- Bert -






Reply | Threaded
Open this post in threaded view
|

Re: Unicode method returns "?"

Bert Freudenberg
In reply to this post by timrowledge
On Sat, Nov 25, 2017 at 2:12 AM, tim Rowledge <[hidden email]> wrote:
In a trunk image I get a char value 8217 - which is 16r2019, so correct - but it renders as a ?. So what we’re almost certainly seeing is a failure of the rendering process to find a font glyph with the relevant characters included. Which, given the large number of chars needed doesn’t surprise me a lot. The rendering does its little dance, finds that there is no glyph in the StrikeFont, delegates to the backup font (a Fixed FaceFont in the typical case) which has a little think and uses #displayErrorOn:length:at:kern:baselineY: to display the ‘substitutionCharacter’ which is ascii 63, or ‘?’.

​Yep.​

The only way you’re likely to get the glyph you want would be to use Pango rendering via the UnicodePlugin (I think!).

​Not the only way. You just need to install a font with all the glyphs you need:

Inline image 2

As you can see, it does properly show the single quote as 2019 hex character, and supports cyrillic (and greek etc too).

For this, I simply downloaded Fira Sans https://fonts.google.com/download?family=Fira%20Sans and dropped the unzipped FiraSans-Regular.ttf into Squeak, and chose "Install ttf style". Then in the Workspace, switch the font to Fira Sans.

That said, our standard font renderer does not know how to deal with ligatures, RTL, etc, so if you want to support scripts like Arabic or Devanagari, you indeed need to use a plugin. Scratch uses UnicodePlugin for rendering, and Etoys on OLPC uses RomePlugin's pango paragraph renderer.

- Bert -​



Reply | Threaded
Open this post in threaded view
|

Re: Unicode method returns "?"

Bert Freudenberg
​Oh, and I just noticed that this is a more succinct way to show a character range:

16r20 asCharacter to: 16r2400 asCharacter

(in Fira Sans this actually shows some interesting symbols...)

- Bert -​