Hi,
on the way to localisation I tried to fix my ouUO characters, but was not able to. Any suggestion would be appreciated. If I type ouUO in a workspace then I get õûÕÛ displayed independently of setting the Workspace>>Option>>Text. (this seems to be the Windows Western interpretation instead of the Windows Central Europe) If I copy uoUO (from Winword for ex.) and paste in a workspace, then I get ouOU. (Well, similar, but without the small "-s above the letters) What I have tried until now according to the earlier suggestions: It seems in my case it is not about double or Uni - bytes, - codes, - whatever, but changing the Windows ANSI codepages between Windows Western, Windows Central Europe (i.e from CP 1252 to 1250). my ouOU remains in one byte code range, they are: 245 251 213 219 "õûÕÛ" resp. 0.) The ouOU works fine in D5. There we do not have this problem. AND it works fine in the basic text presenter (as you can see below). Only the Scintilla seems to have difficulties with the simple ANSI codepages... 1.) "Davorin's follow up post i solved my own problem. window 2000 looks at 'system locale' when converting non-unicode text, (and changing that system locale is the first operation i found that required a reboot). after changing systel locale, croatian characters are displayed ok." I think it was in the pre D6 times, because I have set my Windows (XP) locale and keyboard to Hungarian, and restarted the machine. The result is the same. If I type ouUO in a workspace then I get õûÕÛ displayed. If I copy uoUO (from Winword for ex.) and paste in a workspace, then I get ouOU. 2.) Then checked Chris' suggestion: I tried bytes := #[ 245 251 213 219 "õûÕÛ" ]. tm := bytes asString asValue. tp := TextPresenter show: 'Scintilla view' on: tm. sci := tp view. And got õûÕÛ. (Strange enough is: the "basic" textPresenter displays fine. If I do tp1 := TextPresenter showOn: tm. the I get the correct ouOU displayed. And even if I show a base TextPresenter on the strange Scintilla characters: tp2 := TextPresenter showOn: 'õûÕÛ'. then I also get the correct ouOU displayed. This gives hope for the runtime! 3.) With Chris' "fixed" version of SciLexer.dll: the result is the same. Then added to KernelLibrary getACP "Retrieves the current ANSI code-page identifier for the system." <stdcall: dword GetACP> ^self invalidCall And evaluated KernelLibrary default getACP. It is 1250. 4.) "If it answers 0, then Scintilla is just leaving the formatting to Windows, which /should/ get it right" sci sciSetCodePage: 0. After setting CP to 0, the result is the same. 5.) Then according to Blaire I inserted the view sciSetCodePage: KernelLibrary default getACP into the SmalltalkWorkspace>>applyOptions The result is the same. Now: The Scintilla displayed õûÕÛ changes if I change the sci codepage by #sciSetCodePage: like sci sciSetCodePage: 1251. In the advanced Tab of Windows regional settings all code page conversions are enabled from 1250 (ANSI Central Europe) to 1258 (ANSI OEM - Viet Nam) 1250 (Central Europe) gives õûÕÛ 1251 (Cyrillic) gives some russioan characters 1252 (Latin 1) seems to display the same as 1250 1253 (Greek) displays something else and so on. I tried from 1250 to 1258, but neither of them gives me the correct display although according to the considerations above at least the 0 or the 1250 should. Well, I can live with this if necessary because the "wrong" displayed characters in the texts in the methods seem to be displayed fine in the basic presenters, only it would be nice if I could see the same text in the method, what is displayed during runtime. Any idea, or suggestion: why does Scintilla behave in such a strange way with simple ANSI codepages, and how could I reach to get the correct display? Thanks in advance, Janos |
Yeah, well...
it seems Google itself has problems with displaying some ANSI CP 1250 charactes. So to the őűŐŰ above: ő is "Latin Small letter O with double acute" ű is "Latin Small letter U with double acute" Ő is "Latin Capital letter O with double acute" Ű is ""Latin Capital letter U with double acute" Life is good! Janos |
Janos,
> it seems Google itself has problems with displaying some ANSI CP 1250 > charactes. Not only Google, but many or most newsreaders too. Thanks for the clarification. When those characters were rendered in my newsreader, it interpreted the bytes #[16rF5 16rFB 16rD5 16rDB] as o-tilde u-circumflex O-tilde U-circumflex which is correct for code-page 1252. In code page 1250 the same bytes should be interpreted as: o-double-acute u-double-acute O-double-acute U-double-acute It's difficult to see the difference unless you use a large font (or set #zoomLevel: to something like 20). I'm not absolutely sure, but I /think/ that the problem you are experiencing is that when you paste the ouOU-with-double-acute-accents string into a workspace, the accents don't show up, even though it /ought/ to work (because your Windows system code page is 1250). If not then the rest of this might not be much help ;-) I think I know roughly what's going wrong. It seems to be a bug in the way that Dolphin handles Scintilla/Windows charset identifiers (as distinct from code page identifiers -- Redmond work /hard/ to make this stuff confusing). The rest of this is aimed as much at OA, or at least Blair, as yourself ;-) When you paste your string into the workspace, Dolphin instructs Scintilla to paste the current clipboard text. Scintilla gets the text off the clipboard (as Unicode) and attempts to convert it to something it can insert into its text buffer. This is where the first odd thing happens. Scintilla uses the #characterSet of the current text style in preference to the code page of the control itself (I don't know why it does that). So if the #characterSet is not SC_CHARSET_DEFAULT -- which has value 1, not 0 as you might imagine -- then the control will attempt to convert the Unicode text from the clipboard into the code page corresponding to that charset. If the charset /is/ SC_CHARSET_DEFAULT then the control converts the Unicode into its own configured code page. Scintilla's own default for the charset of each text style is SC_CHARSET_DEFAULT, so it normally works correctly -- and that's why an ordinary TextPresenter with a Scintilla View works correctly. But, when Dolphin defines ScintillaTextStyles for use in a workspace (or anywhere else) then code is all set up to default to nil, which ends up being SC_CHARSET_ANSI (=0). So if Dolphin has set up a text style for the control, then it will tell Scintilla to use a charset of 0. Scintilla does as it is asked, and the result is that all text which is pasted into a styled ScintillaView is converted (as far as possible) into ASCII before being displayed. In the case of your string, Windows will convert it to ouOU (with no accents), which is why the accents disappear when you paste into a workspace. If the characters have no near equivalent then you'll just end up with a lot of ????s. By way of a temporary hack to see if I could fix it (not a serious attempt at a fix, just to find out if I'd interpreted the problem correctly), I made a few changes to my image. I very definitely am /not/ recommending these as fixes for your image, but they may help confirm that we are seeing the same problem, and may also help Blair a little bit. I changed a few of methods to stop ScintillaTextStyle defaulting to SC_CHARSET_ANSI. =============== initialize super initialize. flags := 0. characterSet := SC_CHARSET_DEFAULT. =============== characterSet ^characterSet ifNil: [SC_CHARSET_DEFAULT]. =============== And in ScintillaView I chaged #buildDefaultStyle to useSC_CHARSET_DEFAULT instead of SC_CHARSET_ANSI. Unfortunately that still didn't quite fix it. It turns out that the logic in ScintillaTextStyle>>mergeFont: will always overwrite the style's #characterSet with the #characterSet from the Font. Unfortunately, that seems to be 0 in all the cases I tried, and that's exactly what we don't want. I don't really know what Windows is doing here, I suspect that the value is simply wrong. In any case, I just commented out the line: characterSet ifNil: [characterSet := aFont characterSet] And then new workspaces pasted text correctly ! -- chris |
"Chris Uppal" <[hidden email]> wrote in message
news:448077a1$0$656$[hidden email]... >... > Unfortunately that still didn't quite fix it. It turns out that the logic > in > ScintillaTextStyle>>mergeFont: will always overwrite the style's > #characterSet > with the #characterSet from the Font. Unfortunately, that seems to be 0 > in all > the cases I tried, and that's exactly what we don't want. I don't really > know > what Windows is doing here, I suspect that the value is simply wrong. That's odd. It works fine for me (i.e. if I execute 'Font choose', choose a font, and set the character set to Central European, then I do get back 238). This is on XP SP2. >...In any > case, I just commented out the line: > > characterSet ifNil: [characterSet := aFont characterSet] > > And then new workspaces pasted text correctly ! I don't think that is right though. It should be respecting the character set of the Font you set. There is no way to choose "default" in the font dialog, so I'm afraid we end up shipping with the western (ANSI) character set anyway. Maybe we could set it up programmatically so it will work for more people out of the box. In any case Font characterSet should be returning the correct value (the Scintilla constants are defined to have the same values as the Windows constants. although Scintilla defines a few extra over and above those in gdi32.h), so if you change the workspace default font character set it should work. Indeed I find that if I just change the default workspace font to use the Central European character set, that both direct character entry and copy&paste from Wordpad work fine. This was tested in a fresh 6.02 install without any patches (i.e without any previously discussed internationalisation improvements). Regards Blair |
Blair,
> Indeed I find that if I just change the default workspace font to use the > Central European character set, that both direct character entry and > copy&paste from Wordpad work fine. How do you set it? By Font characterSet: 1250 or somehow else? Thank you, Janos |
In reply to this post by Blair McGlashan-4
Blair,
Blair McGlashan wrote: > Indeed I find that if I just change the default workspace font to use the > Central European character set, that both direct character entry and > copy&paste from Wordpad work fine. How do you change the workspace default font character set? Thank you, Janos |
"Janos" <[hidden email]> wrote in message
news:[hidden email]... > Blair, > > Blair McGlashan wrote: >> Indeed I find that if I just change the default workspace font to use the >> Central European character set, that both direct character entry and >> copy&paste from Wordpad work fine. > > How do you change the workspace default font character set? It's in Dolphin Options. You can either navigate to this from the system launcher window, or in an open workspace invoke Tools/Options/Inspect. You'll get an inspector. In that inspector double-click the defaultFont node in the tree. You'll get a common font dialog in which you can choose the font, including the character set. Regards Blair |
Blair,
Wow. So simple it is!!!!!!!!!!!! And it works fine! Then: "I'm afraid we end up shipping with the western (ANSI) character set anyway. " I do not think it is really necessary to deliver Dolphin with something else as a western (ANSI). (well, we do not speak here about asian languages, they are anotehr story, discussed in another antry already). You can change it any time if you need. Great! Many thanks, Janos |
In reply to this post by Blair McGlashan-4
Blair,
> That's odd. It works fine for me (i.e. if I execute 'Font choose', choose > a font, and set the character set to Central European, then I do get back > 238). That would probably work for me too; I hadn't noticed the "script" option in the font dialog. Janos seems happy, so maybe the issue is closed, but it still seems (to me) that there's something wrong here. I just don't understand what the "script" attribute of a font is supposed to mean. There seems to be very little about the corresponding field in LOGFONT in MSDN. I know very little about pre-Unicode internationalisation, but it seems plausible that the charset field is some sort of archaic holdover which has no meaning now. Fonts don't have charsets or code-pages any more do they ? I thought the underlying machinery worked in Unicode. In this instance, if Dolphin and Scintilla both attempt to honour that field, then the control inevitably ends up discarding information (unless the field happens to be set to a value which matches the control's idea of the document code page). Unless the charset field does something real that I don't know about (perfectly possible), I would have thought that it would be more correct to force it into correspondence with the target document's code page, or -- since that isn't trivial afaik -- to force it always to CHARSET_DEFAULT which Scintilla effectively ignores. As I say, I'm far from certain about this, but that's how it seems to me today. -- chris |
Free forum by Nabble | Edit this page |