Smalltalk › Usenets › Dolphin Smalltalk

Unicode Support

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

9 messages Options

Sergei Gnezdov-4

Unicode Support

Recently I tried to read contents of the drive on my machine with
Dolphin XP. I found that it does not display Russian letters correctly.
I am not sure where exactly the problem starts. Is it in external
interfacing? Any solutions?

The following code demonstrates a problem:

c := OrderedCollection new.
File
forAll: '*'
in: 'C:\Russian'
do: [:each | each fileName ~= '..' ifTrue: [c addLast: each fileName ]].
c inspect

By the way, is the any better way to go through the directory structure?
Not that it is better, but I use recursion in C# or Java.

Thank you

Blair McGlashan-3

Re: Unicode Support

"Sergei Gnezdov" <[hidden email]> wrote in message
news:1095479764.VwjC0ADW8T4FF4BmTgTw4g@teranews...
> Recently I tried to read contents of the drive on my machine with Dolphin
> XP. I found that it does not display Russian letters correctly. I am not
> sure where exactly the problem starts. Is it in external interfacing?
> Any solutions?

I think it probably has to do with the default font saved down into the
RichEdit control used as the source editor. This thread from
comp.lang.smalltalk.dolphin may help:

http://groups.google.co.uk/groups?hl=en&lr=&ie=UTF-8&selm=aq8j61%24dof2%241%40as201.hinet.hr

In order to change the default workspace font double click the 'User
Preferences' item in the main system launcher window. Locate 'Workspace' at
the end of the list, and expand the tree node. You can then double-click the
defaultFont aspect and change the script appropriately.

> ...
> By the way, is the any better way to go through the directory structure?
> Not that it is better, but I use recursion in C# or Java.

We don't provide a comprehensive object model for the file system in
Dolphin. See however the method #example2 on the class side of
AXTypeLibraryAnalyzer. If you try to run the example, however, it will fail
if you are running Windows XP since the example assumes that the system
directory will be WINNT on NT class systems (this is easily corrected). If
you do a search in Google groups for 'comp.lang.smalltalk.dolphin
IFileSystem' it will bring back a number of postings that may be helpful.

Regards

Blair

Chris Uppal-3

Re: Unicode Support

Blair McGlashan wrote:

> > Recently I tried to read contents of the drive on my machine with
> > Dolphin
> > XP. I found that it does not display Russian letters correctly. I am
> > not
> > sure where exactly the problem starts. Is it in external interfacing?
> > Any solutions?
>
> I think it probably has to do with the default font saved down into the
> RichEdit control used as the source editor.

More than that, I think. As far as I can tell, the underlying Windows
findFirstFile/findNextFile stuff (or whatever it's called) is answering data
where the characters in filenames that cannot be represented in 8bits are
replaced by $?. That's to say that the actual bytes pointed to by the
WIIN32_FIND_DATA have 63 as their value.

Sergei, you /might/ be able to use the #cAlternateFileName from the same data
(the old DOS-style name), but it does depend on what you are trying to do.

I did take a look at what would be involved in duplicating Dolphin's file
enumeration using the "wide" API, and it looks doable with some work. More
work than I fancied just for an experiment, though. Of course, once you have
defined wide versions of the existing code, then you'd still have problems if
you need, say, to display a list of those filenames to a user.

-- chris

Blair McGlashan-3

Re: Unicode Support

"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]...

> Blair McGlashan wrote:
>
>> > Recently I tried to read contents of the drive on my machine with
>> > Dolphin
>> > XP. I found that it does not display Russian letters correctly. I am
>> > not
>> > sure where exactly the problem starts. Is it in external interfacing?
>> > Any solutions?
>>
>> I think it probably has to do with the default font saved down into the
>> RichEdit control used as the source editor.
>
> More than that, I think. As far as I can tell, the underlying Windows
> findFirstFile/findNextFile stuff (or whatever it's called) is answering
> data
> where the characters in filenames that cannot be represented in 8bits are
> replaced by $?. That's to say that the actual bytes pointed to by the
> WIIN32_FIND_DATA have 63 as their value.
>...

Are wide characters actually needed for Russian though? I don't know, just
asking.

Regards

Blair

Chris Uppal-3

Re: Unicode Support

Blair McGlashan wrote:

> Are wide characters actually needed for Russian though? I don't know, just
> asking.

Good question, I'm not sure. It may only be that the small-caps characters I
used for the test, which I found on some random Russian web page, may use
code-points that are outside the range normally used by a Russian Windows
installation. Or maybe Windows recognises that my machine is basically English
(albeit with a fair amount of "foreign" language support installed) and is
unable to repreent the name in 8bits using /my/ code page, but would have been
able to do it on a Russian installation.

Anyway, the following ugly hack of a loop will dump the byte representation of
the names of all the files in a folder to the transcript. It may help Sergei
isolate the problem:

File for: '*' in: 'C:\Temp\' do:
[:each || name altName addr len bytes |
name := each cFileName.
altName := each cAlternateFileName.
addr := each bytes yourAddress + 44.
len := name size.
bytes := ByteArray fromAddress: addr length: len.
Transcript
display: altName; display: ': ';
print: name; display: ' = ';
print: bytes; cr].

Which, on my machine, writes:

: '.' = #[46]
: '..' = #[46 46]
AAATXT~1.BZ2: 'aaa.txt.bz2' = #[97 97 97 46 116 120 116 46 98 122 50]
...
XYSTXT~1.BZ2: 'xys.txt.bz2' = #[120 121 115 46 116 120 116 46 98 122 50]
221B~1.TXT: '???????????????.txt' = #[63 63 63 63 63 63 63 63 63 63 63 63 63 63
63 46 116 120 116]
3572~1.TXT: '???? ????????.txt' = #[63 63 63 63 32 63 63 63 63 63 63 63 63 46
116 120 116]
2C05~1.TXT: '????????????.txt' = #[63 63 63 63 63 63 63 63 63 63 63 63 46 116
120 116]

to my Transcript. Notice that the last three files have names that are mostly
made up out of $?, and that that is what Windows has supplied in the raw byte
data. (Those files have names created by cut-and-pasting a random string from
a Russion, Urdu, and Japanse website respectively. They do display correctly
in explorer). Sergei, if you try that and the byte data isn't all 63s, then
you probably only have a display issue, if not then you have a more difficult
problem to deal with. I'd be interested to know which.

BTW, it's a little disturbing that some of the entries don't have "alternate"
filenames. I know you can turn that off, but I'm surprised to find that
Window's hasn't generated alternate names for all the files, even though that
feature is deliberately left turned on on this box :-(

-- chris

Chris Uppal-3

Re: Unicode Support

A follow-up:

> Or maybe Windows recognises that my
> machine is basically English (albeit with a fair amount of "foreign"
> language support installed) and is unable to repreent the name in 8bits
> using /my/ code page, but would have been able to do it on a Russian
> installation.

I got interested enough to take a risk and reset my system (after all, I only
spent /five hours/ yesterday getting Windows bloody Update to work, so
what's a little more messing around going to hurt). It seems that the
speculation is correct.

I switched my machine to use a Cyrillic code page -- at least, that's what I
assume "Control Panel / Regional and Language Settings / Advanced / Language
for non-Unicode programs" means ("This system setting enables non-Unicode
programs to display menus and dialogs in their native language. It does not
affect Unicode programs, but it does apply to all users of this computer.") I
chose "Serbian (Cyrillic)" arbitrarily, and rebooted. (I hadn't changed
anything else, all the Windows menus etc were still in English.)

After doing that, my loop (see earlier post) was producing meaningful byte
values for the Cyrillic filename, and -- somewhat to my surprise, since I
hadn't told Dolphin about the change -- even displayed correctly in the
Transcript. The Urdu and Japanese filenames were still coming back as all 63s.
(So naturally I tried changing that setting to Urdu then Japanese too, both
worked the same way).

So now I'm back with a proper British computer again, and the only lasting
consequence of my rashness seems that Outlook Express's flashing text cursor
now has a little flag at the top. Anyone know how to fix that ;-) ?

-- chris

Sergei Gnezdov-4

Re: Unicode Support

In reply to this post by Blair McGlashan-3

> Are wide characters actually needed for Russian though? I don't know, just
> asking.

I am probably reiterating what Chris Uppal found out already.

Russian characters are traditionally presented with 8 bytes. Normally,
non-English system is configured to use Cyrillic code page or however
else they name it. Traditionally Russian letters are encoded in range
128-256.

I assume that Russian computer should not have any problem, because of
the 8 byte length.

The newest systems (Windows 2K, XP) seem to be capable of displaying
Russian (and many other languages) even if Encoding is not enabled. I
assume that they store file names in Unicode.

...

I prefer not to enable Russian encoding, because it has some negative
font effects (font sizes are not always the same). Chris Uppal found
another one of such problems :(

Unicode is not a big deal to me. It is just that Unicode support is
taken for granted these days (C#, Java).

Thank you

Panu Viljamaa-3

Re: Unicode Support

Sergei Gnezdov wrote:
> Unicode is not a big deal to me. It is just that Unicode support is
> taken for granted these days (C#, Java).

I agree that UNICODE is important from the point of view of
compatiblity with other environments.

Thanks
-Panu Viljamaa

Dmitry Zamotkin-5

Re: Unicode Support

In reply to this post by Sergei Gnezdov-4

Hello Sergei,

> Recently I tried to read contents of the drive on my machine with
> Dolphin XP. I found that it does not display Russian letters correctly.
> I am not sure where exactly the problem starts. Is it in external
> interfacing? Any solutions?

1. Change scrypt of font in "User preferences>Workspace>defaultFont"
and possible in "User preferences>Development System>defaultFont" to
Cyrillic.
2. Find key \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage.
3. Change all values named 1250-1258 to "c_1251.nls"
4. Reboot, everything should be allright.

Dmitry Zamotkin