Re: Cuis Digest, Vol 9, Issue 15

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Cuis Digest, Vol 9, Issue 15

Guido Stepken

"Orthogonality" is one of Smalltalks most appraised features. All functions are "first class" and can be combined with all other functions! ;-)

Am 07.02.2013 16:35 schrieb <[hidden email]>:
Send Cuis mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Cuis digest..."


Today's Topics:

   1. Re: Unicode in Cuis (Janko Miv?ek)
   2. Re: About adding a Unicode handling porting layer (H. Hirzel)
   3. (Minimal) requirements for Unicode support? (H. Hirzel)
   4. Re: [squeak-dev]  [Ann] Cuis 4.1 is released (H. Hirzel)
   5. How to center a submorph in its owner morph? (H. Hirzel)
   6. Re: Dropdown list (H. Hirzel)


----------------------------------------------------------------------

Message: 1
Date: Thu, 07 Feb 2013 13:57:28 +0100
From: Janko Miv?ek <[hidden email]>
To: Discussion of Cuis Smalltalk <[hidden email]>
Subject: Re: [Cuis] Unicode in Cuis
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=UTF-8

As I understand:

UTF-32 is actually no encoding, that's for me 'plain Unicode'. You need
full 32 bits for Japanese letters and because Yoshiki Oshima is
Japanese, that's why Squeak has 32bit WideString.

UTF16 is no encoding for alphabets like most East European Latin,
Cyrilic, Greek, ... But starts encoding for Japanese, Chinese, ... VW
16bit TwoByteString represents that.

UTF8 is no encoding for ASCII and few ISO8859 charsets, but starts
encoding for other alphabets. 8bit ByteString is enough for that.

So, if you use plain ASCI and some ISO8859 charsets, a ByteString is
enough. As soon as at least one character >256 is added, this complete
string is auto converted to TwoByteString in VW, to WideString in
Squeak. That way you preserve all string manipulations, #at:put: #size,
but for cost of less efficient memory consumption.

In VW a Character is subclass of Magnitude and seems to be 32bit by
default. So no problem supporting all Unicode characters/code points.

In Squeak/Pharo Character is also a subclass of Magnitude with
additional instvar #value, which seems to be 32bit as well? Note also
class vars CharacterTable, DigitValues ...

Best regards
Janko




Dne 07. 02. 2013 13:21, pi?e Juan Vuletich:
> On 2/7/2013 9:09 AM, Angel Java Lopez wrote:
>> Why UTF-16?
>>
>> - Quick access, at:, at:put:, the most usual UNICODE characters are
>> directly mapped to 2 bytes
>>
>
> Not if you support full Unicode. So, we'd need to add a flag to each
> instance to say "I only use 2 byte chars", to avoid the cost of scanning
> the whole instance every time, to know, as that would kill the "quick
> access" argument. In any case, and as just said in this thread, the main
> question to answer is "does this issue really matter at all?"
>
>> - If some string has a out-of-band, it can be easily marked as special
>
> That's when thing start to get messy.
>
>> - This is (AFAIK) the preferred internal representation in .NET, Java,
>> and most C++ implementation of w_char (please confirm). So, in current
>> standards, memory is not a problem for string representations.
>
> I _don't_ care about the internal representation others use. I just care
> about which one is best.
>
>> In practice, UTF-16 is plain UNICODE for most the cases
>>
>
> Cheers,
> Juan Vuletich
>
>> On Thu, Feb 7, 2013 at 9:02 AM, Juan Vuletich <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi Angel,
>>
>>
>>     On 2/7/2013 7:07 AM, Angel Java Lopez wrote:
>>>     Hi people!
>>>
>>>     I just found:
>>>     http://www.cprogramming.com/tutorial/unicode.html
>>>
>>>     where a similar pro/cons were discussed
>>>
>>>     and
>>>     http://msdn.microsoft.com/en-us/library/windows/desktop/dd374061(v=vs.85).aspx
>>>     <http://msdn.microsoft.com/en-us/library/windows/desktop/dd374061%28v=vs.85%29.aspx>
>>>
>>>     Well, AFAIK, unicode UTF-16 was the way adopted by C and others
>>>     to represent Unicode strings in memory. According the first
>>>     article, a direct representation of any Unicode character should
>>>     be 4 bytes. But UTF-16 is the encoding adopted to:
>>>     - avoid waste space
>>>     - easy access to a character
>>>
>>>     I found too
>>>     http://www.evanjones.ca/unicode-in-c.html
>>>     Contrary to popular belief, it is possible for a Unicode
>>>     character to require multiple 16-bit values.
>>>
>>>     What format do you use for data that goes in and out of your
>>>     software, and what format do you use internally?
>>>
>>>     Then, the author points to another link
>>>     http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode
>>>     Inside your software, store text as UTF-8 or UTF-16; that is to
>>>     say, pick one of the two and stick with it.
>>>
>>>     Another interesting post by tim bray
>>>     http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
>>>
>>
>>     Thanks for the links.
>>
>>
>>>     My guess:
>>>
>>>     - Cuis Smalltalk could support both internal representation,
>>>     UTF-8 AND UTF-16. A WideString class could be written. And I
>>>     guess, a WideChar. Their protocols should be the same as their
>>>     "byte" counterparts. What I missing: what is the problems in this
>>>     approach? All problems should be in the internal implementation
>>>     of WideString and WideChar, and in the points where the code
>>>     should decide:
>>>
>>>     - a new string is needed, what kind of?
>>
>>     One that can store Unicode. Candidates are UTF-8 and UTF-32.
>>
>>
>>>     - I have an string object (String or WideString). I need to
>>>     encode to store out of memory. What kind of encoding?
>>
>>     That's easy to answer: UTF-8.
>>
>>
>>>     Yes, it could be a lot of work, but it's the way adopted by many
>>>     languages, libraries, technologies:
>>
>>     I don't mean to sound harsh, but we don't decide on statistics. If
>>     we did that, we would not be doing Smalltalk :) .
>>
>>
>>>     a clear separation btw internal representation vs encoding. The
>>>     only twist, Smalltalk-like, it's the possibility of having TWO
>>>     internal representations using Smalltalk capabilitities.
>>>
>>>     The other way: bite the bullit, ONE internal representation,
>>>     UTF-16, all encapsulated in String (I guess it is the strategy
>>>     adopted by Java and .NET). And have new methods when needed, to
>>>     indicate the encoding (I guess in I/O, serialization), when it is
>>>     needed, having a default encoding (UTF-8?)
>>>
>>
>>     UTF-16? Why? I'd rather chose UTF-8 or UTF-32.
>>
>>
>>>     I don't understand why webservers should convert back Strings
>>>     into UTF-8.  No encoding for text response in http protocol? I
>>>     read http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Specifying_the_document.27s_character_encoding
>>>
>>
>>     Serving text in UTF-8 allows using full Unicode web content, and
>>     minimizes compatibility risks.
>>
>>
>>>     AFAIK, everything related to string I/O has specified encoding in
>>>     some way, these days.
>>>
>>>     Angel "Java" Lopez
>>>     @ajlopez
>>>
>>
>>     Cheers,
>>     Juan Vuletich
>>
>>
>>>
>>>     On Wed, Feb 6, 2013 at 11:59 PM, Juan Vuletich
>>>     <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>         Hi Folks,
>>>
>>>         I was'n able to jump before into the recent discussion, but
>>>         I've been thinking a bit about all this. This
>>>         http://en.wikipedia.org/wiki/UTF8 made me realize that using
>>>         UTF8 internally to represent Unicode strings is a way to
>>>         minimize required changes to existing software.
>>>
>>>         Hannes, your proposal for representing WideStrings as
>>>         variableWordSubclass is like
>>>         http://en.wikipedia.org/wiki/UTF-32, right? And if I remember
>>>         correctly, it is what Squeak uses.
>>>
>>>         I ended sketching this to compare alternatives:
>>>
>>>
>>>         ISO 8859-15
>>>         ========
>>>            pros:
>>>            -------
>>>               - We already have a lot of specific primitives in the VM
>>>               - Efficient use of memory
>>>               - Very fast #at: and #at:put:
>>>            cons:
>>>            -------
>>>               - Can only represent Latin alphabets and not full Unicode
>>>
>>>         Unicode UTF-8 (in a new variableByteSubclass)
>>>            pros:
>>>            -------
>>>               - We could reuse many existing String primitives
>>>               - It was created to allow existing code deal with
>>>         Unicode with minimum changes
>>>               - Efficient use of memory
>>>            cons:
>>>            -------
>>>               - Does not allow #at:put:
>>>               - #at: is very slow, O(n) instead of O(1)
>>>
>>>         Unicode UTF-32 (in a new variableWordSubclass)
>>>            pros:
>>>            -------
>>>               - Very fast #at: and #at:put: (although I guess we don
>>>         use this much... especially #at:put:, as Strings are usually
>>>         regarded as immutable)
>>>            cons:
>>>            --------
>>>               - very inefficient use of memory (4 bytes for every
>>>         character!)
>>>               - doesn't take advantage of existing String primitives
>>>         (sloooow)
>>>
>>>
>>>         I think that for some time, our main Character/String
>>>         representation should be ISO 8859-15. Switching to full
>>>         Unicode would require a lot of work in the paragraph layout
>>>         and text rendering engines. But we can start building a
>>>         Unicode representation that could be used for webapps.
>>>
>>>         So I suggest:
>>>
>>>         - Build an Utf8String variableByteSubclass and Utf8Character.
>>>         Try to use String primitives. Do not include operations such
>>>         as #at:, to discourage their use.
>>>
>>>         - Make conversion to/from ISO 8859-15 lossless, by a good
>>>         codification of CodePoints in regular Strings (not unlike
>>>         Hannes' recent contribution).
>>>
>>>         - Use this for the Clipboard. This is pretty close to what we
>>>         have now, but we'd allow using an external Unicode textEditor
>>>         to build any Unicode text, and by copying and pasting into a
>>>         String literal in Cuis code, we'd have something that can be
>>>         converted back into UTF-8 without losing any content.
>>>
>>>         - Web servers should convert back Strings into UTF-8. This
>>>         would let us handle and serve content using full Unicode,
>>>         without needing to wait until our tools can display it properly.
>>>
>>>         - Use this when editing external files, if they happen to be
>>>         in UTF-8 (there are good heuristics for determining this). On
>>>         save, we offer the option to save as UTF-8 or ISO 8859-15.
>>>
>>>         - We can start adapting the environment to avoid using the
>>>         String protocols that are a problem for UTF-8 (i.e. #at:,
>>>         #at:put: and related). This would ease an eventual migration
>>>         to using only UTF-8 for everything.
>>>
>>>         What do you think? Do we have a plan?
>>>
>>>         Cheers,
>>>         Juan Vuletich
>>>
>>>         _______________________________________________
>>>         Cuis mailing list
>>>         [hidden email] <mailto:[hidden email]>
>>>         http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>
>>>
>>>
>>>     _______________________________________________
>>>     Cuis mailing list
>>>     [hidden email] <mailto:[hidden email]>
>>>     http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>>     _______________________________________________
>>     Cuis mailing list
>>     [hidden email] <mailto:[hidden email]>
>>     http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

--
Janko Miv?ek
Svetovalec za informatiko
Eranova d.o.o.
Ljubljana, Slovenija
www.eranova.si
tel:  01 514 22 55
faks: 01 514 22 56
gsm: 031 674 565



------------------------------

Message: 2
Date: Thu, 7 Feb 2013 14:46:57 +0000
From: "H. Hirzel" <[hidden email]>
To: Discussion of Cuis Smalltalk <[hidden email]>
Subject: Re: [Cuis] About adding a Unicode handling porting layer
Message-ID:
        <CAGQxfVh9aaENH1+LzfqRobkvdOTF=CCrBrjYuC_hiem=[hidden email]>
Content-Type: text/plain; charset=UTF-8

Hello Juan,

Good to see you writing how to add Unicode support to Cuis.

Yes, we want to keep things elegant and easy to understand. In fact
the code for the current implementation (Cuis 4.1) of Character /
String is pleasant to read. It is well worked out and I have the
impression that it has more features than Squeak or Pharo. For example
it has functions for word lists and spell checking.

At the moment I think we should go too quickly for conclusions. What
we need is maybe a few hooks in Cuis to help add Unicode support to
have  in external libraries.

Maybe even competing implementations so that we can compare and decide
what is best.

The current implementation of Character and String are well worked
out. I'd like to keep the code as is as much as possible.

--Hannes

On 2/7/13, Juan Vuletich <[hidden email]> wrote:
> Thanks Folks!
>
> Cheers,
> Juan Vuletich
>
> On 2/6/2013 10:07 PM, Ken Dickey wrote:
>> On Wed, 6 Feb 2013 12:21:56 +0000
>> "H. Hirzel"<[hidden email]>  wrote:
>>
>>> The reason why it was not ported by Juan is that he wanted to focus on
>>> Morphic and leave out some complex subsystems like Unicode support,
>>> Monticello and others.
>> I'd just like echo these sentiments. I think moving Morphic forward is the
>> highest value.
>>
>> IMHO, anything we can do to help and/or not hinder Juan is goodness.
>>
>> Cheers,
>> -KenD
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>> -----
>> Se certifico que el correo no contiene virus.
>> Comprobada por AVG - www.avg.es
>> Version: 2013.0.2897 / Base de datos de virus: 2639/6086 - Fecha de la
>> version: 06/02/2013
>>
>>
>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>



------------------------------

Message: 3
Date: Thu, 7 Feb 2013 15:01:24 +0000
From: "H. Hirzel" <[hidden email]>
To: [hidden email]
Subject: [Cuis] (Minimal) requirements for Unicode support?
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Hello Juan, Angel, Janko, Germ?n, Kean and others.

Thank your for sharing ideas and engaging in this discussion.

As we are discussing now a possible Unicode support on the conceptual
level the question arises:

    What are the minimal requirements for Unicode support for Cuis?

Let me state how I see it: (perception form outside , no implementation issues)


## First of all surely


Support of reading and writing of UFT8 encoded files.
To a certain extent this is the case as of now. But is should be improved.

This may or may not mean to drop ISO8859-15 completely as file encoding format.





## Secondly

It should be possible to port web libraries like Swazoo, WebClient,
Aida, Zinc, Altitude and others without problems. This does not
necessarily mean that the files appear 'nicely' in Cuis.


## Thirdly

Unicode support for display in Cuis. Probably still the major European
languages plus special symbols. More or less the state as is plus a
few more symbols/


##  Fourthly

Having a foundation that people who want to add additional support
(e.g. Korean, Russian, Japanese) can go ahead and implement
add-on-libraries.



##  Focus

We should focus on 1) and 2) first.

As of now I think I am on a good way to support this with an external
library. As for the changes in Cuis as such I think we should be
careful not to add too much complexity.
I prefer Cuis to remain lean.


## CONCLUSION

Let's us come to an agreement

a) what the minimal support for Unicode should be.

b) what is the task of Juan to add into Cuis and what should reside in
external libraries.


In the meantime I'll update the notes

   https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md

here describing what the current Unicode support is.


Kind regards
Hannes



------------------------------

Message: 4
Date: Thu, 7 Feb 2013 15:28:04 +0000
From: "H. Hirzel" <[hidden email]>
To: The general-purpose Squeak developers list
        <[hidden email]>, [hidden email]
Cc: Discussion of Cuis Smalltalk <[hidden email]>
Subject: Re: [Cuis] [squeak-dev]  [Ann] Cuis 4.1 is released
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

On 2/7/13, David T. Lewis <[hidden email]> wrote:
> On Thu, Feb 07, 2013 at 09:10:25AM -0300, Juan Vuletich (mail lists) wrote:
>> Hi Martin,
>>
>> Quoting Martin Kuball <[hidden email]>:
>>
>> >Am Wednesday 12 December 2012 schrieb Juan Vuletich (mail lists):
>> >>Hi Folks,
>> >>
>> >>Cuis 4.1 is available at http://www.jvuletich.org/Cuis/Index.html .
>> >>Biggest news is in the Morph hierarchy. Ivars 'bounds' and
>> >>'fullBounds' are gone! All coordinates are now Float and relative to
>> >>the owner morph. This is part of the transition to Morphic 3. The
>> >>drawing engine is still BitBlt and the UI is not scalable yet, but
>> >>Morphic 3 is now much closer.
>> >>
>> >>Cheers,
>> >>Juan Vuletich
>> >>
>> >
>> >I tried cuis 4.1 but was not able to get it to work. I used the
>> >latest vm from debian unstable: 4.10.2.2614-1_amd64. When I open the
>> >image everything seems to be fine. Altough I'm not sure if the
>> >background is supposed to be black? But when I move windows arround
>> >or open popups the screen is not redrawn and it gets cluttered with
>> >parts of the windows and the popups (see attached image). Any idea
>> >what's causing this? The squeak images work just fine.
>> >
>> >Martin
>> >
>>
>> The problems you see are due to bugs in BitBlt that I fixed about 2
>> years ago. You need a newer VM.
>>
>> As we recommend at http://www.jvuletich.org/Cuis/Index.html , use the
>> latest Cog VM for your platform if possible. Right now, that would be
>> http://www.mirandabanda.org/files/Cog/VM/VM.r2678/coglinux.tgz . If
>> that doesn't fit your system, you need to find a relatively recent VM
>> with the fixed BitBlt.
>>
>> Maybe someone can give more specific advice, as I'm not a Linuxer...
>>
>
> Squeak VMs are at http://squeakvm.org/index.html, which has links to
> download the standard interpreter VM at http://squeakvm.org/unix/ and
> Cog VM at http://www.mirandabanda.org/files/Cog/VM/. Cuis will work
> with any of these VMs.
>
> I'm not sure who is maintaining the Debian distribution, but apparently
> it is broken.

Maybe Martin Kuball just needs to use the latest Cuis version (minor version).

    https://github.com/jvuletich

Juan indeed did some fixes this January regarding artifacts remaining
on the display.  The updates are available on github (including
ready-made images).

   https://github.com/jvuletich/Cuis/tree/master/Cuis4WithLatestUpdates

and in particular

    https://github.com/jvuletich/Cuis/blob/master/Cuis4WithLatestUpdates/Cuis4.1-1576.zip

That might be enough to bring Cuis4.1 into action on Debian. As Germ?n
Arduino just noted  it works fine for him on Lubuntu.

Regards
--Hannes




> Dave
>
>
>



------------------------------

Message: 5
Date: Thu, 7 Feb 2013 15:32:16 +0000
From: "H. Hirzel" <[hidden email]>
To: [hidden email]
Subject: [Cuis] How to center a submorph in its owner morph?
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Hello

I want to add an instance of ImageMorph to an instance of  Morph and
center it there?

How do I do that?

I have done experiments with newRow and newColumn layout, see
  https://github.com/hhzl/Cuis-Add-Ons

But I do not have examples yet of centering.

Thank you for the answer in advance.

Hannes



------------------------------

Message: 6
Date: Thu, 7 Feb 2013 15:35:31 +0000
From: "H. Hirzel" <[hidden email]>
To: Discussion of Cuis Smalltalk <[hidden email]>
Subject: Re: [Cuis] Dropdown list
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Hello Juan

Good to have a drop-down list in StyleTextEditor for 4.0. library.

    https://github.com/bpieber/Cuis-StyledTextEditor

I assume I can get it out from there as it is probably  code which is
rather independent. Right?

Regards
Hannes

On 2/7/13, Juan Vuletich <[hidden email]> wrote:
> Hi Hannes,
>
> There is one in StyledTextEditor, but this still doesn't run in 4.1...
>
> Cheers,
> Juan Vuletich
>
> On 2/2/2013 4:33 PM, H. Hirzel wrote:
>> Hello
>>
>> Which class of Morph shall I use to implement a dropdown list in Morphic?
>> So far I did not find anything suitable.
>>
>> Regards
>> Hannes
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>



------------------------------

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org


End of Cuis Digest, Vol 9, Issue 15
***********************************

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org