Smalltalk › Pharo › Pharo Smalltalk Developers

[ANN] The Pharo Unicode Project

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

20 messages Options

Sven Van Caekenberghe-2

[ANN] The Pharo Unicode Project

Hi,

In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.

The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.

The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.

Please read the following article for more information (the appendix explains how to get the code).

An Implementation of Unicode Normalisation

Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.

https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0

The development branch also contains a work in progress implementation of Unicode Collation.

Much work remains to be done and contributions are more than welcome.

Sven & Henrik

Max Leske

Re: [ANN] The Pharo Unicode Project

Good stuff guys!

> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
> An Implementation of Unicode Normalisation
>
> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik

Esteban A. Maringolo

Re: [ANN] The Pharo Unicode Project

I read the whole Article, seems like a tricky, to not say hard,
subject. The article is very detailed and well written though.

What is the rationale behind embracing such a challenging feature like
supporting Unicode?

Regards!

Esteban A. Maringolo

2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:

> Good stuff guys!
>
>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi,
>>
>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>
>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>
>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>
>> Please read the following article for more information (the appendix explains how to get the code).
>>
>> An Implementation of Unicode Normalisation
>>
>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>
>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>
>> The development branch also contains a work in progress implementation of Unicode Collation.
>>
>> Much work remains to be done and contributions are more than welcome.
>>
>> Sven & Henrik
>
>

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

> On 17 Feb 2016, at 19:56, Esteban A. Maringolo <[hidden email]> wrote:
>
> I read the whole Article, seems like a tricky, to not say hard, subject.

Yes. The Unicode specs are big and complex. Step one is to read & understand them, at least well enough to find your way. Doing some implementation is not too hard, but getting 100% scores on the very extensive test suites was/is quite hard.

> The article is very detailed and well written though.

Thx.

> What is the rationale behind embracing such a challenging feature like supporting Unicode?

Unicode is the de facto standard for internationalisation of computer software. Any serious platform has to tackle (big parts of) Unicode.

> Regards!
>
>
> Esteban A. Maringolo
>
>
> 2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:
>> Good stuff guys!
>>
>>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>
>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>
>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>
>>> Please read the following article for more information (the appendix explains how to get the code).
>>>
>>> An Implementation of Unicode Normalisation
>>>
>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>
>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>
>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>
>>> Much work remains to be done and contributions are more than welcome.
>>>
>>> Sven & Henrik
>>
>>
>

Henrik Sperre Johansen

Re: [ANN] The Pharo Unicode Project

In reply to this post by Esteban A. Maringolo

Because, who doesn't want to spend an evening with a workspace like this:

http://pastebin.com/gdxCrTgM

trying to figure out why some of the last 7 out of 189k official collate tests fail?

Seriously though, personally I was just tired of having to answer "Sort of, but not really" every time the question of Unicode support is raised, specifically the last discussion that lead nowhere back in December. Better to show in code how you think a decent implementation might look like, and Sven made it easier to get rolling when he wrote import code for many of the data tables needed, parsing text is one of the things I dislike most...

Cheers,

Henry

P.S. Unicode Collation is more of an intellectual exercise. For practical collation you need the locale tailorings from CLDR, which is kept in a, frankly, abominable format. The object model / actual collate algorithms should carry over nicely though, so it's not a complete waste.

On Wed, Feb 17, 2016 at 7:56 PM, Esteban A. Maringolo <[hidden email]> wrote:

I read the whole Article, seems like a tricky, to not say hard,
subject. The article is very detailed and well written though.

What is the rationale behind embracing such a challenging feature like
supporting Unicode?

Regards!

Esteban A. Maringolo

2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:
> Good stuff guys!
>
>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi,
>>
>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>
>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>
>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>
>> Please read the following article for more information (the appendix explains how to get the code).
>>
>> An Implementation of Unicode Normalisation
>>
>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>
>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>
>> The development branch also contains a work in progress implementation of Unicode Collation.
>>
>> Much work remains to be done and contributions are more than welcome.
>>
>> Sven & Henrik
>
>

Ben Coman

Re: [ANN] The Pharo Unicode Project

In reply to this post by Sven Van Caekenberghe-2

On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
> An Implementation of Unicode Normalisation
>
> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik

This is really great. Thanks Sven & Henrik.

> The right pane shows a large glyph view of the character involved.

This is very cool !!
Does it rely on font support, or is it bitmap or SVG based?
Maybe it could dynamically download SVG data as needed from somewhere
like here...
http://www.fileformat.info/info/unicode/char/00f6/index.htm

> We also added an extension to our environment’s search system to allow you to look up characters by name:

Is there a #unicode filter for Spotter?

check repeated phrase...
> and replace it with and replace it with

I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
support in other programming languages (though circa 2011)
https://www.azabani.com/pages/gbu/
and this sparked a thought to wonder if the consortium would consider
it worthwhile paying someone external (like that author) with broad
cross platform Unicode experience to consult on getting Pharo to a
pragmatic point where we look attractive in such comparisons, which
may also provide a side-channel advertisement for Pharo when such
comparisons are presented at conferences (??)

cheers -ben

Tudor Girba-2

Re: [ANN] The Pharo Unicode Project

In reply to this post by Sven Van Caekenberghe-2

Thank you very much!

This is an important project. It would be great if others would join you.

Cheers,
Doru

> On Feb 17, 2016, at 10:16 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
> An Implementation of Unicode Normalisation
>
> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik

--
www.tudorgirba.com
www.feenk.com

"Beauty is where we see it."

Tudor Girba-2

Re: [ANN] The Pharo Unicode Project

In reply to this post by Ben Coman

Hi,

> On Feb 18, 2016, at 2:05 AM, Ben Coman <[hidden email]> wrote:
>
> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>> ...
>
>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>
> Is there a #unicode filter for Spotter?

Any category can be used as a filter. So, if the category is called #'Unicode Character’, you can use #unicode as a filter.

Cheers,
Doru

--
www.tudorgirba.com
www.feenk.com

"We cannot reach the flow of things unless we let go."

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

> On 18 Feb 2016, at 06:30, Tudor Girba <[hidden email]> wrote:
>
> Hi,
>
>> On Feb 18, 2016, at 2:05 AM, Ben Coman <[hidden email]> wrote:
>>
>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> ...
>>
>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>
>> Is there a #unicode filter for Spotter?
>
> Any category can be used as a filter. So, if the category is called #'Unicode Character’, you can use #unicode as a filter.

Ah, I love it when you get things for free !

> Cheers,
> Doru
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "We cannot reach the flow of things unless we let go."
>
>
>
>
>

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

In reply to this post by Ben Coman

Ben,

> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>
> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>> Hi,
>>
>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>
>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>
>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>
>> Please read the following article for more information (the appendix explains how to get the code).
>>
>> An Implementation of Unicode Normalisation
>>
>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>
>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>
>> The development branch also contains a work in progress implementation of Unicode Collation.
>>
>> Much work remains to be done and contributions are more than welcome.
>>
>> Sven & Henrik
>
> This is really great. Thanks Sven & Henrik.

You're welcome.

>> The right pane shows a large glyph view of the character involved.
>
> This is very cool !!
> Does it rely on font support, or is it bitmap or SVG based?

Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).

> Maybe it could dynamically download SVG data as needed from somewhere
> like here...
> http://www.fileformat.info/info/unicode/char/00f6/index.htm

Wow, I was looking for something like that, but I didn't know it existed.

Can we somehow directly use SVG in Pharo ?

On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)

In any case, it would be very cool to have something like that as a fall back.

>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>
> Is there a #unicode filter for Spotter?

That comes for free.

> check repeated phrase...
>> and replace it with and replace it with

Fixed, thanks.

> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
> support in other programming languages (though circa 2011)
> https://www.azabani.com/pages/gbu/

Reading it.

> and this sparked a thought to wonder if the consortium would consider
> it worthwhile paying someone external (like that author) with broad
> cross platform Unicode experience to consult on getting Pharo to a
> pragmatic point where we look attractive in such comparisons, which
> may also provide a side-channel advertisement for Pharo when such
> comparisons are presented at conferences (??)

Maybe, I don't know if we should spent (marketing) money like that, but who knows.

> cheers -ben

Sven

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

> On 18 Feb 2016, at 15:52, Sven Van Caekenberghe <[hidden email]> wrote:
>
>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>> support in other programming languages (though circa 2011)
>> https://www.azabani.com/pages/gbu/
>
> Reading it.

I read it. So, yes, we still have a long way to go (not that I didn't know, but still).

>> and this sparked a thought to wonder if the consortium would consider
>> it worthwhile paying someone external (like that author) with broad
>> cross platform Unicode experience to consult on getting Pharo to a
>> pragmatic point where we look attractive in such comparisons, which
>> may also provide a side-channel advertisement for Pharo when such
>> comparisons are presented at conferences (??)
>
> Maybe, I don't know if we should spent (marketing) money like that, but who knows.

We are too far off, it will take a while to get at a decent level.

Peter Uhnak

Re: [ANN] The Pharo Unicode Project

> Can we somehow directly use SVG in Pharo ?

Through Cairo/Athens (which is a vector graphics engine).

Look at Athens-SVG package (it's part of of Pharo/Athens repository (which is in image), but the package itself is not loaded by default.

Athens-SVG should be able to take SVG file and produce Athens drawing instructions, although I haven't used it in a while so I don't remember the details.

Or you can look at Roassal2, which supports drawing shapes described with SVG Path (RTSVGPath)

Peter

On Thu, Feb 18, 2016 at 5:05 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> On 18 Feb 2016, at 15:52, Sven Van Caekenberghe <[hidden email]> wrote:
>
>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>> support in other programming languages (though circa 2011)
>> https://www.azabani.com/pages/gbu/
>
> Reading it.

I read it. So, yes, we still have a long way to go (not that I didn't know, but still).

>> and this sparked a thought to wonder if the consortium would consider
>> it worthwhile paying someone external (like that author) with broad
>> cross platform Unicode experience to consult on getting Pharo to a
>> pragmatic point where we look attractive in such comparisons, which
>> may also provide a side-channel advertisement for Pharo when such
>> comparisons are presented at conferences (??)
>
> Maybe, I don't know if we should spent (marketing) money like that, but who knows.

We are too far off, it will take a while to get at a decent level.

Ben Coman

Re: [ANN] The Pharo Unicode Project

In reply to this post by Sven Van Caekenberghe-2

On Thu, Feb 18, 2016 at 10:52 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> Ben,
>
>> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>>
>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> Hi,
>>>
>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>
>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>
>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>
>>> Please read the following article for more information (the appendix explains how to get the code).
>>>
>>> An Implementation of Unicode Normalisation
>>>
>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>
>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>
>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>
>>> Much work remains to be done and contributions are more than welcome.
>>>
>>> Sven & Henrik
>>
>> This is really great. Thanks Sven & Henrik.
>
> You're welcome.
>
>>> The right pane shows a large glyph view of the character involved.
>>
>> This is very cool !!
>> Does it rely on font support, or is it bitmap or SVG based?
>
> Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).
>
>> Maybe it could dynamically download SVG data as needed from somewhere
>> like here...
>> http://www.fileformat.info/info/unicode/char/00f6/index.htm
>
> Wow, I was looking for something like that, but I didn't know it existed.
>
> Can we somehow directly use SVG in Pharo ?
>
> On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)

aH-Har! A challenge! but I did not succeed. The Unicode Code Charts
[1] seem available only in PDF form (e.g. [2]) . Indeed it says "The
fonts and font data used in production of the Unicode Standard may not
be extracted, or used in any way in any product or publication,
without permission or license granted by the typeface owner(s)."

[1] http://unicode.org/charts/About.html
[2] http://unicode.org/charts/PDF/U0100.pdf

Though I discovered a few other interesting resources...

Fallback font
https://en.wikipedia.org/wiki/Fallback_font

Last Resort Font
http://www.unicode.org/policies/lastresortfont_eula.html

Scriptsource - Unicode Character Browsing
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=tubkvb6y8f

Javascript Unibook 4.0 Character Browser
http://www.soldin.de/about/2003-js_unibook/

decodeunicode
http://www.decodeunicode.org/

HTH, cheers -ben

>
> In any case, it would be very cool to have something like that as a fall back.
>
>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>
>> Is there a #unicode filter for Spotter?
>
> That comes for free.
>
>> check repeated phrase...
>>> and replace it with and replace it with
>
> Fixed, thanks.
>
>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>> support in other programming languages (though circa 2011)
>> https://www.azabani.com/pages/gbu/
>
> Reading it.
>
>> and this sparked a thought to wonder if the consortium would consider
>> it worthwhile paying someone external (like that author) with broad
>> cross platform Unicode experience to consult on getting Pharo to a
>> pragmatic point where we look attractive in such comparisons, which
>> may also provide a side-channel advertisement for Pharo when such
>> comparisons are presented at conferences (??)
>
> Maybe, I don't know if we should spent (marketing) money like that, but who knows.
>
>> cheers -ben
>
> Sven
>
>

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

> On 18 Feb 2016, at 18:13, Ben Coman <[hidden email]> wrote:
>
> On Thu, Feb 18, 2016 at 10:52 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>> Ben,
>>
>>> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>>>
>>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>>
>>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>>
>>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>>
>>>> Please read the following article for more information (the appendix explains how to get the code).
>>>>
>>>> An Implementation of Unicode Normalisation
>>>>
>>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>>
>>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>>
>>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>>
>>>> Much work remains to be done and contributions are more than welcome.
>>>>
>>>> Sven & Henrik
>>>
>>> This is really great. Thanks Sven & Henrik.
>>
>> You're welcome.
>>
>>>> The right pane shows a large glyph view of the character involved.
>>>
>>> This is very cool !!
>>> Does it rely on font support, or is it bitmap or SVG based?
>>
>> Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).
>>
>>> Maybe it could dynamically download SVG data as needed from somewhere
>>> like here...
>>> http://www.fileformat.info/info/unicode/char/00f6/index.htm
>>
>> Wow, I was looking for something like that, but I didn't know it existed.
>>
>> Can we somehow directly use SVG in Pharo ?
>>
>> On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)
>
> aH-Har! A challenge! but I did not succeed.

I thought you would write a complete solution in Pharo, oh well ;-)

> The Unicode Code Charts
> [1] seem available only in PDF form (e.g. [2]) . Indeed it says "The
> fonts and font data used in production of the Unicode Standard may not
> be extracted, or used in any way in any product or publication,
> without permission or license granted by the typeface owner(s)."
>
> [1] http://unicode.org/charts/About.html
> [2] http://unicode.org/charts/PDF/U0100.pdf
>
> Though I discovered a few other interesting resources...
>
> Fallback font
> https://en.wikipedia.org/wiki/Fallback_font
>
> Last Resort Font
> http://www.unicode.org/policies/lastresortfont_eula.html
>
> Scriptsource - Unicode Character Browsing
> http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=tubkvb6y8f
>
> Javascript Unibook 4.0 Character Browser
> http://www.soldin.de/about/2003-js_unibook/
>
> decodeunicode
> http://www.decodeunicode.org/

Yeah, good links.

This site seems to be able to generate .png's for each glyph

http://r12a.github.io/uniview/

Though the URLs are not straightforward and probably not meant to be used as web service.

Maybe this is usable:

https://en.wikipedia.org/wiki/GNU_Unifont

> HTH, cheers -ben
>
>>
>> In any case, it would be very cool to have something like that as a fall back.
>>
>>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>>
>>> Is there a #unicode filter for Spotter?
>>
>> That comes for free.
>>
>>> check repeated phrase...
>>>> and replace it with and replace it with
>>
>> Fixed, thanks.
>>
>>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>>> support in other programming languages (though circa 2011)
>>> https://www.azabani.com/pages/gbu/
>>
>> Reading it.
>>
>>> and this sparked a thought to wonder if the consortium would consider
>>> it worthwhile paying someone external (like that author) with broad
>>> cross platform Unicode experience to consult on getting Pharo to a
>>> pragmatic point where we look attractive in such comparisons, which
>>> may also provide a side-channel advertisement for Pharo when such
>>> comparisons are presented at conferences (??)
>>
>> Maybe, I don't know if we should spent (marketing) money like that, but who knows.
>>
>>> cheers -ben
>>
>> Sven

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

Here is a quick & dirty version without error checking:

| codePoint ucd name hex url |
codePoint := 180.
codePoint := 8491.
ucd := codePoint unicodeCharacterData.
name := $_ join: (Character space split: ucd name asLowercase).
hex := String streamContents: [ :out | codePoint printOn: out base: 16 nDigits: 4 ].
url := 'http://www.fileformat.info/info/unicode/char/{1}/{2}.png' format: { hex. name }.
ZnEasy getPng: url.

They seem to be adding a watermark if you do it like this, but not all the time.

> On 18 Feb 2016, at 20:24, Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> On 18 Feb 2016, at 18:13, Ben Coman <[hidden email]> wrote:
>>
>> On Thu, Feb 18, 2016 at 10:52 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> Ben,
>>>
>>>> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>>>>
>>>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>> Hi,
>>>>>
>>>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>>>
>>>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>>>
>>>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>>>
>>>>> Please read the following article for more information (the appendix explains how to get the code).
>>>>>
>>>>> An Implementation of Unicode Normalisation
>>>>>
>>>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>>>
>>>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>>>
>>>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>>>
>>>>> Much work remains to be done and contributions are more than welcome.
>>>>>
>>>>> Sven & Henrik
>>>>
>>>> This is really great. Thanks Sven & Henrik.
>>>
>>> You're welcome.
>>>
>>>>> The right pane shows a large glyph view of the character involved.
>>>>
>>>> This is very cool !!
>>>> Does it rely on font support, or is it bitmap or SVG based?
>>>
>>> Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).
>>>
>>>> Maybe it could dynamically download SVG data as needed from somewhere
>>>> like here...
>>>> http://www.fileformat.info/info/unicode/char/00f6/index.htm
>>>
>>> Wow, I was looking for something like that, but I didn't know it existed.
>>>
>>> Can we somehow directly use SVG in Pharo ?
>>>
>>> On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)
>>
>> aH-Har! A challenge! but I did not succeed.
>
> I thought you would write a complete solution in Pharo, oh well ;-)
>
>> The Unicode Code Charts
>> [1] seem available only in PDF form (e.g. [2]) . Indeed it says "The
>> fonts and font data used in production of the Unicode Standard may not
>> be extracted, or used in any way in any product or publication,
>> without permission or license granted by the typeface owner(s)."
>>
>> [1] http://unicode.org/charts/About.html
>> [2] http://unicode.org/charts/PDF/U0100.pdf
>>
>> Though I discovered a few other interesting resources...
>>
>> Fallback font
>> https://en.wikipedia.org/wiki/Fallback_font
>>
>> Last Resort Font
>> http://www.unicode.org/policies/lastresortfont_eula.html
>>
>> Scriptsource - Unicode Character Browsing
>> http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=tubkvb6y8f
>>
>> Javascript Unibook 4.0 Character Browser
>> http://www.soldin.de/about/2003-js_unibook/
>>
>> decodeunicode
>> http://www.decodeunicode.org/
>
> Yeah, good links.
>
> This site seems to be able to generate .png's for each glyph
>
> http://r12a.github.io/uniview/
>
> Though the URLs are not straightforward and probably not meant to be used as web service.
>
> Maybe this is usable:
>
> https://en.wikipedia.org/wiki/GNU_Unifont
>
>> HTH, cheers -ben
>>
>>>
>>> In any case, it would be very cool to have something like that as a fall back.
>>>
>>>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>>>
>>>> Is there a #unicode filter for Spotter?
>>>
>>> That comes for free.
>>>
>>>> check repeated phrase...
>>>>> and replace it with and replace it with
>>>
>>> Fixed, thanks.
>>>
>>>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>>>> support in other programming languages (though circa 2011)
>>>> https://www.azabani.com/pages/gbu/
>>>
>>> Reading it.
>>>
>>>> and this sparked a thought to wonder if the consortium would consider
>>>> it worthwhile paying someone external (like that author) with broad
>>>> cross platform Unicode experience to consult on getting Pharo to a
>>>> pragmatic point where we look attractive in such comparisons, which
>>>> may also provide a side-channel advertisement for Pharo when such
>>>> comparisons are presented at conferences (??)
>>>
>>> Maybe, I don't know if we should spent (marketing) money like that, but who knows.
>>>
>>>> cheers -ben
>>>
>>> Sven

stepharo

Re: [ANN] The Pharo Unicode Project

In reply to this post by Sven Van Caekenberghe-2

Super cool!

Stef

Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :

stepharo

Re: [ANN] The Pharo Unicode Project

In reply to this post by Sven Van Caekenberghe-2

I ***LOVE*** the documentation :)

Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :

stepharo

Re: [ANN] The Pharo Unicode Project

In reply to this post by Sven Van Caekenberghe-2

I deeply appreciate your effort!

Le 17/2/16 23:27, Sven Van Caekenberghe a écrit :

>> On 17 Feb 2016, at 19:56, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> I read the whole Article, seems like a tricky, to not say hard, subject.
> Yes. The Unicode specs are big and complex. Step one is to read & understand them, at least well enough to find your way. Doing some implementation is not too hard, but getting 100% scores on the very extensive test suites was/is quite hard.
>
>> The article is very detailed and well written though.
> Thx.
>
>> What is the rationale behind embracing such a challenging feature like supporting Unicode?
> Unicode is the de facto standard for internationalisation of computer software. Any serious platform has to tackle (big parts of) Unicode.
>
>> Regards!
>>
>>
>> Esteban A. Maringolo
>>
>>
>> 2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:
>>> Good stuff guys!
>>>
>>>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>>
>>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>>
>>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>>
>>>> Please read the following article for more information (the appendix explains how to get the code).
>>>>
>>>> An Implementation of Unicode Normalisation
>>>>
>>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>>
>>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>>
>>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>>
>>>> Much work remains to be done and contributions are more than welcome.
>>>>
>>>> Sven & Henrik
>>>
>
>

Mariano Martinez Peck

Re: [ANN] The Pharo Unicode Project

In reply to this post by stepharo

Wow guys. Unbelievable. Amazing. Coming from you and Henry, I really don't expect anything else than this :)

Sven I also love your midium posts.

Thanks!!!

On Thu, Feb 18, 2016 at 6:22 PM, stepharo <[hidden email]> wrote:

I ***LOVE*** the documentation :)

Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :

Hi,

In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.

The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.

The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.

Please read the following article for more information (the appendix explains how to get the code).

An Implementation of Unicode Normalisation

Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.

https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0

The development branch also contains a work in progress implementation of Unicode Collation.

Much work remains to be done and contributions are more than welcome.

Sven & Henrik

Mariano
http://marianopeck.wordpress.com

Sven Van Caekenberghe-2

Re: [ANN] The Pharo Unicode Project

Thank you, Stef & Mariano.

I must say that this is a case of positive feedback, many people's efforts (like of you both, but of many, many others as well), is infectious, it stimulates me to do my best too.

So let me just say the same: thank you all for making Pharo what it is today.

Sven

> On 19 Feb 2016, at 02:02, Mariano Martinez Peck <[hidden email]> wrote:
>
> Wow guys. Unbelievable. Amazing. Coming from you and Henry, I really don't expect anything else than this :)
> Sven I also love your midium posts.
>
> Thanks!!!
>
> On Thu, Feb 18, 2016 at 6:22 PM, stepharo <[hidden email]> wrote:
> I ***LOVE*** the documentation :)
>
>
> Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :
>
> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
> An Implementation of Unicode Normalisation
>
> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik
>
>
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com