[ANN] The Pharo Unicode Project

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2
Hi,

In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.

The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.

The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.

Please read the following article for more information (the appendix explains how to get the code).

  An Implementation of Unicode Normalisation

  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.

  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0

The development branch also contains a work in progress implementation of Unicode Collation.

Much work remains to be done and contributions are more than welcome.

Sven & Henrik
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Max Leske
Good stuff guys!

> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
>  An Implementation of Unicode Normalisation
>
>  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
>  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Esteban A. Maringolo
I read the whole Article, seems like a tricky, to not say hard,
subject. The article is very detailed and well written though.

What is the rationale behind embracing such a challenging feature like
supporting Unicode?

Regards!


Esteban A. Maringolo


2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:

> Good stuff guys!
>
>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi,
>>
>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>
>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>
>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>
>> Please read the following article for more information (the appendix explains how to get the code).
>>
>>  An Implementation of Unicode Normalisation
>>
>>  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>
>>  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>
>> The development branch also contains a work in progress implementation of Unicode Collation.
>>
>> Much work remains to be done and contributions are more than welcome.
>>
>> Sven & Henrik
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2

> On 17 Feb 2016, at 19:56, Esteban A. Maringolo <[hidden email]> wrote:
>
> I read the whole Article, seems like a tricky, to not say hard, subject.

Yes. The Unicode specs are big and complex. Step one is to read & understand them, at least well enough to find your way. Doing some implementation is not too hard, but getting 100% scores on the very extensive test suites was/is quite hard.

> The article is very detailed and well written though.

Thx.

> What is the rationale behind embracing such a challenging feature like supporting Unicode?

Unicode is the de facto standard for internationalisation of computer software. Any serious platform has to tackle (big parts of) Unicode.

> Regards!
>
>
> Esteban A. Maringolo
>
>
> 2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:
>> Good stuff guys!
>>
>>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>
>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>
>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>
>>> Please read the following article for more information (the appendix explains how to get the code).
>>>
>>> An Implementation of Unicode Normalisation
>>>
>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>
>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>
>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>
>>> Much work remains to be done and contributions are more than welcome.
>>>
>>> Sven & Henrik
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Henrik Sperre Johansen
In reply to this post by Esteban A. Maringolo
Because, who doesn't want to spend an evening with a workspace like this:
trying to figure out why some of the last 7 out of 189k official collate tests fail?

Seriously though, personally I was just tired of having to answer "Sort of, but not really" every time the question of Unicode support is raised, specifically the last discussion that lead nowhere back in December. Better to show in code how you think a decent implementation might look like, and Sven made it easier to get rolling when he wrote import code for many of the data tables needed, parsing text is one of the things I dislike most...

Cheers,
Henry

P.S. Unicode Collation is more of an intellectual exercise. For practical collation you need the locale tailorings from CLDR, which is kept in a, frankly, abominable format. The object model / actual collate algorithms should carry over nicely though, so it's not a complete waste. 

On Wed, Feb 17, 2016 at 7:56 PM, Esteban A. Maringolo <[hidden email]> wrote:
I read the whole Article, seems like a tricky, to not say hard,
subject. The article is very detailed and well written though.

What is the rationale behind embracing such a challenging feature like
supporting Unicode?

Regards!


Esteban A. Maringolo


2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:
> Good stuff guys!
>
>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi,
>>
>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>
>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>
>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>
>> Please read the following article for more information (the appendix explains how to get the code).
>>
>>  An Implementation of Unicode Normalisation
>>
>>  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>
>>  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>
>> The development branch also contains a work in progress implementation of Unicode Collation.
>>
>> Much work remains to be done and contributions are more than welcome.
>>
>> Sven & Henrik
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Ben Coman
In reply to this post by Sven Van Caekenberghe-2
On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
>   An Implementation of Unicode Normalisation
>
>   Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
>   https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik

This is really great. Thanks Sven & Henrik.


> The right pane shows a large glyph view of the character involved.

This is very cool !!
Does it rely on font support, or is it bitmap or SVG based?
Maybe it could dynamically download SVG data as needed from somewhere
like here...
http://www.fileformat.info/info/unicode/char/00f6/index.htm


> We also added an extension to our environment’s search system to allow you to look up characters by name:

Is there a #unicode filter for Spotter?


check repeated phrase...
> and replace it with and replace it with


I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
support in other programming languages (though circa 2011)
https://www.azabani.com/pages/gbu/
and this sparked a thought to wonder if the consortium would consider
it worthwhile paying someone external (like that author) with broad
cross platform Unicode experience to consult on getting Pharo to a
pragmatic point where we look attractive in such comparisons, which
may also provide a side-channel advertisement for Pharo when such
comparisons are presented at conferences (??)

cheers -ben

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Tudor Girba-2
In reply to this post by Sven Van Caekenberghe-2
Thank you very much!

This is an important project. It would be great if others would join you.

Cheers,
Doru


> On Feb 17, 2016, at 10:16 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
>  An Implementation of Unicode Normalisation
>
>  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
>  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik

--
www.tudorgirba.com
www.feenk.com

"Beauty is where we see it."





Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Tudor Girba-2
In reply to this post by Ben Coman
Hi,

> On Feb 18, 2016, at 2:05 AM, Ben Coman <[hidden email]> wrote:
>
> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>> ...
>
>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>
> Is there a #unicode filter for Spotter?

Any category can be used as a filter. So, if the category is called #'Unicode Character’, you can use #unicode as a filter.

Cheers,
Doru

--
www.tudorgirba.com
www.feenk.com

"We cannot reach the flow of things unless we let go."





Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2

> On 18 Feb 2016, at 06:30, Tudor Girba <[hidden email]> wrote:
>
> Hi,
>
>> On Feb 18, 2016, at 2:05 AM, Ben Coman <[hidden email]> wrote:
>>
>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> ...
>>
>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>
>> Is there a #unicode filter for Spotter?
>
> Any category can be used as a filter. So, if the category is called #'Unicode Character’, you can use #unicode as a filter.

Ah, I love it when you get things for free !

> Cheers,
> Doru
>
> --
> www.tudorgirba.com
> www.feenk.com
>
> "We cannot reach the flow of things unless we let go."
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2
In reply to this post by Ben Coman
Ben,

> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>
> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>> Hi,
>>
>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>
>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>
>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>
>> Please read the following article for more information (the appendix explains how to get the code).
>>
>>  An Implementation of Unicode Normalisation
>>
>>  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>
>>  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>
>> The development branch also contains a work in progress implementation of Unicode Collation.
>>
>> Much work remains to be done and contributions are more than welcome.
>>
>> Sven & Henrik
>
> This is really great. Thanks Sven & Henrik.

You're welcome.

>> The right pane shows a large glyph view of the character involved.
>
> This is very cool !!
> Does it rely on font support, or is it bitmap or SVG based?

Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).

> Maybe it could dynamically download SVG data as needed from somewhere
> like here...
> http://www.fileformat.info/info/unicode/char/00f6/index.htm

Wow, I was looking for something like that, but I didn't know it existed.

Can we somehow directly use SVG in Pharo ?

On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)

In any case, it would be very cool to have something like that as a fall back.

>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>
> Is there a #unicode filter for Spotter?

That comes for free.

> check repeated phrase...
>> and replace it with and replace it with

Fixed, thanks.

> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
> support in other programming languages (though circa 2011)
> https://www.azabani.com/pages/gbu/

Reading it.

> and this sparked a thought to wonder if the consortium would consider
> it worthwhile paying someone external (like that author) with broad
> cross platform Unicode experience to consult on getting Pharo to a
> pragmatic point where we look attractive in such comparisons, which
> may also provide a side-channel advertisement for Pharo when such
> comparisons are presented at conferences (??)

Maybe, I don't know if we should spent (marketing) money like that, but who knows.

> cheers -ben

Sven


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2

> On 18 Feb 2016, at 15:52, Sven Van Caekenberghe <[hidden email]> wrote:
>
>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>> support in other programming languages (though circa 2011)
>> https://www.azabani.com/pages/gbu/
>
> Reading it.

I read it. So, yes, we still have a long way to go (not that I didn't know, but still).

>> and this sparked a thought to wonder if the consortium would consider
>> it worthwhile paying someone external (like that author) with broad
>> cross platform Unicode experience to consult on getting Pharo to a
>> pragmatic point where we look attractive in such comparisons, which
>> may also provide a side-channel advertisement for Pharo when such
>> comparisons are presented at conferences (??)
>
> Maybe, I don't know if we should spent (marketing) money like that, but who knows.

We are too far off, it will take a while to get at a decent level.
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Peter Uhnak
Can we somehow directly use SVG in Pharo ?

Through Cairo/Athens (which is a vector graphics engine).
Look at Athens-SVG package (it's part of of Pharo/Athens repository (which is in image), but the package itself is not loaded by default.
Athens-SVG should be able to take SVG file and produce Athens drawing instructions, although I haven't used it in a while so I don't remember the details.

Or you can look at Roassal2, which supports drawing shapes described with SVG Path (RTSVGPath)

Peter


On Thu, Feb 18, 2016 at 5:05 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> On 18 Feb 2016, at 15:52, Sven Van Caekenberghe <[hidden email]> wrote:
>
>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>> support in other programming languages (though circa 2011)
>> https://www.azabani.com/pages/gbu/
>
> Reading it.

I read it. So, yes, we still have a long way to go (not that I didn't know, but still).

>> and this sparked a thought to wonder if the consortium would consider
>> it worthwhile paying someone external (like that author) with broad
>> cross platform Unicode experience to consult on getting Pharo to a
>> pragmatic point where we look attractive in such comparisons, which
>> may also provide a side-channel advertisement for Pharo when such
>> comparisons are presented at conferences (??)
>
> Maybe, I don't know if we should spent (marketing) money like that, but who knows.

We are too far off, it will take a while to get at a decent level.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Ben Coman
In reply to this post by Sven Van Caekenberghe-2
On Thu, Feb 18, 2016 at 10:52 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> Ben,
>
>> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>>
>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> Hi,
>>>
>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>
>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>
>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>
>>> Please read the following article for more information (the appendix explains how to get the code).
>>>
>>>  An Implementation of Unicode Normalisation
>>>
>>>  Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>
>>>  https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>
>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>
>>> Much work remains to be done and contributions are more than welcome.
>>>
>>> Sven & Henrik
>>
>> This is really great. Thanks Sven & Henrik.
>
> You're welcome.
>
>>> The right pane shows a large glyph view of the character involved.
>>
>> This is very cool !!
>> Does it rely on font support, or is it bitmap or SVG based?
>
> Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).
>
>> Maybe it could dynamically download SVG data as needed from somewhere
>> like here...
>> http://www.fileformat.info/info/unicode/char/00f6/index.htm
>
> Wow, I was looking for something like that, but I didn't know it existed.
>
> Can we somehow directly use SVG in Pharo ?
>
> On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)

aH-Har! A challenge!  but I did not succeed.   The Unicode Code Charts
[1] seem available only in PDF form (e.g. [2]) . Indeed it says "The
fonts and font data used in production of the Unicode Standard may not
be extracted, or used in any way in any product or publication,
without permission or license granted by the typeface owner(s)."

[1] http://unicode.org/charts/About.html
[2] http://unicode.org/charts/PDF/U0100.pdf

Though I discovered a few other interesting resources...

Fallback font
https://en.wikipedia.org/wiki/Fallback_font

Last Resort Font
http://www.unicode.org/policies/lastresortfont_eula.html

Scriptsource - Unicode Character Browsing
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=tubkvb6y8f

Javascript Unibook 4.0 Character Browser
http://www.soldin.de/about/2003-js_unibook/

decodeunicode
http://www.decodeunicode.org/

HTH, cheers -ben

>
> In any case, it would be very cool to have something like that as a fall back.
>
>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>
>> Is there a #unicode filter for Spotter?
>
> That comes for free.
>
>> check repeated phrase...
>>> and replace it with and replace it with
>
> Fixed, thanks.
>
>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>> support in other programming languages (though circa 2011)
>> https://www.azabani.com/pages/gbu/
>
> Reading it.
>
>> and this sparked a thought to wonder if the consortium would consider
>> it worthwhile paying someone external (like that author) with broad
>> cross platform Unicode experience to consult on getting Pharo to a
>> pragmatic point where we look attractive in such comparisons, which
>> may also provide a side-channel advertisement for Pharo when such
>> comparisons are presented at conferences (??)
>
> Maybe, I don't know if we should spent (marketing) money like that, but who knows.
>
>> cheers -ben
>
> Sven
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2

> On 18 Feb 2016, at 18:13, Ben Coman <[hidden email]> wrote:
>
> On Thu, Feb 18, 2016 at 10:52 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>> Ben,
>>
>>> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>>>
>>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>>
>>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>>
>>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>>
>>>> Please read the following article for more information (the appendix explains how to get the code).
>>>>
>>>> An Implementation of Unicode Normalisation
>>>>
>>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>>
>>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>>
>>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>>
>>>> Much work remains to be done and contributions are more than welcome.
>>>>
>>>> Sven & Henrik
>>>
>>> This is really great. Thanks Sven & Henrik.
>>
>> You're welcome.
>>
>>>> The right pane shows a large glyph view of the character involved.
>>>
>>> This is very cool !!
>>> Does it rely on font support, or is it bitmap or SVG based?
>>
>> Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).
>>
>>> Maybe it could dynamically download SVG data as needed from somewhere
>>> like here...
>>> http://www.fileformat.info/info/unicode/char/00f6/index.htm
>>
>> Wow, I was looking for something like that, but I didn't know it existed.
>>
>> Can we somehow directly use SVG in Pharo ?
>>
>> On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)
>
> aH-Har! A challenge!  but I did not succeed.  

I thought you would write a complete solution in Pharo, oh well ;-)

> The Unicode Code Charts
> [1] seem available only in PDF form (e.g. [2]) . Indeed it says "The
> fonts and font data used in production of the Unicode Standard may not
> be extracted, or used in any way in any product or publication,
> without permission or license granted by the typeface owner(s)."
>
> [1] http://unicode.org/charts/About.html
> [2] http://unicode.org/charts/PDF/U0100.pdf
>
> Though I discovered a few other interesting resources...
>
> Fallback font
> https://en.wikipedia.org/wiki/Fallback_font
>
> Last Resort Font
> http://www.unicode.org/policies/lastresortfont_eula.html
>
> Scriptsource - Unicode Character Browsing
> http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=tubkvb6y8f
>
> Javascript Unibook 4.0 Character Browser
> http://www.soldin.de/about/2003-js_unibook/
>
> decodeunicode
> http://www.decodeunicode.org/

Yeah, good links.

This site seems to be able to generate .png's for each glyph

http://r12a.github.io/uniview/

Though the URLs are not straightforward and probably not meant to be used as web service.

Maybe this is usable:

https://en.wikipedia.org/wiki/GNU_Unifont

> HTH, cheers -ben
>
>>
>> In any case, it would be very cool to have something like that as a fall back.
>>
>>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>>
>>> Is there a #unicode filter for Spotter?
>>
>> That comes for free.
>>
>>> check repeated phrase...
>>>> and replace it with and replace it with
>>
>> Fixed, thanks.
>>
>>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>>> support in other programming languages (though circa 2011)
>>> https://www.azabani.com/pages/gbu/
>>
>> Reading it.
>>
>>> and this sparked a thought to wonder if the consortium would consider
>>> it worthwhile paying someone external (like that author) with broad
>>> cross platform Unicode experience to consult on getting Pharo to a
>>> pragmatic point where we look attractive in such comparisons, which
>>> may also provide a side-channel advertisement for Pharo when such
>>> comparisons are presented at conferences (??)
>>
>> Maybe, I don't know if we should spent (marketing) money like that, but who knows.
>>
>>> cheers -ben
>>
>> Sven


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2
Here is a quick & dirty version without error checking:

| codePoint ucd name hex url |
codePoint := 180.
codePoint := 8491.
ucd := codePoint unicodeCharacterData.
name := $_ join: (Character space split: ucd name asLowercase).
hex := String streamContents: [ :out | codePoint printOn: out base: 16 nDigits: 4 ].
url := 'http://www.fileformat.info/info/unicode/char/{1}/{2}.png' format: { hex. name }.
ZnEasy getPng: url.

They seem to be adding a watermark if you do it like this, but not all the time.

> On 18 Feb 2016, at 20:24, Sven Van Caekenberghe <[hidden email]> wrote:
>
>>
>> On 18 Feb 2016, at 18:13, Ben Coman <[hidden email]> wrote:
>>
>> On Thu, Feb 18, 2016 at 10:52 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>> Ben,
>>>
>>>> On 18 Feb 2016, at 02:05, Ben Coman <[hidden email]> wrote:
>>>>
>>>> On Wed, Feb 17, 2016 at 5:16 PM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>> Hi,
>>>>>
>>>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>>>
>>>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>>>
>>>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>>>
>>>>> Please read the following article for more information (the appendix explains how to get the code).
>>>>>
>>>>> An Implementation of Unicode Normalisation
>>>>>
>>>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>>>
>>>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>>>
>>>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>>>
>>>>> Much work remains to be done and contributions are more than welcome.
>>>>>
>>>>> Sven & Henrik
>>>>
>>>> This is really great. Thanks Sven & Henrik.
>>>
>>> You're welcome.
>>>
>>>>> The right pane shows a large glyph view of the character involved.
>>>>
>>>> This is very cool !!
>>>> Does it rely on font support, or is it bitmap or SVG based?
>>>
>>> Currently, we just use what Pharo can do (it thus depends on your font and even then the layout is often wrong in special cases, something we need to fix as well).
>>>
>>>> Maybe it could dynamically download SVG data as needed from somewhere
>>>> like here...
>>>> http://www.fileformat.info/info/unicode/char/00f6/index.htm
>>>
>>> Wow, I was looking for something like that, but I didn't know it existed.
>>>
>>> Can we somehow directly use SVG in Pharo ?
>>>
>>> On this page [http://www.fileformat.info/info/unicode/index.htm] they say that they got all their info from unicode.org but I can't seem to find the SVG data there. You seem to be good in finding stuff ;-)
>>
>> aH-Har! A challenge!  but I did not succeed.  
>
> I thought you would write a complete solution in Pharo, oh well ;-)
>
>> The Unicode Code Charts
>> [1] seem available only in PDF form (e.g. [2]) . Indeed it says "The
>> fonts and font data used in production of the Unicode Standard may not
>> be extracted, or used in any way in any product or publication,
>> without permission or license granted by the typeface owner(s)."
>>
>> [1] http://unicode.org/charts/About.html
>> [2] http://unicode.org/charts/PDF/U0100.pdf
>>
>> Though I discovered a few other interesting resources...
>>
>> Fallback font
>> https://en.wikipedia.org/wiki/Fallback_font
>>
>> Last Resort Font
>> http://www.unicode.org/policies/lastresortfont_eula.html
>>
>> Scriptsource - Unicode Character Browsing
>> http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=tubkvb6y8f
>>
>> Javascript Unibook 4.0 Character Browser
>> http://www.soldin.de/about/2003-js_unibook/
>>
>> decodeunicode
>> http://www.decodeunicode.org/
>
> Yeah, good links.
>
> This site seems to be able to generate .png's for each glyph
>
> http://r12a.github.io/uniview/
>
> Though the URLs are not straightforward and probably not meant to be used as web service.
>
> Maybe this is usable:
>
> https://en.wikipedia.org/wiki/GNU_Unifont
>
>> HTH, cheers -ben
>>
>>>
>>> In any case, it would be very cool to have something like that as a fall back.
>>>
>>>>> We also added an extension to our environment’s search system to allow you to look up characters by name:
>>>>
>>>> Is there a #unicode filter for Spotter?
>>>
>>> That comes for free.
>>>
>>>> check repeated phrase...
>>>>> and replace it with and replace it with
>>>
>>> Fixed, thanks.
>>>
>>>> I bumped into an interesting "Good, Bad, Ugly" comparison of Unicode
>>>> support in other programming languages (though circa 2011)
>>>> https://www.azabani.com/pages/gbu/
>>>
>>> Reading it.
>>>
>>>> and this sparked a thought to wonder if the consortium would consider
>>>> it worthwhile paying someone external (like that author) with broad
>>>> cross platform Unicode experience to consult on getting Pharo to a
>>>> pragmatic point where we look attractive in such comparisons, which
>>>> may also provide a side-channel advertisement for Pharo when such
>>>> comparisons are presented at conferences (??)
>>>
>>> Maybe, I don't know if we should spent (marketing) money like that, but who knows.
>>>
>>>> cheers -ben
>>>
>>> Sven


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

stepharo
In reply to this post by Sven Van Caekenberghe-2
Super cool!

Stef

Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :

> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
>    An Implementation of Unicode Normalisation
>
>    Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
>    https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik
>


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

stepharo
In reply to this post by Sven Van Caekenberghe-2
I ***LOVE*** the documentation :)


Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :

> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
>    An Implementation of Unicode Normalisation
>
>    Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
>    https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik
>


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

stepharo
In reply to this post by Sven Van Caekenberghe-2
I deeply appreciate your effort!

Le 17/2/16 23:27, Sven Van Caekenberghe a écrit :

>> On 17 Feb 2016, at 19:56, Esteban A. Maringolo <[hidden email]> wrote:
>>
>> I read the whole Article, seems like a tricky, to not say hard, subject.
> Yes. The Unicode specs are big and complex. Step one is to read & understand them, at least well enough to find your way. Doing some implementation is not too hard, but getting 100% scores on the very extensive test suites was/is quite hard.
>
>> The article is very detailed and well written though.
> Thx.
>
>> What is the rationale behind embracing such a challenging feature like supporting Unicode?
> Unicode is the de facto standard for internationalisation of computer software. Any serious platform has to tackle (big parts of) Unicode.
>
>> Regards!
>>
>>
>> Esteban A. Maringolo
>>
>>
>> 2016-02-17 15:24 GMT-03:00 Max Leske <[hidden email]>:
>>> Good stuff guys!
>>>
>>>> On 17 Feb 2016, at 10:16, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>>>>
>>>> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>>>>
>>>> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>>>>
>>>> Please read the following article for more information (the appendix explains how to get the code).
>>>>
>>>> An Implementation of Unicode Normalisation
>>>>
>>>> Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>>>>
>>>> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>>>>
>>>> The development branch also contains a work in progress implementation of Unicode Collation.
>>>>
>>>> Much work remains to be done and contributions are more than welcome.
>>>>
>>>> Sven & Henrik
>>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Mariano Martinez Peck
In reply to this post by stepharo
Wow guys. Unbelievable. Amazing. Coming from you and Henry, I really don't expect anything else than this :)
Sven I also love your midium posts. 

Thanks!!!

On Thu, Feb 18, 2016 at 6:22 PM, stepharo <[hidden email]> wrote:
I ***LOVE*** the documentation :)


Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :

Hi,

In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.

The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.

The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.

Please read the following article for more information (the appendix explains how to get the code).

   An Implementation of Unicode Normalisation

   Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.

   https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0

The development branch also contains a work in progress implementation of Unicode Collation.

Much work remains to be done and contributions are more than welcome.

Sven & Henrik






--
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] The Pharo Unicode Project

Sven Van Caekenberghe-2
Thank you, Stef & Mariano.

I must say that this is a case of positive feedback, many people's efforts (like of you both, but of many, many others as well), is infectious, it stimulates me to do my best too.

So let me just say the same: thank you all for making Pharo what it is today.

Sven

> On 19 Feb 2016, at 02:02, Mariano Martinez Peck <[hidden email]> wrote:
>
> Wow guys. Unbelievable. Amazing. Coming from you and Henry, I really don't expect anything else than this :)
> Sven I also love your midium posts.
>
> Thanks!!!
>
> On Thu, Feb 18, 2016 at 6:22 PM, stepharo <[hidden email]> wrote:
> I ***LOVE*** the documentation :)
>
>
> Le 17/2/16 10:16, Sven Van Caekenberghe a écrit :
>
> Hi,
>
> In Pharo we can deal with and represent any Unicode character and string, but there are still some important pieces of functionality missing.
>
> The goal of the Pharo Unicode project is to gradually improve and expand Unicode support in Pharo. We started in December last year to lay the foundation and are now ready to go public with what we built.
>
> The first delivery is an implementation of Unicode Normalization, together with an implementation of the Unicode Character Database.
>
> Please read the following article for more information (the appendix explains how to get the code).
>
>    An Implementation of Unicode Normalisation
>
>    Streaming NFC, NFD, NFKC & NFKD, normalization QC and normalization preserving concatenation.
>
>    https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43#.qmy18gky0
>
> The development branch also contains a work in progress implementation of Unicode Collation.
>
> Much work remains to be done and contributions are more than welcome.
>
> Sven & Henrik
>
>
>
>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com