Artefact and WideString

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Artefact and WideString

HilaireFernandes
Hello,

Is it me, or is there a problem with WideString and Artefact?

The following produces an empty page:

page add: ((PDFFormattedTextElement from: 10mm @ 20mm to: 277mm @ 30mm)
      alignment: PDFAlignment left;
      text:  'Argent de poche : 1€').

The text is WideString.
Removing the €, it went ok.

Hilaire


--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Olivier Auverlot
Hi Hilaire,

Take a look in the Artefact demos. I think there are a PDF document with a monetary character.


2015-10-21 22:02 GMT+02:00 Hilaire <[hidden email]>:
Hello,

Is it me, or is there a problem with WideString and Artefact?

The following produces an empty page:

page add: ((PDFFormattedTextElement from: 10mm @ 20mm to: 277mm @ 30mm)
      alignment: PDFAlignment left;
      text:  'Argent de poche : 1€').

The text is WideString.
Removing the €, it went ok.

Hilaire


--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu




Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

HilaireFernandes
Le 21/10/2015 22:26, olivier auverlot a écrit :
> Hi Hilaire,
>
> Take a look in the Artefact demos. I think there are a PDF document
> with a monetary character.
>

Indeed ((128 asCharacter) asString). But this character does not print
on the web with Unicode encoding.
Unified support will be great, object with EURO symbol could be printed
as is both in Seaside and Artefact.
So not sure what's wrong.


Thanks

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Sven Van Caekenberghe-2

> On 22 Oct 2015, at 11:14, Hilaire <[hidden email]> wrote:
>
> Le 21/10/2015 22:26, olivier auverlot a écrit :
>> Hi Hilaire,
>>
>> Take a look in the Artefact demos. I think there are a PDF document
>> with a monetary character.
>>
>
> Indeed ((128 asCharacter) asString).

I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.

https://en.wikipedia.org/wiki/Euro_sign

> But this character does not print
> on the web with Unicode encoding.
> Unified support will be great, object with EURO symbol could be printed
> as is both in Seaside and Artefact.
> So not sure what's wrong.
>
>
> Thanks
>
> Hilaire
>
> --
> Dr. Geo
> http://drgeo.eu
> http://google.com/+DrgeoEu
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

HilaireFernandes
Le 22/10/2015 12:01, Sven Van Caekenberghe a écrit :
> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
>
> https://en.wikipedia.org/wiki/Euro_sign
Indeed. I guess the 128 value is for 8 bits char encoding, and it is the
one required by Artefact.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu



Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Sabine Manaa
I do also use two different implementations for artefact/pdf and html:

artefact: 
128 asCharacter asString

html:
'€'

same would be great




2015-10-22 15:11 GMT+02:00 HilaireFernandes [via Smalltalk] <[hidden email]>:
Le 22/10/2015 12:01, Sven Van Caekenberghe a écrit :
> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
>
> https://en.wikipedia.org/wiki/Euro_sign
Indeed. I guess the 128 value is for 8 bits char encoding, and it is the
one required by Artefact.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu






If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/Artefact-and-WideString-tp4857147p4857323.html
To start a new topic under Pharo Smalltalk Users, email [hidden email]
To unsubscribe from Pharo Smalltalk Users, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Sven Van Caekenberghe-2

> On 22 Oct 2015, at 15:31, Sabine Manaa <[hidden email]> wrote:
>
> I do also use two different implementations for artefact/pdf and html:
>
> artefact:
> 128 asCharacter asString

I am still very curious to know in which character encoding that is the case ?

https://en.wikipedia.org/wiki/Currency_sign_(typography)

> html:
> '€'
>
> same would be great
>
>
>
>
> 2015-10-22 15:11 GMT+02:00 HilaireFernandes [via Smalltalk] <[hidden email]>:
> Le 22/10/2015 12:01, Sven Van Caekenberghe a écrit :
> > I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
> >
> > https://en.wikipedia.org/wiki/Euro_sign
> Indeed. I guess the 128 value is for 8 bits char encoding, and it is the
> one required by Artefact.
>
> Hilaire
>
> --
> Dr. Geo
> http://drgeo.eu
> http://google.com/+DrgeoEu
>
>
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://forum.world.st/Artefact-and-WideString-tp4857147p4857323.html
> To start a new topic under Pharo Smalltalk Users, email [hidden email]
> To unsubscribe from Pharo Smalltalk Users, click here.
> NAML
>
>
> View this message in context: Re: Artefact and WideString
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Stephan Eggermont-3
In reply to this post by Sven Van Caekenberghe-2
On 22/10/15 12:01, Sven Van Caekenberghe wrote:

>
>> On 22 Oct 2015, at 11:14, Hilaire <[hidden email]> wrote:
>>
>> Le 21/10/2015 22:26, olivier auverlot a écrit :
>>> Hi Hilaire,
>>>
>>> Take a look in the Artefact demos. I think there are a PDF document
>>> with a monetary character.
>>>
>>
>> Indeed ((128 asCharacter) asString).
>
> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.

Yes, it might be ISO-Latin-1. There are several codepages having € at
128. PDF support several encodings, I don't know what Artefact uses by
default.

Stephan



Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Henrik Sperre Johansen
In reply to this post by Sabine Manaa

On 22 Oct 2015, at 3:31 , Sabine Manaa <[hidden email]> wrote:

I do also use two different implementations for artefact/pdf and html:

artefact: 
128 asCharacter asString

html:
'€'

same would be great

4.9 character
"character numeric code representing an abstract symbol according to some defined character encoding rule 
NOTE 1 There are three manifestations of characters in PDF, depending on context: 
• A PDF file is represented as a sequence of 8-bit bytes, some of which are interpreted as character codes in the ASCII character set and some of which are treated as arbitrary binary data depending upon the context. 
• The contents (data) of a string or stream object in some contexts are interpreted as character codes in the PDFDocEncoding or UTF-16 character set. 
• The contents of a string within a PDF content stream in some situations are interpreted as character codes that select glyphs to be drawn on the page according to a character encoding that is associated with the text font. "

What those contexts are, I don't know, but they all need to be handled differently;
- For bullet one, there's nothing to do.
- For bullet 2, there needs to be an encoding layer which converts the strings to proper format when writing the PDF, see section 7.9.2. 
Seems to me the process would be simpler when writing the file if one ignored PDFDocEncoding altogether and eiter write ascii, or convert to BOM-marked UTF16 (in the same way we write ASCII or BOM-marked UTF8 for chunk files)
- For bullet 3, one would need to convert to the fonts character set.

Cheers
Henry

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Sven Van Caekenberghe-2
In reply to this post by Stephan Eggermont-3

> On 22 Oct 2015, at 16:00, Stephan Eggermont <[hidden email]> wrote:
>
> On 22/10/15 12:01, Sven Van Caekenberghe wrote:
>>
>>> On 22 Oct 2015, at 11:14, Hilaire <[hidden email]> wrote:
>>>
>>> Le 21/10/2015 22:26, olivier auverlot a écrit :
>>>> Hi Hilaire,
>>>>
>>>> Take a look in the Artefact demos. I think there are a PDF document
>>>> with a monetary character.
>>>>
>>>
>>> Indeed ((128 asCharacter) asString).
>>
>> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
>
> Yes, it might be ISO-Latin-1. There are several codepages having € at 128. PDF support several encodings, I don't know what Artefact uses by default.
>
> Stephan

Indeed, I spoke too quickly, several indeed do (44 out of 69 defined):

ZnByteEncoder knownEncodingIdentifiers select: [ :each |
  (ZnByteEncoder newForEncoding: each) characterDomain includes: $€ ].
       
(ZnByteEncoder knownEncodingIdentifiers
   collect: [ :each | ZnByteEncoder newForEncoding: each ])
     select: [ :each | each characterDomain includes: $€ ]
     thenCollect: [ :each | each identifier -> (each encodeString: $€ asString) ].

Most but not all use 128 as encoding. But Latin1 is not one of them (at least not in the strict interpretation).
       
ZnCharacterEncoder latin1 encodeString: $€ asString.

Sven



Reply | Threaded
Open this post in threaded view
|

Re: Artefact and WideString

Stephan Eggermont-3
On 22-10-15 16:16, Sven Van Caekenberghe wrote:
> Most but not all use 128 as encoding. But Latin1 is not one of them (at least not in the strict interpretation).

Hmm, you can't trust anything you read on the internet anymore:)
CP1252, legacy Windows it is.

Stephan