Smalltalk › Pharo › Pharo Smalltalk Users

Artefact and WideString

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

11 messages Options

HilaireFernandes

Artefact and WideString

Hello,

Is it me, or is there a problem with WideString and Artefact?

The following produces an empty page:

page add: ((PDFFormattedTextElement from: 10mm @ 20mm to: 277mm @ 30mm)
alignment: PDFAlignment left;
text: 'Argent de poche : 1€').

The text is WideString.
Removing the €, it went ok.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

http://drgeo.eu

Olivier Auverlot

Re: Artefact and WideString

Hi Hilaire,

Take a look in the Artefact demos. I think there are a PDF document with a monetary character.

2015-10-21 22:02 GMT+02:00 Hilaire <[hidden email]>:

Hello,

Is it me, or is there a problem with WideString and Artefact?

The following produces an empty page:

page add: ((PDFFormattedTextElement from: 10mm @ 20mm to: 277mm @ 30mm)
alignment: PDFAlignment left;
text: 'Argent de poche : 1€').

The text is WideString.
Removing the €, it went ok.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

HilaireFernandes

Re: Artefact and WideString

Le 21/10/2015 22:26, olivier auverlot a écrit :
> Hi Hilaire,
>
> Take a look in the Artefact demos. I think there are a PDF document
> with a monetary character.
>

Indeed ((128 asCharacter) asString). But this character does not print
on the web with Unicode encoding.
Unified support will be great, object with EURO symbol could be printed
as is both in Seaside and Artefact.
So not sure what's wrong.

Thanks

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

http://drgeo.eu

Sven Van Caekenberghe-2

Re: Artefact and WideString

> On 22 Oct 2015, at 11:14, Hilaire <[hidden email]> wrote:
>
> Le 21/10/2015 22:26, olivier auverlot a écrit :
>> Hi Hilaire,
>>
>> Take a look in the Artefact demos. I think there are a PDF document
>> with a monetary character.
>>
>
> Indeed ((128 asCharacter) asString).

I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.

https://en.wikipedia.org/wiki/Euro_sign

> But this character does not print
> on the web with Unicode encoding.
> Unified support will be great, object with EURO symbol could be printed
> as is both in Seaside and Artefact.
> So not sure what's wrong.
>
>
> Thanks
>
> Hilaire
>
> --
> Dr. Geo
> http://drgeo.eu
> http://google.com/+DrgeoEu
>
>
>

HilaireFernandes

Re: Artefact and WideString

Le 22/10/2015 12:01, Sven Van Caekenberghe a écrit :
> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
>
> https://en.wikipedia.org/wiki/Euro_sign
Indeed. I guess the 128 value is for 8 bits char encoding, and it is the
one required by Artefact.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

http://drgeo.eu

Sabine Manaa

Re: Artefact and WideString

I do also use two different implementations for artefact/pdf and html:

artefact:

128 asCharacter asString

html:

'€'

same would be great

2015-10-22 15:11 GMT+02:00 HilaireFernandes [via Smalltalk] <[hidden email]>:

Le 22/10/2015 12:01, Sven Van Caekenberghe a écrit :
> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
>
> https://en.wikipedia.org/wiki/Euro_sign
Indeed. I guess the 128 value is for 8 bits char encoding, and it is the
one required by Artefact.

Hilaire

--
Dr. Geo
http://drgeo.eu
http://google.com/+DrgeoEu

If you reply to this email, your message will be added to the discussion below:
http://forum.world.st/Artefact-and-WideString-tp4857147p4857323.html

To start a new topic under Pharo Smalltalk Users, email [hidden email]
To unsubscribe from Pharo Smalltalk Users, click here.
NAML

Sven Van Caekenberghe-2

Re: Artefact and WideString

> On 22 Oct 2015, at 15:31, Sabine Manaa <[hidden email]> wrote:
>
> I do also use two different implementations for artefact/pdf and html:
>
> artefact:
> 128 asCharacter asString

I am still very curious to know in which character encoding that is the case ?

https://en.wikipedia.org/wiki/Currency_sign_(typography)

> html:
> '€'
>
> same would be great
>
>
>
>
> 2015-10-22 15:11 GMT+02:00 HilaireFernandes [via Smalltalk] <[hidden email]>:
> Le 22/10/2015 12:01, Sven Van Caekenberghe a écrit :
> > I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
> >
> > https://en.wikipedia.org/wiki/Euro_sign
> Indeed. I guess the 128 value is for 8 bits char encoding, and it is the
> one required by Artefact.
>
> Hilaire
>
> --
> Dr. Geo
> http://drgeo.eu
> http://google.com/+DrgeoEu
>
>
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://forum.world.st/Artefact-and-WideString-tp4857147p4857323.html
> To start a new topic under Pharo Smalltalk Users, email [hidden email]
> To unsubscribe from Pharo Smalltalk Users, click here.
> NAML
>
>
> View this message in context: Re: Artefact and WideString
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.

Stephan Eggermont-3

Re: Artefact and WideString

In reply to this post by Sven Van Caekenberghe-2

On 22/10/15 12:01, Sven Van Caekenberghe wrote:

>
>> On 22 Oct 2015, at 11:14, Hilaire <[hidden email]> wrote:
>>
>> Le 21/10/2015 22:26, olivier auverlot a écrit :
>>> Hi Hilaire,
>>>
>>> Take a look in the Artefact demos. I think there are a PDF document
>>> with a monetary character.
>>>
>>
>> Indeed ((128 asCharacter) asString).
>
> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.

Yes, it might be ISO-Latin-1. There are several codepages having € at
128. PDF support several encodings, I don't know what Artefact uses by
default.

Stephan

Henrik Sperre Johansen

Re: Artefact and WideString

In reply to this post by Sabine Manaa

On 22 Oct 2015, at 3:31 , Sabine Manaa <[hidden email]> wrote:

I do also use two different implementations for artefact/pdf and html:

artefact:
128 asCharacter asString

html:
'€'

same would be great

https://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

4.9 character

"character numeric code representing an abstract symbol according to some defined character encoding rule

NOTE 1 There are three manifestations of characters in PDF, depending on context:

• A PDF file is represented as a sequence of 8-bit bytes, some of which are interpreted as character codes in the ASCII character set and some of which are treated as arbitrary binary data depending upon the context.

• The contents (data) of a string or stream object in some contexts are interpreted as character codes in the PDFDocEncoding or UTF-16 character set.

• The contents of a string within a PDF content stream in some situations are interpreted as character codes that select glyphs to be drawn on the page according to a character encoding that is associated with the text font. "

What those contexts are, I don't know, but they all need to be handled differently;

- For bullet one, there's nothing to do.

- For bullet 2, there needs to be an encoding layer which converts the strings to proper format when writing the PDF, see section 7.9.2.

Seems to me the process would be simpler when writing the file if one ignored PDFDocEncoding altogether and eiter write ascii, or convert to BOM-marked UTF16 (in the same way we write ASCII or BOM-marked UTF8 for chunk files)

- For bullet 3, one would need to convert to the fonts character set.

Cheers

Henry

signature.asc (859 bytes) Download Attachment

Sven Van Caekenberghe-2

Re: Artefact and WideString

In reply to this post by Stephan Eggermont-3

> On 22 Oct 2015, at 16:00, Stephan Eggermont <[hidden email]> wrote:
>
> On 22/10/15 12:01, Sven Van Caekenberghe wrote:
>>
>>> On 22 Oct 2015, at 11:14, Hilaire <[hidden email]> wrote:
>>>
>>> Le 21/10/2015 22:26, olivier auverlot a écrit :
>>>> Hi Hilaire,
>>>>
>>>> Take a look in the Artefact demos. I think there are a PDF document
>>>> with a monetary character.
>>>>
>>>
>>> Indeed ((128 asCharacter) asString).
>>
>> I am pretty sure this is wrong. The Unicode code point for the Euro symbol is decimal 8364 and not 128.
>
> Yes, it might be ISO-Latin-1. There are several codepages having € at 128. PDF support several encodings, I don't know what Artefact uses by default.
>
> Stephan

Indeed, I spoke too quickly, several indeed do (44 out of 69 defined):

ZnByteEncoder knownEncodingIdentifiers select: [ :each |
(ZnByteEncoder newForEncoding: each) characterDomain includes: $€ ].

(ZnByteEncoder knownEncodingIdentifiers
collect: [ :each | ZnByteEncoder newForEncoding: each ])
select: [ :each | each characterDomain includes: $€ ]
thenCollect: [ :each | each identifier -> (each encodeString: $€ asString) ].

Most but not all use 128 as encoding. But Latin1 is not one of them (at least not in the strict interpretation).

ZnCharacterEncoder latin1 encodeString: $€ asString.

Sven

Stephan Eggermont-3

Re: Artefact and WideString

On 22-10-15 16:16, Sven Van Caekenberghe wrote:
> Most but not all use 128 as encoding. But Latin1 is not one of them (at least not in the strict interpretation).

Hmm, you can't trust anything you read on the internet anymore:)
CP1252, legacy Windows it is.

Stephan