ZnUrls with Non-ASCII characters

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

ZnUrls with Non-ASCII characters

Sean P. DeNigris
Administrator
'https://en.wiktionary.org/wiki/prêt#French' asUrl ==>
ZnCharacterEncodingError: ASCII character expected. Ideas?



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Peter Kenny
Sean

The trick is to url encode the bit that contains the accented characters. In your case, try:
('https://en.wiktionary.org/wiki/','prêt#French' urlEncoded ) asUrl

If you use urlEncoded on the whole string, the encoded slashes seem to confuse things.

HTH

Peter Kenny


-----Original Message-----
From: Pharo-users [mailto:[hidden email]] On Behalf Of Sean P. DeNigris
Sent: 07 December 2017 14:49
To: [hidden email]
Subject: [Pharo-users] ZnUrls with Non-ASCII characters

'https://en.wiktionary.org/wiki/prêt#French' asUrl ==>
ZnCharacterEncodingError: ASCII character expected. Ideas?



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html


Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Sven Van Caekenberghe-2
In reply to this post by Sean P. DeNigris


> On 7 Dec 2017, at 15:49, Sean P. DeNigris <[hidden email]> wrote:
>
> 'https://en.wiktionary.org/wiki/prêt#French' asUrl ==>
> ZnCharacterEncodingError: ASCII character expected. Ideas?

Non-ASCII characters are not allowed in a URL (in its external string representation, the input of the parser), they must be encoded.

When you construct a URL from parts, the encoding will be done for you, as you specify unencoded elements.

'https://en.wiktionary.org/wiki' asUrl addPathSegment: 'prêt'; fragment: #French; yourself.

  => https://en.wiktionary.org/wiki/pr%C3%AAt#French

'https://en.wiktionary.org/wiki/pr%C3%AAt#French' asUrl.

And you would be correct to remark that web browsers do allow this, which is more a UI thing.

> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Sean P. DeNigris
Administrator
Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Sven Van Caekenberghe-2


> On 7 Dec 2017, at 18:00, Sean P. DeNigris <[hidden email]> wrote:
>
> Why not:
> 'https://en.wiktionary.org/wiki/prêt#French' asUrl  =>
> https://en.wiktionary.org/wiki/pr%C3%AAt#French
> ?

#asUrl invokes the URL parser that takes the EXTERNAL string representation of a URL as input. It stays strict to the specification and as such does not allow non-encoded non-ascii characters.

If you construct a URL from INTERNAL parts, that's different.

It would probably be possible to write a more lenient parser as opposed to a strict one. I have not yet given that idea much thought.

> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Sean P. DeNigris
Administrator
Sven Van Caekenberghe-2 wrote
> It would probably be possible to write a more lenient parser as opposed to
> a strict one. I have not yet given that idea much thought.

K, thanks for the explanation and assistance.



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Sean P. DeNigris
Administrator
In reply to this post by Sven Van Caekenberghe-2
Sven Van Caekenberghe-2 wrote
>> 'https://en.wiktionary.org/wiki/prêt#French' asUrl  =>
>> https://en.wiktionary.org/wiki/pr%C3%AAt#French
>> ?
>
> It would probably be possible to write a more lenient parser as opposed to
> a strict one. I have not yet given that idea much thought.

I ran into this issue again. The problem is that I got the URL from an
outside source, so I don't have the luxury of constructing it bit by bit.
The non-ASCII stuff this time is in the query, but I can't turn it into a
URL to parse it to safely separate out the query :/



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: ZnUrls with Non-ASCII characters

Sean P. DeNigris
Administrator
Sean P. DeNigris wrote
> I ran into this issue again.

I found a few other threads where this came up before over the years, but
seemed unresolved. Just after I posted I found one from 2014 [1] where you
shared a trick that worked!

Namely, 'http://myhost/path/with/umlaut/äöü.txt' asFileReference asUrl.

1. http://forum.world.st/Umlauts-in-ZnUrl-tp4793736.html



-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Cheers,
Sean