Umlauts in ZnUrl

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Umlauts in ZnUrl

Udo Schneider
All,

What's the expected behavior with non-ASCII characters in URLs. Let's
say I want to access a file named "äöü.txt" - My assumption was that
Zinc takes care of the UTF-8 -> 7bit (ASCII) -> Escape encoding. But
there is either something I don't understand or some manual steps I'm
missing.

The "straightforward" way doesn't work:
'http://myhost/path/with/umlaut/äöü.txt' asUrl.
"ZnCharacterEncodingError: ASCII character expected"

Although the actual encoding seems to be able to handle it (ignoring the
escapes slashes for the moment:
'http://myhost/path/with/umlaut/äöü.txt' urlEncoded.
"'http%3A%2F%2Fmyhost%2Fpath%2Fwith%2Fumlaut%2F%C3%A4%C3%B6%C3%BC.txt'"

Creating a URL from already escaped characters works as well:
'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' asUrl.
"http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt"

As does the decoding of such an URL:
'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' urlDecoded.
"'http://myhost/path/with/umlaut/äöü.txt'"

At them moment I'm manually encoding UTF-8 characters in paths segments
before trying to build the URL. But is this the correct way?

Best Regards,

Udo



Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Sven Van Caekenberghe-2
Hi Udo,

With a URL/URI there are two representations: the external one (the way they are written) and the internal one (what is really meant). ZnUrl follows this distinction.

When you say #asUrl (or #asZnUrl) you are actually parsing an external string representation. When doing so, percent decoding is done by ZnPercentEncoder. This class is strict, in that it does not allow non-safe, non-ascii characters in its input. AFAIK this is correct, but I can imagine a less strict interpretation (like the URL input box of a browser would allow). If you have a reading of the specs that says otherwise I would be very interested.

To save you from doing the encoding yourself, you have to construct the URL from its parts explicitly, like this:

ZnUrl new
  scheme: #http;
  host: 'myhost';
  addPathSegments: #('path' 'with' 'unlaut' 'äöü.txt');
  yourself.  

 => http://myhost/path/with/unlaut/%C3%A4%C3%B6%C3%BC.txt

Class comments and unit tests should help.

There is also this draft:

  http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/

HTH,

Sven

PS: Incidentally, this does work

  'http://myhost/path/with/umlaut/äöü.txt' asFileReference asUrl.

because #asFileReference works differently.

> On 02 Dec 2014, at 23:32, Udo Schneider <[hidden email]> wrote:
>
> All,
>
> What's the expected behavior with non-ASCII characters in URLs. Let's say I want to access a file named "äöü.txt" - My assumption was that Zinc takes care of the UTF-8 -> 7bit (ASCII) -> Escape encoding. But there is either something I don't understand or some manual steps I'm missing.
>
> The "straightforward" way doesn't work:
> 'http://myhost/path/with/umlaut/äöü.txt' asUrl. "ZnCharacterEncodingError: ASCII character expected"
>
> Although the actual encoding seems to be able to handle it (ignoring the escapes slashes for the moment:
> 'http://myhost/path/with/umlaut/äöü.txt' urlEncoded.
> "'http%3A%2F%2Fmyhost%2Fpath%2Fwith%2Fumlaut%2F%C3%A4%C3%B6%C3%BC.txt'"
>
> Creating a URL from already escaped characters works as well:
> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' asUrl.
> "http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt"
>
> As does the decoding of such an URL:
> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' urlDecoded.
> "'http://myhost/path/with/umlaut/äöü.txt'"
>
> At them moment I'm manually encoding UTF-8 characters in paths segments before trying to build the URL. But is this the correct way?
>
> Best Regards,
>
> Udo
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Udo Schneider
Hi Sven,

the distinction between external and internal representations was the
part I didn't get - but it really makes sense.

Your approach with adding pathSegments works perfectly! I'm working with
ZnClient though and reusing it for different requests. So I added the
following two (three) methods to make my life easier:

ZnClient>>#addPathSegments: pathSegments
        "Modify the receiver's path by adding the elements of pathSegments at
the end"

        pathSegments do: [ :each | self addPathSegment: each ]

ZnClient>>#resetPath
        self path: ''

I didn't want to overwrite #path: because I only need this for some
special edge case.

ZnClient>>#webdavPath: path
        self
                resetPath;
                addPathSegments: ($/ split: path)
BTW: The whole Zinc framework is a real pleasure to work with. Once I
got used to thinking in terms of objects and not only strings I didn't
want to look back :-)

CU,

Udo



On 03/12/14 00:02, Sven Van Caekenberghe wrote:

> Hi Udo,
>
> With a URL/URI there are two representations: the external one (the way they are written) and the internal one (what is really meant). ZnUrl follows this distinction.
>
> When you say #asUrl (or #asZnUrl) you are actually parsing an external string representation. When doing so, percent decoding is done by ZnPercentEncoder. This class is strict, in that it does not allow non-safe, non-ascii characters in its input. AFAIK this is correct, but I can imagine a less strict interpretation (like the URL input box of a browser would allow). If you have a reading of the specs that says otherwise I would be very interested.
>
> To save you from doing the encoding yourself, you have to construct the URL from its parts explicitly, like this:
>
> ZnUrl new
>    scheme: #http;
>    host: 'myhost';
>    addPathSegments: #('path' 'with' 'unlaut' 'äöü.txt');
>    yourself.
>
>   => http://myhost/path/with/unlaut/%C3%A4%C3%B6%C3%BC.txt
>
> Class comments and unit tests should help.
>
> There is also this draft:
>
>    http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/
>
> HTH,
>
> Sven
>
> PS: Incidentally, this does work
>
>    'http://myhost/path/with/umlaut/äöü.txt' asFileReference asUrl.
>
> because #asFileReference works differently.
>
>> On 02 Dec 2014, at 23:32, Udo Schneider <[hidden email]> wrote:
>>
>> All,
>>
>> What's the expected behavior with non-ASCII characters in URLs. Let's say I want to access a file named "äöü.txt" - My assumption was that Zinc takes care of the UTF-8 -> 7bit (ASCII) -> Escape encoding. But there is either something I don't understand or some manual steps I'm missing.
>>
>> The "straightforward" way doesn't work:
>> 'http://myhost/path/with/umlaut/äöü.txt' asUrl. "ZnCharacterEncodingError: ASCII character expected"
>>
>> Although the actual encoding seems to be able to handle it (ignoring the escapes slashes for the moment:
>> 'http://myhost/path/with/umlaut/äöü.txt' urlEncoded.
>> "'http%3A%2F%2Fmyhost%2Fpath%2Fwith%2Fumlaut%2F%C3%A4%C3%B6%C3%BC.txt'"
>>
>> Creating a URL from already escaped characters works as well:
>> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' asUrl.
>> "http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt"
>>
>> As does the decoding of such an URL:
>> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' urlDecoded.
>> "'http://myhost/path/with/umlaut/äöü.txt'"
>>
>> At them moment I'm manually encoding UTF-8 characters in paths segments before trying to build the URL. But is this the correct way?
>>
>> Best Regards,
>>
>> Udo
>>
>>
>>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Sven Van Caekenberghe-2

> On 03 Dec 2014, at 08:49, Udo Schneider <[hidden email]> wrote:
>
> Hi Sven,
>
> the distinction between external and internal representations was the part I didn't get - but it really makes sense.

OK.

> Your approach with adding pathSegments works perfectly! I'm working with ZnClient though and reusing it for different requests. So I added the following two (three) methods to make my life easier:
>
> ZnClient>>#addPathSegments: pathSegments
> "Modify the receiver's path by adding the elements of pathSegments at the end"
>
> pathSegments do: [ :each | self addPathSegment: each ]

Hmm, have you seen that ZnClient>>#addPath: actually does this already for non-string arguments (assumed to be collections) ?

> ZnClient>>#resetPath
> self path: ''

Yeah, that is maybe useful (there is already #resetEntity).

I am also reusing ZnClient instances and I never needed to reset the path like that, I always do #path: and #url: I guess - mostly the second.

> I didn't want to overwrite #path: because I only need this for some special edge case.
>
> ZnClient>>#webdavPath: path
> self
> resetPath;
> addPathSegments: ($/ split: path)

That is too specific, to add for everyone I think. Now, the #split: behaviour is already in there.

But I also forgot to mention earlier that this works too:

'http://myhost.com' asUrl / 'foo' / 'élève-ümlaut.txt'.

It is about as short as it gets, right ?

Here is how to get the #split: behaviour:

'http://myhost.com' asUrl / 'foo/élève-ümlaut.txt'

When adding a string, the argument is split on / automatically.

> BTW: The whole Zinc framework is a real pleasure to work with. Once I got used to thinking in terms of objects and not only strings I didn't want to look back :-)

Thanks.

> CU,
>
> Udo
>
>
>
> On 03/12/14 00:02, Sven Van Caekenberghe wrote:
>> Hi Udo,
>>
>> With a URL/URI there are two representations: the external one (the way they are written) and the internal one (what is really meant). ZnUrl follows this distinction.
>>
>> When you say #asUrl (or #asZnUrl) you are actually parsing an external string representation. When doing so, percent decoding is done by ZnPercentEncoder. This class is strict, in that it does not allow non-safe, non-ascii characters in its input. AFAIK this is correct, but I can imagine a less strict interpretation (like the URL input box of a browser would allow). If you have a reading of the specs that says otherwise I would be very interested.
>>
>> To save you from doing the encoding yourself, you have to construct the URL from its parts explicitly, like this:
>>
>> ZnUrl new
>>   scheme: #http;
>>   host: 'myhost';
>>   addPathSegments: #('path' 'with' 'unlaut' 'äöü.txt');
>>   yourself.
>>
>>  => http://myhost/path/with/unlaut/%C3%A4%C3%B6%C3%BC.txt
>>
>> Class comments and unit tests should help.
>>
>> There is also this draft:
>>
>>   http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/
>>
>> HTH,
>>
>> Sven
>>
>> PS: Incidentally, this does work
>>
>>   'http://myhost/path/with/umlaut/äöü.txt' asFileReference asUrl.
>>
>> because #asFileReference works differently.
>>
>>> On 02 Dec 2014, at 23:32, Udo Schneider <[hidden email]> wrote:
>>>
>>> All,
>>>
>>> What's the expected behavior with non-ASCII characters in URLs. Let's say I want to access a file named "äöü.txt" - My assumption was that Zinc takes care of the UTF-8 -> 7bit (ASCII) -> Escape encoding. But there is either something I don't understand or some manual steps I'm missing.
>>>
>>> The "straightforward" way doesn't work:
>>> 'http://myhost/path/with/umlaut/äöü.txt' asUrl. "ZnCharacterEncodingError: ASCII character expected"
>>>
>>> Although the actual encoding seems to be able to handle it (ignoring the escapes slashes for the moment:
>>> 'http://myhost/path/with/umlaut/äöü.txt' urlEncoded.
>>> "'http%3A%2F%2Fmyhost%2Fpath%2Fwith%2Fumlaut%2F%C3%A4%C3%B6%C3%BC.txt'"
>>>
>>> Creating a URL from already escaped characters works as well:
>>> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' asUrl.
>>> "http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt"
>>>
>>> As does the decoding of such an URL:
>>> 'http://myhost/path/with/umlaut/%C3%A4%C3%B6%C3%BC.txt' urlDecoded.
>>> "'http://myhost/path/with/umlaut/äöü.txt'"
>>>
>>> At them moment I'm manually encoding UTF-8 characters in paths segments before trying to build the URL. But is this the correct way?
>>>
>>> Best Regards,
>>>
>>> Udo
>>>
>>>
>>>
>>
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Udo Schneider
On 03.12.2014 10:19, Sven Van Caekenberghe wrote:
> Hmm, have you seen that ZnClient>>#addPath: actually does this already for non-string arguments (assumed to be collections) ?
No I didn't. That's even better for my purposes!

> That is too specific, to add for everyone I think. Now, the #split: behaviour is already in there.
I just deleted this method. I can work with intances of Path - so no
need to #split: Strings anymore.

> 'http://myhost.com' asUrl / 'foo' / 'élève-ümlaut.txt'.
>
> It is about as short as it gets, right ?
>
> Here is how to get the #split: behaviour:
>
> 'http://myhost.com' asUrl / 'foo/élève-ümlaut.txt'
>
> When adding a string, the argument is split on / automatically.
As always I amazed of Smalltalk in general and the design of Zinc
specifically ...

CU,

Udo



Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Sven Van Caekenberghe-2

> On 03 Dec 2014, at 13:54, Udo Schneider <[hidden email]> wrote:
>
> On 03.12.2014 10:19, Sven Van Caekenberghe wrote:
>> Hmm, have you seen that ZnClient>>#addPath: actually does this already for non-string arguments (assumed to be collections) ?
> No I didn't. That's even better for my purposes!
>
>> That is too specific, to add for everyone I think. Now, the #split: behaviour is already in there.
> I just deleted this method. I can work with intances of Path - so no need to #split: Strings anymore.
>
>> 'http://myhost.com' asUrl / 'foo' / 'élève-ümlaut.txt'.
>>
>> It is about as short as it gets, right ?
>>
>> Here is how to get the #split: behaviour:
>>
>> 'http://myhost.com' asUrl / 'foo/élève-ümlaut.txt'
>>
>> When adding a string, the argument is split on / automatically.
> As always I amazed of Smalltalk in general and the design of Zinc specifically ...

Good !

Don't forget: if you get something working that hasn't been done before (it seems you are working on WebDAV), write something short about it to share with the rest of the community.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Udo Schneider
On 03.12.2014 13:57, Sven Van Caekenberghe wrote:
> Don't forget: if you get something working that hasn't been done before (it seems you are working on WebDAV), write something short about it to share with the rest of the community.
For sure! I'm still shuffling around a lot of code. And there is /no/
documentation yet. I'll make an official announcement once I have the
code sorted and documents.

But you should be able to get an impression. Try this:

"Install packages - all tests should be green given a WebDAV Server"
Gofer new
     url: 'http://smalltalkhub.com/mc/UdoSchneider/FileSystemNetwork/main';
     package: 'ConfigurationOfFileSystemNetwork';
     load.
((Smalltalk at: #ConfigurationOfFileSystemNetwork) project version:
#develop) load.

fs := FileSystem webdav: '[URL of a WebDAV server]'.
wd := fs workingDirectory.

"All the FileReference methods should work"
wd children.
wd children first basename.
wd children first isFile.
"wd children first delete."

"You can even open a FileList on the working dir"
FileList openOn: wd.


CU,

Udo



Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Udo Schneider
On 03/12/14 14:41, Udo Schneider wrote:
> ((Smalltalk at: #ConfigurationOfFileSystemNetwork) project version:
> #develop) load.
This should have been

(Smalltalk at: #ConfigurationOfFileSystemNetwork) project version:
#development) load.

of course.

CU,

Udo



Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Sven Van Caekenberghe-2
In reply to this post by Udo Schneider
Cool, playing with file systems ;-)

> On 03 Dec 2014, at 14:41, Udo Schneider <[hidden email]> wrote:
>
> On 03.12.2014 13:57, Sven Van Caekenberghe wrote:
>> Don't forget: if you get something working that hasn't been done before (it seems you are working on WebDAV), write something short about it to share with the rest of the community.
> For sure! I'm still shuffling around a lot of code. And there is /no/ documentation yet. I'll make an official announcement once I have the code sorted and documents.
>
> But you should be able to get an impression. Try this:
>
> "Install packages - all tests should be green given a WebDAV Server"
> Gofer new
>    url: 'http://smalltalkhub.com/mc/UdoSchneider/FileSystemNetwork/main';
>    package: 'ConfigurationOfFileSystemNetwork';
>    load.
> ((Smalltalk at: #ConfigurationOfFileSystemNetwork) project version: #develop) load.
>
> fs := FileSystem webdav: '[URL of a WebDAV server]'.
> wd := fs workingDirectory.
>
> "All the FileReference methods should work"
> wd children.
> wd children first basename.
> wd children first isFile.
> "wd children first delete."
>
> "You can even open a FileList on the working dir"
> FileList openOn: wd.
>
>
> CU,
>
> Udo
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Sven Van Caekenberghe-2
In reply to this post by Udo Schneider
Actually, the ugly (Smalltalk at: #XYZ) can be replaced by the shorter and more elegant #XYZ asClass.

But loading a stable or development configuration can be condensed in one single expression:

Gofer it
  smalltalkhubUser: 'UdoSchneider' project: 'FileSystemNetwork';
  configurationOf;
  loadDevelopment.

Cool, right ?

> On 03 Dec 2014, at 15:57, Udo Schneider <[hidden email]> wrote:
>
> On 03/12/14 14:41, Udo Schneider wrote:
>> ((Smalltalk at: #ConfigurationOfFileSystemNetwork) project version:
>> #develop) load.
> This should have been
>
> (Smalltalk at: #ConfigurationOfFileSystemNetwork) project version: #development) load.
>
> of course.
>
> CU,
>
> Udo
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Udo Schneider
Cool indeed!

Didn't I mention that I'm always (again) amazed by Smalltalk/Pharo?

CU,

Udo


On 03/12/14 21:08, Sven Van Caekenberghe wrote:

> Actually, the ugly (Smalltalk at: #XYZ) can be replaced by the shorter and more elegant #XYZ asClass.
>
> But loading a stable or development configuration can be condensed in one single expression:
>
> Gofer it
>    smalltalkhubUser: 'UdoSchneider' project: 'FileSystemNetwork';
>    configurationOf;
>    loadDevelopment.
>
> Cool, right ?
>
>> On 03 Dec 2014, at 15:57, Udo Schneider <[hidden email]> wrote:
>>
>> On 03/12/14 14:41, Udo Schneider wrote:
>>> ((Smalltalk at: #ConfigurationOfFileSystemNetwork) project version:
>>> #develop) load.
>> This should have been
>>
>> (Smalltalk at: #ConfigurationOfFileSystemNetwork) project version: #development) load.
>>
>> of course.
>>
>> CU,
>>
>> Udo
>>
>>
>>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Umlauts in ZnUrl

Udo Schneider
In reply to this post by Sven Van Caekenberghe-2
It started as a simple extension of your WebDAV stuff. The filesystem
integration started soon afterwards with a simple "wouldn't it be cool"
thought ... :-)

CU,

Udo


On 03/12/14 20:59, Sven Van Caekenberghe wrote:

> Cool, playing with file systems ;-)
>
>> On 03 Dec 2014, at 14:41, Udo Schneider <[hidden email]> wrote:
>>
>> On 03.12.2014 13:57, Sven Van Caekenberghe wrote:
>>> Don't forget: if you get something working that hasn't been done before (it seems you are working on WebDAV), write something short about it to share with the rest of the community.
>> For sure! I'm still shuffling around a lot of code. And there is /no/ documentation yet. I'll make an official announcement once I have the code sorted and documents.
>>
>> But you should be able to get an impression. Try this:
>>
>> "Install packages - all tests should be green given a WebDAV Server"
>> Gofer new
>>     url: 'http://smalltalkhub.com/mc/UdoSchneider/FileSystemNetwork/main';
>>     package: 'ConfigurationOfFileSystemNetwork';
>>     load.
>> ((Smalltalk at: #ConfigurationOfFileSystemNetwork) project version: #develop) load.
>>
>> fs := FileSystem webdav: '[URL of a WebDAV server]'.
>> wd := fs workingDirectory.
>>
>> "All the FileReference methods should work"
>> wd children.
>> wd children first basename.
>> wd children first isFile.
>> "wd children first delete."
>>
>> "You can even open a FileList on the working dir"
>> FileList openOn: wd.
>>
>>
>> CU,
>>
>> Udo
>>
>>
>>
>
>
>