STON encoding of slashes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

STON encoding of slashes

demarey
Hi,

I just noticed that STON encoding of forward slashes changed.

STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'

It used to be ''[hidden email]:foo/bar.git’’.

Is it on purpose?

Thanks,
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Sven Van Caekenberghe-2
Hi Christophe,

> On 18 Jan 2017, at 09:30, Christophe Demarey <[hidden email]> wrote:
>
> Hi,
>
> I just noticed that STON encoding of forward slashes changed.
>
> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
>
> It used to be ''[hidden email]:foo/bar.git’’.
>
> Is it on purpose?
>
> Thanks,
> Christophe

Yes, it was 'changed' in

Name: STON-Core-SvenVanCaekenberghe.78
Author: SvenVanCaekenberghe
Time: 20 November 2016, 7:48:23.799323 pm
UUID: ee198da8-80e8-4944-ba9b-dffae331c57c
Ancestors: STON-Core-SvenVanCaekenberghe.77

with as comment

Fix the encoding of forward slash ($/). This was already in the (external) documentation.

In other words, it was an implementation error (omission). Note that JSON also has this escape.

Now, in retrospect, the question is 'should all possible escapes be done when writing ?', especially for $/ this is a maybe a bit too much, the same for tab. For the other named escapes, the case is clearer: they are either necessary $" and $\ or they make things clearer and less error prone, $b, $n, $f and $r.

There is already the #asciiOnly option that decides whether non-ASCII Unicode characters are encoded with \uHHHH or not.

I guess you are human editing STON config files, right ?

Sven



Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

demarey

> Le 18 janv. 2017 à 09:51, Sven Van Caekenberghe <[hidden email]> a écrit :
>
> Hi Christophe,
>
>> On 18 Jan 2017, at 09:30, Christophe Demarey <[hidden email]> wrote:
>>
>> Hi,
>>
>> I just noticed that STON encoding of forward slashes changed.
>>
>> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
>>
>> It used to be ''[hidden email]:foo/bar.git’’.
>>
>> Is it on purpose?
>>
>> Thanks,
>> Christophe
>
> Yes, it was 'changed' in
>
> Name: STON-Core-SvenVanCaekenberghe.78
> Author: SvenVanCaekenberghe
> Time: 20 November 2016, 7:48:23.799323 pm
> UUID: ee198da8-80e8-4944-ba9b-dffae331c57c
> Ancestors: STON-Core-SvenVanCaekenberghe.77
>
> with as comment
>
> Fix the encoding of forward slash ($/). This was already in the (external) documentation.
>
> In other words, it was an implementation error (omission). Note that JSON also has this escape.

ok.

> Now, in retrospect, the question is 'should all possible escapes be done when writing ?', especially for $/ this is a maybe a bit too much, the same for tab. For the other named escapes, the case is clearer: they are either necessary $" and $\ or they make things clearer and less error prone, $b, $n, $f and $r.
>
> There is already the #asciiOnly option that decides whether non-ASCII Unicode characters are encoded with \uHHHH or not.
>
> I guess you are human editing STON config files, right ?

I have kind of STON config files (STON files holding metadata) and I try to keep them user readable / editable.
By example, I do not serialize an url directly with STON because it shows ZnUrl implementation details but I rather serialize the url string. That said, indeed, I would like to avoid escape characters if they are not needed. Maybe there could be an option for that?
Also, maybe I overuse STON that should just be a raw serializer and do not try to « tune » the output to get a user friendly serialization.

Thanks for the answer.
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Peter Uhnak
On Wed, Jan 18, 2017 at 11:11:06AM +0100, Christophe Demarey wrote:
>
> > Le 18 janv. 2017 à 09:51, Sven Van Caekenberghe <[hidden email]> a écrit :
> >
> > Hi Christophe,
> >
> >> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
> >> It used to be ''[hidden email]:foo/bar.git’’.

> > In other words, it was an implementation error (omission). Note that JSON also has this escape.

Yes and no for JSON.

Only " and \ has to be escaped. Escaping anything else will give it special meaning with the exception of / which will just produce the same thing, because it is a special snowflake. :)

quoted from https://tools.ietf.org/html/rfc7159#section-7:

- Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
- Alternatively, there are two-character sequence escape representations of some popular characters.

"/" (U+002F) doesn't fall into control character range, but the alternative section does permit it escaping it.

In other words, JSON strings "\/", and "/", and "\u002f" are equivalent.

But JSON itself doesn't require you to escape "/" (just like you are not required to escape "hi" into "\u0068\u0069", although you can).

Note that other systems do not escape / by default:

(Pharo)
NeoJSONWriter toString: '[hidden email]:foo/bar.git' => "[hidden email]:foo/bar.git"

(JavaScript)
JSON.stringify('[hidden email]:foo/bar.git') => "[hidden email]:foo/bar.git"

(Ruby)
require 'json'
puts '[hidden email]:foo/bar.git'.to_json => "[hidden email]:foo/bar.git"

Peter

Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Sven Van Caekenberghe-2
So talking only about the encoding/writing phase, the conclusion would be

 - not to escape $/
 - escape everything with code points [0,31], using named escapes if they exist, else \uHHHH
 - escape $\ itself

That leaves the question about $' and $".

$' is used in STON as string delimiter, so it has to be escaped.

 - escape $'

Right now, $" is also escaped. Should that remain the case, or only in JSON compatibility mode (where $" is used as string delimiter) ?

 - do not escape $"

In JSON mode, escape $" and not $' then ?

When parsing, all named and other escapes are always accepted, as they are now.

> On 18 Jan 2017, at 15:25, Peter Uhnak <[hidden email]> wrote:
>
> On Wed, Jan 18, 2017 at 11:11:06AM +0100, Christophe Demarey wrote:
>>
>>> Le 18 janv. 2017 à 09:51, Sven Van Caekenberghe <[hidden email]> a écrit :
>>>
>>> Hi Christophe,
>>>
>>>> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
>>>> It used to be ''[hidden email]:foo/bar.git’’.
>
>>> In other words, it was an implementation error (omission). Note that JSON also has this escape.
>
> Yes and no for JSON.
>
> Only " and \ has to be escaped. Escaping anything else will give it special meaning with the exception of / which will just produce the same thing, because it is a special snowflake. :)
>
> quoted from https://tools.ietf.org/html/rfc7159#section-7:
>
> - Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
> - Alternatively, there are two-character sequence escape representations of some popular characters.
>
> "/" (U+002F) doesn't fall into control character range, but the alternative section does permit it escaping it.
>
> In other words, JSON strings "\/", and "/", and "\u002f" are equivalent.
>
> But JSON itself doesn't require you to escape "/" (just like you are not required to escape "hi" into "\u0068\u0069", although you can).
>
> Note that other systems do not escape / by default:
>
> (Pharo)
> NeoJSONWriter toString: '[hidden email]:foo/bar.git' => "[hidden email]:foo/bar.git"
>
> (JavaScript)
> JSON.stringify('[hidden email]:foo/bar.git') => "[hidden email]:foo/bar.git"
>
> (Ruby)
> require 'json'
> puts '[hidden email]:foo/bar.git'.to_json => "[hidden email]:foo/bar.git"
>
> Peter
>


Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Peter Uhnak
On Wed, Jan 18, 2017 at 03:38:17PM +0100, Sven Van Caekenberghe wrote:
> So talking only about the encoding/writing phase, the conclusion would be

10.  Generators

   A JSON generator produces JSON text.  The resulting text MUST
   strictly conform to the JSON grammar.

I guess the ABNF table is the best reference; but we want to generate the most compact form available (=that still conforms to the syntax).

>  - not to escape $/

For covenience's sake compact is better
        * I guess the speciality of \/ is somehow related to JavaScript strings interpreting both '/' and '\/' into '/'?

>  - escape everything with code points [0,31], using named escapes if they exist, else \uHHHH

yes, the named ones have Pharo equivalent too

s := STON fromString: '"\b\f\n\r\t"'.
s asArray "{Character backspace. Character newPage. Character lf. Character cr. Character tab}"

>  - escape $\ itself

yes

>
> That leaves the question about $' and $".
>
> $' is used in STON as string delimiter, so it has to be escaped.
>
>  - escape $'

We already had discussion about $' and JSON http://forum.world.st/STON-doesn-t-produce-valid-JSON-it-shouldn-t-escape-quation-mark-td4923777.html

It must not be escaped in JSON, but that raises question about STON being superset of JSON; is such thing achievable given this disparity?

>
> Right now, $" is also escaped. Should that remain the case, or only in JSON compatibility mode (where $" is used as string delimiter) ?
>
>  - do not escape $"
>
> In JSON mode, escape $" and not $' then ?

That's how it should be in JSON.

> When parsing, all named and other escapes are always accepted, as they are now.

9.  Parsers

   A JSON parser MUST accept all texts that conform to the JSON grammar.
   A JSON parser MAY accept non-JSON forms or extensions.

My question about STON and JSON: What is the benefit of STON being superset of JSON? To me it feels like an arbitrary restriction for STON; would STON benefit from dropping this requirement and instead only worry about good smalltalk object representation? (And leave JSON to NeoJSON or something.)

Peter

> > On 18 Jan 2017, at 15:25, Peter Uhnak <[hidden email]> wrote:
> >
> > On Wed, Jan 18, 2017 at 11:11:06AM +0100, Christophe Demarey wrote:
> >>
> >>> Le 18 janv. 2017 à 09:51, Sven Van Caekenberghe <[hidden email]> a écrit :
> >>>
> >>> Hi Christophe,
> >>>
> >>>> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
> >>>> It used to be ''[hidden email]:foo/bar.git’’.
> >
> >>> In other words, it was an implementation error (omission). Note that JSON also has this escape.
> >
> > Yes and no for JSON.
> >
> > Only " and \ has to be escaped. Escaping anything else will give it special meaning with the exception of / which will just produce the same thing, because it is a special snowflake. :)
> >
> > quoted from https://tools.ietf.org/html/rfc7159#section-7:
> >
> > - Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
> > - Alternatively, there are two-character sequence escape representations of some popular characters.
> >
> > "/" (U+002F) doesn't fall into control character range, but the alternative section does permit it escaping it.
> >
> > In other words, JSON strings "\/", and "/", and "\u002f" are equivalent.
> >
> > But JSON itself doesn't require you to escape "/" (just like you are not required to escape "hi" into "\u0068\u0069", although you can).
> >
> > Note that other systems do not escape / by default:
> >
> > (Pharo)
> > NeoJSONWriter toString: '[hidden email]:foo/bar.git' => "[hidden email]:foo/bar.git"
> >
> > (JavaScript)
> > JSON.stringify('[hidden email]:foo/bar.git') => "[hidden email]:foo/bar.git"
> >
> > (Ruby)
> > require 'json'
> > puts '[hidden email]:foo/bar.git'.to_json => "[hidden email]:foo/bar.git"
> >
> > Peter
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Sven Van Caekenberghe-2

> On 18 Jan 2017, at 16:20, Peter Uhnak <[hidden email]> wrote:
>
> On Wed, Jan 18, 2017 at 03:38:17PM +0100, Sven Van Caekenberghe wrote:
>> So talking only about the encoding/writing phase, the conclusion would be
>
> 10.  Generators
>
>   A JSON generator produces JSON text.  The resulting text MUST
>   strictly conform to the JSON grammar.
>
> I guess the ABNF table is the best reference; but we want to generate the most compact form available (=that still conforms to the syntax).
>
>> - not to escape $/
>
> For covenience's sake compact is better
> * I guess the speciality of \/ is somehow related to JavaScript strings interpreting both '/' and '\/' into '/'?
>
>> - escape everything with code points [0,31], using named escapes if they exist, else \uHHHH
>
> yes, the named ones have Pharo equivalent too
>
> s := STON fromString: '"\b\f\n\r\t"'.
> s asArray "{Character backspace. Character newPage. Character lf. Character cr. Character tab}"
>
>> - escape $\ itself
>
> yes
>
>>
>> That leaves the question about $' and $".
>>
>> $' is used in STON as string delimiter, so it has to be escaped.
>>
>> - escape $'
>
> We already had discussion about $' and JSON http://forum.world.st/STON-doesn-t-produce-valid-JSON-it-shouldn-t-escape-quation-mark-td4923777.html

Yes, I know, that change was OK.

> It must not be escaped in JSON, but that raises question about STON being superset of JSON; is such thing achievable given this disparity?
>
>>
>> Right now, $" is also escaped. Should that remain the case, or only in JSON compatibility mode (where $" is used as string delimiter) ?
>>
>> - do not escape $"
>>
>> In JSON mode, escape $" and not $' then ?
>
> That's how it should be in JSON.
>
>> When parsing, all named and other escapes are always accepted, as they are now.
>
> 9.  Parsers
>
>   A JSON parser MUST accept all texts that conform to the JSON grammar.
>   A JSON parser MAY accept non-JSON forms or extensions.
>
> My question about STON and JSON: What is the benefit of STON being superset of JSON? To me it feels like an arbitrary restriction for STON; would STON benefit from dropping this requirement and instead only worry about good smalltalk object representation? (And leave JSON to NeoJSON or something.)

Being a superset means that you get a simple JSON parser (and even limited writer) for free once you install STON (or once it is part of the Pharo image, as it is now). It also means that we can fall back to the JSON spec as a guide in discussion like this one. It also helps people understand what STON is, by analogy but also differences with JSON.

So my conclusion would be (while writing), always escape $\ and not $/, in pure STON mode (the default), escape $' and not $", in JSON mode, escape $" and not $'. Those would be the changes. Agreed ?

> Peter
>
>>> On 18 Jan 2017, at 15:25, Peter Uhnak <[hidden email]> wrote:
>>>
>>> On Wed, Jan 18, 2017 at 11:11:06AM +0100, Christophe Demarey wrote:
>>>>
>>>>> Le 18 janv. 2017 à 09:51, Sven Van Caekenberghe <[hidden email]> a écrit :
>>>>>
>>>>> Hi Christophe,
>>>>>
>>>>>> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
>>>>>> It used to be ''[hidden email]:foo/bar.git’’.
>>>
>>>>> In other words, it was an implementation error (omission). Note that JSON also has this escape.
>>>
>>> Yes and no for JSON.
>>>
>>> Only " and \ has to be escaped. Escaping anything else will give it special meaning with the exception of / which will just produce the same thing, because it is a special snowflake. :)
>>>
>>> quoted from https://tools.ietf.org/html/rfc7159#section-7:
>>>
>>> - Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
>>> - Alternatively, there are two-character sequence escape representations of some popular characters.
>>>
>>> "/" (U+002F) doesn't fall into control character range, but the alternative section does permit it escaping it.
>>>
>>> In other words, JSON strings "\/", and "/", and "\u002f" are equivalent.
>>>
>>> But JSON itself doesn't require you to escape "/" (just like you are not required to escape "hi" into "\u0068\u0069", although you can).
>>>
>>> Note that other systems do not escape / by default:
>>>
>>> (Pharo)
>>> NeoJSONWriter toString: '[hidden email]:foo/bar.git' => "[hidden email]:foo/bar.git"
>>>
>>> (JavaScript)
>>> JSON.stringify('[hidden email]:foo/bar.git') => "[hidden email]:foo/bar.git"
>>>
>>> (Ruby)
>>> require 'json'
>>> puts '[hidden email]:foo/bar.git'.to_json => "[hidden email]:foo/bar.git"
>>>
>>> Peter
>>>
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Peter Uhnak
On Wed, Jan 18, 2017 at 04:38:15PM +0100, Sven Van Caekenberghe wrote:
> Being a superset means that you get a simple JSON parser (and even limited writer) for free once you install STON (or once it is part of the Pharo image, as it is now). It also means that we can fall back to the JSON spec as a guide in discussion like this one. It also helps people understand what STON is, by analogy but also differences with JSON.
>
> So my conclusion would be (while writing), always escape $\ and not $/, in pure STON mode (the default), escape $' and not $", in JSON mode, escape $" and not $'. Those would be the changes. Agreed ?
>

Yes for JSON, I guess yes for STON.
(For me STON is effectively opaque representation (because of its DFS strategy), so I don't care too much about how it is stored (as long as it is not binary))

Thanks!

Peter


> > Peter
> >
> >>> On 18 Jan 2017, at 15:25, Peter Uhnak <[hidden email]> wrote:
> >>>
> >>> On Wed, Jan 18, 2017 at 11:11:06AM +0100, Christophe Demarey wrote:
> >>>>
> >>>>> Le 18 janv. 2017 à 09:51, Sven Van Caekenberghe <[hidden email]> a écrit :
> >>>>>
> >>>>> Hi Christophe,
> >>>>>
> >>>>>> STON toString: '[hidden email]:foo/bar.git’ => ''[hidden email]:foo\/bar.git’'
> >>>>>> It used to be ''[hidden email]:foo/bar.git’’.
> >>>
> >>>>> In other words, it was an implementation error (omission). Note that JSON also has this escape.
> >>>
> >>> Yes and no for JSON.
> >>>
> >>> Only " and \ has to be escaped. Escaping anything else will give it special meaning with the exception of / which will just produce the same thing, because it is a special snowflake. :)
> >>>
> >>> quoted from https://tools.ietf.org/html/rfc7159#section-7:
> >>>
> >>> - Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
> >>> - Alternatively, there are two-character sequence escape representations of some popular characters.
> >>>
> >>> "/" (U+002F) doesn't fall into control character range, but the alternative section does permit it escaping it.
> >>>
> >>> In other words, JSON strings "\/", and "/", and "\u002f" are equivalent.
> >>>
> >>> But JSON itself doesn't require you to escape "/" (just like you are not required to escape "hi" into "\u0068\u0069", although you can).
> >>>
> >>> Note that other systems do not escape / by default:
> >>>
> >>> (Pharo)
> >>> NeoJSONWriter toString: '[hidden email]:foo/bar.git' => "[hidden email]:foo/bar.git"
> >>>
> >>> (JavaScript)
> >>> JSON.stringify('[hidden email]:foo/bar.git') => "[hidden email]:foo/bar.git"
> >>>
> >>> (Ruby)
> >>> require 'json'
> >>> puts '[hidden email]:foo/bar.git'.to_json => "[hidden email]:foo/bar.git"
> >>>
> >>> Peter
> >>>
> >>
> >>
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: STON encoding of slashes

Sven Van Caekenberghe-2
In reply to this post by Sven Van Caekenberghe-2

> On 18 Jan 2017, at 16:38, Sven Van Caekenberghe <[hidden email]> wrote:
>
> So my conclusion would be (while writing), always escape $\ and not $/, in pure STON mode (the default), escape $' and not $", in JSON mode, escape $" and not $'.

I implemented these changes in writing behaviour:

===
Name: STON-Core-SvenVanCaekenberghe.81
Author: SvenVanCaekenberghe
Time: 31 January 2017, 11:43:15.520367 pm
UUID: d83172d8-f01e-4e63-9382-515399ffa7bc
Ancestors: STON-Core-SvenVanCaekenberghe.80

Change the encoding of characters while writing so that in default STON mode only the following named character escapes are used: \b \t \n \f \' and \\ while in JSON mode \' is replaced by \" - this means that / is normally not escaped.

Add STONWriter>>#escape:with: as API

Adjust 2 unit tests to reflect this change

Update time tag of STONWriter class>>#initialize
===
Name: STON-Tests-SvenVanCaekenberghe.71
Author: SvenVanCaekenberghe
Time: 31 January 2017, 11:43:43.88292 pm
UUID: 65045513-6f48-43b7-a112-89dabf34a8f8
Ancestors: STON-Tests-SvenVanCaekenberghe.70

Change the encoding of characters while writing so that in default STON mode only the following named character escapes are used: \b \t \n \f \' and \\ while in JSON mode \' is replaced by \" - this means that / is normally not escaped.

Add STONWriter>>#escape:with: as API

Adjust 2 unit tests to reflect this change

Update time tag of STONWriter class>>#initialize
===


Sven