Smalltalk › Pharo › Pharo Smalltalk Users

NeoCSV/NeoJSON and encodings

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Esteban A. Maringolo

NeoCSV/NeoJSON and encodings

Hi all, Sven,

I would like to know what is the proper way (steps) to parse a UTF-8 encoded CSV file, which will store most of the strings into domain objects instVars which will get mapped back to JSON and send trough the wire by means of a Seaside RESTful Filter.

I haven't specified any encoding information during the input or the output, and then I'm not seeing the right characters in the inspectors (I expected that), nor in the JSON output or the Seaside HTML output.

The Zinc server adaptor has its default codec, it is, utf-8.

Sven Van Caekenberghe-2

Re: NeoCSV/NeoJSON and encodings

Hello Esteban,

On 11 Mar 2013, at 03:17, Esteban A. Maringolo <[hidden email]> wrote:

> Hi all, Sven,
>
> I would like to know what is the proper way (steps) to parse a UTF-8 encoded
> CSV file, which will store most of the strings into domain objects instVars
> which will get mapped back to JSON and send trough the wire by means of a
> Seaside RESTful Filter.
>
> I haven't specified any encoding information during the input or the output,
> and then I'm not seeing the right characters in the inspectors (I expected
> that), nor in the JSON output or the Seaside HTML output.
>
> The Zinc server adaptor has its default codec, it is, utf-8.

Both NeoCSV and NeoJSON were written to be encoding agnostic. I.e. they work on character streams that you provide. The encoding/decoding is up to you or up to whatever you use to instanciate the character streams.

Here is a quick example (Pharo VM on Mac OS X, #20587, standard NeoCSV release).

'foo.csv' asFileReference writeStreamDo: [ :out |
(NeoCSVWriter on: out)
nextPut: #( 1 'élève en Français' ) ].

'foo.csv' asFileReference readStreamDo: [ :in |
(NeoCSVReader on: in)
next ].

#('1' 'élève en Français')

$ cat foo.csv
"1","élève en Français"

$ file foo.csv
foo.csv: UTF-8 Unicode text, with CRLF line terminators

The above code uses whatever FileReference offers, namely UTF-8 encoded character streams.

I would suggest that you inspect the contents of the character streams before feeding them to NeoCSV or NeoJSON, the wrong encoding will probably be visible there.

'foo.csv' asFileReference readStreamDo: [ :in | in upToEnd ].

Zinc, both the client and the server, should normally always do the right thing (™): based on the Content-Type bytes will be converted using the proper encoding.

Regards,

Sven

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill

Esteban A. Maringolo

Re: NeoCSV/NeoJSON and encodings

Hi all, Sven,

Sorry I forgot to reply to this yesterday.

It was simpler than anything else, for some reason I was using StandardFileStream to read the file. I switched to FileStream and everything went fine.

ps: aString asFileReference writeStreamDo: doesn't work in my 1.4 image.

Sven Van Caekenberghe-2

Re: NeoCSV/NeoJSON and encodings

On 12 Mar 2013, at 20:38, "Esteban A. Maringolo" <[hidden email]> wrote:

> Hi all, Sven,
>
> Sorry I forgot to reply to this yesterday.
>
> It was simpler than anything else, for some reason I was using
> StandardFileStream to read the file. I switched to FileStream and everything
> went fine.

Good!

> ps: aString asFileReference writeStreamDo: doesn't work in my 1.4 image.

;-)

I switched to 2.0 months ago, it so much nicer.

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill