Hello Esteban,
On 11 Mar 2013, at 03:17, Esteban A. Maringolo <
[hidden email]> wrote:
> Hi all, Sven,
>
> I would like to know what is the proper way (steps) to parse a UTF-8 encoded
> CSV file, which will store most of the strings into domain objects instVars
> which will get mapped back to JSON and send trough the wire by means of a
> Seaside RESTful Filter.
>
> I haven't specified any encoding information during the input or the output,
> and then I'm not seeing the right characters in the inspectors (I expected
> that), nor in the JSON output or the Seaside HTML output.
>
> The Zinc server adaptor has its default codec, it is, utf-8.
Both NeoCSV and NeoJSON were written to be encoding agnostic. I.e. they work on character streams that you provide. The encoding/decoding is up to you or up to whatever you use to instanciate the character streams.
Here is a quick example (Pharo VM on Mac OS X, #20587, standard NeoCSV release).
'foo.csv' asFileReference writeStreamDo: [ :out |
(NeoCSVWriter on: out)
nextPut: #( 1 'élève en Français' ) ].
'foo.csv' asFileReference readStreamDo: [ :in |
(NeoCSVReader on: in)
next ].
#('1' 'élève en Français')
$ cat foo.csv
"1","élève en Français"
$ file foo.csv
foo.csv: UTF-8 Unicode text, with CRLF line terminators
The above code uses whatever FileReference offers, namely UTF-8 encoded character streams.
I would suggest that you inspect the contents of the character streams before feeding them to NeoCSV or NeoJSON, the wrong encoding will probably be visible there.
'foo.csv' asFileReference readStreamDo: [ :in | in upToEnd ].
Zinc, both the client and the server, should normally always do the right thing (™): based on the Content-Type bytes will be converted using the proper encoding.
Regards,
Sven
--
Sven Van Caekenberghe
http://stfx.euSmalltalk is the Red Pill