NeoCSVWriter automatic quotes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

NeoCSVWriter automatic quotes

Peter Uhnak
Hi,

is it/would it be possible to modify NeoCSVWriter to add quotes only where necessary? So only if the written value contains the separator (is there any other case that would require the quotes?).

Thanks,
Peter

Reply | Threaded
Open this post in threaded view
|

Re: NeoCSVWriter automatic quotes

Peter Uhnak
So basically something like this,

```
NeoCSVWriter>>writeMaybeQuotedField: object
        | string |
        string := object asString.
        string := string copyReplaceAll: '"' with: '""'.
        ((string includes: $") or: [ string includes: separator ])
                ifTrue: [ writeStream
                        nextPut: $";
                        nextPutAll: string;
                        nextPut: $" ]
                ifFalse: [ writeStream nextPutAll: string ]
```

Peter


On Fri, Nov 25, 2016 at 07:35:29PM +0100, Peter Uhnak wrote:
> Hi,
>
> is it/would it be possible to modify NeoCSVWriter to add quotes only where necessary? So only if the written value contains the separator (is there any other case that would require the quotes?).
>
> Thanks,
> Peter

Reply | Threaded
Open this post in threaded view
|

Re: NeoCSVWriter automatic quotes

Sven Van Caekenberghe-2
In reply to this post by Peter Uhnak
Peter,

> On 25 Nov 2016, at 19:35, Peter Uhnak <[hidden email]> wrote:
>
> Hi,
>
> is it/would it be possible to modify NeoCSVWriter to add quotes only where necessary? So only if the written value contains the separator (is there any other case that would require the quotes?).
>
> Thanks,
> Peter

Why exactly do you want this ?

CSV is a pretty simple format, having a field quoted in one line and not in another already sounds like being a bit too clever, though it does seem to be allowed.

The idea of having different field writers was to choose the most efficient one for your data (types). You can configure a writer per field (column). If you have numbers, raw gives you the fastest performance.

Doing optional quoting would always require more than one pass, if not many more, over the string, possibly generating more garbage. (Like in your example implementation).

There are 3 reasons to do quoting: embedded separator, embedded newline, embedded quote (you forgot the newline case).

All that being said, maybe an #optionalQuoted field writer could be a reasonable configureable option, but I would not make it the default.

Again, why do you want this ?

Sven


Reply | Threaded
Open this post in threaded view
|

Re: NeoCSVWriter automatic quotes

Peter Uhnak
On Fri, Nov 25, 2016 at 09:38:42PM +0100, Sven Van Caekenberghe wrote:

> Peter,
>
> > On 25 Nov 2016, at 19:35, Peter Uhnak <[hidden email]> wrote:
> >
> > Hi,
> >
> > is it/would it be possible to modify NeoCSVWriter to add quotes only where necessary? So only if the written value contains the separator (is there any other case that would require the quotes?).
> >
> > Thanks,
> > Peter
>
> Why exactly do you want this ?

My use case is that sometimes I need/want to view the file with my eyes, and when in 95% of the file the content is single (or couple words), then quotes add a lot of clutter. So if in 5% (or 1%) of the file there are quotes and the reset aren't, then I would be happy. :)

But it's not such a big problem, I've added it for myself, at least until we will have a usable way to edit CSV files in Pharo... do we really not have any table editor? :/

Peter

>
> CSV is a pretty simple format, having a field quoted in one line and not in another already sounds like being a bit too clever, though it does seem to be allowed.
>
> The idea of having different field writers was to choose the most efficient one for your data (types). You can configure a writer per field (column). If you have numbers, raw gives you the fastest performance.
>
> Doing optional quoting would always require more than one pass, if not many more, over the string, possibly generating more garbage. (Like in your example implementation).
>
> There are 3 reasons to do quoting: embedded separator, embedded newline, embedded quote (you forgot the newline case).
>
> All that being said, maybe an #optionalQuoted field writer could be a reasonable configureable option, but I would not make it the default.
>
> Again, why do you want this ?
>
> Sven
>
>