Smalltalk › Pharo › Pharo Smalltalk Users

NeoNumberParser and localization

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Peter Uhnak

NeoNumberParser and localization

Hi,

is there any plan for NeoNumberParser do add localization support?

e.g.

NeoNumberParser new

thousandsSeparator: $,; "common in us data"

parse: '12,230'

12230

NeoNumberParser new

decimalSeparator: $,; "common in eu data"

parse: '12,230'

12.230

Thanks,

Peter

Sven Van Caekenberghe-2

Re: NeoNumberParser and localization

Peter,

NeoNumberParser is a simple number (integer/float) parser that is part of NeoCSV (it was based on the JSON number parsing code). It was added because I wanted a number parser that makes little demands on the stream it parses from (just 1 character peek ahead, no arbitrary backtracking, limited API). It was not meant to be very powerful.

If you check the references, you see that where it is used in NeoCSVReader, you could easily substitute another parser.

Now, I can understand where/how your suggestions would make sense. Maybe you can try subclassing and make your own variant (first) ? BTW, you not only need to set the thousands separator, but the decimal separator too, I guess.

Sven

> On 05 Jul 2016, at 14:17, Peter Uhnák <[hidden email]> wrote:
>
> Hi,
>
> is there any plan for NeoNumberParser do add localization support?
>
> e.g.
>
> NeoNumberParser new
> thousandsSeparator: $,; "common in us data"
> parse: '12,230'
>
> =>
>
> 12230
>
> NeoNumberParser new
> decimalSeparator: $,; "common in eu data"
> parse: '12,230'
>
> =>
>
> 12.230
>
> Thanks,
> Peter

Peter Uhnak

Re: NeoNumberParser and localization

I know that only NeoCSV uses it — that's how I ran into this problem. I was processing some (czech) CSV files which used the decimal comma separator… however the numbers were silently truncated, which wasn't nice to say the least — I really don't understand why the default behavior is to silently change the value, and not produce an error — this also applies to Pharo's number parser.

BTW, you not only need to set the thousands separator, but the decimal separator too, I guess.

depending on the default values, but that's really not the main point

Now, I can understand where/how your suggestions would make sense. Maybe you can try subclassing and make your own variant (first) ?

Well I would need a way to configure the CSV parser. Because I am certainly not interested in manually transforming every float field. I want just configure it at one place and use the regular addFloatField — after all the file is going to be consistent in it's format.

Btw there are other options for improvement, like configuring the default date field and then having addDateField, etc. But maybe that's just overloading the NeoCSV parser… in any case it's a food for thought.

Peter

On Tue, Jul 5, 2016 at 2:34 PM, Sven Van Caekenberghe <[hidden email]> wrote:

Peter,

NeoNumberParser is a simple number (integer/float) parser that is part of NeoCSV (it was based on the JSON number parsing code). It was added because I wanted a number parser that makes little demands on the stream it parses from (just 1 character peek ahead, no arbitrary backtracking, limited API). It was not meant to be very powerful.

If you check the references, you see that where it is used in NeoCSVReader, you could easily substitute another parser.

Now, I can understand where/how your suggestions would make sense. Maybe you can try subclassing and make your own variant (first) ? BTW, you not only need to set the thousands separator, but the decimal separator too, I guess.

Sven

> On 05 Jul 2016, at 14:17, Peter Uhnák <[hidden email]> wrote:
>
> Hi,
>
> is there any plan for NeoNumberParser do add localization support?
>
> e.g.
>
> NeoNumberParser new
> thousandsSeparator: $,; "common in us data"
> parse: '12,230'
>
> =>
>
> 12230
>
> NeoNumberParser new
> decimalSeparator: $,; "common in eu data"
> parse: '12,230'
>
> =>
>
> 12.230
>
> Thanks,
> Peter

Sven Van Caekenberghe-2

Re: NeoNumberParser and localization

> On 05 Jul 2016, at 16:40, Peter Uhnák <[hidden email]> wrote:
>
> I know that only NeoCSV uses it — that's how I ran into this problem. I was processing some (czech) CSV files which used the decimal comma separator… however the numbers were silently truncated, which wasn't nice to say the least — I really don't understand why the default behavior is to silently change the value, and not produce an error — this also applies to Pharo's number parser.
>
> BTW, you not only need to set the thousands separator, but the decimal separator too, I guess.
>
> depending on the default values, but that's really not the main point
>
> Now, I can understand where/how your suggestions would make sense. Maybe you can try subclassing and make your own variant (first) ?
>
> Well I would need a way to configure the CSV parser. Because I am certainly not interested in manually transforming every float field. I want just configure it at one place and use the regular addFloatField — after all the file is going to be consistent in it's format.
>
> Btw there are other options for improvement, like configuring the default date field and then having addDateField, etc. But maybe that's just overloading the NeoCSV parser… in any case it's a food for thought.

Indeed, I do not want to overload the CSV parser, it is pretty simple right now.

The conversions are all in the convenience protocol for a reason: they just save you some typing. You really ought to do your own conversions, when you need to.

parser addFieldConverter: [ :string | MyNumberParser parse: string ]

There are too many formats out there (especially for dates/times).

You are right about truncation and error handling. But parsing and enforcing a syntax are two different things. That is why I think the thousands separator option is not that simple, consider

1,000.00
10,00.00
1,0,0,0.00
1,000.00E1000,000

You see ? One quick and dirty solution would be to just remove $, or replace one character by another.

> Peter
>
>
> On Tue, Jul 5, 2016 at 2:34 PM, Sven Van Caekenberghe <[hidden email]> wrote:
> Peter,
>
> NeoNumberParser is a simple number (integer/float) parser that is part of NeoCSV (it was based on the JSON number parsing code). It was added because I wanted a number parser that makes little demands on the stream it parses from (just 1 character peek ahead, no arbitrary backtracking, limited API). It was not meant to be very powerful.
>
> If you check the references, you see that where it is used in NeoCSVReader, you could easily substitute another parser.
>
> Now, I can understand where/how your suggestions would make sense. Maybe you can try subclassing and make your own variant (first) ? BTW, you not only need to set the thousands separator, but the decimal separator too, I guess.
>
> Sven
>
> > On 05 Jul 2016, at 14:17, Peter Uhnák <[hidden email]> wrote:
> >
> > Hi,
> >
> > is there any plan for NeoNumberParser do add localization support?
> >
> > e.g.
> >
> > NeoNumberParser new
> > thousandsSeparator: $,; "common in us data"
> > parse: '12,230'
> >
> > =>
> >
> > 12230
> >
> > NeoNumberParser new
> > decimalSeparator: $,; "common in eu data"
> > parse: '12,230'
> >
> > =>
> >
> > 12.230
> >
> > Thanks,
> > Peter
>
>
>