Data scrapping in pharo: converting text with dates in Spanish

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Data scrapping in pharo: converting text with dates in Spanish

Offray
Hi all,

I'm making a small data scrapper[1] in pharo to feed some
visualizations. The data I'm scrapping contains strings with dates in
Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert
them to proper dates in Smalltalk. So I started prototyping an Small
script at [2], but the problem is that monthIndex at: month doesn't work
because 'jun' is not in the dictionary (please refer to [2] for
details). This is motivated by month = 'jun' giving false, but when I
inspect month its content is 'jun', so I think that I'm missing
something important.

[1] http://smalltalkhub.com/#!/~Offray/Dataviz
[2] http://ws.stfx.eu/IOMTYZ0N9W29

So here come my questions:

a). How to get "monthIndex at: month" working properly so I can get '06'
as the proper month index for the month 'jun' (junio).?
b) Why month = 'jun' gives false?
c) There is any way to convert strings which has months in different
languages (for example Spanish) more directly?

Thanks,

Offray

Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Sven Van Caekenberghe-2
You could have a look at the ZTimestamp package (you can load it using the Configuration Browser, the repository is http://www.smalltalkhub.com/#!/~SvenVanCaekenberghe/Neo). It has a class ZTimestampFormat which currently formats & parses dates/times/timestamps in 4 languages "by example". You could add Spanish, I guess. Contribution are welcome.

For example,

(ZTimestampFormat fromString: '_3 February 2001')
  french;
  parse: ' 7 Août 1967'.

> On 05 Apr 2015, at 18:18, Offray Vladimir Luna Cárdenas <[hidden email]> wrote:
>
> Hi all,
>
> I'm making a small data scrapper[1] in pharo to feed some visualizations. The data I'm scrapping contains strings with dates in Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert them to proper dates in Smalltalk. So I started prototyping an Small script at [2], but the problem is that monthIndex at: month doesn't work because 'jun' is not in the dictionary (please refer to [2] for details). This is motivated by month = 'jun' giving false, but when I inspect month its content is 'jun', so I think that I'm missing something important.
>
> [1] http://smalltalkhub.com/#!/~Offray/Dataviz
> [2] http://ws.stfx.eu/IOMTYZ0N9W29
>
> So here come my questions:
>
> a). How to get "monthIndex at: month" working properly so I can get '06' as the proper month index for the month 'jun' (junio).?
> b) Why month = 'jun' gives false?
> c) There is any way to convert strings which has months in different languages (for example Spanish) more directly?
>
> Thanks,
>
> Offray
>


Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Ben Coman
In reply to this post by Offray


On Mon, Apr 6, 2015 at 12:18 AM, Offray Vladimir Luna Cárdenas <[hidden email]> wrote:
Hi all,

I'm making a small data scrapper[1] in pharo to feed some visualizations. The data I'm scrapping contains strings with dates in Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert them to proper dates in Smalltalk. So I started prototyping an Small script at [2], but the problem is that monthIndex at: month doesn't work because 'jun' is not in the dictionary (please refer to [2] for details). This is motivated by month = 'jun' giving false, but when I inspect month its content is 'jun', so I think that I'm missing something important.

[1] http://smalltalkhub.com/#!/~Offray/Dataviz
[2] http://ws.stfx.eu/IOMTYZ0N9W29

So here come my questions:

a). How to get "monthIndex at: month" working properly so I can get '06' as the proper month index for the month 'jun' (junio).?
b) Why month = 'jun' gives false?
c) There is any way to convert strings which has months in different languages (for example Spanish) more directly?



I don't know the answer, but here is how you might work it out yourself (its what I just did).
1. World > Tools > Find > Source.  
2. Search for text: june       
3. Excluding XxxTest test classes, class comments, and a dropList, only a single method "initialize" looks relevant. That should make it easier.
4. In that method, right-click on 'MonthNames' > Extended search... > References.
5. In each of those references put a "self haltOnce"
6. World > System > Enable halt/inspect once.
7. Evaluate your expression and see if you hit the halt.

So "maybe" you just edit that initialize method, then evaluate the comment at the top of the method.

cheers -ben


Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Offray
Thanks Ben and Sven,

I will take a look to both of your suggestions.

Cheers,

Offray

El 05/04/15 a las 12:28, Ben Coman escribió:

>
>
> On Mon, Apr 6, 2015 at 12:18 AM, Offray Vladimir Luna Cárdenas
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>      Hi all,
>
>      I'm making a small data scrapper[1] in pharo to feed some visualizations.
>      The data I'm scrapping contains strings with dates in Spanish like '16:21 -
>      15 de jun. de 2011' and I would like to convert them to proper dates in
>      Smalltalk. So I started prototyping an Small script at [2], but the problem
>      is that monthIndex at: month doesn't work because 'jun' is not in the
>      dictionary (please refer to [2] for details). This is motivated by month =
>      'jun' giving false, but when I inspect month its content is 'jun', so I
>      think that I'm missing something important.
>
>      [1] http://smalltalkhub.com/#!/~__Offray/Dataviz
>      <http://smalltalkhub.com/#!/~Offray/Dataviz>
>      [2] http://ws.stfx.eu/IOMTYZ0N9W29
>
>      So here come my questions:
>
>      a). How to get "monthIndex at: month" working properly so I can get '06' as
>      the proper month index for the month 'jun' (junio).?
>      b) Why month = 'jun' gives false?
>      c) There is any way to convert strings which has months in different
>      languages (for example Spanish) more directly?
>
>
>
> I don't know the answer, but here is how you might work it out yourself (its
> what I just did).
> 1. World > Tools > Find > Source.
> 2. Search for text: june
> 3. Excluding XxxTest test classes, class comments, and a dropList, only a single
> method "initialize" looks relevant. That should make it easier.
> 4. In that method, right-click on 'MonthNames' > Extended search... > References.
> 5. In each of those references put a "self haltOnce"
> 6. World > System > Enable halt/inspect once.
> 7. Evaluate your expression and see if you hit the halt.
>
> So "maybe" you just edit that initialize method, then evaluate the comment at
> the top of the method.
>
> cheers -ben
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Paul DeBruicker
In reply to this post by Offray
copyFrom:14 to: 17 copies 4 characters.  You're testing a 4 character long string against a 3 character long string.  That's why the test fails.  Either change it to copyFrom: 15 to:17 or add a trimBoth send to month temp var.  




You may have an easier time not using temp variables in the workspace as you can then explore/inspect them more easily to see what they contain after the calculation has or has not done what you expected.  






Offray wrote
Hi all,

I'm making a small data scrapper[1] in pharo to feed some
visualizations. The data I'm scrapping contains strings with dates in
Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert
them to proper dates in Smalltalk. So I started prototyping an Small
script at [2], but the problem is that monthIndex at: month doesn't work
because 'jun' is not in the dictionary (please refer to [2] for
details). This is motivated by month = 'jun' giving false, but when I
inspect month its content is 'jun', so I think that I'm missing
something important.

[1] http://smalltalkhub.com/#!/~Offray/Dataviz
[2] http://ws.stfx.eu/IOMTYZ0N9W29

So here come my questions:

a). How to get "monthIndex at: month" working properly so I can get '06'
as the proper month index for the month 'jun' (junio).?
b) Why month = 'jun' gives false?
c) There is any way to convert strings which has months in different
languages (for example Spanish) more directly?

Thanks,

Offray
Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Offray
Thanks Paul,

My bad! Rookie mistake :-). Now is working as implemented in [1] and
updated in the Dataviz-Twitter package.

[1] http://ws.stfx.eu/L4PZMLV88I8U

Thanks,

Offray

El 05/04/15 a las 22:22, Paul DeBruicker escribió:

> copyFrom:14 to: 17 copies 4 characters.  You're testing a 4 character long
> string against a 3 character long string.  That's why the test fails.
> Either change it to copyFrom: 15 to:17 or add a trimBoth send to month temp
> var.
>
>
>
>
> You may have an easier time not using temp variables in the workspace as you
> can then explore/inspect them more easily to see what they contain after the
> calculation has or has not done what you expected.
>
>
>
>
>
>
>
> Offray wrote
>> Hi all,
>>
>> I'm making a small data scrapper[1] in pharo to feed some
>> visualizations. The data I'm scrapping contains strings with dates in
>> Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert
>> them to proper dates in Smalltalk. So I started prototyping an Small
>> script at [2], but the problem is that monthIndex at: month doesn't work
>> because 'jun' is not in the dictionary (please refer to [2] for
>> details). This is motivated by month = 'jun' giving false, but when I
>> inspect month its content is 'jun', so I think that I'm missing
>> something important.
>>
>> [1] http://smalltalkhub.com/#!/~Offray/Dataviz
>> [2] http://ws.stfx.eu/IOMTYZ0N9W29
>>
>> So here come my questions:
>>
>> a). How to get "monthIndex at: month" working properly so I can get '06'
>> as the proper month index for the month 'jun' (junio).?
>> b) Why month = 'jun' gives false?
>> c) There is any way to convert strings which has months in different
>> languages (for example Spanish) more directly?
>>
>> Thanks,
>>
>> Offray
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/Data-scrapping-in-pharo-converting-text-with-dates-in-Spanish-tp4817705p4817757.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Offray
Well... there is still a minor issue with the hour... is not part of the
date, for some reason.

Cheers,

Offray

El 06/04/15 a las 14:15, Offray Vladimir Luna Cárdenas escribió:

> Thanks Paul,
>
> My bad! Rookie mistake :-). Now is working as implemented in [1] and
> updated in the Dataviz-Twitter package.
>
> [1] http://ws.stfx.eu/L4PZMLV88I8U
>
> Thanks,
>
> Offray
>
> El 05/04/15 a las 22:22, Paul DeBruicker escribió:
>> copyFrom:14 to: 17 copies 4 characters.  You're testing a 4 character
>> long
>> string against a 3 character long string.  That's why the test fails.
>> Either change it to copyFrom: 15 to:17 or add a trimBoth send to month
>> temp
>> var.
>>
>>
>>
>>
>> You may have an easier time not using temp variables in the workspace
>> as you
>> can then explore/inspect them more easily to see what they contain
>> after the
>> calculation has or has not done what you expected.
>>
>>
>>
>>
>>
>>
>>
>> Offray wrote
>>> Hi all,
>>>
>>> I'm making a small data scrapper[1] in pharo to feed some
>>> visualizations. The data I'm scrapping contains strings with dates in
>>> Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert
>>> them to proper dates in Smalltalk. So I started prototyping an Small
>>> script at [2], but the problem is that monthIndex at: month doesn't work
>>> because 'jun' is not in the dictionary (please refer to [2] for
>>> details). This is motivated by month = 'jun' giving false, but when I
>>> inspect month its content is 'jun', so I think that I'm missing
>>> something important.
>>>
>>> [1] http://smalltalkhub.com/#!/~Offray/Dataviz
>>> [2] http://ws.stfx.eu/IOMTYZ0N9W29
>>>
>>> So here come my questions:
>>>
>>> a). How to get "monthIndex at: month" working properly so I can get '06'
>>> as the proper month index for the month 'jun' (junio).?
>>> b) Why month = 'jun' gives false?
>>> c) There is any way to convert strings which has months in different
>>> languages (for example Spanish) more directly?
>>>
>>> Thanks,
>>>
>>> Offray
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://forum.world.st/Data-scrapping-in-pharo-converting-text-with-dates-in-Spanish-tp4817705p4817757.html
>>
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Paul DeBruicker
When you look at the Date class comment you'll see that they are timespans (a Start + a duration ) and not moments in time, like the time on a clock on a certain day.  What you want is probably an instance of  DateAndTime.





Offray wrote
Well... there is still a minor issue with the hour... is not part of the
date, for some reason.

Cheers,

Offray

El 06/04/15 a las 14:15, Offray Vladimir Luna Cárdenas escribió:
> Thanks Paul,
>
> My bad! Rookie mistake :-). Now is working as implemented in [1] and
> updated in the Dataviz-Twitter package.
>
> [1] http://ws.stfx.eu/L4PZMLV88I8U
>
> Thanks,
>
> Offray
>
> El 05/04/15 a las 22:22, Paul DeBruicker escribió:
>> copyFrom:14 to: 17 copies 4 characters.  You're testing a 4 character
>> long
>> string against a 3 character long string.  That's why the test fails.
>> Either change it to copyFrom: 15 to:17 or add a trimBoth send to month
>> temp
>> var.
>>
>>
>>
>>
>> You may have an easier time not using temp variables in the workspace
>> as you
>> can then explore/inspect them more easily to see what they contain
>> after the
>> calculation has or has not done what you expected.
>>
>>
>>
>>
>>
>>
>>
>> Offray wrote
>>> Hi all,
>>>
>>> I'm making a small data scrapper[1] in pharo to feed some
>>> visualizations. The data I'm scrapping contains strings with dates in
>>> Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert
>>> them to proper dates in Smalltalk. So I started prototyping an Small
>>> script at [2], but the problem is that monthIndex at: month doesn't work
>>> because 'jun' is not in the dictionary (please refer to [2] for
>>> details). This is motivated by month = 'jun' giving false, but when I
>>> inspect month its content is 'jun', so I think that I'm missing
>>> something important.
>>>
>>> [1] http://smalltalkhub.com/#!/~Offray/Dataviz
>>> [2] http://ws.stfx.eu/IOMTYZ0N9W29
>>>
>>> So here come my questions:
>>>
>>> a). How to get "monthIndex at: month" working properly so I can get '06'
>>> as the proper month index for the month 'jun' (junio).?
>>> b) Why month = 'jun' gives false?
>>> c) There is any way to convert strings which has months in different
>>> languages (for example Spanish) more directly?
>>>
>>> Thanks,
>>>
>>> Offray
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://forum.world.st/Data-scrapping-in-pharo-converting-text-with-dates-in-Spanish-tp4817705p4817757.html
>>
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: converting text with dates in Spanish

Offray
Yep, that was it. asDateAndTime solves the issue.

Thanks,

Offray

El 06/04/15 a las 14:37, Paul DeBruicker escribió:

> When you look at the Date class comment you'll see that they are timespans (a
> Start + a duration ) and not moments in time, like the time on a clock on a
> certain day.  What you want is probably an instance of  DateAndTime.
>
>
>
>
>
>
> Offray wrote
>> Well... there is still a minor issue with the hour... is not part of the
>> date, for some reason.
>>
>> Cheers,
>>
>> Offray
>>
>> El 06/04/15 a las 14:15, Offray Vladimir Luna Cárdenas escribió:
>>> Thanks Paul,
>>>
>>> My bad! Rookie mistake :-). Now is working as implemented in [1] and
>>> updated in the Dataviz-Twitter package.
>>>
>>> [1] http://ws.stfx.eu/L4PZMLV88I8U
>>>
>>> Thanks,
>>>
>>> Offray
>>>
>>> El 05/04/15 a las 22:22, Paul DeBruicker escribió:
>>>> copyFrom:14 to: 17 copies 4 characters.  You're testing a 4 character
>>>> long
>>>> string against a 3 character long string.  That's why the test fails.
>>>> Either change it to copyFrom: 15 to:17 or add a trimBoth send to month
>>>> temp
>>>> var.
>>>>
>>>>
>>>>
>>>>
>>>> You may have an easier time not using temp variables in the workspace
>>>> as you
>>>> can then explore/inspect them more easily to see what they contain
>>>> after the
>>>> calculation has or has not done what you expected.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Offray wrote
>>>>> Hi all,
>>>>>
>>>>> I'm making a small data scrapper[1] in pharo to feed some
>>>>> visualizations. The data I'm scrapping contains strings with dates in
>>>>> Spanish like '16:21 - 15 de jun. de 2011' and I would like to convert
>>>>> them to proper dates in Smalltalk. So I started prototyping an Small
>>>>> script at [2], but the problem is that monthIndex at: month doesn't
>>>>> work
>>>>> because 'jun' is not in the dictionary (please refer to [2] for
>>>>> details). This is motivated by month = 'jun' giving false, but when I
>>>>> inspect month its content is 'jun', so I think that I'm missing
>>>>> something important.
>>>>>
>>>>> [1] http://smalltalkhub.com/#!/~Offray/Dataviz
>>>>> [2] http://ws.stfx.eu/IOMTYZ0N9W29
>>>>>
>>>>> So here come my questions:
>>>>>
>>>>> a). How to get "monthIndex at: month" working properly so I can get
>>>>> '06'
>>>>> as the proper month index for the month 'jun' (junio).?
>>>>> b) Why month = 'jun' gives false?
>>>>> c) There is any way to convert strings which has months in different
>>>>> languages (for example Spanish) more directly?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Offray
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://forum.world.st/Data-scrapping-in-pharo-converting-text-with-dates-in-Spanish-tp4817705p4817757.html
>>>>
>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/Data-scrapping-in-pharo-converting-text-with-dates-in-Spanish-tp4817705p4817882.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>