SortedCollection>>median

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

SortedCollection>>median

Rob Rothwell
I notice that the current median is:

median
    "Return the middle element, or as close as we can get."

    ^ self at: self size + 1 // 2

Any reason not to make that accurate and return the average of the middle two values for collections containing an even number of items?

Or...better yet...should statistics functions be removed and placed into another package with other useful items like mode, range, standardDeviation, etc...?

Just wondering!

Rob

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Michael van der Gulik-2


On Mon, Mar 2, 2009 at 1:34 PM, Rob Rothwell <[hidden email]> wrote:
I notice that the current median is:

median
    "Return the middle element, or as close as we can get."

    ^ self at: self size + 1 // 2

Any reason not to make that accurate and return the average of the middle two values for collections containing an even number of items?

Or...better yet...should statistics functions be removed and placed into another package with other useful items like mode, range, standardDeviation, etc...?


Hmm...

| μ σ |
s := StatisticalAnalysis on: myCollection.
μ := s mean.
σ := s standardDeviation.
range := μ ± σ.

This might be interesting.

Gulik

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Rob Rothwell
Well, I work in a Six Sigma department so...!

Anyway, I agree that your proposal could be interesting...what do you think of the current implementations in Pharo?

#median, for example, doesn't seem to be used anywhere except as an "approved" method name in MethodFinder>>initialize...

Take care,

Rob

2009/3/1 Michael van der Gulik <[hidden email]>


On Mon, Mar 2, 2009 at 1:34 PM, Rob Rothwell <[hidden email]> wrote:
I notice that the current median is:

median
    "Return the middle element, or as close as we can get."

    ^ self at: self size + 1 // 2

Any reason not to make that accurate and return the average of the middle two values for collections containing an even number of items?

Or...better yet...should statistics functions be removed and placed into another package with other useful items like mode, range, standardDeviation, etc...?


Hmm...

| μ σ |
s := StatisticalAnalysis on: myCollection.
μ := s mean.
σ := s standardDeviation.
range := μ ± σ.

This might be interesting.

Gulik

--
http://gulik.pbwiki.com/

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Stéphane Ducasse
In reply to this post by Rob Rothwell

On Mar 2, 2009, at 1:34 AM, Rob Rothwell wrote:

> I notice that the current median is:
>
> median
>     "Return the middle element, or as close as we can get."
>
>     ^ self at: self size + 1 // 2
>
> Any reason not to make that accurate and return the average of the  
> middle two values for collections containing an even number of items?
>
> Or...better yet...should statistics functions be removed and placed  
> into another package with other useful items like mode, range,  
> standardDeviation, etc...?

I think that it would be good.
A nice package with a nice documentation and associated tests.

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Stéphane Ducasse
In reply to this post by Rob Rothwell
This reminds me some thoughts about time and date.

Originally date and time were ok but not so great.
Then one guy (brent) fixed it and people were happy and it was  
introduced in the base system.
At that time when time was introduced (in 3.7 I guess) it was a  
obvious choice.
Then chronos and aconcagua went out showing some other properties.

Now retrospectively I think that it would be better to have a minimal  
version in the system and
loadable external package. There are some exceptions to this strategy:  
announcements and regexp
(where this is more a language decision), but in general this is a  
good strategy

So we could rename median naiveMedian and have a package that does all  
the rest and
that people can load it.

Stef

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Adrian Lienhard

On Mar 2, 2009, at 09:00 , Stéphane Ducasse wrote:

> This reminds me some thoughts about time and date.
>
> Originally date and time were ok but not so great.
> Then one guy (brent) fixed it and people were happy and it was
> introduced in the base system.
> At that time when time was introduced (in 3.7 I guess) it was a
> obvious choice.
> Then chronos and aconcagua went out showing some other properties.
>
> Now retrospectively I think that it would be better to have a minimal
> version in the system and
> loadable external package. There are some exceptions to this strategy:
> announcements and regexp
> (where this is more a language decision), but in general this is a
> good strategy

I think a good rule of thumb is whether the code is used by the  
kernel. Having packages like Regex and Announcements in the core makes  
only sense if they are also used in the core. Else they would better  
be external packages.

Not sure about median, though, as for just one method you wouldn't  
want to create and load an extra package...

Adrian

>
>
> So we could rename median naiveMedian and have a package that does all
> the rest and
> that people can load it.
>
> Stef
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Stéphane Ducasse
>
>
> Not sure about median, though, as for just one method you wouldn't
> want to create and load an extra package...

but this is not only one method but a statistic little package I  
imagine.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

Miguel Enrique Cobá Martínez
In reply to this post by Adrian Lienhard
Adrian Lienhard wrote:

> On Mar 2, 2009, at 09:00 , Stéphane Ducasse wrote:
>
>> This reminds me some thoughts about time and date.
>>
>> Originally date and time were ok but not so great.
>> Then one guy (brent) fixed it and people were happy and it was
>> introduced in the base system.
>> At that time when time was introduced (in 3.7 I guess) it was a
>> obvious choice.
>> Then chronos and aconcagua went out showing some other properties.
>>
>> Now retrospectively I think that it would be better to have a minimal
>> version in the system and
>> loadable external package. There are some exceptions to this strategy:
>> announcements and regexp
>> (where this is more a language decision), but in general this is a
>> good strategy
>
> I think a good rule of thumb is whether the code is used by the  
> kernel. Having packages like Regex and Announcements in the core makes  
> only sense if they are also used in the core. Else they would better  
> be external packages.


Then why not structure the packages in 3 parts:

core- All needed to bootstrap a minimal image (kernel, packages to
install more packages, logging, scripting,etc)

base- All the packages that have various versions but the included one
is the prefered, suggested, standarized to (Announcements, SUnit, etc)

universes/distros/monticellos/squeaksources- all the packages existing
and *easily* loadable by the users

Then, with a core image, and a script, you can create a base image.
With a base image and a script or a GUI, you can create your very own
image (e.g. Damien Cassou's images)

Regards,
Miguel Cobá

>
> Not sure about median, though, as for just one method you wouldn't  
> want to create and load an extra package...
>
> Adrian
>
>>
>> So we could rename median naiveMedian and have a package that does all
>> the rest and
>> that people can load it.
>>
>> Stef
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
cbc
Reply | Threaded
Open this post in threaded view
|

Re: SortedCollection>>median

cbc
In reply to this post by Rob Rothwell
On Sun, Mar 1, 2009 at 4:34 PM, Rob Rothwell <[hidden email]> wrote:
I notice that the current median is:

median
    "Return the middle element, or as close as we can get."

    ^ self at: self size + 1 // 2

Any reason not to make that accurate and return the average of the middle two values for collections containing an even number of items?
One nice thing about the existing definition is that you can use #median on collections that are numeric based.  So, median of:
#( 'Ford' 'Audi' 'Nissan' 'Hyundai' ) median
would actually return in the current definition, but not in a 'statistical' definition.  Similarly, #max and #min and #size would work for this, although #sum wouldn't.
 
Chris

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project