Smalltalk › Frameworks & Tools › SciSmalltalk

Mahalanobis distance

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Didier Besset

Mahalanobis distance

Mahalanobis distance is already implemented (or at least the tools for this) if Steph did port everything. Please refer to the chapter on Data Mining of my book. It explains everything. The main tool is the correlation matrix.

I have used it personnally: it is very efficient for filtering data against a sampling sample, with optionally a set of excluding samples.

Here is how to proceed:

Collect the sampling data for the reference sample.
(optional) collect samples for the excluding samples.
Compute the correlation matrix using for each sample.

Now, for each measured data vector v, you compute the Mahalanobis distance as d= vCv where C is the correlation matrix. This number is distributed according to a chi square distribution with n-1 degrees of freedom (n is the dimension of the vector v). SO so can adjust the threshold by specifying the probability of false positive or false negative accordingly.

To check whether you measure belongs to the reference sample, cut for d < d_max, to exclude use d > d_min.

I used this for a coin detector (roughly 20 measures, that the dimension of v) sampled with 1000 data (1000 coins put through the detector for sampling) and 1000 data of the fake coins (excluding set). We obtained close to 100% efficiency.

Cheers,

Didier

--
You received this message because you are subscribed to the Google Groups "SciSmalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

SergeStinckwich

Re: Mahalanobis distance

On Sat, Mar 5, 2016 at 10:03 AM, Didier Besset <[hidden email]> wrote:
> Mahalanobis distance is already implemented (or at least the tools for this)
> if Steph did port everything. Please refer to the chapter on Data Mining of
> my book. It explains everything. The main tool is the correlation matrix.
>
> I have used it personnally: it is very efficient for filtering data against
> a sampling sample, with optionally a set of excluding samples.

Thank you Didier to join the mailing-list !

> Here is how to proceed:
>
> Collect the sampling data for the reference sample.
> (optional) collect samples for the excluding samples.
> Compute the correlation matrix using for each sample.
>
> Now, for each measured data vector v, you compute the Mahalanobis distance
> as d= vCv where C is the correlation matrix. This number is distributed
> according to a chi square distribution with n-1 degrees of freedom (n is the
> dimension of the vector v). SO so can adjust the threshold by specifying the
> probability of false positive or false negative accordingly.
>
> To check whether you measure belongs to the reference sample, cut for d <
> d_max, to exclude use d > d_min.
>
> I used this for a coin detector (roughly 20 measures, that the dimension of
> v) sampled with 1000 data (1000 coins put through the detector for sampling)
> and 1000 data of the fake coins (excluding set). We obtained close to 100%
> efficiency.

Maybe this could be a nice example in add in the lib.
We definitively need more examples.

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

--
You received this message because you are subscribed to the Google Groups "SciSmalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

stepharo

Re: Mahalanobis distance

In reply to this post by Didier Besset

Hi didier

You see this is another example of the same symptoms :)
We are not good in math. So we need how tos or classes to encapulate the knowledge.

Stef

Le samedi 5 mars 2016 10:03:33 UTC+1, Didier Besset a écrit :

Mahalanobis distance is already implemented (or at least the tools for this) if Steph did port everything. Please refer to the chapter on Data Mining of my book. It explains everything. The main tool is the correlation matrix.

I have used it personnally: it is very efficient for filtering data against a sampling sample, with optionally a set of excluding samples.

Here is how to proceed:

Collect the sampling data for the reference sample.
(optional) collect samples for the excluding samples.
Compute the correlation matrix using for each sample.
Now, for each measured data vector v, you compute the Mahalanobis distance as d= vCv where C is the correlation matrix. This number is distributed according to a chi square distribution with n-1 degrees of freedom (n is the dimension of the vector v). SO so can adjust the threshold by specifying the probability of false positive or false negative accordingly.

To check whether you measure belongs to the reference sample, cut for d < d_max, to exclude use d > d_min.

I used this for a coin detector (roughly 20 measures, that the dimension of v) sampled with 1000 data (1000 coins put through the detector for sampling) and 1000 data of the fake coins (excluding set). We obtained close to 100% efficiency.

Cheers,

Didier

Tudor Girba-2

Re: Mahalanobis distance

+100 :)

It’s great to see this energy around here!

Doru

> On Mar 5, 2016, at 2:15 PM, [hidden email] wrote:
>
> Hi didier
>
> You see this is another example of the same symptoms :)
> We are not good in math. So we need how tos or classes to encapulate the knowledge.
>
> Stef
>
> Le samedi 5 mars 2016 10:03:33 UTC+1, Didier Besset a écrit :
> Mahalanobis distance is already implemented (or at least the tools for this) if Steph did port everything. Please refer to the chapter on Data Mining of my book. It explains everything. The main tool is the correlation matrix.
>
> I have used it personnally: it is very efficient for filtering data against a sampling sample, with optionally a set of excluding samples.
>
> Here is how to proceed:
>
> • Collect the sampling data for the reference sample.
> • (optional) collect samples for the excluding samples.
> • Compute the correlation matrix using for each sample.
> Now, for each measured data vector v, you compute the Mahalanobis distance as d= vCv where C is the correlation matrix. This number is distributed according to a chi square distribution with n-1 degrees of freedom (n is the dimension of the vector v). SO so can adjust the threshold by specifying the probability of false positive or false negative accordingly.
>
> To check whether you measure belongs to the reference sample, cut for d < d_max, to exclude use d > d_min.
>
> I used this for a coin detector (roughly 20 measures, that the dimension of v) sampled with 1000 data (1000 coins put through the detector for sampling) and 1000 data of the fake coins (excluding set). We obtained close to 100% efficiency.
>
> Cheers,
>
> Didier
>
> --
> You received this message because you are subscribed to the Google Groups "SciSmalltalk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
> For more options, visit https://groups.google.com/d/optout.

--
www.tudorgirba.com
www.feenk.com

"When people care, great things can happen."

--
You received this message because you are subscribed to the Google Groups "SciSmalltalk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.