# Mahalanobis distance Classic List Threaded 4 messages Open this post in threaded view
|

## Mahalanobis distance

 Mahalanobis distance is already implemented (or at least the tools for this) if Steph did port everything. Please refer to the chapter on Data Mining of my book. It explains everything. The main tool is the correlation matrix. I have used it personnally: it is very efficient for filtering data against a sampling sample, with optionally a set of excluding samples.Here is how to proceed:Collect the sampling data for the reference sample.(optional) collect samples for the excluding samples.Compute the correlation matrix using for each sample.Now, for each measured data vector v, you compute the Mahalanobis distance as d= vCv where C is the correlation matrix. This number is distributed according to a chi square distribution with n-1 degrees of freedom (n is the dimension of the vector v). SO so can adjust the threshold by specifying the probability of false positive or false negative accordingly.To check whether you measure belongs to the reference sample, cut for d < d_max, to exclude use d > d_min.I used this for a coin detector (roughly 20 measures, that the dimension of v) sampled with 1000 data (1000 coins put through the detector for sampling) and 1000 data of the fake coins (excluding set). We obtained close to 100% efficiency. Cheers, Didier -- You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout.
Open this post in threaded view
|

## Re: Mahalanobis distance

 On Sat, Mar 5, 2016 at 10:03 AM, Didier Besset <[hidden email]> wrote: > Mahalanobis distance is already implemented (or at least the tools for this) > if Steph did port everything. Please refer to the chapter on Data Mining of > my book. It explains everything. The main tool is the correlation matrix. > > I have used it personnally: it is very efficient for filtering data against > a sampling sample, with optionally a set of excluding samples. Thank you Didier to join the mailing-list ! > Here is how to proceed: > > Collect the sampling data for the reference sample. > (optional) collect samples for the excluding samples. > Compute the correlation matrix using for each sample. > > Now, for each measured data vector v, you compute the Mahalanobis distance > as d= vCv where C is the correlation matrix. This number is distributed > according to a chi square distribution with n-1 degrees of freedom (n is the > dimension of the vector v). SO so can adjust the threshold by specifying the > probability of false positive or false negative accordingly. > > To check whether you measure belongs to the reference sample, cut for d < > d_max, to exclude use d > d_min. > > I used this for a coin detector (roughly 20 measures, that the dimension of > v) sampled with 1000 data (1000 coins put through the detector for sampling) > and 1000 data of the fake coins (excluding set). We obtained close to 100% > efficiency. Maybe this could be a nice example in add in the lib. We definitively need more examples. -- Serge Stinckwich UCBN & UMI UMMISCO 209 (IRD/UPMC) Every DSL ends up being Smalltalk http://www.doesnotunderstand.org/-- You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout.