Mahalanobis distance is already implemented (or at least the tools
for this) if Steph did port everything. Please refer to the chapter
on Data Mining of my book. It explains everything. The main tool is the correlation matrix.
-- I have used it personnally: it is very efficient for filtering data against a sampling sample, with optionally a set of excluding samples. Here is how to proceed:
Now, for each measured data vector v, you compute the Mahalanobis distance as d= vCv where C is the correlation matrix. This number is distributed according to a chi square distribution with n-1 degrees of freedom (n is the dimension of the vector v). SO so can adjust the threshold by specifying the probability of false positive or false negative accordingly. To check whether you measure belongs to the reference sample, cut for d < d_max, to exclude use d > d_min. I used this for a coin detector (roughly 20 measures, that the dimension of v) sampled with 1000 data (1000 coins put through the detector for sampling) and 1000 data of the fake coins (excluding set). We obtained close to 100% efficiency.
Cheers, Didier You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
On Sat, Mar 5, 2016 at 10:03 AM, Didier Besset <[hidden email]> wrote:
> Mahalanobis distance is already implemented (or at least the tools for this) > if Steph did port everything. Please refer to the chapter on Data Mining of > my book. It explains everything. The main tool is the correlation matrix. > > I have used it personnally: it is very efficient for filtering data against > a sampling sample, with optionally a set of excluding samples. Thank you Didier to join the mailing-list ! > Here is how to proceed: > > Collect the sampling data for the reference sample. > (optional) collect samples for the excluding samples. > Compute the correlation matrix using for each sample. > > Now, for each measured data vector v, you compute the Mahalanobis distance > as d= vCv where C is the correlation matrix. This number is distributed > according to a chi square distribution with n-1 degrees of freedom (n is the > dimension of the vector v). SO so can adjust the threshold by specifying the > probability of false positive or false negative accordingly. > > To check whether you measure belongs to the reference sample, cut for d < > d_max, to exclude use d > d_min. > > I used this for a coin detector (roughly 20 measures, that the dimension of > v) sampled with 1000 data (1000 coins put through the detector for sampling) > and 1000 data of the fake coins (excluding set). We obtained close to 100% > efficiency. Maybe this could be a nice example in add in the lib. We definitively need more examples. -- Serge Stinckwich UCBN & UMI UMMISCO 209 (IRD/UPMC) Every DSL ends up being Smalltalk http://www.doesnotunderstand.org/ -- You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by Didier Besset
Hi didier
-- You see this is another example of the same symptoms :) We are not good in math. So we need how tos or classes to encapulate the knowledge. Stef Le samedi 5 mars 2016 10:03:33 UTC+1, Didier Besset a écrit :
You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
+100 :)
It’s great to see this energy around here! Doru > On Mar 5, 2016, at 2:15 PM, [hidden email] wrote: > > Hi didier > > You see this is another example of the same symptoms :) > We are not good in math. So we need how tos or classes to encapulate the knowledge. > > Stef > > Le samedi 5 mars 2016 10:03:33 UTC+1, Didier Besset a écrit : > Mahalanobis distance is already implemented (or at least the tools for this) if Steph did port everything. Please refer to the chapter on Data Mining of my book. It explains everything. The main tool is the correlation matrix. > > I have used it personnally: it is very efficient for filtering data against a sampling sample, with optionally a set of excluding samples. > > Here is how to proceed: > > • Collect the sampling data for the reference sample. > • (optional) collect samples for the excluding samples. > • Compute the correlation matrix using for each sample. > Now, for each measured data vector v, you compute the Mahalanobis distance as d= vCv where C is the correlation matrix. This number is distributed according to a chi square distribution with n-1 degrees of freedom (n is the dimension of the vector v). SO so can adjust the threshold by specifying the probability of false positive or false negative accordingly. > > To check whether you measure belongs to the reference sample, cut for d < d_max, to exclude use d > d_min. > > I used this for a coin detector (roughly 20 measures, that the dimension of v) sampled with 1000 data (1000 coins put through the detector for sampling) and 1000 data of the fake coins (excluding set). We obtained close to 100% efficiency. > > Cheers, > > Didier > > -- > You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. > To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. > For more options, visit https://groups.google.com/d/optout. -- www.tudorgirba.com www.feenk.com "When people care, great things can happen." -- You received this message because you are subscribed to the Google Groups "SciSmalltalk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Free forum by Nabble | Edit this page |