[hapax] tdm weighting

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[hapax] tdm weighting

Alberto Bacchelli
Hi all,

 I am reproducing some experiments I did time ago,
and I am again using Hapax (latest version for VW).

FYI, I just noticed that there could be an issue in how the
weighting in the TermDocumentMatrix is evaluated.
I attach the method here for convenience:

TermDocumentMatrix>>weight

        | newMatrix |
        newMatrix := SparseRowMatrix new: matrix dimension.
        matrix rows with: newMatrix rows do: [ :row :newRow |
                | globalWeight  |
                globalWeight := globalWeighting forTerm: row.
                row doSparseWithIndex: [ :each :index |
                        newRow at: index put: (localWeighting
forValue: each) * globalWeight ]].
        matrix := newMatrix.

this method should apply the tf-idf weighting [1] to the matrix.
This weighting is composed of two parts:
a global weighting (i.e., idf: the more a term is common, the less its weight)
a local weighting (i.e., tf: each term is normalized on the number of
terms appearing).

The global weighting is correctly done in the line:
globalWeight := globalWeighting forTerm: row.

while the local weighting is NOT correctly done in the line:
newRow at: index put: (localWeighting forValue: each) * globalWeight

in fact, the "localWeighting forValue: each" will always return "each" back.
This means that it simply does not apply any local weighting.

Right now I am working on fixing this issue.
Please let me know if I am wrong, or if you have an elegant solution :)

Cheers,
 Alberto



[1] http://en.wikipedia.org/wiki/Tf%E2%80%93idf
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev