Hi,
I'd like to know if someone did a work on information theory algorithm ? I saw a package in Moose about information theory but it is just a kind of document indexation.
Is there something more complete (quantities information)?
Thanks,
--------------
Brice Govin
PhD student in RMoD research team at INRIA Lille Software Engineer at THALES AIR SYSTEMS Rungis ENSTA-Bretagne ENSI2014 22 Avenue du General Leclerc 92340 BOURG-LA-REINE |
Le 15/6/16 à 19:03, Brice GOVIN a
écrit :
Hi, tell us more. What is it?
|
Yes, tell us more.
This is an interesting topic Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
|
Information theory is about quantifying and qualifying the content of an information in a data set.
Basically, it means that for a specific dataset I could say which data is interesting or not (according to the algorithm I use).
It is used in information retrieval (IR).
I should have started with that maybe…
Actually, I made mistake talking about information theory, it is more about information retrieval (my bad..). However, with the Moose-Algo-Information-Retrieval, I have only a set of words that is used in documents but I would like to know if
there was an effort on any algorithm to qualify these words ?
There different kinds of model:
- set-theoretic model
- documents are represented as set of words or phrases and similarity derives from the set-theoretic operations on those sets (I don’t understand so much this one for now..)
- Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval
- algebraic model
- documents and queries are represented as vectors (or matrices or tuples) and similarity is computed between query and document thanks to this representation
- Common techniques are Latent Semantic Indexing (LSI), Vector Space Model (several kinds)
- probabilistic model
- there is no particular representation for documents here. Similarity is computed using the probability the document is relavant for the query
- Common techniques are Latent Dirichlet Allocation or others
I’m more about using an algebraic model and maybe is there something on Latent Semantic Indexing?
I’m not sure, I explained my thinking well …
Regards,
--------------
Brice Govin
PhD student in RMoD research team at INRIA Lille Software Engineer at THALES AIR SYSTEMS Rungis ENSTA-Bretagne ENSI2014 22 Avenue du General Leclerc 92340 BOURG-LA-REINE
|
In reply to this post by abergel
Information theory is about quantifying and qualifying the content of an information in a data set.
Basically, it means that for a specific dataset I could say which data is interesting or not (according to the algorithm I use).
It is used in information retrieval (IR).
I should have started with that maybe…
Actually, I made mistake talking about information theory, it is more about information retrieval (my bad..). However, with the Moose-Algo-Information-Retrieval, I have only a set of words that is used in documents but I would like to know if
there was an effort on any algorithm to qualify these words ?
There different kinds of model:
- set-theoretic model
- documents are represented as set of words or phrases and similarity derives from the set-theoretic operations on those sets (I don’t understand so much this one for now..)
- Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval
- algebraic model
- documents and queries are represented as vectors (or matrices or tuples) and similarity is computed between query and document thanks to this representation
- Common techniques are Latent Semantic Indexing (LSI), Vector Space Model (several kinds)
- probabilistic model
- there is no particular representation for documents here. Similarity is computed using the probability the document is relavant for the query
- Common techniques are Latent Dirichlet Allocation or others
I’m more about using an algebraic model and maybe is there something on Latent Semantic Indexing?
I’m not sure, I explained my thinking well …
Regards,
--------------
Brice Govin
PhD student in RMoD research team at INRIA Lille Software Engineer at THALES AIR SYSTEMS Rungis ENSTA-Bretagne ENSI2014 22 Avenue du General Leclerc 92340 BOURG-LA-REINE
|
In reply to this post by BriceG
I am not expert in Latent Semantic Indexing, but it is very simple to implement. Give a try, we will help
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
|
In reply to this post by BriceG
Adrian Kuhn developed LSI for Moose in VW. Hapax. It should be around. Stef Le 16/6/16 à 09:05, Brice GOVIN a
écrit :
Information theory is about quantifying and qualifying the content of an information in a data set. |
Free forum by Nabble | Edit this page |