Smalltalk › Pharo › Pharo Smalltalk Users

Information Theory

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

BriceG

Information Theory

Hi,

I'd like to know if someone did a work on information theory algorithm ? I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

stepharo

Re: Information Theory

Le 15/6/16 à 19:03, Brice GOVIN a écrit :

Hi,
I'd like to know if someone did a work on information theory algorithm ?

tell us more.
What is it?

I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

abergel

Re: Information Theory

Yes, tell us more.

This is an interesting topic

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Jun 15, 2016, at 4:38 PM, stepharo <[hidden email]> wrote:

Le 15/6/16 à 19:03, Brice GOVIN a écrit :

Hi,
I'd like to know if someone did a work on information theory algorithm ?

tell us more.
What is it?

I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

BriceG

Re: Information Theory

Information theory is about quantifying and qualifying the content of an information in a data set.

Basically, it means that for a specific dataset I could say which data is interesting or not (according to the algorithm I use).

It is used in information retrieval (IR).

I should have started with that maybe…

Actually, I made mistake talking about information theory, it is more about information retrieval (my bad..). However, with the Moose-Algo-Information-Retrieval, I have only a set of words that is used in documents but I would like to know if there was an effort on any algorithm to qualify these words ?

There different kinds of model:

- set-theoretic model

- documents are represented as set of words or phrases and similarity derives from the set-theoretic operations on those sets (I don’t understand so much this one for now..)

- Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval

- algebraic model

- documents and queries are represented as vectors (or matrices or tuples) and similarity is computed between query and document thanks to this representation

- Common techniques are Latent Semantic Indexing (LSI), Vector Space Model (several kinds)

- probabilistic model

- there is no particular representation for documents here. Similarity is computed using the probability the document is relavant for the query

- Common techniques are Latent Dirichlet Allocation or others

I’m more about using an algebraic model and maybe is there something on Latent Semantic Indexing?

I’m not sure, I explained my thinking well …

Regards,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

On 15 Jun 2016, at 22:54, Alexandre Bergel <[hidden email]> wrote:

Yes, tell us more.
This is an interesting topic

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Jun 15, 2016, at 4:38 PM, stepharo <[hidden email]> wrote:

Le 15/6/16 à 19:03, Brice GOVIN a écrit :

Hi,
I'd like to know if someone did a work on information theory algorithm ?

tell us more.
What is it?

I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

BriceG

Re: Information Theory

In reply to this post by abergel

Information theory is about quantifying and qualifying the content of an information in a data set.

Basically, it means that for a specific dataset I could say which data is interesting or not (according to the algorithm I use).

It is used in information retrieval (IR).

I should have started with that maybe…

There different kinds of model:

- set-theoretic model

- documents are represented as set of words or phrases and similarity derives from the set-theoretic operations on those sets (I don’t understand so much this one for now..)

- Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval

- algebraic model

- documents and queries are represented as vectors (or matrices or tuples) and similarity is computed between query and document thanks to this representation

- Common techniques are Latent Semantic Indexing (LSI), Vector Space Model (several kinds)

- probabilistic model

- there is no particular representation for documents here. Similarity is computed using the probability the document is relavant for the query

- Common techniques are Latent Dirichlet Allocation or others

I’m more about using an algebraic model and maybe is there something on Latent Semantic Indexing?

I’m not sure, I explained my thinking well …

Regards,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

On 15 Jun 2016, at 22:54, Alexandre Bergel <[hidden email]> wrote:

Yes, tell us more.
This is an interesting topic

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Jun 15, 2016, at 4:38 PM, stepharo <[hidden email]> wrote:

Le 15/6/16 à 19:03, Brice GOVIN a écrit :

Hi,
I'd like to know if someone did a work on information theory algorithm ?

tell us more.
What is it?

I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

abergel

Re: Information Theory

In reply to this post by BriceG

I am not expert in Latent Semantic Indexing, but it is very simple to implement. Give a try, we will help

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Jun 16, 2016, at 3:05 AM, Brice GOVIN <[hidden email]> wrote:

Information theory is about quantifying and qualifying the content of an information in a data set.
Basically, it means that for a specific dataset I could say which data is interesting or not (according to the algorithm I use).

It is used in information retrieval (IR).

I should have started with that maybe…

Actually, I made mistake talking about information theory, it is more about information retrieval (my bad..). However, with the Moose-Algo-Information-Retrieval, I have only a set of words that is used in documents but I would like to know if there was an effort on any algorithm to qualify these words ?

There different kinds of model:

- set-theoretic model

- documents are represented as set of words or phrases and similarity derives from the set-theoretic operations on those sets (I don’t understand so much this one for now..)

- Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval

- algebraic model

- documents and queries are represented as vectors (or matrices or tuples) and similarity is computed between query and document thanks to this representation

- Common techniques are Latent Semantic Indexing (LSI), Vector Space Model (several kinds)

- probabilistic model

- there is no particular representation for documents here. Similarity is computed using the probability the document is relavant for the query

- Common techniques are Latent Dirichlet Allocation or others

I’m more about using an algebraic model and maybe is there something on Latent Semantic Indexing?

I’m not sure, I explained my thinking well …

Regards,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

On 15 Jun 2016, at 22:54, Alexandre Bergel <[hidden email]> wrote:

Yes, tell us more.
This is an interesting topic

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Jun 15, 2016, at 4:38 PM, stepharo <[hidden email]> wrote:

Le 15/6/16 à 19:03, Brice GOVIN a écrit :

Hi,
I'd like to know if someone did a work on information theory algorithm ?

tell us more.
What is it?

I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

stepharo

Re: Information Theory

In reply to this post by BriceG

Adrian Kuhn developed LSI for Moose in VW. Hapax.

It should be around.
I can try to find it if we do not find in Moose Pharo

Stef

Le 16/6/16 à 09:05, Brice GOVIN a écrit :

Information theory is about quantifying and qualifying the content of an information in a data set.
Basically, it means that for a specific dataset I could say which data is interesting or not (according to the algorithm I use).

It is used in information retrieval (IR).

I should have started with that maybe…

Actually, I made mistake talking about information theory, it is more about information retrieval (my bad..). However, with the Moose-Algo-Information-Retrieval, I have only a set of words that is used in documents but I would like to know if there was an effort on any algorithm to qualify these words ?

There different kinds of model:

- set-theoretic model

- documents are represented as set of words or phrases and similarity derives from the set-theoretic operations on those sets (I don’t understand so much this one for now..)

- Common techniques are Boolean Model (several kinds) and Fuzzy Retrieval

- algebraic model

- documents and queries are represented as vectors (or matrices or tuples) and similarity is computed between query and document thanks to this representation

- Common techniques are Latent Semantic Indexing (LSI), Vector Space Model (several kinds)

- probabilistic model

- there is no particular representation for documents here. Similarity is computed using the probability the document is relavant for the query

- Common techniques are Latent Dirichlet Allocation or others

I’m more about using an algebraic model and maybe is there something on Latent Semantic Indexing?

I’m not sure, I explained my thinking well …

Regards,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE

On 15 Jun 2016, at 22:54, Alexandre Bergel <[hidden email]> wrote:

Yes, tell us more.
This is an interesting topic

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

On Jun 15, 2016, at 4:38 PM, stepharo <[hidden email]> wrote:

Le 15/6/16 à 19:03, Brice GOVIN a écrit :

Hi,
I'd like to know if someone did a work on information theory algorithm ?

tell us more.
What is it?

I saw a package in Moose about information theory but it is just a kind of document indexation.

Is there something more complete (quantities information)?

Thanks,

--------------

Brice Govin
PhD student in RMoD research team at INRIA Lille
Software Engineer at THALES AIR SYSTEMS Rungis
ENSTA-Bretagne ENSI2014
22 Avenue du General Leclerc 92340 BOURG-LA-REINE