Smalltalk › Usenets › Dolphin Smalltalk

Learning algorythms

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

Günther Schmidt

Learning algorythms

Hi,

I need to create a parser that can identify keywords by training it.

I have records of books (from a catalog system), and they've got keyword
fields that contain 1-, 2- and 3-word keywords like:

'4th century history Europe revolutinary history of Germany feminist
movement'

The keywords in the record above should then be:

'4th century history'
'Europe'
'revolutionary history of Germany'
'feminist movement'

The keywords are only separated by whitespace, not even commas. So I'd
have to initially train the parser until it is capabel of identifying
more and more keywords by itself without having to ask me.

Any ideas folks?

Günther

Chris Uppal-3

Re: Learning algorythms

Günther Schmidt wrote:

> I have records of books (from a catalog system), and they've got keyword
> fields that contain 1-, 2- and 3-word keywords like:
>
> '4th century history Europe revolutinary history of Germany feminist
> movement'
>
> The keywords in the record above should then be:
>
> '4th century history'
> 'Europe'
> 'revolutionary history of Germany'
> 'feminist movement'

One idea would be to measure how strongly associated word pairs were, e.g. if
'feminist' is usually followed by 'history' then that's part of a multi-word
keyword, whereas 'feminist' is rarely followed by 'giraffe', so those are not
part of the same keyword.

Then the parsing would be:
get next word
does it "usually" follow the previous word (> some threshold) ?
ifTrue: still in same keyword sequence
ifFalse: finish previous sequence and start new one

The hardest part would be to find a suitable threshold. I suspect that if you
want to be formal about it then you'll have to get into (Bayesian?) statistics,
but I'd hope[*] that's not necessary.

([*] because I /loath/ stats, and am very bad at it ;-)

-- chris

Simple implementation -- untested:

follows := LookupTable new.
records do:
[:record || words |
words := record subStrings.
all addAll: words.
1 to: words size-1 do:
[:i || pair count |
pair := Array with: (words at: i) with: (words at: i+1)
count := follows at: pair ifAbsentPut: [0].
follows at: pair put: count+1]].

Janos Kazsoki

Re: Learning algorythms

In reply to this post by Günther Schmidt

Günther,

yes, one approach could be similar to the "animal game" in the
Tutorials, where you build a "knowledge base", bacically a decision
tree, and you insert in each turn the new keywords.

The other one is much more complicated it involves nearly all areas of
AI, like machine learning, data mining, Bayes nets, semantic nets,
natural language processing, perhaps fuzzy, rule based expert systems
and so on, and so on..

I hope this helps,
Janos