Learning algorythms

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Learning algorythms

Günther Schmidt
Hi,

I need to create a parser that can identify keywords by training it.

I have records of books (from a catalog system), and they've got keyword
fields that contain 1-, 2- and 3-word keywords like:

'4th century history Europe revolutinary history of Germany feminist
movement'

The keywords in the record above should then be:

        '4th century history'
        'Europe'
        'revolutionary history of Germany'
        'feminist movement'

The keywords are only separated by whitespace, not even commas. So I'd
have to initially train the parser until it is capabel of identifying
more and more keywords by itself without having to ask me.

Any ideas folks?

Günther


Reply | Threaded
Open this post in threaded view
|

Re: Learning algorythms

Chris Uppal-3
Günther Schmidt wrote:

> I have records of books (from a catalog system), and they've got keyword
> fields that contain 1-, 2- and 3-word keywords like:
>
> '4th century history Europe revolutinary history of Germany feminist
> movement'
>
> The keywords in the record above should then be:
>
> '4th century history'
> 'Europe'
> 'revolutionary history of Germany'
> 'feminist movement'

One idea would be to measure how strongly associated word pairs were, e.g. if
'feminist' is usually followed by 'history' then that's part of a multi-word
keyword, whereas 'feminist' is rarely followed by 'giraffe', so those are not
part of the same keyword.

Then the parsing would be:
    get next word
    does it "usually" follow the previous word (> some threshold) ?
        ifTrue: still in same keyword sequence
        ifFalse: finish previous sequence and start new one

The hardest part would be to find a suitable threshold.  I suspect that if you
want to be formal about it then you'll have to get into (Bayesian?) statistics,
but I'd hope[*] that's not necessary.

([*] because I /loath/ stats, and am very bad at it ;-)

    -- chris

Simple implementation -- untested:

    follows := LookupTable new.
    records do:
        [:record || words |
        words := record subStrings.
        all addAll: words.
        1 to: words size-1 do:
            [:i || pair count |
            pair := Array with: (words at: i) with:  (words at: i+1)
            count := follows at: pair ifAbsentPut: [0].
            follows at: pair put: count+1]].


Reply | Threaded
Open this post in threaded view
|

Re: Learning algorythms

Janos Kazsoki
In reply to this post by Günther Schmidt
Günther,

yes, one approach could be similar to the "animal game" in the
Tutorials, where you build a "knowledge base", bacically a decision
tree, and you insert in each turn the new keywords.

The other one is much more complicated it involves nearly all areas of
AI, like machine learning, data mining, Bayes nets, semantic nets,
natural language processing, perhaps fuzzy, rule based expert systems
and so on, and so on..

I hope this helps,
Janos