[vwnc] a simple TextHyphenator, IndexedBinarySearchTree

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] a simple TextHyphenator, IndexedBinarySearchTree

Thomas Schrader
Hi all,

I wish everybody the best for the still new year.

I thought it a part of the Collection hierarchy, but I cannot find stuff for indexed binary searching.

The problem is, I've got around 10.000 hard coded character patterns which I want to match against natural language words to find the potential hyphenation points. In simplified terms, a pattern like 'put-er' puts the $- to the word 'comput-er', for example.

Currently I'm dully matching every single pattern to each word by string searching which costs me perhaps 20 mSec each. This *needs* to be faster by a factor of 10 at least. Caching words helps a lot, but this is IMHO not a solution.

How would you implement a faster solution? How can I use existing collections for that purpose?

Does anybody disagree with the hyphenation-pattern method at all?

Thank you for helping

Thomas J. Schrader

--

mailto thomas j schrader at web de

________________________________________________________________________
Kostenlos tippen, täglich 1 Million gewinnen: zum WEB.DE MillionenKlick!
http://produkte.web.de/go/08/


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] a simple TextHyphenator, IndexedBinarySearchTree

Georg Heeg
Thomas,

What I would try? I would use a variant of the trie algorithm. This
algorithm has been use by the award winning application of the 2008 dynamic
languages shootout. You can find information in in the paragraph "Seaside
mit Smalltalk gewinnt Dynamic Languages Shootout des Java-Spektrums: " on
the bottom of the starting page www.heeg.de. As far as I remember the
complete implementation including the trie is available on public store.

Georg

Georg Heeg eK, Dortmund und Köthen, HR Dortmund A 12812
Tel. +49-3496-214328, Fax +49-3496-214712

> -----Ursprüngliche Nachricht-----
> Von: [hidden email] [mailto:[hidden email]] Im
> Auftrag von Thomas Schrader
> Gesendet: Mittwoch, 27. Januar 2010 10:25
> An: [hidden email]
> Betreff: [vwnc] a simple TextHyphenator, IndexedBinarySearchTree
>
> Hi all,
>
> I wish everybody the best for the still new year.
>
> I thought it a part of the Collection hierarchy, but I cannot find stuff
for indexed
> binary searching.
>
> The problem is, I've got around 10.000 hard coded character patterns which
I
> want to match against natural language words to find the potential
hyphenation
> points. In simplified terms, a pattern like 'put-er' puts the $- to the
word
> 'comput-er', for example.
>
> Currently I'm dully matching every single pattern to each word by string
> searching which costs me perhaps 20 mSec each. This *needs* to be faster
by a
> factor of 10 at least. Caching words helps a lot, but this is IMHO not a
solution.
>
> How would you implement a faster solution? How can I use existing
collections

> for that purpose?
>
> Does anybody disagree with the hyphenation-pattern method at all?
>
> Thank you for helping
>
> Thomas J. Schrader
>
> --
>
> mailto thomas j schrader at web de
>
> _________________________________________________________________
> _______
> Kostenlos tippen, täglich 1 Million gewinnen: zum WEB.DE MillionenKlick!
> http://produkte.web.de/go/08/
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] a simple TextHyphenator, IndexedBinarySearchTree

Thomas Schrader
In reply to this post by Thomas Schrader
> I would use a variant of the trie algorithm.  Cool! Fast!!  Thanks a lot.  Cheers  Thomas J. Schrader___________________________________________________________GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] a simple TextHyphenator, IndexedBinarySearchTree

Mark Pirogovsky-3
In reply to this post by Thomas Schrader
Thomas,

Have you looked at the VW package   "UIBasics-Internationalization"
which has support for the indexed searches in the Message catalogs?  
Take a look at the classes UserMessage and
IndexedFileMessageCatalog maybe you will find what you need there.

Thomas Schrader wrote:

> Hi all,
>
> I wish everybody the best for the still new year.
>
> I thought it a part of the Collection hierarchy, but I cannot find stuff for indexed binary searching.
>
> The problem is, I've got around 10.000 hard coded character patterns which I want to match against natural language words to find the potential hyphenation points. In simplified terms, a pattern like 'put-er' puts the $- to the word 'comput-er', for example.
>
> Currently I'm dully matching every single pattern to each word by string searching which costs me perhaps 20 mSec each. This *needs* to be faster by a factor of 10 at least. Caching words helps a lot, but this is IMHO not a solution.
>
> How would you implement a faster solution? How can I use existing collections for that purpose?
>
> Does anybody disagree with the hyphenation-pattern method at all?
>
> Thank you for helping
>
> Thomas J. Schrader
>
> --
>
> mailto thomas j schrader at web de
>
> ________________________________________________________________________
> Kostenlos tippen, täglich 1 Million gewinnen: zum WEB.DE MillionenKlick!
> http://produkte.web.de/go/08/
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>
>
>    

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc