[squeak-dev] [slightly OT] Searching List of items to find abbreviations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] [slightly OT] Searching List of items to find abbreviations

David Zmick
I am doing this python, but I would like you advice on it.

I am given a list like:
1: JCT BOX JS27 T 88-F-1
2: JCT BOX JS-2713 EE-15.2
3: JCT BOX JS32 H 116 A C 14.5
4: JCT BOX JS28 T 120-N-11
5: JCT BOX JS28 T-120-N-11
6: JCT BOX JS32 H 116 A C-14.5
7: JUNCTION BOX JS32 H 116 A C-14.5
and i need to find "similar" items, i have already written the part of my script to find the similar items, for example, line 4 and 5 are similar. i used a simple Regular Expression that was generated after each line.

so say you line was:
ASD 123
the regex would be
A[!-/\s]?S[!-/\s]?D[!-/\s]?1[!-/\s]?2[!-/\s]?3[!-/\s]?
this finds anything that may be similar due to punctuation.

the next step is to find lines that are similar based on abbreviations, so I would be able to match lines with JUNCTION and JCT, then check the results from that match against the results from the first match and find the most likely candidates for similarities.  I have tried this:

use a regular expression built from the letters in an abbreviation, eg, JCT would look like J.*[C]?.*[T]? so that the expression would find anything that had those letters in it in that order, with anything in between, but that does not work, any ideas?

--
David Zmick
/dz0004455\
http://david-zmick.co.cc


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] [slightly OT] Searching List of items to find abbreviations

K. K. Subramaniam
On Saturday 11 Jul 2009 9:10:37 am David Zmick wrote:

> I am given a list like:
> 1: JCT BOX JS27 T 88-F-1
> 2: JCT BOX JS-2713 EE-15.2
> 3: JCT BOX JS32 H 116 A C 14.5
> 4: JCT BOX JS28 T 120-N-11
> 5: JCT BOX JS28 T-120-N-11
> 6: JCT BOX JS32 H 116 A C-14.5
> 7: JUNCTION BOX JS32 H 116 A C-14.5
> and i need to find "similar" items, i have already written the part of my
> script to find the similar items, for example, line 4 and 5 are similar
You can pose such questions to "Method FInder" (World Menu->open->Method
Finder) to discover options. I got a whole bunch of options for:
  'JUNCTION' . 'JCT' . true

See http://wiki.squeak.org/squeak/1916

HTH .. Subbu