I am doing this python, but I would like you advice on it.
I am given a list like:
1: JCT BOX JS27 T 88-F-1
2: JCT BOX JS-2713 EE-15.2
3: JCT BOX JS32 H 116 A C 14.5
4: JCT BOX JS28 T 120-N-11
5: JCT BOX JS28 T-120-N-11
6: JCT BOX JS32 H 116 A C-14.5
7: JUNCTION BOX JS32 H 116 A C-14.5
and i need to find "similar" items, i have already written the part of my script to find the similar items, for example, line 4 and 5 are similar. i used a simple Regular Expression that was generated after each line.
so say you line was:
ASD 123
the regex would be
A[!-/\s]?S[!-/\s]?D[!-/\s]?1[!-/\s]?2[!-/\s]?3[!-/\s]?
this finds anything that may be similar due to punctuation.
the next step is to find lines that are similar based on abbreviations, so I would be able to match lines with JUNCTION and JCT, then check the results from that match against the results from the first match and find the most likely candidates for similarities. I have tried this:
use a regular expression built from the letters in an abbreviation, eg, JCT would look like J.*[C]?.*[T]? so that the expression would find anything that had those letters in it in that order, with anything in between, but that does not work, any ideas?
--
David Zmick
/dz0004455\
http://david-zmick.co.cc