I want to find a string in a very large number of strings (3 millions
and increasing). Should one use a simple Set ? -> means, that lots of memory is used and perhaps lots of RAM is needed in the GEM ... (total memory at least 40 Mbyte of data). Swapping ? Should I use an UnorderedCollection of Strings with Index (Equality) ? -> how do I set an index on a set with Strings ? -> perhaps like: aSetOfStrings createEqualityIndexOn: '' withLastElementClass: String ? Another problem is, that these strings-set change two times the day ... mostly adding strings. Any hint ? Marten -- Marten Feldtmann _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 03/18/2016 02:34 PM, [hidden email] via Glass wrote:
> I want to find a string in a very large number of strings (3 millions > and increasing). > > Should one use a simple Set ? > > -> means, that lots of memory is used and perhaps lots of RAM is needed > in the GEM ... (total memory at least 40 Mbyte of data). Swapping ? > > Should I use an UnorderedCollection of Strings with Index (Equality) ? > > -> how do I set an index on a set with Strings ? > > -> perhaps like: > aSetOfStrings createEqualityIndexOn: '' withLastElementClass: > String ? > > Another problem is, that these strings-set change two times the day ... > mostly adding strings. > > > Any hint ? What is the key by which you look up the string? If it is the entire string, (the string is 'foobar' and I know I want 'foobar') use a Set, this will be very efficient (but if you know the entire string, why do you need to look it up at all?) If it is a prefix of the string (the string is 'foobar' but I only know I want the string that starts 'foo') use an index. If you want to look up from some substring (the string is 'foobar' and I know only that I want a string that contains 'oba') you might want to build a more complex structure. Regards, -Martin _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Am 18.03.2016 um 23:31 schrieb Martin McClure:
> > If it is the entire string, (the string is 'foobar' and I know I want > 'foobar') use a Set, this will be very efficient (but if you know the > entire string, why do you need to look it up at all?) The information is: is the string present within that set ... thats all. I think I will start with a set - though I thought its a waste of memory to have the whole stuff loaded into the gem memory and set operations are memory based ... -- Marten Feldtmann _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 03/19/2016 12:11 AM, [hidden email] wrote:
> Am 18.03.2016 um 23:31 schrieb Martin McClure: > >> >> If it is the entire string, (the string is 'foobar' and I know I want >> 'foobar') use a Set, this will be very efficient (but if you know the >> entire string, why do you need to look it up at all?) > > The information is: is the string present within that set ... thats > all. I think I will start with a set - though I thought its a waste of > memory to have the whole stuff loaded into the gem memory and set > operations are memory based ... > It sounds like a Set is ideal, then. Memory should not be a problem when doing lookups, even with very large sets. When doing a lookup, first the hash of the string to be looked up is calculated. This indicates where in the set the string will be, if it is present. Then only a small portion of the set, including that position, is faulted into memory and the lookup completed. The entire set does not ever need to be in memory at once. Regards, -Martin _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |