while working with a large Dictionary with URI keys, I stumbled on URI's
implementation of #hash and #= = aURI ^self class = aURI class and: [self asString = aURI asString] URI>>hash ^ self asString hash needless to say that both are horrible with regards to performance, especially #hash. I've started to implement = and hash as extensions in some subclasses (FileURL, HttpURL), but IMHO a more sane implementation should be in the base image. _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Hi,
Witch version of VW are you running on? A huge works on Hashing in VW has been done on 7.6, so it's better for you to wait for the publication of the 7.6 NC version that should happen on short time. They probably solved you problem. ciao Giorgio On Wed, Apr 23, 2008 at 1:12 PM, Holger Kleinsorgen <[hidden email]> wrote: while working with a large Dictionary with URI keys, I stumbled on URI's _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
oops, obviously was <span onclick="dr4sdgryt(event)">which... not a witch..
On Wed, Apr 23, 2008 at 3:10 PM, giorgio ferraris <[hidden email]> wrote: Hi, _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by giorgiof
No, that didn't change in 7.6.
At 09:10 AM 4/23/2008, giorgio ferraris wrote: Hi,
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc --
Alan Knight [|], Engineering Manager, Cincom Smalltalk
_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
In reply to this post by Holger Kleinsorgen-3
Holger,
Would you mind sending me a dataset of reasonable size so I can look at this? Thanks, Andres. Holger Kleinsorgen wrote: > while working with a large Dictionary with URI keys, I stumbled on URI's > implementation of #hash and #= > > = aURI > ^self class = aURI class > and: [self asString = aURI asString] > > URI>>hash > ^ self asString hash > > needless to say that both are horrible with regards to performance, > especially #hash. I've started to implement = and hash as extensions in > some subclasses (FileURL, HttpURL), but IMHO a more sane implementation > should be in the base image. > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Holger,
I took a quick look and I have a feeling that you are running into asString being very expensive, particularly because the resulting string will be hashed and then thrown away. I did a small experiment with FileURL, and it's not hard to get a 3x performance boost by not creating the string at all and just hashing the stuff the string is manufactured from. Other URLs see more or less the same speedup boost. An improved PartialURL hash method runs almost 3x faster. And, after fixing a number of bugs in URLwithAuthority, the speedup factor there was about 4.7x. Is this what you had in mind? Andres. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Andres Valloud Sent: Wednesday, April 23, 2008 9:29 AM To: [hidden email] Subject: Re: [vwnc] URI hash and = Holger, Would you mind sending me a dataset of reasonable size so I can look at this? Thanks, Andres. Holger Kleinsorgen wrote: > while working with a large Dictionary with URI keys, I stumbled on > URI's implementation of #hash and #= > > = aURI > ^self class = aURI class > and: [self asString = aURI asString] > > URI>>hash > ^ self asString hash > > needless to say that both are horrible with regards to performance, > especially #hash. I've started to implement = and hash as extensions > in some subclasses (FileURL, HttpURL), but IMHO a more sane > implementation should be in the base image. > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Valloud, Andres schrieb:
> Holger, > > I took a quick look and I have a feeling that you are running into > asString being very expensive, particularly because the resulting string > will be hashed and then thrown away. > given the new string hash in 7.6, results should be less devastating (I encountered the problem in 7.4.1). the problem was also boosted by the fact the many of the URIs only differed in the fragment part, e.g. http://www.my-hostname-is-longer-than-yours.com/insane-ontology#topic http://www.my-hostname-is-longer-than-yours.com/insane-ontology#another_topic and so on. the string conversion overhead is also visible. I have to access the dictionary quite frequently, so implementing hash / = without asString should still benefit from the performance improvements you've noticed. > I did a small experiment with FileURL, and it's not hard to get a 3x > performance boost by not creating the string at all and just hashing the > stuff the string is manufactured from. > > Other URLs see more or less the same speedup boost. An improved > PartialURL hash method runs almost 3x faster. And, after fixing a > number of bugs in URLwithAuthority, the speedup factor there was about > 4.7x. > > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Ok, I will look into the = methods too then.
Andres. Holger Kleinsorgen wrote: > Valloud, Andres schrieb: > >> Holger, >> >> I took a quick look and I have a feeling that you are running into >> asString being very expensive, particularly because the resulting string >> will be hashed and then thrown away. >> >> > given the new string hash in 7.6, results should be less devastating (I > encountered the problem in 7.4.1). > > the problem was also boosted by the fact the many of the URIs only > differed in the fragment part, e.g. > > http://www.my-hostname-is-longer-than-yours.com/insane-ontology#topic > http://www.my-hostname-is-longer-than-yours.com/insane-ontology#another_topic > > and so on. > > the string conversion overhead is also visible. I have to access the > dictionary quite frequently, so implementing hash / = without asString > should still benefit from the performance improvements you've noticed. > >> I did a small experiment with FileURL, and it's not hard to get a 3x >> performance boost by not creating the string at all and just hashing the >> stuff the string is manufactured from. >> >> Other URLs see more or less the same speedup boost. An improved >> PartialURL hash method runs almost 3x faster. And, after fixing a >> number of bugs in URLwithAuthority, the speedup factor there was about >> 4.7x. >> >> >> > coolness > _______________________________________________ > vwnc mailing list > [hidden email] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc > > _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |