Greetings fellow ghosts of GemStone past. It's been nice seeing some of your names in my inbox again after all these years. SmallDateAndTime sounds like a great idea. Back in 2004 I posted on predecessor to this list about canonicalizing dates and curve descriptors. I don't think there's an archive available anywhere for those older posts so including it again below in case anyone's interested. BTW, the work described was on JPMorgan's Kapital system. See slide 8 of ESUG talk a few months after post below - http://www.esug.org/data/ESUG2004/ValueOfSmalltalk.pdf -Keith ----- Forwarded Message -----
From: Keith Piraino <[hidden email]> To: "[hidden email]" <[hidden email]> Sent: Monday, January 12, 2004, 05:52:22 PM EST Subject: Canonicalization + 2 Spaces I’ve been working on canonicalizing some objects recently in a GemStone/VisualWorks system. I haven’t seen discussion in the past about some of the 2 space issues that come up in this context when you’re dealing with both GemStone and a client image. I’ll describe the work we’ve done, and I’d be interested in comments from anyone about how they’ve tackled similar issues… The first phase of this work involved dates. We ran a scan and found that we had 15 million date instances in one of our databases, but they really only represented 17,000 different days. The few hundred MB wasted in our (much larger) databases isn’t great, but our bigger concern was the tens of MB of memory these duplicate instances took up in images when we faulted them in. The canonicalization on each side was simple enough. The range of dates we’re interested in is 200 years, amounting to about 70K instances. In GS we pre-build an array of the canonical instances and the # days since January 1, 1901 is the index into the array. In VW we have a similar array that is lazily populated as needed. The tricky part is the mapping between the two and supporting “independent creation” in VW. We don’t want to have to fault in all 70K dates up front or worse have our VW date creation code forwarding into the gem at arbitrary points to find the right instance. Tests faulting all the dates added 30 seconds to our login time, which is definitely not desirable. Instead we override the faulting and flushing behavior on dates. We override #newFromGSObjectReport: and parse the report to get the offset into the canonical array. If a corresponding VW date has already been created we map to that instead of the instance in the report. If it’s a new instance to that image we just ensure it ends up in the local canonical array. We hook into flushing by using #asGSObjectInSession:. During the first flush we use #privatePerform: to retrieve the encoded oops of all 70K canonical dates. As individual dates are flushed we can then create the appropriate GbsObjects and map them. This way even if the date instance is created locally in VW it will always end up resolving to the single corresponding canonical instance in GS. Faulting 70K encoded oops only takes about a second since they’re SmallIntegers. We process the report ourselves to avoid intermediate GbsObjects which speeds things up a little more. The next phase of this worked involved objects that function as multi-part keys in our application. They hold more complex data but are always uniquely identified by a name (Symbol). Years of application code have relied on the fact that these objects are canonical, and you’ll never have more than one with the same name. Comparisons use ==, not =. Until recently this canonicalization was maintained by storing the instances in multi-level dictionaries that were faulted into the image. This approach became problematic as the number of instances increased and a new requirement came along to allow new keys to be generated at any time, not at defined points. Some of the basics of our new solution are similar to the date approach. There’s a canonical structure on each side (dictionary) that is not replicated. When we fault an object we check for an existing VW instance and if necessary map to that instead. One new wrinkle on the faulting side is stubs. Since the application relies so heavily on identity comparison we have to handle the case where the object was created locally and registered in the image side dictionary, and later we attempt to create and map a stub for the real persistent object. If we allow the stub to be created we effectively have two of our keys with the same name that are no longer identical. To prevent this we’ve hacked even more deeply into core replication methods like #clientObject:namedBuffer:indexableBuffer:slot:lookupOop:forwarder:secondPassLog:cached:keeper:. If we’re about to create a stub for an instance one of our key classes we first use fetch operations to retrieve the name, which is one of the inst var values. (Note that #privateExecute: at this point can cause moreTraversal errors). We then check the local dictionary and if an instance with that name has already been created we resolve the replication to that instance instead of creating a stub. Otherwise we allow the stub to be created, but then add the stub to the local dictionary. On the flushing side we use #privateExecute: to see if the object exists already in the persistent dictionary. If it does we return the encoded oop without actually reading the object’s data page using #_instVarAsEncodedOop: (thanks Norm). From there we can just create a GbsObject and map just like for dates. If the object doesn’t exist things get trickier. We have to ensure that it gets added to the persistent GS dictionary. In order to get this right in the case of things like concurrency conflicts and various failure scenarios knowledge of these lazily flushed instances had to be embedded into our transaction framework. The end result is that these objects can just be created on the fly in any image (or gem for that matter) and we always guarantee canonicalization in both spaces. We’re happy with the result but curious if anyone has addressed this in a way that involved diving less deeply into GBS… Thanks - Keith __________________________________ Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes _______________________________________________ GemStone-Smalltalk mailing list [hidden email] https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk |
Hi Keith, Nice to hear from you. I had also implemented Date instance canonicalization much as you described. It was a big improvement for the project at the time because our application model had previously replicated many dates into GBS caches. Didn't gemtalk make Date instances immediate objects since maybe around GS 3.0? For DateTime replication tuning the storage of integers was better than canonical instances but required consistent use of conversion accessors. Application code in GS could potentially consume many oops for the transient DateTime instances that might get created but in practice that was not a problem where tuned this way. Having DateTime instances as immediates would reduce the chance that the accessor trick wouldn't be most efficient for some application usage scenarios in gem. Oh, speaking of replication tuning, one of the big improvements I achieved was through self generating custom replication specs. In tuning mode the replication was one level deep (except some collections) and stubs that got faulted for a declared context/replication would get recorded into contextual replication specs. It was an iterative tuning process but the result was that replication would include no more than was needed and had only deliberate stub faulting later. Avoiding growth of GBS caches had become mission critical, and this achieved it until application code could be reimplemented to run in gems alone. The loss of efficient copy replication had made this necessary, but it was also used tune VW+GBS applications to consistently offer near instant response times. Looking back, one of the biggest problems with smalltalk in general was that people could too easily write inefficient code. For me it became a full time job cleaning and tuning code that got produced at a fast rate for releases. At least the riggors of C coding brought attention to efficiency from the start. I'm happy to be retired now. Paul Baumann On Sun, Jun 28, 2020, 10:35 AM Keith Piraino via GemStone-Smalltalk <[hidden email]> wrote:
_______________________________________________ GemStone-Smalltalk mailing list [hidden email] https://lists.gemtalksystems.com/mailman/listinfo/gemstone-smalltalk |
Free forum by Nabble | Edit this page |