Hi All,
having just stumbled across the fact that only characters with codes from 0 to 255 are unique I wondered whether anyone has considered doing the following: Character addClassVarNamed: 'LargeCodeCharacters'. Character class methods for class initialization initialize [LargeCodeCharacters := WeakSet new Character class methods for instance creation value: anInteger "Answer the Character whose value is anInteger." | theCharacter existingInstanceOrNil | anInteger <= 255 ifTrue: [^CharacterTable at: anInteger + 1]. theCharacter := self basicNew setValue: anInteger. ^(existingInstanceOrNil := LargeCodeCharacters like: theCharacter) ifNil: [LargeCodeCharacters add: theCharacter] ifNotNil: [existingInstanceOrNil] Yes this has the potential to create a lot of space overhead, but only for artificial codes that enumerate over all characters. I suspect that for most cases the actual set of active characters would be quite small. (Alternatives that are indexed by integers might also work well, e.g. a flat WeakValueDictionary that used a WeakArray for its values). Just a thought... |
2008/1/28, Eliot Miranda <[hidden email]>:
> Hi All, > > having just stumbled across the fact that only characters with codes > from 0 to 255 are unique I wondered whether anyone has considered doing the > following: > > Character addClassVarNamed: 'LargeCodeCharacters'. > > > Character class methods for class initialization > initialize > [LargeCodeCharacters := WeakSet new > > > Character class methods for instance creation > value: anInteger > "Answer the Character whose value is anInteger." > > | theCharacter existingInstanceOrNil | > anInteger <= 255 ifTrue: > [^CharacterTable at: anInteger + 1]. > theCharacter := self basicNew setValue: anInteger. > ^(existingInstanceOrNil := LargeCodeCharacters like: theCharacter) > ifNil: [LargeCodeCharacters add: theCharacter] > ifNotNil: [existingInstanceOrNil] > > > Yes this has the potential to create a lot of space overhead, but only for > artificial codes that enumerate over all characters. I suspect that for > most cases the actual set of active characters would be quite small. > > (Alternatives that are indexed by integers might also work well, e.g. a flat > WeakValueDictionary that used a WeakArray for its values). Lord no, please no more Weak* collections. That was one of the major performance fixes we did in Seaside, kicking Weak* collections. They don't scale, they kill you in production. Cheers Philippe |
In reply to this post by Eliot Miranda-2
On 28-Jan-08, at 10:23 AM, Eliot Miranda wrote: > having just stumbled across the fact that only characters with > codes from 0 to 255 are unique I wondered whether anyone has > considered doing the following: What would be the motivation? Colin |
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
>> (Alternatives that are indexed by integers might also work well, e.g. a flat >> WeakValueDictionary that used a WeakArray for its values). > > Lord no, please no more Weak* collections. That was one of the major > performance fixes we did in Seaside, kicking Weak* collections. They > don't scale, they kill you in production. It's finalization that kills you, not weak collections per se. If it were then symbol management should cause the same issues. For the case in question you could shrink the character table on system startup / shutdown which would avoid the finalization issues. Cheers, - Andreas |
On Jan 28, 2008, at 21:51 , Andreas Raab wrote:
> Philippe Marschall wrote: >>> (Alternatives that are indexed by integers might also work well, >>> e.g. a flat >>> WeakValueDictionary that used a WeakArray for its values). >> Lord no, please no more Weak* collections. That was one of the major >> performance fixes we did in Seaside, kicking Weak* collections. They >> don't scale, they kill you in production. > > It's finalization that kills you, not weak collections per se. If > it were then symbol management should cause the same issues. For > the case in question you could shrink the character table on system > startup / shutdown which would avoid the finalization issues. Well, instances of characters are mostly temporary - Strings actually store binary numbers, not Character instances, they create Characters on the fly. One would have to measure the space and performance trade-offs of looking up unique Character instances vs. simply creating characters when needed. My hunch is it doesn't matter so is not worth the added complexity. - Bert - |
Making Character immediate like SmallInteger instead of OOP would
probably make a difference. Both unique and fast. For the price of added complexity in the VM. But Eliot must know this for sure. Nicolas Bert Freudenberg a écrit : > On Jan 28, 2008, at 21:51 , Andreas Raab wrote: > >> Philippe Marschall wrote: >>>> (Alternatives that are indexed by integers might also work well, >>>> e.g. a flat >>>> WeakValueDictionary that used a WeakArray for its values). >>> Lord no, please no more Weak* collections. That was one of the major >>> performance fixes we did in Seaside, kicking Weak* collections. They >>> don't scale, they kill you in production. >> >> It's finalization that kills you, not weak collections per se. If it >> were then symbol management should cause the same issues. For the case >> in question you could shrink the character table on system startup / >> shutdown which would avoid the finalization issues. > > Well, instances of characters are mostly temporary - Strings actually > store binary numbers, not Character instances, they create Characters on > the fly. > > One would have to measure the space and performance trade-offs of > looking up unique Character instances vs. simply creating characters > when needed. My hunch is it doesn't matter so is not worth the added > complexity. > > - Bert - > > > > |
In reply to this post by Eliot Miranda-2
Eliot,
> (Alternatives that are indexed by integers might also work well, e.g. a flat WeakValueDictionary that used a WeakArray > for its values). Yes, there would be no reason to do basicNew. But I need to ask the same question Colin asked; what would we gain? For most of the time, characters are in strings so there are not many real instances around. For writing a parser (hmm) it might make things a bit faster, but not much. Tagged immediate character objects would have been ok (like VisualWorks), but it puts some pressure on integers... -- Yoshiki |
Yoshiki Ohshima a écrit :
> Eliot, > >> (Alternatives that are indexed by integers might also work well, e.g. a flat WeakValueDictionary that used a WeakArray >> for its values). > > Yes, there would be no reason to do basicNew. > > But I need to ask the same question Colin asked; what would we gain? > For most of the time, characters are in strings so there are not many > real instances around. For writing a parser (hmm) it might make > things a bit faster, but not much. > > Tagged immediate character objects would have been ok (like > VisualWorks), but it puts some pressure on integers... > Sure, 1 bit less. This has to be thought again when 64 bits Squeak will spread. Nicolas > -- Yoshiki > > |
nicolas cellier a écrit :
> Yoshiki Ohshima a écrit : >> Eliot, >> >>> (Alternatives that are indexed by integers might also work well, e.g. >>> a flat WeakValueDictionary that used a WeakArray >>> for its values). >> >> Yes, there would be no reason to do basicNew. >> >> But I need to ask the same question Colin asked; what would we gain? >> For most of the time, characters are in strings so there are not many >> real instances around. For writing a parser (hmm) it might make >> things a bit faster, but not much. >> >> Tagged immediate character objects would have been ok (like >> VisualWorks), but it puts some pressure on integers... >> > > Sure, 1 bit less. > This has to be thought again when 64 bits Squeak will spread. > > Nicolas > But we do not need to reserve two tag bits for every case: xxx..xxx1 is a SmallInteger on 31 bits xxx..xx10 is a Character xxx..xx00 is an OOP This does not put pressure on integer Nicolas |
nicolas cellier wrote:
> But we do not need to reserve two tag bits for every case: > > xxx..xxx1 is a SmallInteger on 31 bits > xxx..xx10 is a Character > xxx..xx00 is an OOP > > This does not put pressure on integer You may want to read this post: http://lists.squeakfoundation.org/pipermail/vm-dev/2006-January/000429.html It outlines a similar approach except that it adds 64 immediate classes instead of one and in return reduces the number of available bits to 24 (which makes for nice 1x24, 2x12, 3x8, 4x6 usage patterns in characters, immediate points, short colors etc) Cheers, - Andreas |
Andreas Raab a écrit :
> nicolas cellier wrote: >> But we do not need to reserve two tag bits for every case: >> >> xxx..xxx1 is a SmallInteger on 31 bits >> xxx..xx10 is a Character >> xxx..xx00 is an OOP >> >> This does not put pressure on integer > > You may want to read this post: > > http://lists.squeakfoundation.org/pipermail/vm-dev/2006-January/000429.html Yes thanks > > It outlines a similar approach except that it adds 64 immediate classes > instead of one and in return reduces the number of available bits to 24 > (which makes for nice 1x24, 2x12, 3x8, 4x6 usage patterns in characters, > immediate points, short colors etc) > > Cheers, > - Andreas > > |
Free forum by Nabble | Edit this page |