Hi All,
SparseLargeTable appears not to be sparse at all. If you look at SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated instances and then make them sparse by sending zapDefaultOnlyEntries.
It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the existing approach won't work.
I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize. Best Eliot
|
At Sat, 1 Nov 2008 12:12:21 -0700,
Eliot Miranda wrote: > > [1 <multipart/alternative (7bit)>] > [1.1 <text/plain; ISO-8859-1 (7bit)>] > > [1.2 <text/html; ISO-8859-1 (7bit)>] > Hi All, > > SparseLargeTable appears not to be sparse at all. If you look at SparseLargeTable>> > initChunkSize:size:arrayClass:base:defaultValue: you'll see tat each bin is initialized with an instance of the base > class rather than being filled lazily. Further, instead of pvtNoCheckAt:put: lazily creating an entry in each bin as > needed it simply throws away the write if the bin is empty. Instead the usage pattern is to create fully-populated > instances and then make them sparse by sending zapDefaultOnlyEntries. Correct. As you saw, it is designed for the usage pattern of creating a read-only table once. > It seems to me that SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: should leave bins empty > until pvtNoCheckAt:put: puts other than the default value. Since I want to create a sparse table with 2^32 entries the > existing approach won't work. Ah ha. > I wondered whether anyone else had already fixed this or whether there's a good reason not to lazily initialize. I'm not aware of anybody who fixed. Well, if you need to hold 2^32 entries, you should do whatever it takes and doing the lazy initialization seems to be good... -- Yoshiki |
On Sat, Nov 1, 2008 at 2:49 PM, Yoshiki Ohshima <[hidden email]> wrote: At Sat, 1 Nov 2008 12:12:21 -0700, Here's what I came up with. Note the use of \\ in noCheckAt:[put:] which avoids an extra arithmetic op (a multiplication). You might consider replacing the relevant parts of SparseLargeTable with this as I don't see the utility of filling the bins eagerly.
SparseLargeArray.st (3K) Download Attachment |
Yes, thank you, this is a good idea to restore lazy initialization.
Current strategy is strange. Reading current code, I wonder if the intention was to make an access (self noCheckAt: size + 1) fail when (size \\ chunkSize ~= 0) ? but this should already be handled by pvtCheckIndex: ... Note that using \\ is preferable for code readability, but not for performance if you use LargeInteger: | int32 q r | int32 := 1 << 31. { [q := int32 - 1 // 1024 + 1. r := int32 - 1 \\ 1024 + 1] bench. [q := int32 - 1 // 1024 + 1. r := int32 - 1 - (q-1*1024) + 1] bench. }. #('30482.50349930014 per second.' '42663.0673865227 per second.') Yes because, primitive 31 fails and super revert to the second form which was inlined in SparseLargeTable... Sure this does not really matter, I doubt this access be the bottleneck of any code, and maybe you already hacked primitive 31... Nicolas Eliot Miranda a écrit : > > > On Sat, Nov 1, 2008 at 2:49 PM, Yoshiki Ohshima <[hidden email] > <mailto:[hidden email]>> wrote: > > At Sat, 1 Nov 2008 12:12:21 -0700, > Eliot Miranda wrote: > > > > [1 <multipart/alternative (7bit)>] > > [1.1 <text/plain; ISO-8859-1 (7bit)>] > > > > [1.2 <text/html; ISO-8859-1 (7bit)>] > > Hi All, > > > > SparseLargeTable appears not to be sparse at all. If you > look at SparseLargeTable>> > > initChunkSize:size:arrayClass:base:defaultValue: you'll see tat > each bin is initialized with an instance of the base > > class rather than being filled lazily. Further, instead of > pvtNoCheckAt:put: lazily creating an entry in each bin as > > needed it simply throws away the write if the bin is empty. > Instead the usage pattern is to create fully-populated > > instances and then make them sparse by sending zapDefaultOnlyEntries. > > Correct. As you saw, it is designed for the usage pattern of > creating a read-only table once. > > > It seems to me that > SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: > should leave bins empty > > until pvtNoCheckAt:put: puts other than the default value. Since > I want to create a sparse table with 2^32 entries the > > existing approach won't work. > > Ah ha. > > > I wondered whether anyone else had already fixed this or whether > there's a good reason not to lazily initialize. > > I'm not aware of anybody who fixed. Well, if you need to hold 2^32 > entries, you should do whatever it takes and doing the lazy > initialization seems to be good... > > > Here's what I came up with. Note the use of \\ in noCheckAt:[put:] > which avoids an extra arithmetic op (a multiplication). You might > consider replacing the relevant parts of SparseLargeTable with this as I > don't see the utility of filling the bins eagerly. > > > > > > -- Yoshiki > > > > ------------------------------------------------------------------------ > > |
There was a bug in the code I posted. I forgot to initialize new chunks with the default value. Also, atAllPut: has an obvious and convenient implementation. So the attached...
On Mon, Nov 3, 2008 at 3:12 PM, nicolas cellier <[hidden email]> wrote: Yes, thank you, this is a good idea to restore lazy initialization. SparseLargeArray.st (3K) Download Attachment |
In reply to this post by Nicolas Cellier-3
|
Eliot Miranda a écrit :
> > Sure this does not really matter, I doubt this access be the > bottleneck of any code, and maybe you already hacked primitive 31... > > > No, but if you have, I'm all ears :) > > > ------------------------------------------------------------------------ > > Hehe, I don't even know the man-hour cost of such a hack... The day when identified as a real bottleneck i will have a look. Concerning SparseLargeTable, I can understand the utility of base as: any element of index < base is known to be = defaultValue. This should economize some space by reducing basicSize requirement... But it does not ! Browse #new:chunkSize:arrayClass:base:defaultValue: Also, with base ~= 1, things like #printString become strange: (SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1) printString. -> 'a SparseLargeTable(-1 -1)' Though its size is 3, and: | tmp | tmp := SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 defaultValue: -1. (1 to: 3) collect: [:i | tmp at: i] -> #(-1 -1 -1) Also, in #analyzeSpaceSaving total should be either total := size - base + 1. or: total := size. but not: total := size - base. as you proposed. Also, lastChunkSize is not set to a correct size. It should be: (lastChunkSize := size - base + 1 \\ chunkSize) = 0 ifTrue: [lastChunkSize := chunkSize]. which can be more efficiently written: lastChunkSize := size - base \\ chunkSize + 1. Well, I guess every one use with base = 1... Nicolas |
Hmm, you must also define
arrayClass ^arrayClass otherwise, printString might fail. nicolas cellier a écrit : > Eliot Miranda a écrit : >> >> Sure this does not really matter, I doubt this access be the >> bottleneck of any code, and maybe you already hacked primitive 31... >> >> >> No, but if you have, I'm all ears :) >> >> >> ------------------------------------------------------------------------ >> >> > > Hehe, I don't even know the man-hour cost of such a hack... > The day when identified as a real bottleneck i will have a look. > > Concerning SparseLargeTable, I can understand the utility of base as: > any element of index < base is known to be = defaultValue. > This should economize some space by reducing basicSize requirement... > But it does not ! > Browse #new:chunkSize:arrayClass:base:defaultValue: > > Also, with base ~= 1, things like #printString become strange: > > (SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 > defaultValue: -1) printString. > -> 'a SparseLargeTable(-1 -1)' > > Though its size is 3, and: > | tmp | > tmp := SparseLargeTable new: 3 chunkSize: 3 arrayClass: Array base: 2 > defaultValue: -1. > (1 to: 3) collect: [:i | tmp at: i] > -> #(-1 -1 -1) > > Also, in #analyzeSpaceSaving total should be either > total := size - base + 1. > or: > total := size. > but not: > total := size - base. > as you proposed. > > Also, lastChunkSize is not set to a correct size. > It should be: > (lastChunkSize := size - base + 1 \\ chunkSize) = 0 ifTrue: > [lastChunkSize := chunkSize]. > which can be more efficiently written: > lastChunkSize := size - base \\ chunkSize + 1. > > Well, I guess every one use with base = 1... > > Nicolas > > > |
In reply to this post by Nicolas Cellier-3
Nicholas, thanks for this checking! See below... On Mon, Nov 3, 2008 at 4:44 PM, nicolas cellier <[hidden email]> wrote:
Eliot Miranda a écrit : The old printOn: was wrong. It should read printElementsOn: aStream
aStream nextPut: $(. self isEmpty ifFalse: [base to: size + base - 2 do:
[:index | aStream print: (self at: index); space]. aStream print: (self at: size + base - 1)].
aStream nextPut: $)
Yes. Also, lastChunkSize is not set to a correct size. No that's not right. base determines the first index, so that e.g. SparseLargeArray new: (2 raisedToInteger: 32)
chunkSize: 32 * 1024 arrayClass: ByteArray base: 0
defaultValue: 0provides a bye-addressible array of 2^32 elements with indices 0 through (2 raisedTo: 32) - 1.
I think my code is correct. e.g. { ((SparseLargeTable new: 15 chunkSize: 8 arrayClass: Array base: 2 defaultValue: -1) basicAt: 2) size. ((SparseLargeTable new: 15 chunkSize: 8 arrayClass: Array base: 3 defaultValue: -1) basicAt: 2) size.
((SparseLargeTable new: 16 chunkSize: 8 arrayClass: Array base: 2 defaultValue: -1) basicAt: 2) size. ((SparseLargeTable new: 16 chunkSize: 8 arrayClass: Array base: 3 defaultValue: -1) basicAt: 2) size
} #(7 7 8 8) I've attached the fixed code. Multilingual-Collection.st (14K) Download Attachment |
Eliot Miranda a écrit :
> > Also, lastChunkSize is not set to a correct size. > It should be: > (lastChunkSize := size - base + 1 \\ chunkSize) = 0 ifTrue: > [lastChunkSize := chunkSize]. > which can be more efficiently written: > lastChunkSize := size - base \\ chunkSize + 1. > > Well, I guess every one use with base = 1... > > > No that's not right. base determines the first index, so that e.g. > SparseLargeArray > new: (2 raisedToInteger: 32) > chunkSize: 32 * 1024 > arrayClass: ByteArray > base: 0 > defaultValue: 0 > > provides a bye-addressible array of 2^32 elements with indices 0 through > (2 raisedTo: 32) - 1. > That's interesting because that's exactly how I understood 'base' first. But that's not how it did work before your patches. SparseLargeTable inherits from ArrayedCollection for which the whole protocol require indices between 1 and self size... And that is effectively how SparseLargeTable at: and at:put: did handle the indices... (mySparseTable at: 0) did not work. A logical explanation was that base was here for saving bytes, but that was only a deduction, i did not discuss that with Yoshiki. That's also why the (self basicAt: anIndex) were protected in #noChekAt: the first elements (before base) and also the last ones (see how #zapDefaultOnlyEntries use #privateSize: ) were potentially out of self "basicBounds". Now, the modifications you made are appealing, but beware not using one of the 1-based inherited message on self. That's rather not bug-proof. Nicolas |
On Wed, Nov 5, 2008 at 11:41 AM, nicolas cellier <[hidden email]> wrote: Eliot Miranda a écrit : Can you say which methods require this and why? And that is effectively how SparseLargeTable at: and at:put: did handle the indices... (mySparseTable at: 0) did not work. I'd like to plug those holes. I absolutely need 0-based sparse arrays :)
|
In reply to this post by Eliot Miranda-2
Eliot
if /when you have something, I would be more than happy to harvest the fixes. stef On Nov 1, 2008, at 8:12 PM, Eliot Miranda wrote: > Hi All, > > SparseLargeTable appears not to be sparse at all. If you look > at > SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: > you'll see tat each bin is initialized with an instance of the base > class rather than being filled lazily. Further, instead of > pvtNoCheckAt:put: lazily creating an entry in each bin as needed it > simply throws away the write if the bin is empty. Instead the usage > pattern is to create fully-populated instances and then make them > sparse by sending zapDefaultOnlyEntries. > > It seems to me that > SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: > should leave bins empty until pvtNoCheckAt:put: puts other than the > default value. Since I want to create a sparse table with 2^32 > entries the existing approach won't work. > > I wondered whether anyone else had already fixed this or whether > there's a good reason not to lazily initialize. > > Best > Eliot > |
Hi Stef,
I already experimented Eliot's SparseTable changes in Pharo, but I'm waiting for stabilization before publishing. stephane ducasse a écrit : > Eliot > > if /when you have something, I would be more than happy to harvest the > fixes. > > stef > > On Nov 1, 2008, at 8:12 PM, Eliot Miranda wrote: > >> Hi All, >> >> SparseLargeTable appears not to be sparse at all. If you look at >> SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: >> you'll see tat each bin is initialized with an instance of the base >> class rather than being filled lazily. Further, instead of >> pvtNoCheckAt:put: lazily creating an entry in each bin as needed it >> simply throws away the write if the bin is empty. Instead the usage >> pattern is to create fully-populated instances and then make them >> sparse by sending zapDefaultOnlyEntries. >> >> It seems to me that >> SparseLargeTable>>initChunkSize:size:arrayClass:base:defaultValue: >> should leave bins empty until pvtNoCheckAt:put: puts other than the >> default value. Since I want to create a sparse table with 2^32 >> entries the existing approach won't work. >> >> I wondered whether anyone else had already fixed this or whether >> there's a good reason not to lazily initialize. >> >> Best >> Eliot >> > > > |
In reply to this post by Eliot Miranda-2
Eliot Miranda a écrit :
> > Can you say which methods require this and why? Well there are just too many forms of 1 to: self size do: [:i | (self at:i ) doSomething] to fit in this list. Of course, I doubt you want to iterate over 2**32 elements, so maybe this is a false probem. Though you can legitimately attempt a: ^aSparseTable findFirst: [:element | element > 0]. > > I'd like to plug those holes. I absolutely need 0-based sparse arrays :) > Maybe you won't be trapped by any hole for your private usage... That's different if the class is supposed to be generic. But since an anythingBut-1-based sequenceable collection would not fit in existing code base without pain, that's probably a false problem too. I try to imagine how to: It would be safer to hook the class in a new hierarchy... Beware, even SequenceableCollection is crippled with 1 to: self size do: Only Collection implements all loops with do: Personnally, I would like this do: self error: 'you are trying to iterate over a huge collection'. and implement some nonDefaultDo: aBlock | chunk index | 1 to: self basicSize do: [:i | (chunk := self basicAt: i) isNil ifFalse: [ chunk do: [:element | element = defaultValue ifFalse: [ aBlock value: element]]]] Then I will trap myself with 0-based indices... For example, i would write inner loop of nonDefaultWithIndexDo: 1 to: chunk size do: [:j | (chunk at: j) = defaultValue... or chunk withIndexDo: [:element :j | element = defaultValue ifFalse: [ aBlock value: i - 1 * chunkSize + j + base - 1 value: element But wait, who told me chunk was 1-based? I could for example populate a SparseLargeTable with 0-based SparseLargeTable as arrayClass just for making my own 2**64 collection (i can't help challenging your 2**32)... So I would need a (chunk withRankDo: ) or just (chunk atRank: ), and to cover my sparse case, probably generalize a nonDefaultWithRankDo:... ... or #nonDefaultWithOffsetDo: just to remove the - 1. Yes, 0-based pain startingAt: second message (messageList atOffset: 1). Nicolas |
In reply to this post by Eliot Miranda-2
Eliot Miranda a écrit :
> > > > I'd like to plug those holes. I absolutely need 0-based sparse arrays :) > > pvtCheckIndex: index ... index - base > size ifTrue: [self errorSubscriptBounds: index]. If I take index = size + 1 and base = 1, I won't get an error. Should be >= |
In reply to this post by Nicolas Cellier-3
At Tue, 04 Nov 2008 01:44:39 +0100,
nicolas cellier wrote: > > Well, I guess every one use with base = 1... The "base" was of course set differently if you browse the users. And if you browse the users, you can simply tell that it is exclusively used for the multilingualization part of the system, and I didn't really thought about printing a big table... -- Yoshiki |
Yoshiki Ohshima a écrit :
> At Tue, 04 Nov 2008 01:44:39 +0100, > nicolas cellier wrote: >> Well, I guess every one use with base = 1... > > The "base" was of course set differently if you browse the users. > And if you browse the users, you can simply tell that it is > exclusively used for the multilingualization part of the system, and I > didn't really thought about printing a big table... > > -- Yoshiki > > Ah, the two instances in my 3.10.2 occidental image are 1-based... SparseLargeTable allInstances collect: [:e | e base]. -> #(1 1) The problem is more with accessing than printing. I'm curious, how do you use base ~= 1 ? Anyway, maybe it won't do what you think... because I have other griefs with creation messages ignoring their arguments :) new: size chunkSize: chunkSize arrayClass: aClass base: b ^self new: size chunkSize: chunkSize arrayClass: Array base: 1 defaultValue: nil. Nicolas |
At Fri, 07 Nov 2008 21:08:22 +0100,
nicolas cellier wrote: > > Yoshiki Ohshima a écrit : > > At Tue, 04 Nov 2008 01:44:39 +0100, > > nicolas cellier wrote: > >> Well, I guess every one use with base = 1... > > > > The "base" was of course set differently if you browse the users. > > And if you browse the users, you can simply tell that it is > > exclusively used for the multilingualization part of the system, and I > > didn't really thought about printing a big table... > > > > -- Yoshiki > > > > > > Ah, the two instances in my 3.10.2 occidental image are 1-based... > SparseLargeTable allInstances collect: [:e | e base]. > -> #(1 1) Your image probably doesn't contain any of the CJK-ish langauges, I would guess. > The problem is more with accessing than printing. > I'm curious, how do you use base ~= 1 ? Take a look for the references in the OLPC Etoys image. > Anyway, maybe it won't do what you think... > because I have other griefs with creation messages ignoring their > arguments :) > > new: size chunkSize: chunkSize arrayClass: aClass base: b > > ^self new: size chunkSize: chunkSize arrayClass: Array base: 1 > defaultValue: nil. That is funny. The use I know was always with the "fully qualified" version in the image. (Thank you for spotting!) -- Yoshiki |
Free forum by Nabble | Edit this page |