SparseLargeTable instances and Unicode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

SparseLargeTable instances and Unicode

Mariano Martinez Peck
Hi folks. I wasn't aware of the class SparseLargeTable but it seems it has two important instances "SparseLargeTable allInstances size -> 2"

They are the class side variables #DecimalProperty and #GeneralCategory in Unicode.

Actually,

(Unicode classPool at: 'GeneralCategory') size  -> 917632

(Unicode classPool at: 'DecimalProperty') size -> 917632

So...collections of almost 1 million elements.

Now a couple of questions:

1) Is this normal/expected?   In my calculus both arrays are like 3mb of memory

2) If I don't use UTF and I send an encoding as parameter in the VM (different from UTF), do I need this class side variables?


Thanks and forgive my ignorance in this field...I just get the alarm while doing some memory analysis for a paper.

Mariano

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Lukas Renggli
> 1) Is this normal/expected?   In my calculus both arrays are like 3mb of
> memory

Yes, these are lookup tables for character classification like
#isLetter and #isLowercase.

Now what bothers me more is that the character classification in Pharo
is dead slow. Other Smalltalk implementations have much more space and
time efficient implementations.

> 2) If I don't use UTF and I send an encoding as parameter in the VM
> (different from UTF), do I need this class side variables?

AFAIK the encoding just defines how characters are passed into the VM.
This does not affect how characters are represented inside.

Lukas

--
Lukas Renggli
www.lukas-renggli.ch

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Henrik Sperre Johansen
In reply to this post by Mariano Martinez Peck
  On 21.09.2010 19:10, Mariano Martinez Peck wrote:

> Hi folks. I wasn't aware of the class SparseLargeTable but it seems it
> has two important instances "SparseLargeTable allInstances size -> 2"
>
> They are the class side variables #DecimalProperty and
> #GeneralCategory in Unicode.
>
> Actually,
>
> (Unicode classPool at: 'GeneralCategory') size  -> 917632
>
> (Unicode classPool at: 'DecimalProperty') size -> 917632
>
> So...collections of almost 1 million elements.
>
> Now a couple of questions:
>
> 1) Is this normal/expected?   In my calculus both arrays are like 3mb
> of memory
Look at the implementation, there's a reason they're called sparse.
If my math is correct (crosses fingers), DecimalProperty is ~15KB,
GeneralCategory ~70KB

>
> 2) If I don't use UTF and I send an encoding as parameter in the VM
> (different from UTF), do I need this class side variables?
What Lukas said.

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Mariano Martinez Peck


On Tue, Sep 21, 2010 at 7:55 PM, Henrik Sperre Johansen <[hidden email]> wrote:
 On 21.09.2010 19:10, Mariano Martinez Peck wrote:
Hi folks. I wasn't aware of the class SparseLargeTable but it seems it has two important instances "SparseLargeTable allInstances size -> 2"

They are the class side variables #DecimalProperty and #GeneralCategory in Unicode.

Actually,

(Unicode classPool at: 'GeneralCategory') size  -> 917632

(Unicode classPool at: 'DecimalProperty') size -> 917632

So...collections of almost 1 million elements.

Now a couple of questions:

1) Is this normal/expected?   In my calculus both arrays are like 3mb of memory
Look at the implementation, there's a reason they're called sparse.
If my math is correct (crosses fingers), DecimalProperty is ~15KB, GeneralCategory ~70KB

So...there must be a problem with my code, I was doing something like SpaceTally does:


spaceForUsedInstancesOf: aClass withInstanceCount: instCount
    "Answer the number of bytes consumed by all instances of the given class, including their object headers."

    | isCompact instVarBytes bytesPerElement headerBytes total |
    instCount = 0 ifTrue: [^ 0].
    isCompact := aClass indexIfCompact > 0.
    instVarBytes := aClass instSize * 4.
    aClass isVariable
        ifTrue: [
            bytesPerElement := aClass isBytes ifTrue: [1] ifFalse: [4].
            total := 0.
            aClass allInstancesDo: [:inst | | contentBytes |
                contentBytes := instVarBytes + (inst size * bytesPerElement).
                headerBytes :=
                    contentBytes > 255
                        ifTrue: [12]
                        ifFalse: [isCompact ifTrue: [4] ifFalse: [8]].
                    (UnusedObjectDiscoverer current primitiveGetUsedBit: inst) ifTrue:
                [total := total + headerBytes + contentBytes]].
            ^ total]
        ifFalse: [
            headerBytes :=
                instVarBytes > 255
                    ifTrue: [12]
                    ifFalse: [isCompact ifTrue: [4] ifFalse: [8]].
            ^ instCount * (headerBytes + instVarBytes)].



SparseLargeTable instanceCount ->>> 2

SpaceTally new spaceForUsedInstancesOf: SparseLargeTable withInstanceCount: 2   ->>>  7341112

thanks




2) If I don't use UTF and I send an encoding as parameter in the VM (different from UTF), do I need this class side variables?
What Lukas said.

Cheers,
Henry


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Henrik Sperre Johansen
On 21.09.2010 20:07, Mariano Martinez Peck wrote:


On Tue, Sep 21, 2010 at 7:55 PM, Henrik Sperre Johansen <[hidden email]> wrote:
 On 21.09.2010 19:10, Mariano Martinez Peck wrote:
Hi folks. I wasn't aware of the class SparseLargeTable but it seems it has two important instances "SparseLargeTable allInstances size -> 2"

They are the class side variables #DecimalProperty and #GeneralCategory in Unicode.

Actually,

(Unicode classPool at: 'GeneralCategory') size  -> 917632

(Unicode classPool at: 'DecimalProperty') size -> 917632

So...collections of almost 1 million elements.

Now a couple of questions:

1) Is this normal/expected?   In my calculus both arrays are like 3mb of memory
Look at the implementation, there's a reason they're called sparse.
If my math is correct (crosses fingers), DecimalProperty is ~15KB, GeneralCategory ~70KB

So...there must be a problem with my code, I was doing something like SpaceTally does:


spaceForUsedInstancesOf: aClass withInstanceCount: instCount
    "Answer the number of bytes consumed by all instances of the given class, including their object headers."

    | isCompact instVarBytes bytesPerElement headerBytes total |
    instCount = 0 ifTrue: [^ 0].
    isCompact := aClass indexIfCompact > 0.
    instVarBytes := aClass instSize * 4.
    aClass isVariable
        ifTrue: [
            bytesPerElement := aClass isBytes ifTrue: [1] ifFalse: [4].
            total := 0.
            aClass allInstancesDo: [:inst | | contentBytes |
                contentBytes := instVarBytes + (inst size * bytesPerElement).
There you go. Use basicSize, size is overridden by f.ex. HashedCollections and LargeSparseTables.

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Adrian Lienhard
Here's the code I used to confirm Henrik's math:

table := Unicode classPool at: 'GeneralCategory'.
count := table basicSize.
1 to: table basicSize do: [ :i |
        (table basicAt: i) isNil ifFalse: [
                count := count + (table basicAt: i) basicSize ] ].
(count / 1024) asFloat

-> 66.8759765625

+ the headers of each subtable.

Cheers,
Adrian

On Sep 21, 2010, at 20:16 , Henrik Sperre Johansen wrote:

> On 21.09.2010 20:07, Mariano Martinez Peck wrote:
>>
>>
>> On Tue, Sep 21, 2010 at 7:55 PM, Henrik Sperre Johansen <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>     On 21.09.2010 19:10, Mariano Martinez Peck wrote:
>>
>>        Hi folks. I wasn't aware of the class SparseLargeTable but it
>>        seems it has two important instances "SparseLargeTable
>>        allInstances size -> 2"
>>
>>        They are the class side variables #DecimalProperty and
>>        #GeneralCategory in Unicode.
>>
>>        Actually,
>>
>>        (Unicode classPool at: 'GeneralCategory') size  -> 917632
>>
>>        (Unicode classPool at: 'DecimalProperty') size -> 917632
>>
>>        So...collections of almost 1 million elements.
>>
>>        Now a couple of questions:
>>
>>        1) Is this normal/expected?   In my calculus both arrays are
>>        like 3mb of memory
>>
>>    Look at the implementation, there's a reason they're called sparse.
>>    If my math is correct (crosses fingers), DecimalProperty is ~15KB,
>>    GeneralCategory ~70KB
>>
>>
>> So...there must be a problem with my code, I was doing something like SpaceTally does:
>>
>>
>> spaceForUsedInstancesOf: aClass withInstanceCount: instCount
>>    "Answer the number of bytes consumed by all instances of the given class, including their object headers."
>>
>>    | isCompact instVarBytes bytesPerElement headerBytes total |
>>    instCount = 0 ifTrue: [^ 0].
>>    isCompact := aClass indexIfCompact > 0.
>>    instVarBytes := aClass instSize * 4.
>>    aClass isVariable
>>        ifTrue: [
>>            bytesPerElement := aClass isBytes ifTrue: [1] ifFalse: [4].
>>            total := 0.
>>            aClass allInstancesDo: [:inst | | contentBytes |
>>                contentBytes := instVarBytes + (inst *size* * bytesPerElement).
> There you go. Use basicSize, size is overridden by f.ex. HashedCollections and LargeSparseTables.
>
> Cheers,
> Henry
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Mariano Martinez Peck
In reply to this post by Henrik Sperre Johansen


2010/9/21 Henrik Sperre Johansen <[hidden email]>
On 21.09.2010 20:07, Mariano Martinez Peck wrote:


On Tue, Sep 21, 2010 at 7:55 PM, Henrik Sperre Johansen <[hidden email]> wrote:
 On 21.09.2010 19:10, Mariano Martinez Peck wrote:
Hi folks. I wasn't aware of the class SparseLargeTable but it seems it has two important instances "SparseLargeTable allInstances size -> 2"

They are the class side variables #DecimalProperty and #GeneralCategory in Unicode.

Actually,

(Unicode classPool at: 'GeneralCategory') size  -> 917632

(Unicode classPool at: 'DecimalProperty') size -> 917632

So...collections of almost 1 million elements.

Now a couple of questions:

1) Is this normal/expected?   In my calculus both arrays are like 3mb of memory
Look at the implementation, there's a reason they're called sparse.
If my math is correct (crosses fingers), DecimalProperty is ~15KB, GeneralCategory ~70KB

So...there must be a problem with my code, I was doing something like SpaceTally does:


spaceForUsedInstancesOf: aClass withInstanceCount: instCount
    "Answer the number of bytes consumed by all instances of the given class, including their object headers."

    | isCompact instVarBytes bytesPerElement headerBytes total |
    instCount = 0 ifTrue: [^ 0].
    isCompact := aClass indexIfCompact > 0.
    instVarBytes := aClass instSize * 4.
    aClass isVariable
        ifTrue: [
            bytesPerElement := aClass isBytes ifTrue: [1] ifFalse: [4].
            total := 0.
            aClass allInstancesDo: [:inst | | contentBytes |
                contentBytes := instVarBytes + (inst size * bytesPerElement).
There you go. Use basicSize, size is overridden by f.ex. HashedCollections and LargeSparseTables.


:)  Thanks Henry. Now  SpaceTally new spaceForInstancesOf: SparseLargeTable withInstanceCount: 2  ->> 18748

So....I guess I should commit the fix for SpaceTally >> spaceForInstancesOf: aClass withInstanceCount: instCount

shouldn't I ?


In addition, I modified SpaceTally >> printSpaceAnalysis: threshold on: aStream

from this:

printSpaceAnalysis: threshold on: aStream
    "SpaceTally new printSpaceAnalysis: 1 on:(FileStream forceNewFileNamed: 'STspace.text')"

    "sd-This method should be rewrote to be more coherent within the rest of the class
    ie using preAllocate and spaceForInstanceOf:"

    "If threshold > 0, then only those classes with more than that number
    of instances will be shown, and they will be sorted by total instance space.
    If threshold = 0, then all classes will appear, sorted by name."

    | totalCodeSpace totalInstCount totalInstSpace n totalPercent |
    Smalltalk garbageCollect.
    totalCodeSpace := totalInstCount := totalInstSpace := n := 0.
    results := OrderedCollection new: Smalltalk classNames size.
    'Taking statistics...'
        displayProgressAt: Sensor cursorPoint
        from: 0 to: Smalltalk classNames size
        during: [:bar |
        (Smalltalk globals allClasses) do:
            [:cl | | codeSpace instCount instSpace eltSize | codeSpace := cl spaceUsed.
            bar value: (n := n+1).
            Smalltalk garbageCollectMost.
            instCount := cl instanceCount.
            instSpace := (cl indexIfCompact > 0 ifTrue: [4] ifFalse: [8])*instCount. "Object headers"
            cl isVariable
                ifTrue: [eltSize := cl isBytes ifTrue: [1] ifFalse: [4].
                        cl allInstancesDo: [:x | instSpace := instSpace + (x basicSize*eltSize)]]
            ifFalse: [instSpace := instSpace + (cl instSize*instCount*4)].
            results add: (SpaceTallyItem analyzedClassName: cl name codeSize: codeSpace instanceCount:  instCount spaceForInstances: instSpace).
            totalCodeSpace := totalCodeSpace + codeSpace.
            totalInstCount := totalInstCount + instCount.
            totalInstSpace := totalInstSpace + instSpace]].
        totalPercent := 0.0.

    "aStream timeStamp."
    aStream
        nextPutAll: ('Class' padded: #right to: 30 with: $ );
        nextPutAll: ('code space' padded: #left to: 12 with: $ );
        nextPutAll: ('# instances' padded: #left to: 12 with: $ );
        nextPutAll: ('inst space' padded: #left to: 12 with: $ );
        nextPutAll: ('percent' padded: #left to: 8 with: $ ); cr.

    threshold > 0 ifTrue: [
        "If inst count threshold > 0, then sort by space"
        results := (results select: [:s | s instanceCount >= threshold or: [s spaceForInstances > (totalInstSpace // 500)]])
            asSortedCollection: [:s :s2 | s spaceForInstances > s2 spaceForInstances]].

    results do: [:s | | percent |
        aStream
            nextPutAll: (s analyzedClassName padded: #right to: 30 with: $ );
            nextPutAll: (s codeSize printString padded: #left to: 12 with: $ );
            nextPutAll: (s instanceCount printString padded: #left to: 12 with: $ );
            nextPutAll: (s spaceForInstances printString padded: #left to: 14 with: $ ).
        percent := s spaceForInstances*100.0/totalInstSpace roundTo: 0.1.
        totalPercent := totalPercent + percent.
        percent >= 0.1 ifTrue: [
            aStream nextPutAll: (percent printString padded: #left to: 8 with: $ )].
        aStream cr].

    aStream
        cr; nextPutAll: ('Total' padded: #right to: 30 with: $ );
        nextPutAll: (totalCodeSpace printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstCount printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstSpace printString padded: #left to: 14 with: $ );
        nextPutAll: ((totalPercent roundTo: 0.1) printString padded: #left to: 8 with: $ ).

 

To this:



printSpaceAnalysis: threshold on: aStream
    "SpaceTally new printSpaceAnalysis: 1 on:(FileStream forceNewFileNamed: 'STspace.text')"

    "sd-This method should be rewrote to be more coherent within the rest of the class
    ie using preAllocate and spaceForInstanceOf:"

    "If threshold > 0, then only those classes with more than that number
    of instances will be shown, and they will be sorted by total instance space.
    If threshold = 0, then all classes will appear, sorted by name."

    | totalCodeSpace totalInstCount totalInstSpace n totalPercent |
    Smalltalk garbageCollect.
    totalCodeSpace := totalInstCount := totalInstSpace := n := 0.
    results := OrderedCollection new: Smalltalk classNames size.
    'Taking statistics...'
        displayProgressAt: Sensor cursorPoint
        from: 0 to: Smalltalk classNames size
        during: [:bar |
        (Smalltalk globals allClasses) do:
            [:cl | | codeSpace instCount instSpace eltSize | codeSpace := cl spaceUsed.
            bar value: (n := n+1).
            Smalltalk garbageCollectMost.
            instCount := cl instanceCount.
            instSpace := self spaceForInstancesOf: cl withInstanceCount: instCount.
            results add: (SpaceTallyItem analyzedClassName: cl name codeSize: codeSpace instanceCount:  instCount spaceForInstances: instSpace).
            totalCodeSpace := totalCodeSpace + codeSpace.
            totalInstCount := totalInstCount + instCount.
            totalInstSpace := totalInstSpace + instSpace]].
        totalPercent := 0.0.

    "aStream timeStamp."
    aStream
        nextPutAll: ('Class' padded: #right to: 30 with: $ );
        nextPutAll: ('code space' padded: #left to: 12 with: $ );
        nextPutAll: ('# instances' padded: #left to: 12 with: $ );
        nextPutAll: ('inst space' padded: #left to: 12 with: $ );
        nextPutAll: ('percent' padded: #left to: 8 with: $ ); cr.

    threshold > 0 ifTrue: [
        "If inst count threshold > 0, then sort by space"
        results := (results select: [:s | s instanceCount >= threshold or: [s spaceForInstances > (totalInstSpace // 500)]])
            asSortedCollection: [:s :s2 | s spaceForInstances > s2 spaceForInstances]].

    results do: [:s | | percent |
        aStream
            nextPutAll: (s analyzedClassName padded: #right to: 30 with: $ );
            nextPutAll: (s codeSize printString padded: #left to: 12 with: $ );
            nextPutAll: (s instanceCount printString padded: #left to: 12 with: $ );
            nextPutAll: (s spaceForInstances printString padded: #left to: 14 with: $ ).
        percent := s spaceForInstances*100.0/totalInstSpace roundTo: 0.1.
        totalPercent := totalPercent + percent.
        percent >= 0.1 ifTrue: [
            aStream nextPutAll: (percent printString padded: #left to: 8 with: $ )].
        aStream cr].

    aStream
        cr; nextPutAll: ('Total' padded: #right to: 30 with: $ );
        nextPutAll: (totalCodeSpace printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstCount printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstSpace printString padded: #left to: 14 with: $ );
        nextPutAll: ((totalPercent roundTo: 0.1) printString padded: #left to: 8 with: $ ).



So....you agree with both changes?

Thanks!

Mariano



Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Henrik Sperre Johansen
In reply to this post by Adrian Lienhard
  On 21.09.2010 20:22, Adrian Lienhard wrote:

> Here's the code I used to confirm Henrik's math:
>
> table := Unicode classPool at: 'GeneralCategory'.
> count := table basicSize.
> 1 to: table basicSize do: [ :i |
> (table basicAt: i) isNil ifFalse: [
> count := count + (table basicAt: i) basicSize ] ].
> (count / 1024) asFloat
>
> ->  66.8759765625
>
> + the headers of each subtable.
>
> Cheers,
> Adrian
Yes, except initially doing
count := table basicSize*4 (4 bytes for a slot when class isBytes not)
that's basically what I did. ~ due to not counting headers :)

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Henrik Sperre Johansen
In reply to this post by Mariano Martinez Peck
On 21.09.2010 20:28, Mariano Martinez Peck wrote:


:)  Thanks Henry. Now  SpaceTally new spaceForInstancesOf: SparseLargeTable withInstanceCount: 2  ->> 18748

So....I guess I should commit the fix for SpaceTally >> spaceForInstancesOf: aClass withInstanceCount: instCount

shouldn't I ?


In addition, I modified SpaceTally >> printSpaceAnalysis: threshold on: aStream

from this:

printSpaceAnalysis: threshold on: aStream
    "SpaceTally new printSpaceAnalysis: 1 on:(FileStream forceNewFileNamed: 'STspace.text')"

    "sd-This method should be rewrote to be more coherent within the rest of the class
    ie using preAllocate and spaceForInstanceOf:"

    "If threshold > 0, then only those classes with more than that number
    of instances will be shown, and they will be sorted by total instance space.
    If threshold = 0, then all classes will appear, sorted by name."

    | totalCodeSpace totalInstCount totalInstSpace n totalPercent |
    Smalltalk garbageCollect.
    totalCodeSpace := totalInstCount := totalInstSpace := n := 0.
    results := OrderedCollection new: Smalltalk classNames size.
    'Taking statistics...'
        displayProgressAt: Sensor cursorPoint
        from: 0 to: Smalltalk classNames size
        during: [:bar |
        (Smalltalk globals allClasses) do:
            [:cl | | codeSpace instCount instSpace eltSize | codeSpace := cl spaceUsed.
            bar value: (n := n+1).
            Smalltalk garbageCollectMost.
            instCount := cl instanceCount.
            instSpace := (cl indexIfCompact > 0 ifTrue: [4] ifFalse: [8])*instCount. "Object headers"
            cl isVariable
                ifTrue: [eltSize := cl isBytes ifTrue: [1] ifFalse: [4].
                        cl allInstancesDo: [:x | instSpace := instSpace + (x basicSize*eltSize)]]
            ifFalse: [instSpace := instSpace + (cl instSize*instCount*4)].
            results add: (SpaceTallyItem analyzedClassName: cl name codeSize: codeSpace instanceCount:  instCount spaceForInstances: instSpace).
            totalCodeSpace := totalCodeSpace + codeSpace.
            totalInstCount := totalInstCount + instCount.
            totalInstSpace := totalInstSpace + instSpace]].
        totalPercent := 0.0.

    "aStream timeStamp."
    aStream
        nextPutAll: ('Class' padded: #right to: 30 with: $ );
        nextPutAll: ('code space' padded: #left to: 12 with: $ );
        nextPutAll: ('# instances' padded: #left to: 12 with: $ );
        nextPutAll: ('inst space' padded: #left to: 12 with: $ );
        nextPutAll: ('percent' padded: #left to: 8 with: $ ); cr.

    threshold > 0 ifTrue: [
        "If inst count threshold > 0, then sort by space"
        results := (results select: [:s | s instanceCount >= threshold or: [s spaceForInstances > (totalInstSpace // 500)]])
            asSortedCollection: [:s :s2 | s spaceForInstances > s2 spaceForInstances]].

    results do: [:s | | percent |
        aStream
            nextPutAll: (s analyzedClassName padded: #right to: 30 with: $ );
            nextPutAll: (s codeSize printString padded: #left to: 12 with: $ );
            nextPutAll: (s instanceCount printString padded: #left to: 12 with: $ );
            nextPutAll: (s spaceForInstances printString padded: #left to: 14 with: $ ).
        percent := s spaceForInstances*100.0/totalInstSpace roundTo: 0.1.
        totalPercent := totalPercent + percent.
        percent >= 0.1 ifTrue: [
            aStream nextPutAll: (percent printString padded: #left to: 8 with: $ )].
        aStream cr].

    aStream
        cr; nextPutAll: ('Total' padded: #right to: 30 with: $ );
        nextPutAll: (totalCodeSpace printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstCount printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstSpace printString padded: #left to: 14 with: $ );
        nextPutAll: ((totalPercent roundTo: 0.1) printString padded: #left to: 8 with: $ ).

 

To this:



printSpaceAnalysis: threshold on: aStream
    "SpaceTally new printSpaceAnalysis: 1 on:(FileStream forceNewFileNamed: 'STspace.text')"

    "sd-This method should be rewrote to be more coherent within the rest of the class
    ie using preAllocate and spaceForInstanceOf:"

    "If threshold > 0, then only those classes with more than that number
    of instances will be shown, and they will be sorted by total instance space.
    If threshold = 0, then all classes will appear, sorted by name."

    | totalCodeSpace totalInstCount totalInstSpace n totalPercent |
    Smalltalk garbageCollect.
    totalCodeSpace := totalInstCount := totalInstSpace := n := 0.
    results := OrderedCollection new: Smalltalk classNames size.
    'Taking statistics...'
        displayProgressAt: Sensor cursorPoint
        from: 0 to: Smalltalk classNames size
        during: [:bar |
        (Smalltalk globals allClasses) do:
            [:cl | | codeSpace instCount instSpace eltSize | codeSpace := cl spaceUsed.
            bar value: (n := n+1).
            Smalltalk garbageCollectMost.
            instCount := cl instanceCount.
            instSpace := self spaceForInstancesOf: cl withInstanceCount: instCount.
            results add: (SpaceTallyItem analyzedClassName: cl name codeSize: codeSpace instanceCount:  instCount spaceForInstances: instSpace).
            totalCodeSpace := totalCodeSpace + codeSpace.
            totalInstCount := totalInstCount + instCount.
            totalInstSpace := totalInstSpace + instSpace]].
        totalPercent := 0.0.

    "aStream timeStamp."
    aStream
        nextPutAll: ('Class' padded: #right to: 30 with: $ );
        nextPutAll: ('code space' padded: #left to: 12 with: $ );
        nextPutAll: ('# instances' padded: #left to: 12 with: $ );
        nextPutAll: ('inst space' padded: #left to: 12 with: $ );
        nextPutAll: ('percent' padded: #left to: 8 with: $ ); cr.

    threshold > 0 ifTrue: [
        "If inst count threshold > 0, then sort by space"
        results := (results select: [:s | s instanceCount >= threshold or: [s spaceForInstances > (totalInstSpace // 500)]])
            asSortedCollection: [:s :s2 | s spaceForInstances > s2 spaceForInstances]].

    results do: [:s | | percent |
        aStream
            nextPutAll: (s analyzedClassName padded: #right to: 30 with: $ );
            nextPutAll: (s codeSize printString padded: #left to: 12 with: $ );
            nextPutAll: (s instanceCount printString padded: #left to: 12 with: $ );
            nextPutAll: (s spaceForInstances printString padded: #left to: 14 with: $ ).
        percent := s spaceForInstances*100.0/totalInstSpace roundTo: 0.1.
        totalPercent := totalPercent + percent.
        percent >= 0.1 ifTrue: [
            aStream nextPutAll: (percent printString padded: #left to: 8 with: $ )].
        aStream cr].

    aStream
        cr; nextPutAll: ('Total' padded: #right to: 30 with: $ );
        nextPutAll: (totalCodeSpace printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstCount printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstSpace printString padded: #left to: 14 with: $ );
        nextPutAll: ((totalPercent roundTo: 0.1) printString padded: #left to: 8 with: $ ).



So....you agree with both changes?

Thanks!

Mariano



Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Sound good to me!
Personally I'd also like some more refactoring, like rewriting spaceForInstancesOf: withInstanceCount into
#spaceForInstancesOfClass: aClass,  delegating to:
#spaceFor: amount instancesOfClass: aFixedClass (private category?)
#spaceForInstancesOfVariableClass: aVariableClass (private category?)

That way you also move ugly instance counting out of the loop.

Also, while you're at it, what about factoring out
contentBytes > 255
                        ifTrue: [12]
                        ifFalse: [isCompact ifTrue: [4] ifFalse: [8].
into, say, #headerBytesOfClass: aClass ?

And really, wtf?
An instance of a compact, variable class with more than 255 bytes in it will have a 12-byte header, while those with less will have a 4-byte header?
Plus, setting a class with more than 63 instance variables will never have an effect, its instances will always have 12-byte headers?

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Mariano Martinez Peck
Sound good to me!
Personally I'd also like some more refactoring, like rewriting spaceForInstancesOf: withInstanceCount into
#spaceForInstancesOfClass: aClass,  delegating to:
#spaceFor: amount instancesOfClass: aFixedClass (private category?)
#spaceForInstancesOfVariableClass: aVariableClass (private category?)


I was thinking more or less the same. Even more, if we integrate Object >> sizeInMemory

why not adding also:

Class >> spaceForInstances
    | totalSize |
    totalSize := 0.
    self allInstancesDo: [ :inst |
        totalSize := totalSize + inst sizeInMemory.
    ].
    ^ totalSize
 
?

Then  we can also replace spaceForInstancesOf: withInstanceCount   with spaceForInstances
Example:


printSpaceAnalysis: threshold on: aStream
    "SpaceTally new printSpaceAnalysis: 1 on:(FileStream forceNewFileNamed: 'STspace.text')"

    "sd-This method should be rewrote to be more coherent within the rest of the class
    ie using preAllocate and spaceForInstanceOf:"

    "If threshold > 0, then only those classes with more than that number
    of instances will be shown, and they will be sorted by total instance space.
    If threshold = 0, then all classes will appear, sorted by name."

    | totalCodeSpace totalInstCount totalInstSpace n totalPercent |
    Smalltalk garbageCollect.
    totalCodeSpace := totalInstCount := totalInstSpace := n := 0.
    results := OrderedCollection new: Smalltalk classNames size.
    'Taking statistics...'
        displayProgressAt: Sensor cursorPoint
        from: 0 to: Smalltalk classNames size
        during: [:bar |
        (Smalltalk globals allClasses) do:
            [:cl | | codeSpace instCount instSpace eltSize | codeSpace := cl spaceUsed.
            bar value: (n := n+1).
            Smalltalk garbageCollectMost.
            instCount := cl instanceCount.
            instSpace := cl spaceForInstances.
            results add: (SpaceTallyItem analyzedClassName: cl name codeSize: codeSpace instanceCount:  instCount spaceForInstances: instSpace).
            totalCodeSpace := totalCodeSpace + codeSpace.
            totalInstCount := totalInstCount + instCount.
            totalInstSpace := totalInstSpace + instSpace]].
        totalPercent := 0.0.

    "aStream timeStamp."
    aStream
        nextPutAll: ('Class' padded: #right to: 30 with: $ );
        nextPutAll: ('code space' padded: #left to: 12 with: $ );
        nextPutAll: ('# instances' padded: #left to: 12 with: $ );
        nextPutAll: ('inst space' padded: #left to: 12 with: $ );
        nextPutAll: ('percent' padded: #left to: 8 with: $ ); cr.

    threshold > 0 ifTrue: [
        "If inst count threshold > 0, then sort by space"
        results := (results select: [:s | s instanceCount >= threshold or: [s spaceForInstances > (totalInstSpace // 500)]])
            asSortedCollection: [:s :s2 | s spaceForInstances > s2 spaceForInstances]].

    results do: [:s | | percent |
        aStream
            nextPutAll: (s analyzedClassName padded: #right to: 30 with: $ );
            nextPutAll: (s codeSize printString padded: #left to: 12 with: $ );
            nextPutAll: (s instanceCount printString padded: #left to: 12 with: $ );
            nextPutAll: (s spaceForInstances printString padded: #left to: 14 with: $ ).
        percent := s spaceForInstances*100.0/totalInstSpace roundTo: 0.1.
        totalPercent := totalPercent + percent.
        percent >= 0.1 ifTrue: [
            aStream nextPutAll: (percent printString padded: #left to: 8 with: $ )].
        aStream cr].

    aStream
        cr; nextPutAll: ('Total' padded: #right to: 30 with: $ );
        nextPutAll: (totalCodeSpace printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstCount printString padded: #left to: 12 with: $ );
        nextPutAll: (totalInstSpace printString padded: #left to: 14 with: $ );
        nextPutAll: ((totalPercent roundTo: 0.1) printString padded: #left to: 8 with: $ ).



agree??


thanks



That way you also move ugly instance counting out of the loop.

Also, while you're at it, what about factoring out
contentBytes > 255
                        ifTrue: [12]
                        ifFalse: [isCompact ifTrue: [4] ifFalse: [8].
into, say, #headerBytesOfClass: aClass ?

And really, wtf?
An instance of a compact, variable class with more than 255 bytes in it will have a 12-byte header, while those with less will have a 4-byte header?
Plus, setting a class with more than 63 instance variables will never have an effect, its instances will always have 12-byte headers?

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: SparseLargeTable instances and Unicode

Eliot Miranda-2
In reply to this post by Mariano Martinez Peck


2010/9/21 Mariano Martinez Peck <[hidden email]>


On Tue, Sep 21, 2010 at 7:55 PM, Henrik Sperre Johansen <[hidden email]> wrote:
 On 21.09.2010 19:10, Mariano Martinez Peck wrote:
Hi folks. I wasn't aware of the class SparseLargeTable but it seems it has two important instances "SparseLargeTable allInstances size -> 2"

They are the class side variables #DecimalProperty and #GeneralCategory in Unicode.

Actually,

(Unicode classPool at: 'GeneralCategory') size  -> 917632

(Unicode classPool at: 'DecimalProperty') size -> 917632

So...collections of almost 1 million elements.

Now a couple of questions:

1) Is this normal/expected?   In my calculus both arrays are like 3mb of memory
Look at the implementation, there's a reason they're called sparse.
If my math is correct (crosses fingers), DecimalProperty is ~15KB, GeneralCategory ~70KB

So...there must be a problem with my code, I was doing something like SpaceTally does:


spaceForUsedInstancesOf: aClass withInstanceCount: instCount
    "Answer the number of bytes consumed by all instances of the given class, including their object headers."

    | isCompact instVarBytes bytesPerElement headerBytes total |
    instCount = 0 ifTrue: [^ 0].
    isCompact := aClass indexIfCompact > 0.
    instVarBytes := aClass instSize * 4.
    aClass isVariable
        ifTrue: [
            bytesPerElement := aClass isBytes ifTrue: [1] ifFalse: [4].
            total := 0.
            aClass allInstancesDo: [:inst | | contentBytes |
                contentBytes := instVarBytes + (inst size * bytesPerElement).
                headerBytes :=
                    contentBytes > 255
                        ifTrue: [12]
                        ifFalse: [isCompact ifTrue: [4] ifFalse: [8]].
                    (UnusedObjectDiscoverer current primitiveGetUsedBit: inst) ifTrue:
                [total := total + headerBytes + contentBytes]].
            ^ total]
        ifFalse: [
            headerBytes :=
                instVarBytes > 255
                    ifTrue: [12]
                    ifFalse: [isCompact ifTrue: [4] ifFalse: [8]].
            ^ instCount * (headerBytes + instVarBytes)].



SparseLargeTable instanceCount ->>> 2

SpaceTally new spaceForUsedInstancesOf: SparseLargeTable withInstanceCount: 2   ->>>  7341112

You must use basicSize not size.  This is a sparse table.  It has a virtual size.  size answers that virtual size, not the actual size.  If you use basicSize you'll also need to traverse su structure to correctly compute the size of the entire sparse array including its pages.

HTH
Eliot

thanks




2) If I don't use UTF and I send an encoding as parameter in the VM (different from UTF), do I need this class side variables?
What Lukas said.

Cheers,
Henry


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project