Sometimes it's too easy, part II

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Sometimes it's too easy, part II

Eliot Miranda-2
 
Hi All,

    I had stalled in recent days on the new compactor.  The compactor appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release).

Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons.  Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space.  The default type for oops in the VM is saint, which is signed.

So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down.

The problem is how to find these unsigned comparisons.  I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux.  Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects.

And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons.  The first cut looked like this.  Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated.  The important bit at the bottom is

node isSend
 and: [(#(< > <= >=) includes: node selector)
 and: [({node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]]

SpurPlanningCompactor class>>#identifySignedComparisons
"self identifySignedComparisons"
| vmm cg |
vmm := (VMMaker forPlatform: 'Cross')
interpreterClass: StackInterpreter;
options: #(ObjectMemory Spur32BitMemoryManager).
cg := [vmm buildCodeGeneratorForInterpreter]
on: Notification
do: [:ex|
ex tag == #getVMMaker
ifTrue: [ex resume: vmm]
ifFalse: [ex pass]].
cg vmClass preGenerationHook: cg.
cg inferTypesForImplicitlyTypedVariablesAndMethods.
cg retainMethods: self selectors.
cg prepareMethods.
cg doInlining: true.
self selectors do:
[:sel|
(cg methodNamed: sel) ifNotNil:
[:m|
m parseTree nodesDo:
[:node|
(node isSend
and: [(#(< > <= >=) includes: node selector)
and: [{node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue:
[Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]

Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like

identifySignedComparisons
"self identifySignedComparisons"
| vmm cg noise |
noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate'
'(self classIndexOf: o*) > self isForwardedObjectClassIndexPun'
'GCModeFull > 0'
'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)'
'fmt* < manager firstCompiledMethodFormat'
'fmt* < self firstCompiledMethodFormat'
'fmt* <= 5'
'gcPhaseInProgress > 0'
'i <= finishIndex'
'i >= 0'
'numPointerSlots > 0'
'scavenger rememberedSetSize > 0').
vmm := (VMMaker forPlatform: 'Cross')
interpreterClass: StackInterpreter;
options: #(ObjectMemory Spur32BitMemoryManager).
cg := [vmm buildCodeGeneratorForInterpreter]
on: Notification
do: [:ex|
ex tag == #getVMMaker
ifTrue: [ex resume: vmm]
ifFalse: [ex pass]].
cg vmClass preGenerationHook: cg.
cg inferTypesForImplicitlyTypedVariablesAndMethods.
cg retainMethods: self selectors.
cg prepareMethods.
cg doInlining: true.
self selectors do:
[:sel|
(cg methodNamed: sel) ifNotNil:
[:m|
m parseTree nodesDo:
[:node|
(node isSend
and: [(#(< > <= >=) includes: node selector)
and: [({node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])
and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue:
[Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]

and my output is

savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj
unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj
unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory
findNextMarkedPinnedAfter: nextObj >= manager endOfMemory
planCompactSavingForwarders toFinger <= (manager startOfObject: o)
planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin)
updatePointers o2 >= firstFreeObject
updatePointers toFinger <= (manager startOfObject: o3)
updatePointers toFinger <= (manager startOfObject: previousPin)
savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj
freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart
freeFrom:upTo:previousPin: seg segSize + seg segStart < limit
freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger
freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1
freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory
freeFrom:upTo:previousPin: nextUnpinned >= limit
freeFrom:upTo:previousPin: nextObj >= manager endOfMemory
unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj
compact savedFirstFieldsSpace start >= nilObj
compact savedFirstFieldsSpace start >= nilObj
updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj
endCompaction savedFirstFieldsSpace start >= nilObj
updatePointersInInitialImmobileObjects o >= firstFreeObject
copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o)
copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin)
copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace
copyAndUnmarkMobileObjects availableSpace > 0
releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj
updatePointersInMobileObjects toFinger <= (manager startOfObject: o)
updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin)


Less than half an hour's work.  Now I can squash those bugs :-)

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Nicolas Cellier
 
Ah great!
I had the same idea of instrumenting slang rather than analyzing the compiler warnings for the same purpose of filtering the noise out, but never took the time to do it...


2017-01-25 18:22 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi All,

    I had stalled in recent days on the new compactor.  The compactor appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release).

Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons.  Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space.  The default type for oops in the VM is saint, which is signed.

So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down.

The problem is how to find these unsigned comparisons.  I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux.  Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects.

And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons.  The first cut looked like this.  Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated.  The important bit at the bottom is

node isSend
 and: [(#(< > <= >=) includes: node selector)
 and: [({node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]]

SpurPlanningCompactor class>>#identifySignedComparisons
"self identifySignedComparisons"
| vmm cg |
vmm := (VMMaker forPlatform: 'Cross')
interpreterClass: StackInterpreter;
options: #(ObjectMemory Spur32BitMemoryManager).
cg := [vmm buildCodeGeneratorForInterpreter]
on: Notification
do: [:ex|
ex tag == #getVMMaker
ifTrue: [ex resume: vmm]
ifFalse: [ex pass]].
cg vmClass preGenerationHook: cg.
cg inferTypesForImplicitlyTypedVariablesAndMethods.
cg retainMethods: self selectors.
cg prepareMethods.
cg doInlining: true.
self selectors do:
[:sel|
(cg methodNamed: sel) ifNotNil:
[:m|
m parseTree nodesDo:
[:node|
(node isSend
and: [(#(< > <= >=) includes: node selector)
and: [{node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue:
[Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]

Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like

identifySignedComparisons
"self identifySignedComparisons"
| vmm cg noise |
noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate'
'(self classIndexOf: o*) > self isForwardedObjectClassIndexPun'
'GCModeFull > 0'
'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)'
'fmt* < manager firstCompiledMethodFormat'
'fmt* < self firstCompiledMethodFormat'
'fmt* <= 5'
'gcPhaseInProgress > 0'
'i <= finishIndex'
'i >= 0'
'numPointerSlots > 0'
'scavenger rememberedSetSize > 0').
vmm := (VMMaker forPlatform: 'Cross')
interpreterClass: StackInterpreter;
options: #(ObjectMemory Spur32BitMemoryManager).
cg := [vmm buildCodeGeneratorForInterpreter]
on: Notification
do: [:ex|
ex tag == #getVMMaker
ifTrue: [ex resume: vmm]
ifFalse: [ex pass]].
cg vmClass preGenerationHook: cg.
cg inferTypesForImplicitlyTypedVariablesAndMethods.
cg retainMethods: self selectors.
cg prepareMethods.
cg doInlining: true.
self selectors do:
[:sel|
(cg methodNamed: sel) ifNotNil:
[:m|
m parseTree nodesDo:
[:node|
(node isSend
and: [(#(< > <= >=) includes: node selector)
and: [({node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])
and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue:
[Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]

and my output is

savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj
unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj
unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory
findNextMarkedPinnedAfter: nextObj >= manager endOfMemory
planCompactSavingForwarders toFinger <= (manager startOfObject: o)
planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin)
updatePointers o2 >= firstFreeObject
updatePointers toFinger <= (manager startOfObject: o3)
updatePointers toFinger <= (manager startOfObject: previousPin)
savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj
freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart
freeFrom:upTo:previousPin: seg segSize + seg segStart < limit
freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger
freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1
freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory
freeFrom:upTo:previousPin: nextUnpinned >= limit
freeFrom:upTo:previousPin: nextObj >= manager endOfMemory
unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj
compact savedFirstFieldsSpace start >= nilObj
compact savedFirstFieldsSpace start >= nilObj
updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj
endCompaction savedFirstFieldsSpace start >= nilObj
updatePointersInInitialImmobileObjects o >= firstFreeObject
copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o)
copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin)
copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace
copyAndUnmarkMobileObjects availableSpace > 0
releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj
updatePointersInMobileObjects toFinger <= (manager startOfObject: o)
updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin)


Less than half an hour's work.  Now I can squash those bugs :-)

_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

johnmci
 
I recall that the issue we had with making the 32bit version clean was a for/while loop using a signed integer as pointer compare. If the start/end of the object was either negative or positive it worked, but if the object spanned the 2GB boundary then boom. 

Using the start address in mmap was helpful to fiddle with where the image started in memory. 

On Wed, Jan 25, 2017 at 10:45 AM, Nicolas Cellier <[hidden email]> wrote:
 
Ah great!
I had the same idea of instrumenting slang rather than analyzing the compiler warnings for the same purpose of filtering the noise out, but never took the time to do it...


2017-01-25 18:22 GMT+01:00 Eliot Miranda <[hidden email]>:
 
Hi All,

    I had stalled in recent days on the new compactor.  The compactor appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release).

Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons.  Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space.  The default type for oops in the VM is saint, which is signed.

So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down.

The problem is how to find these unsigned comparisons.  I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux.  Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects.

And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons.  The first cut looked like this.  Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated.  The important bit at the bottom is

node isSend
 and: [(#(< > <= >=) includes: node selector)
 and: [({node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]]

SpurPlanningCompactor class>>#identifySignedComparisons
"self identifySignedComparisons"
| vmm cg |
vmm := (VMMaker forPlatform: 'Cross')
interpreterClass: StackInterpreter;
options: #(ObjectMemory Spur32BitMemoryManager).
cg := [vmm buildCodeGeneratorForInterpreter]
on: Notification
do: [:ex|
ex tag == #getVMMaker
ifTrue: [ex resume: vmm]
ifFalse: [ex pass]].
cg vmClass preGenerationHook: cg.
cg inferTypesForImplicitlyTypedVariablesAndMethods.
cg retainMethods: self selectors.
cg prepareMethods.
cg doInlining: true.
self selectors do:
[:sel|
(cg methodNamed: sel) ifNotNil:
[:m|
m parseTree nodesDo:
[:node|
(node isSend
and: [(#(< > <= >=) includes: node selector)
and: [{node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue:
[Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]

Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like

identifySignedComparisons
"self identifySignedComparisons"
| vmm cg noise |
noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate'
'(self classIndexOf: o*) > self isForwardedObjectClassIndexPun'
'GCModeFull > 0'
'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)'
'fmt* < manager firstCompiledMethodFormat'
'fmt* < self firstCompiledMethodFormat'
'fmt* <= 5'
'gcPhaseInProgress > 0'
'i <= finishIndex'
'i >= 0'
'numPointerSlots > 0'
'scavenger rememberedSetSize > 0').
vmm := (VMMaker forPlatform: 'Cross')
interpreterClass: StackInterpreter;
options: #(ObjectMemory Spur32BitMemoryManager).
cg := [vmm buildCodeGeneratorForInterpreter]
on: Notification
do: [:ex|
ex tag == #getVMMaker
ifTrue: [ex resume: vmm]
ifFalse: [ex pass]].
cg vmClass preGenerationHook: cg.
cg inferTypesForImplicitlyTypedVariablesAndMethods.
cg retainMethods: self selectors.
cg prepareMethods.
cg doInlining: true.
self selectors do:
[:sel|
(cg methodNamed: sel) ifNotNil:
[:m|
m parseTree nodesDo:
[:node|
(node isSend
and: [(#(< > <= >=) includes: node selector)
and: [({node receiver. node args first } anySatisfy:
[:o| (cg typeFor: o in: m)
ifNil: [false]
ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])
and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue:
[Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]

and my output is

savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj
unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj
unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory
findNextMarkedPinnedAfter: nextObj >= manager endOfMemory
planCompactSavingForwarders toFinger <= (manager startOfObject: o)
planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin)
updatePointers o2 >= firstFreeObject
updatePointers toFinger <= (manager startOfObject: o3)
updatePointers toFinger <= (manager startOfObject: previousPin)
savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj
freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart
freeFrom:upTo:previousPin: seg segSize + seg segStart < limit
freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger
freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1
freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory
freeFrom:upTo:previousPin: nextUnpinned >= limit
freeFrom:upTo:previousPin: nextObj >= manager endOfMemory
unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj
compact savedFirstFieldsSpace start >= nilObj
compact savedFirstFieldsSpace start >= nilObj
updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj
endCompaction savedFirstFieldsSpace start >= nilObj
updatePointersInInitialImmobileObjects o >= firstFreeObject
copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o)
copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin)
copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace
copyAndUnmarkMobileObjects availableSpace > 0
releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj
updatePointersInMobileObjects toFinger <= (manager startOfObject: o)
updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin)


Less than half an hour's work.  Now I can squash those bugs :-)

_,,,^..^,,,_
best, Eliot






--
===========================================================================
John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk
===========================================================================
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Bert Freudenberg
In reply to this post by Eliot Miranda-2
 
Awesome! However ...

On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote:
The default type for oops in the VM is sqInt (*), which is signed.

... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?

- Bert -
(*) fixed the typo

Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Tobias Pape
 

On 26.01.2017, at 11:42, Bert Freudenberg <[hidden email]> wrote:

> Awesome! However ...
>
> On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote:
> The default type for oops in the VM is sqInt (*), which is signed.
>
> ... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
>

*signed behavior?


> - Bert -
> (*) fixed the typo
>

Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Bert Freudenberg
 
On Thu, Jan 26, 2017 at 1:32 PM, Tobias Pape <[hidden email]> wrote:


On 26.01.2017, at 11:42, Bert Freudenberg <[hidden email]> wrote:

> Awesome! However ...
>
> On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote:
> The default type for oops in the VM is sqInt (*), which is signed.
>
> ... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
>

*signed behavior?

Err, yes, we never want *signed* behavior unless comparing / shifting small ints.

- Bert -
 

> - Bert -
> (*) fixed the typo
>


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Eliot Miranda-2
In reply to this post by Bert Freudenberg
 
Hi Bert,

On Jan 26, 2017, at 2:42 AM, Bert Freudenberg <[hidden email]> wrote:

Awesome! However ...

On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote:
The default type for oops in the VM is sqInt (*), which is signed.

... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?

well, I've wanted to change this for some time but potentially it's a lot of work, as it affects the plugin and platform support code as much as the VM source.  But let's at least discuss it.

First, what exactly do we do?  Right now we have sqInt & usqInt as oop-sized signed and unsigned types.  My list of pros and cons below is off the top of my head, preliminary.  Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately.  Do we

- a) change the code generator to use usqInt as the default type in place of sqInt?
   pros presumably minimal changes except to code generator and removing pragmas that specify usqInt?
   cons rewriting platform support code to use usqInt for at least parameters

- b) add a new default type, e.g. oop_t, which is the same as usqInt?
  pros, see above
  cons figuring out which usqInt types apply to oops and which to unsigned values

- c) change sqInt to be unsigned and add e.g. ssqInt as a signed type
   pros don't have to change platform source
 
- d) add a new default type, e.g. oop_t, which is the same as char *?
   pros oops are pointers 
    cons C allows arithmetic on char *, so this won't fix inappropriate arithmetic on oops
         It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses)

- e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *)

- f..z) ?

Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?

Anyway, looks to me like a) or c) are the main contenders.  What d'all think?


- Bert -
(*) fixed the typo

Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

timrowledge
 
Surely since we’re concerned about (ab)use of signed values for pointers to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago?

Yes, any change means a bunch of work to do but isn’t that always the price to improve things?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: LB: connect Line-voltage to BITNET


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

David T. Lewis
 
+1

Dave

>
> Surely since we’re concerned about (ab)use of signed values for pointers
> to words we ought to define a ‘proper’ sqOop thing and use that
> instead of a slightly cleaned up derivative of how the Slang default was
> left as ‘int’ twenty years ago?
>
> Yes, any change means a bunch of work to do but isn’t that always the
> price to improve things?
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Strange OpCodes: LB: connect Line-voltage to BITNET
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

johnmci
 
+1

Sent from my iPhone

> On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote:
>
>
> +1
>
> Dave
>
>>
>> Surely since we’re concerned about (ab)use of signed values for pointers
>> to words we ought to define a ‘proper’ sqOop thing and use that
>> instead of a slightly cleaned up derivative of how the Slang default was
>> left as ‘int’ twenty years ago?
>>
>> Yes, any change means a bunch of work to do but isn’t that always the
>> price to improve things?
>>
>> tim
>> --
>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>> Strange OpCodes: LB: connect Line-voltage to BITNET
>>
>>
>>
>
>

smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Igor Stasenko
 
+1

with C i was always wondering, why C standard does not provides 
an unsigned integer type which size strictly corresponds to size of pointer type, e.g.
sizeof(some standard uint) == sizeof(void*)
that would solve so many problems and make thing so much easier..

instead they had this:
Data Type: ptrdiff_t

This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short intint or long int), but might be a nonstandard type that exists only for this purpose.


it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers..
alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder.. 

One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t,
unfortunately, ISO gives no guarantees that its size corresponds to pointer size.


that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for  proposing to use non-standard C herecy :)


On 26 January 2017 at 21:37, John McIntosh <[hidden email]> wrote:
 
+1

Sent from my iPhone

> On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote:
>
>
> +1
>
> Dave
>
>>
>> Surely since we’re concerned about (ab)use of signed values for pointers
>> to words we ought to define a ‘proper’ sqOop thing and use that
>> instead of a slightly cleaned up derivative of how the Slang default was
>> left as ‘int’ twenty years ago?
>>
>> Yes, any change means a bunch of work to do but isn’t that always the
>> price to improve things?
>>
>> tim
>> --
>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>> Strange OpCodes: LB: connect Line-voltage to BITNET
>>
>>
>>
>
>




--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Nicolas Cellier
In reply to this post by johnmci
 
Some requirements:

1) Slang must be able to code boolean and arithmetic expressions without the need of type hint.

2) If we still want to support 32bits image on 64bits VM, then it would be good to avoid implicit conversions between machine pointer (char *) and oop. We should allways use char *pointerForOop(sqOop oop) and sqOop oopForPointer(char *).

3) we must avoid as much as possible to invoke UB, or the compiled code will start to crash or behave wrongly as soon as we use an optimization flag, and this will get worse at each C compiler version.

So far, sqInt is defined as default type and works well for 1).
We have to cast to unsigned here and there though to avoid UB (like left shifting a signed value, or testing for overflow in post condition). But also to avoid unwanted sign extension when we manipulate chnuks of bit (like high and low part) - we had several bugs with that.

I much prefer unsigned, because it's almost free from UB. With signed, we just don't have right to overflow, that's crazy...
But if we switch to unsigned, my fear is to have to add many type hints...
Because telling automatically which arithmetic expressions requires signed, and which would work with unsigned is not easy.
for example, for( i=size-1;i >= 0; i--)  isn't going to do what we want.
Nor c = a - b; if ( c < 0 ) ...

Declaring sqOop as a union could help 2) but would hardly fit 1).
Maybe type inference can solve it entirely, and it's not a problem?

Declaring sqOop as (char *) would not support 2)
And we could not compare oops any more because it's UB!
We could eventually use slang type inference and force (usqInt) oop1 < (usqInt) oop2...
I don't like this option and much prefer explicit char *pointerForOop(sqOop oop) and sqOop oopForPointer(char *).


2017-01-26 20:37 GMT+01:00 John McIntosh <[hidden email]>:
 
+1

Sent from my iPhone

> On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote:
>
>
> +1
>
> Dave
>
>>
>> Surely since we’re concerned about (ab)use of signed values for pointers
>> to words we ought to define a ‘proper’ sqOop thing and use that
>> instead of a slightly cleaned up derivative of how the Slang default was
>> left as ‘int’ twenty years ago?
>>
>> Yes, any change means a bunch of work to do but isn’t that always the
>> price to improve things?
>>
>> tim
>> --
>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>> Strange OpCodes: LB: connect Line-voltage to BITNET
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Nicolas Cellier
In reply to this post by Igor Stasenko
 


2017-01-26 21:20 GMT+01:00 Igor Stasenko <[hidden email]>:
 
+1

with C i was always wondering, why C standard does not provides 
an unsigned integer type which size strictly corresponds to size of pointer type, e.g.
sizeof(some standard uint) == sizeof(void*)
that would solve so many problems and make thing so much easier..


In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)

 
instead they had this:
Data Type: ptrdiff_t

This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short intint or long int), but might be a nonstandard type that exists only for this purpose.


it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers..
alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder.. 

Technically you should only compare pointers that points to parts of a same structure or array (or byte just after the end of an array), otherwise it's UB.
I guess this should apply to ptrdiff_t, in which case ptrdiff_t should be as long as a size_t, except it is signed.
 

One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t,
unfortunately, ISO gives no guarantees that its size corresponds to pointer size.


size_t does not necessarily fit intptr_t indeed, use the right type.
 

that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for  proposing to use non-standard C herecy :)


On 26 January 2017 at 21:37, John McIntosh <[hidden email]> wrote:
 
+1

Sent from my iPhone

> On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote:
>
>
> +1
>
> Dave
>
>>
>> Surely since we’re concerned about (ab)use of signed values for pointers
>> to words we ought to define a ‘proper’ sqOop thing and use that
>> instead of a slightly cleaned up derivative of how the Slang default was
>> left as ‘int’ twenty years ago?
>>
>> Yes, any change means a bunch of work to do but isn’t that always the
>> price to improve things?
>>
>> tim
>> --
>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>> Strange OpCodes: LB: connect Line-voltage to BITNET
>>
>>
>>
>
>




--
Best regards,
Igor Stasenko.


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Igor Stasenko
 


On 26 January 2017 at 22:32, Nicolas Cellier <[hidden email]> wrote:
 


2017-01-26 21:20 GMT+01:00 Igor Stasenko <[hidden email]>:
 
+1

with C i was always wondering, why C standard does not provides 
an unsigned integer type which size strictly corresponds to size of pointer type, e.g.
sizeof(some standard uint) == sizeof(void*)
that would solve so many problems and make thing so much easier..


In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)

i guess, you meant uintptr_t then
but it was introduced only in C99.. and compilers started supporting C99 years later..
but the problem is that AFAIK, first lines of Squeak were written around '96 
and people had to deal with what they had..

And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like
our VM were written first :)


--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Nicolas Cellier
 


2017-01-26 21:46 GMT+01:00 Igor Stasenko <[hidden email]>:
 


On 26 January 2017 at 22:32, Nicolas Cellier <[hidden email]> wrote:
 


2017-01-26 21:20 GMT+01:00 Igor Stasenko <[hidden email]>:
 
+1

with C i was always wondering, why C standard does not provides 
an unsigned integer type which size strictly corresponds to size of pointer type, e.g.
sizeof(some standard uint) == sizeof(void*)
that would solve so many problems and make thing so much easier..


In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)

i guess, you meant uintptr_t then
but it was introduced only in C99.. and compilers started supporting C99 years later..
but the problem is that AFAIK, first lines of Squeak were written around '96 
and people had to deal with what they had..

And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like
our VM were written first :)


If not all compilers support C99, most support at least uintptr_t
But because we are really caring of obsolete configurations, we have introduced a usqIntptr_t...

However, an oop is not necessarily a usqIntptr_t, for example if we have 32bits image on 64bits VM, which might still be an interesting case.
 

--
Best regards,
Igor Stasenko.


Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Igor Stasenko
 


On 26 January 2017 at 22:58, Nicolas Cellier <[hidden email]> wrote:
 


2017-01-26 21:46 GMT+01:00 Igor Stasenko <[hidden email]>:
 


On 26 January 2017 at 22:32, Nicolas Cellier <[hidden email]> wrote:
 


2017-01-26 21:20 GMT+01:00 Igor Stasenko <[hidden email]>:
 
+1

with C i was always wondering, why C standard does not provides 
an unsigned integer type which size strictly corresponds to size of pointer type, e.g.
sizeof(some standard uint) == sizeof(void*)
that would solve so many problems and make thing so much easier..


In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)

i guess, you meant uintptr_t then
but it was introduced only in C99.. and compilers started supporting C99 years later..
but the problem is that AFAIK, first lines of Squeak were written around '96 
and people had to deal with what they had..

And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like
our VM were written first :)


If not all compilers support C99, most support at least uintptr_t
But because we are really caring of obsolete configurations, we have introduced a usqIntptr_t...

However, an oop is not necessarily a usqIntptr_t, for example if we have 32bits image on 64bits VM, which might still be an interesting case.

you mean interesting in terms, like implementing a 70's era processor or DoS simulator(s) in javascript, so you can open it on web page and have fun? :)
 
 

--
Best regards,
Igor Stasenko.






--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Ben Coman
In reply to this post by Eliot Miranda-2
 
On Fri, Jan 27, 2017 at 12:52 AM, Eliot Miranda <[hidden email]> wrote:

>
> Hi Bert,
>
> On Jan 26, 2017, at 2:42 AM, Bert Freudenberg <[hidden email]> wrote:
>
> Awesome! However ...
>
> On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote:
>>
>> The default type for oops in the VM is sqInt (*), which is signed.
>
>
> ... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
>
>
> well, I've wanted to change this for some time but potentially it's a lot of work, as it affects the plugin and platform support code as much as the VM source.  But let's at least discuss it.
>
> First, what exactly do we do?  Right now we have sqInt & usqInt as oop-sized signed and unsigned types.  My list of pros and cons below is off the top of my head, preliminary.  Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately.  Do we
>
> - a) change the code generator to use usqInt as the default type in place of sqInt?
>    pros presumably minimal changes except to code generator and removing pragmas that specify usqInt?
>    cons rewriting platform support code to use usqInt for at least parameters
>
> - b) add a new default type, e.g. oop_t, which is the same as usqInt?
>   pros, see above
>   cons figuring out which usqInt types apply to oops and which to unsigned values
>
> - c) change sqInt to be unsigned and add e.g. ssqInt as a signed type
>    pros don't have to change platform source
>
> - d) add a new default type, e.g. oop_t, which is the same as char *?
>    pros oops are pointers
>     cons C allows arithmetic on char *, so this won't fix inappropriate arithmetic on oops
>          It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses)
>
> - e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *)
>
> - f..z) ?
>
> Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?
>
> Anyway, looks to me like a) or c) are the main contenders.  What d'all think?

What is the overlap in the cons between a) and b) ?
Is one a super-set of the other or are the changes orthogonal?

For stability during the progressive work on a)
would it help to anyway replace sqInt by oop_t,
and have an experimental branch where "typdef usqInt oop_t"
while the main Cog branch remains "typedef sqInt oop_t" ?

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Tobias Pape
In reply to this post by Igor Stasenko
 

On 26.01.2017, at 21:20, Igor Stasenko <[hidden email]> wrote:

> +1
>
> with C i was always wondering, why C standard does not provides
> an unsigned integer type which size strictly corresponds to size of pointer type, e.g.
> sizeof(some standard uint) == sizeof(void*)

You mean uintptr_t?

https://en.wikipedia.org/wiki/C_data_types#Fixed-width_integer_types

C11 says:

7.20.1.4 Integer types capable of holding object pointers

1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

intptr_t

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

uintptr_t

These types are optional.


> that would solve so many problems and make thing so much easier..
>
> instead they had this:
> http://www.gnu.org/software/libc/manual/html_node/Important-Data-Types.html
> Data Type: ptrdiff_t
> This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short int, int or long int), but might be a nonstandard type that exists only for this purpose.
>
>
> it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers..
> alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder..
>
> One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t,
> unfortunately, ISO gives no guarantees that its size corresponds to pointer size.
>
>
> Btw,look what i found :)
> https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins
>
> that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for  proposing to use non-standard C herecy :)
>
>
> On 26 January 2017 at 21:37, John McIntosh <[hidden email]> wrote:
>  
> +1
>
> Sent from my iPhone
>
> > On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote:
> >
> >
> > +1
> >
> > Dave
> >
> >>
> >> Surely since we’re concerned about (ab)use of signed values for pointers
> >> to words we ought to define a ‘proper’ sqOop thing and use that
> >> instead of a slightly cleaned up derivative of how the Slang default was
> >> left as ‘int’ twenty years ago?
> >>
> >> Yes, any change means a bunch of work to do but isn’t that always the
> >> price to improve things?
> >>
> >> tim
> >> --
> >> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> >> Strange OpCodes: LB: connect Line-voltage to BITNET
> >>
> >>
> >>
> >
> >
>
>
>
>
> --
> Best regards,
> Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Bert Freudenberg
In reply to this post by Eliot Miranda-2
 
On Thu, Jan 26, 2017 at 5:52 PM, Eliot Miranda <[hidden email]> wrote:
 

- a) change the code generator to use usqInt as the default type in place of sqInt?
- b) add a new default type, e.g. oop_t, which is the same as usqInt?
- c) change sqInt to be unsigned and add e.g. ssqInt as a signed type
- d) add a new default type, e.g. oop_t, which is the same as char *?
- e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *)

I'd think b should work fine. sqOop as suggested by Tim sounds good.
 
Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?

Well, most of SqueakJS is written by hand and not generated. That is the interpreter, jit, most plugins. Non-immediate oops are direct object references, there is no way to access their numeric value.

Where it is relevant is in the Slang sources for a couple of plugins that I translated to JS. My VMMakerJS subclasses the regular code generator and I had to add a crude type inference mechanism and some special-case hacks to make sized pointer arithmetic and signed/unsigned stuff work.

I guess if I had to touch it again (still using the generated code from 2 years ago) I may be better off using the oscog version instead of baking my own type inferencer. In any case, a clear distinction between pointers and arithmetic types would be a Good Thing.

- Bert -
Reply | Threaded
Open this post in threaded view
|

Re: Sometimes it's too easy, part II

Andres Valloud-4
In reply to this post by timrowledge
 
And structure the changes to cause a net code loss, so the next set of
changes is easier to make.  Repeat.

On 1/26/17 10:54 , tim Rowledge wrote:

>
> Surely since we’re concerned about (ab)use of signed values for
> pointers to words we ought to define a ‘proper’ sqOop thing and use
> that instead of a slightly cleaned up derivative of how the Slang
> default was left as ‘int’ twenty years ago?
>
> Yes, any change means a bunch of work to do but isn’t that always the
> price to improve things?
>
> tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Strange OpCodes: LB: connect Line-voltage to BITNET
>
>
12