Hi All,
I had stalled in recent days on the new compactor. The compactor appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release). Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons. Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space. The default type for oops in the VM is saint, which is signed. So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down. The problem is how to find these unsigned comparisons. I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux. Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects. And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons. The first cut looked like this. Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated. The important bit at the bottom is node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]] SpurPlanningCompactor class>>#identifySignedComparisons "self identifySignedComparisons" | vmm cg | vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [{node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]] Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like identifySignedComparisons "self identifySignedComparisons" | vmm cg noise | noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate' '(self classIndexOf: o*) > self isForwardedObjectClassIndexPun' 'GCModeFull > 0' 'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)' 'fmt* < manager firstCompiledMethodFormat' 'fmt* < self firstCompiledMethodFormat' 'fmt* <= 5' 'gcPhaseInProgress > 0' 'i <= finishIndex' 'i >= 0' 'numPointerSlots > 0' 'scavenger rememberedSetSize > 0'). vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]) and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]] and my output is savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory findNextMarkedPinnedAfter: nextObj >= manager endOfMemory planCompactSavingForwarders toFinger <= (manager startOfObject: o) planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin) updatePointers o2 >= firstFreeObject updatePointers toFinger <= (manager startOfObject: o3) updatePointers toFinger <= (manager startOfObject: previousPin) savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart freeFrom:upTo:previousPin: seg segSize + seg segStart < limit freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1 freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory freeFrom:upTo:previousPin: nextUnpinned >= limit freeFrom:upTo:previousPin: nextObj >= manager endOfMemory unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj compact savedFirstFieldsSpace start >= nilObj compact savedFirstFieldsSpace start >= nilObj updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj endCompaction savedFirstFieldsSpace start >= nilObj updatePointersInInitialImmobileObjects o >= firstFreeObject copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o) copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin) copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace copyAndUnmarkMobileObjects availableSpace > 0 releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj updatePointersInMobileObjects toFinger <= (manager startOfObject: o) updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin) Less than half an hour's work. Now I can squash those bugs :-) _,,,^..^,,,_ best, Eliot |
Ah great! I had the same idea of instrumenting slang rather than analyzing the compiler warnings for the same purpose of filtering the noise out, but never took the time to do it...2017-01-25 18:22 GMT+01:00 Eliot Miranda <[hidden email]>:
|
I recall that the issue we had with making the 32bit version clean was a for/while loop using a signed integer as pointer compare. If the start/end of the object was either negative or positive it worked, but if the object spanned the 2GB boundary then boom. Using the start address in mmap was helpful to fiddle with where the image started in memory. On Wed, Jan 25, 2017 at 10:45 AM, Nicolas Cellier <[hidden email]> wrote:
=========================================================================== John M. McIntosh. Corporate Smalltalk Consulting Ltd https://www.linkedin.com/in/smalltalk =========================================================================== |
In reply to this post by Eliot Miranda-2
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote:
... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right? - Bert - (*) fixed the typo |
On 26.01.2017, at 11:42, Bert Freudenberg <[hidden email]> wrote: > Awesome! However ... > > On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote: > The default type for oops in the VM is sqInt (*), which is signed. > > ... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right? > *signed behavior? > - Bert - > (*) fixed the typo > |
On Thu, Jan 26, 2017 at 1:32 PM, Tobias Pape <[hidden email]> wrote:
Err, yes, we never want *signed* behavior unless comparing / shifting small ints. - Bert -
|
In reply to this post by Bert Freudenberg
Hi Bert,
First, what exactly do we do? Right now we have sqInt & usqInt as oop-sized signed and unsigned types. My list of pros and cons below is off the top of my head, preliminary. Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately. Do we - a) change the code generator to use usqInt as the default type in place of sqInt? pros presumably minimal changes except to code generator and removing pragmas that specify usqInt? cons rewriting platform support code to use usqInt for at least parameters - b) add a new default type, e.g. oop_t, which is the same as usqInt? pros, see above cons figuring out which usqInt types apply to oops and which to unsigned values - c) change sqInt to be unsigned and add e.g. ssqInt as a signed type pros don't have to change platform source - d) add a new default type, e.g. oop_t, which is the same as char *?
pros oops are pointers cons C allows arithmetic on char *, so this won't fix inappropriate arithmetic on oops It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses) - e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *) - f..z) ? Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored? Anyway, looks to me like a) or c) are the main contenders. What d'all think?
|
Surely since we’re concerned about (ab)use of signed values for pointers to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago? Yes, any change means a bunch of work to do but isn’t that always the price to improve things? tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET |
+1 Dave > > Surely since weâre concerned about (ab)use of signed values for pointers > to words we ought to define a âproperâ sqOop thing and use that > instead of a slightly cleaned up derivative of how the Slang default was > left as âintâ twenty years ago? > > Yes, any change means a bunch of work to do but isnât that always the > price to improve things? > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Strange OpCodes: LB: connect Line-voltage to BITNET > > > |
+1 Sent from my iPhone > On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote: > > > +1 > > Dave > >> >> Surely since weâre concerned about (ab)use of signed values for pointers >> to words we ought to define a âproperâ sqOop thing and use that >> instead of a slightly cleaned up derivative of how the Slang default was >> left as âintâ twenty years ago? >> >> Yes, any change means a bunch of work to do but isnât that always the >> price to improve things? >> >> tim >> -- >> tim Rowledge; [hidden email]; http://www.rowledge.org/tim >> Strange OpCodes: LB: connect Line-voltage to BITNET >> >> >> > > smime.p7s (3K) Download Attachment |
+1 with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*) that would solve so many problems and make thing so much easier.. instead they had this:
it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers.. alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder.. One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t, unfortunately, ISO gives no guarantees that its size corresponds to pointer size. Btw,look what i found :) https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for proposing to use non-standard C herecy :) On 26 January 2017 at 21:37, John McIntosh <[hidden email]> wrote:
Best regards,
Igor Stasenko. |
In reply to this post by johnmci
Some requirements: 1) Slang must be able to code boolean and arithmetic expressions without the need of type hint. 3) we must avoid as much as possible to invoke UB, or the compiled code will start to crash or behave wrongly as soon as we use an optimization flag, and this will get worse at each C compiler version. So far, sqInt is defined as default type and works well for 1). We have to cast to unsigned here and there though to avoid UB (like left shifting a signed value, or testing for overflow in post condition). But also to avoid unwanted sign extension when we manipulate chnuks of bit (like high and low part) - we had several bugs with that. I much prefer unsigned, because it's almost free from UB. With signed, we just don't have right to overflow, that's crazy... But if we switch to unsigned, my fear is to have to add many type hints... Because telling automatically which arithmetic expressions requires signed, and which would work with unsigned is not easy. for example, for( i=size-1;i >= 0; i--) isn't going to do what we want. Nor c = a - b; if ( c < 0 ) ... Declaring sqOop as a union could help 2) but would hardly fit 1). Maybe type inference can solve it entirely, and it's not a problem? Declaring sqOop as (char *) would not support 2) And we could not compare oops any more because it's UB! We could eventually use slang type inference and force (usqInt) oop1 < (usqInt) oop2... I don't like this option and much prefer explicit char
*pointerForOop(sqOop oop) and sqOop oopForPointer(char *). 2017-01-26 20:37 GMT+01:00 John McIntosh <[hidden email]>:
|
In reply to this post by Igor Stasenko
2017-01-26 21:20 GMT+01:00 Igor Stasenko <[hidden email]>:
In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)
Technically you should only compare pointers that points to parts of a same structure or array (or byte just after the end of an array), otherwise it's UB. I guess this should apply to ptrdiff_t, in which case ptrdiff_t should be as long as a size_t, except it is signed.
size_t does not necessarily fit intptr_t indeed, use the right type.
|
On 26 January 2017 at 22:32, Nicolas Cellier <[hidden email]> wrote:
i guess, you meant uintptr_t then but it was introduced only in C99.. and compilers started supporting C99 years later.. but the problem is that AFAIK, first lines of Squeak were written around '96 and people had to deal with what they had.. And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like our VM were written first :) Best regards,
Igor Stasenko. |
2017-01-26 21:46 GMT+01:00 Igor Stasenko <[hidden email]>:
If not all compilers support C99, most support at least uintptr_t But because we are really caring of obsolete configurations, we have introduced a usqIntptr_t... However, an oop is not necessarily a usqIntptr_t, for example if we have 32bits image on 64bits VM, which might still be an interesting case.
|
On 26 January 2017 at 22:58, Nicolas Cellier <[hidden email]> wrote:
you mean interesting in terms, like implementing a 70's era processor or DoS simulator(s) in javascript, so you can open it on web page and have fun? :)
Best regards,
Igor Stasenko. |
In reply to this post by Eliot Miranda-2
On Fri, Jan 27, 2017 at 12:52 AM, Eliot Miranda <[hidden email]> wrote: > > Hi Bert, > > On Jan 26, 2017, at 2:42 AM, Bert Freudenberg <[hidden email]> wrote: > > Awesome! However ... > > On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <[hidden email]> wrote: >> >> The default type for oops in the VM is sqInt (*), which is signed. > > > ... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right? > > > well, I've wanted to change this for some time but potentially it's a lot of work, as it affects the plugin and platform support code as much as the VM source. But let's at least discuss it. > > First, what exactly do we do? Right now we have sqInt & usqInt as oop-sized signed and unsigned types. My list of pros and cons below is off the top of my head, preliminary. Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately. Do we > > - a) change the code generator to use usqInt as the default type in place of sqInt? > pros presumably minimal changes except to code generator and removing pragmas that specify usqInt? > cons rewriting platform support code to use usqInt for at least parameters > > - b) add a new default type, e.g. oop_t, which is the same as usqInt? > pros, see above > cons figuring out which usqInt types apply to oops and which to unsigned values > > - c) change sqInt to be unsigned and add e.g. ssqInt as a signed type > pros don't have to change platform source > > - d) add a new default type, e.g. oop_t, which is the same as char *? > pros oops are pointers > cons C allows arithmetic on char *, so this won't fix inappropriate arithmetic on oops > It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses) > > - e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *) > > - f..z) ? > > Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored? > > Anyway, looks to me like a) or c) are the main contenders. What d'all think? What is the overlap in the cons between a) and b) ? Is one a super-set of the other or are the changes orthogonal? For stability during the progressive work on a) would it help to anyway replace sqInt by oop_t, and have an experimental branch where "typdef usqInt oop_t" while the main Cog branch remains "typedef sqInt oop_t" ? cheers -ben |
In reply to this post by Igor Stasenko
On 26.01.2017, at 21:20, Igor Stasenko <[hidden email]> wrote: > +1 > > with C i was always wondering, why C standard does not provides > an unsigned integer type which size strictly corresponds to size of pointer type, e.g. > sizeof(some standard uint) == sizeof(void*) You mean uintptr_t? https://en.wikipedia.org/wiki/C_data_types#Fixed-width_integer_types C11 says: 7.20.1.4 Integer types capable of holding object pointers 1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer: intptr_t The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer: uintptr_t These types are optional. > that would solve so many problems and make thing so much easier.. > > instead they had this: > http://www.gnu.org/software/libc/manual/html_node/Important-Data-Types.html > Data Type: ptrdiff_t > This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short int, int or long int), but might be a nonstandard type that exists only for this purpose. > > > it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers.. > alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder.. > > One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t, > unfortunately, ISO gives no guarantees that its size corresponds to pointer size. > > > Btw,look what i found :) > https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins > > that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for proposing to use non-standard C herecy :) > > > On 26 January 2017 at 21:37, John McIntosh <[hidden email]> wrote: > > +1 > > Sent from my iPhone > > > On Jan 26, 2017, at 10:58, David T. Lewis <[hidden email]> wrote: > > > > > > +1 > > > > Dave > > > >> > >> Surely since we’re concerned about (ab)use of signed values for pointers > >> to words we ought to define a ‘proper’ sqOop thing and use that > >> instead of a slightly cleaned up derivative of how the Slang default was > >> left as ‘int’ twenty years ago? > >> > >> Yes, any change means a bunch of work to do but isn’t that always the > >> price to improve things? > >> > >> tim > >> -- > >> tim Rowledge; [hidden email]; http://www.rowledge.org/tim > >> Strange OpCodes: LB: connect Line-voltage to BITNET > >> > >> > >> > > > > > > > > > -- > Best regards, > Igor Stasenko. |
In reply to this post by Eliot Miranda-2
On Thu, Jan 26, 2017 at 5:52 PM, Eliot Miranda <[hidden email]> wrote:
I'd think b should work fine. sqOop as suggested by Tim sounds good.
Well, most of SqueakJS is written by hand and not generated. That is the interpreter, jit, most plugins. Non-immediate oops are direct object references, there is no way to access their numeric value. Where it is relevant is in the Slang sources for a couple of plugins that I translated to JS. My VMMakerJS subclasses the regular code generator and I had to add a crude type inference mechanism and some special-case hacks to make sized pointer arithmetic and signed/unsigned stuff work. I guess if I had to touch it again (still using the generated code from 2 years ago) I may be better off using the oscog version instead of baking my own type inferencer. In any case, a clear distinction between pointers and arithmetic types would be a Good Thing. - Bert - |
In reply to this post by timrowledge
And structure the changes to cause a net code loss, so the next set of changes is easier to make. Repeat. On 1/26/17 10:54 , tim Rowledge wrote: > > Surely since we’re concerned about (ab)use of signed values for > pointers to words we ought to define a ‘proper’ sqOop thing and use > that instead of a slightly cleaned up derivative of how the Slang > default was left as ‘int’ twenty years ago? > > Yes, any change means a bunch of work to do but isn’t that always the > price to improve things? > > tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Strange OpCodes: LB: connect Line-voltage to BITNET > > |
Free forum by Nabble | Edit this page |