New Cog VMs available

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

New Cog VMs available

Eliot Miranda-2

CogVM binaries as per VMMaker.oscog-eem.1834/r3684

General:
Correct undue sign extension wile promoting 32 to 64 bits int in
fetchLong64:ofObject: This was causing trouble in non spur object memory (V3).

Fix slowdown of update on 64-bit x64 by using int variables for the tides in
sqExternalSemaphores.c and inline assembler for the sqCompareAndSwap in
sqAtomicOps.h.

Fix slips in offset time primitives.  They need to pop argumentCount + 1, not
argumentCount.

Define an optional primitive as the fast primitive fail code so that unselected
optional primitives are in the prmitiveTable as fast primitive fails.


Spur:
Have WideString>>at: fail to answer an out-of-range character in the
interpreter primitive (as well as in the JIT).

Fix bug in following state on primitive failure. The old code would
always follow to depth 1, even if the accessor depth was 0.
Hard-code primitiveSize's depth to 0 (accessing length in at:[put:]
and size causes the stack depth computation to answer 1 instead of
0 for these primitives.

Fix assert and dequeueMourner for case where mournQueue is nil.

- fixed a bug in receiver accessing in immutability primitive for
  mirror primitive.

- change primitiveSetOrHasIdentityHash to patch the class table if
  the new hash is set to a behavior (the primitive knows it's a
  behavior if the second (optional) argument is true)
  For example:
FullBlockClosure tryPrimitive: 161 withArgs: {38.true}

No longer follow the method and context fields in a closure in
activateNewClosureMethod:numArgs:mayContextSwitch:; the caller will
have failed if these are forwarders, so no need to check again.


Cogit:
Add a primitive that answers pc map data for methods which can be
used to better decorate methods in the VM Profiler.  Refactor the pc
map enumeration facilities so that the Sista pic data primitive can
share the same enumerator.  Do this by collapsing the
isBackwardBranch and annotation parameters into a single parameter.

Follow selectors in the openPICList post Spur become.

Fix the ARM's caller-saved register mask now that we can name all the actual
registers.

Reworked machine code generation of immutability so for common stores it
uses a single trampoline for both store checks and immutability checks.

- improved support for register allocation: branch merge
  successfully compiled with register moved instead of spilling.

Sista Cogit:
Don't bother to add counters to conditional jumps implementing and:
and or:.  Added the remote inst var access bytecode in sista V1
bytecode set without interfering with existing code.


Plugins:
Upgrade LargeIntegersPlugin to v2.0
LargeInteger primitives now deal with 32-bits digits.  No change to image code.

Memory is 8 bytes aligned on Spur. When storing 32/64 bits large
integers values, allways fill the eight bytes whatever the
effectivily used size, rather than bother with dissertion of size.

Generate integer type checking as C macros rather than direct/indirect
interpreterProxy function call in plugins. This, and 32-bit accessing mean
singificantly faster large integer arithmetic.

Fix primAlienReplace to use positiveMachineIntegerValueOf: instead of
positive32BitValueOf:.

BitBltPlugin operates on 32bit word units, therefore it's better to declare its
operands as 'unsigned int' rather than sqInt.  On 32bit VM, this doesn't change
anything, but on 64bits spur, it makes this snippet work:

    | wideString source pos blt expectedWideString |
    source := #[1 64 255 14 1 64 48 251].
    expectedWideString := WideString fromByteArray: source.
    wideString := WideString new: source size // 4.
    pos := 0.
    blt := (BitBlt
        toForm: (Form new hackBits: wideString))
        sourceForm: (Form new hackBits: source).
    blt
        combinationRule: Form over;
        sourceX: 0;
        sourceY: pos // 4;
        height: wideString byteSize // 4;
        width: 4;
        destX: 0;
        destY: 0;
        copyBits.
wideString restoreEndianness.
self assert: wideString = expectedWideString

Hence it fixes loading/diffing MCZ with wide character.


Mac OS Builds:
Don't link with -fvisibility=hidden; it breaks external plugins. Use
-fvisibility=default instead.  This fixes e.g. UnixOSProcessPlugin on Mac OS X.


Windows Builds:
The Windows VMs are no longer "dpiAware".  If you want one-to-one pixel mapping check the README for ins ructions.  It's a simple edit.
_,,,^..^,,,_
best, Eliot


bpi
Reply | Threaded
Open this post in threaded view
|

Re: Crash Dump on OS X (was: New Cog VMs available)

bpi
Hi Eliot,

I just found a way to reproduce it: Maximise the window by holding alt and click the green window button.

Cheers,
Bernhard

> Am 24.04.2016 um 10:38 schrieb Bernhard Pieber <[hidden email]>:
>
> Hi Eliot,
>
> Thanks for your VMs.
>
> I just had a crash on OS X 10.11.4. Maybe it’s helpful:
> <crash.dmp.zip>
>
> Cheers,
> Bernhard
>
>> Am 23.04.2016 um 21:48 schrieb Eliot Miranda <[hidden email]>:
>>
>> ... at http://www.mirandabanda.org/files/Cog/VM/VM.r3684
>>
>> CogVM binaries as per VMMaker.oscog-eem.1834/r3684
>>
>> General:
>> Correct undue sign extension wile promoting 32 to 64 bits int in
>> fetchLong64:ofObject: This was causing trouble in non spur object memory (V3).
>>
>> Fix slowdown of update on 64-bit x64 by using int variables for the tides in
>> sqExternalSemaphores.c and inline assembler for the sqCompareAndSwap in
>> sqAtomicOps.h.
>>
>> Fix slips in offset time primitives.  They need to pop argumentCount + 1, not
>> argumentCount.
>>
>> Define an optional primitive as the fast primitive fail code so that unselected
>> optional primitives are in the prmitiveTable as fast primitive fails.
>>
>>
>> Spur:
>> Have WideString>>at: fail to answer an out-of-range character in the
>> interpreter primitive (as well as in the JIT).
>>
>> Fix bug in following state on primitive failure. The old code would
>> always follow to depth 1, even if the accessor depth was 0.
>> Hard-code primitiveSize's depth to 0 (accessing length in at:[put:]
>> and size causes the stack depth computation to answer 1 instead of
>> 0 for these primitives.
>>
>> Fix assert and dequeueMourner for case where mournQueue is nil.
>>
>> - fixed a bug in receiver accessing in immutability primitive for
>> mirror primitive.
>>
>> - change primitiveSetOrHasIdentityHash to patch the class table if
>> the new hash is set to a behavior (the primitive knows it's a
>> behavior if the second (optional) argument is true)
>> For example:
>> FullBlockClosure tryPrimitive: 161 withArgs: {38.true}
>>
>> No longer follow the method and context fields in a closure in
>> activateNewClosureMethod:numArgs:mayContextSwitch:; the caller will
>> have failed if these are forwarders, so no need to check again.
>>
>>
>> Cogit:
>> Add a primitive that answers pc map data for methods which can be
>> used to better decorate methods in the VM Profiler.  Refactor the pc
>> map enumeration facilities so that the Sista pic data primitive can
>> share the same enumerator.  Do this by collapsing the
>> isBackwardBranch and annotation parameters into a single parameter.
>>
>> Follow selectors in the openPICList post Spur become.
>>
>> Fix the ARM's caller-saved register mask now that we can name all the actual
>> registers.
>>
>> Reworked machine code generation of immutability so for common stores it
>> uses a single trampoline for both store checks and immutability checks.
>>
>> - improved support for register allocation: branch merge
>> successfully compiled with register moved instead of spilling.
>>
>> Sista Cogit:
>> Don't bother to add counters to conditional jumps implementing and:
>> and or:.  Added the remote inst var access bytecode in sista V1
>> bytecode set without interfering with existing code.
>>
>>
>> Plugins:
>> Upgrade LargeIntegersPlugin to v2.0
>> LargeInteger primitives now deal with 32-bits digits.  No change to image code.
>>
>> Memory is 8 bytes aligned on Spur. When storing 32/64 bits large
>> integers values, allways fill the eight bytes whatever the
>> effectivily used size, rather than bother with dissertion of size.
>>
>> Generate integer type checking as C macros rather than direct/indirect
>> interpreterProxy function call in plugins. This, and 32-bit accessing mean
>> singificantly faster large integer arithmetic.
>>
>> Fix primAlienReplace to use positiveMachineIntegerValueOf: instead of
>> positive32BitValueOf:.
>>
>> BitBltPlugin operates on 32bit word units, therefore it's better to declare its
>> operands as 'unsigned int' rather than sqInt.  On 32bit VM, this doesn't change
>> anything, but on 64bits spur, it makes this snippet work:
>>
>>   | wideString source pos blt expectedWideString |
>>   source := #[1 64 255 14 1 64 48 251].
>>   expectedWideString := WideString fromByteArray: source.
>>   wideString := WideString new: source size // 4.
>>   pos := 0.
>>   blt := (BitBlt
>>       toForm: (Form new hackBits: wideString))
>>       sourceForm: (Form new hackBits: source).
>>   blt
>>       combinationRule: Form over;
>>       sourceX: 0;
>>       sourceY: pos // 4;
>>       height: wideString byteSize // 4;
>>       width: 4;
>>       destX: 0;
>>       destY: 0;
>>       copyBits.
>> wideString restoreEndianness.
>> self assert: wideString = expectedWideString
>>
>> Hence it fixes loading/diffing MCZ with wide character.
>>
>>
>> Mac OS Builds:
>> Don't link with -fvisibility=hidden; it breaks external plugins. Use
>> -fvisibility=default instead.  This fixes e.g. UnixOSProcessPlugin on Mac OS X.
>>
>>
>> Windows Builds:
>> The Windows VMs are no longer "dpiAware".  If you want one-to-one pixel mapping check the README for ins ructions.  It's a simple edit.
>> _,,,^..^,,,_
>> best, Eliot
>


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] New Cog VMs available

Levente Uzonyi
In reply to this post by Eliot Miranda-2
Hi Holger,

I ran the Shootout benchmarks[1] which I'd recently updated on
cogspurlinuxht 3397 and 3648 using the latest Squeak Trunk image, and the
latter performed better in three benchmarks: reverseComplement (-20%),
pidigits (-30%) and fasta (-40%).
For the rest, the performance of the two VMs were the same.
So, I suspect there's an area not covered by these benchmarks where you
experience the slowdown. Or there's some other difference responsible for
the slowdown.
Can you tell us more about your benchmark?
Have you tried profiling it?

Levente

[1] http://leves.web.elte.hu/squeak/Shootout-ul.19.mcz