Reed Solomon plugins & performance slow down

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Reed Solomon plugins & performance slow down

Robert Withers-2
 

Hey all'y'all,

I got the plugins working correctly. Published to Cryptography. The plugins are attached. Unfortunately profiling indicates a slow down. The leaves did change from GaloisField methods to GaloisFieldPoly methods. I will work on pluginizing the GFPoly methods, such as #evaluateAt: & #multiplyByMonomialDegree:coefficient:...

Here are the results without plugins (135 seconds):

WITHOUT PLUGINS

 - 132032 tallies, 135390 msec.

**Tree**
--------------------------------
Process: (40) 57626: nil
--------------------------------

**Leaves**
16.9% {22843ms} RSFECGenericGF>>exp:
9.2% {12450ms} RSFECGenericGF>>maskValue:
8.6% {11659ms} RSFECGenericGF>>addOrSubtract:by:
7.2% {9806ms} RSFECGenericGF>>log:
6.6% {8958ms} RSFECGenericGF>>normalizeIndex:
4.9% {6578ms} RSFECGenericGF>>multiply:by:
2.3% {3130ms} RSErasureGalois>>maskValue:
2.3% {3118ms} RSFECGenericGFPoly>>evaluateAt:
1.6% {2175ms} RSErasureGalois>>normalizeIndex:
1.3% {1718ms} RSFECGenericGFPoly>>addOrSubtractPoly:
1.3% {1702ms} RSFECGenericGFPoly>>multiplyByMonomialDegree:coefficient:
1.2% {1606ms} RSErasureGalois>>log:
1.1% {1516ms} RSErasureGalois>>galoisMultiply:by:

And here are the results with plugins (152 seconds):

WITH PLUGINS

 - 149040 tallies, 152531 msec.

**Tree**
--------------------------------
Process: (40) 89223: nil
--------------------------------

**Leaves**
27.9% {42565ms} RSFECGenericGFWithPlugin>>addOrSubtract:by:
17.3% {26440ms} RSFECGenericGFPoly>>evaluateAt:
8.5% {12926ms} RSFECGenericGFWithPlugin>>maskValue:
7.3% {11062ms} RSFECGenericGFPoly>>multiplyByMonomialDegree:coefficient:
6.5% {9854ms} RSFECGenericGFPoly>>addOrSubtractPoly:
5.6% {8598ms} RSFECGenericGFWithPlugin>>multiply:by:
2.6% {4032ms} RSErasureGaloisWithPlugin>>galoisMultiply:by:
2.4% {3601ms} RSErasureGaloisWithPlugin>>addOrSubtract:by:
2.1% {3137ms} RSErasureGaloisWithPlugin>>maskValue:
1.6% {2436ms} RSFECGenericGFPoly>>multiplyPoly:

--
---
Kindly,
Robert



RSErasurePlugin.so (43K) Download Attachment
RSFECPlugin.so (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Levente Uzonyi
 
Hi Robert,

I had a quick look at the plugin code. The operations you implemented in
the plugin are way too simple. The VM's JIT makes the smalltalk code run
faster, because it has the same operations implemented, and it can avoid
calling the overhead of calling the primitive many times. Hence the
slowdown.

For example, RSFECPlugin >> #primitiveAddOrSubtractby seems to implement
bitwise xor on two 32-bit unsigned values, but the arguments are always
0-255. The VM has primitive 16, which does bitwise xor and covers that
input range, so the JIT, combined with other operations can generate
machine code directly with that without jumping back and forth between
native code and smalltalk.

If you want a plugin to provide any noticable speedup, you need to do
more computation in a single primtiive call.


Levente

P.S.: I think, in RSFECGenericGC >> #initializeExpTable, the lines

  x := x bitXor: primitive.
  x bitAnd: (size - 1)]].

should read

  x := (x bitXor: primitive) bitAnd: size - 1 ]].
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 
Hi Levente,

Thank you for having a look-see and letting me know your thoughts! It
makes sense what you are saying. so I think the only way to speed it up
is to embed the loops into primitives. I have 5 of 6 methods from Poly
that I am stuffing into the plugin. The #dividePoly: is a bit more
complicated, updating the remainder poly and so forth. I do not think a
Poly can be within a primitive. Hopefully, {product := Array new: size
withAll: 0} will work in translation. I fixed #initializeExpTable per
your suggestion.

Thanks!

---
Kindly,
Robert


On 5/30/21 8:18 PM, Levente Uzonyi wrote:

> Hi Robert,
>
> I had a quick look at the plugin code. The operations you implemented in
> the plugin are way too simple. The VM's JIT makes the smalltalk code run
> faster, because it has the same operations implemented, and it can avoid
> calling the overhead of calling the primitive many times. Hence the
> slowdown.
>
> For example, RSFECPlugin >> #primitiveAddOrSubtractby seems to implement
> bitwise xor on two 32-bit unsigned values, but the arguments are always
> 0-255. The VM has primitive 16, which does bitwise xor and covers that
> input range, so the JIT, combined with other operations can generate
> machine code directly with that without jumping back and forth between
> native code and smalltalk.
>
> If you want a plugin to provide any noticable speedup, you need to do
> more computation in a single primtiive call.
>
>
> Levente
>
> P.S.: I think, in RSFECGenericGC >> #initializeExpTable, the lines
>
>   x := x bitXor: primitive.
>   x bitAnd: (size - 1)]].
>
> should read
>
>   x := (x bitXor: primitive) bitAnd: size - 1 ]].

Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

My previous email is awaiting approval as it is too big. I attached a crash.dmp. Here is the email with a link to the crash dump I placed on dropbox.

I implemented a set of 5 primitives for Polynomial math in the RSFECPlugin class and added a RSFECGenericGFPolyWithPlugin to call them. In testing FEC I get a segFault, but looking at the Smalltalk stack dump it is happening in the RSErasurePlugin which is not called by the RSFEC code. It looks like a DoIt and I cannot find the offending code. Where can I look for a DoIt that is calling RSErasure code when running RSFEC code? Here is the smalltalk stack dump and I am attaching the crash.dmp file [1].

    0x7ffde5e6a1b0 I RSErasureGaloisWithPlugin>galoisMultiply:by: 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e6a220 I RSErasureGaloisWithPlugin>generateMultiplicationTable 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59df0 I RSErasureGaloisWithPlugin>initialize 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59e20 M RSErasureGaloisWithPlugin class(Behavior)>new 0x7fdeec8: a(n) RSErasureGaloisWithPlugin class
    0x7ffde5e59e60 I RSErasureGalois class>newGalois 0x6a89510: a(n) RSErasureGalois class
    0x7ffde5e59e90 M UndefinedObject>DoIt 0x2de78e0: a(n) UndefinedObject

[]1 crash.zip - https://www.dropbox.com/s/qyq735yksmc4dbr/crash.zip?dl=0

---
Kindly,
Robert
---
Kindly,
Robert


On 5/30/21 8:50 PM, Robert Withers wrote:
Hi Levente,

Thank you for having a look-see and letting me know your thoughts! It
makes sense what you are saying. so I think the only way to speed it up
is to embed the loops into primitives. I have 5 of 6 methods from Poly
that I am stuffing into the plugin. The #dividePoly: is a bit more
complicated, updating the remainder poly and so forth. I do not think a
Poly can be within a primitive. Hopefully, {product := Array new: size
withAll: 0} will work in translation. I fixed #initializeExpTable per
your suggestion.

Thanks!

---
Kindly,
Robert


On 5/30/21 8:18 PM, Levente Uzonyi wrote:
Hi Robert,

I had a quick look at the plugin code. The operations you implemented in
the plugin are way too simple. The VM's JIT makes the smalltalk code run
faster, because it has the same operations implemented, and it can avoid
calling the overhead of calling the primitive many times. Hence the
slowdown.

For example, RSFECPlugin >> #primitiveAddOrSubtractby seems to implement
bitwise xor on two 32-bit unsigned values, but the arguments are always
0-255. The VM has primitive 16, which does bitwise xor and covers that
input range, so the JIT, combined with other operations can generate
machine code directly with that without jumping back and forth between
native code and smalltalk.

If you want a plugin to provide any noticable speedup, you need to do
more computation in a single primtiive call.


Levente

P.S.: I think, in RSFECGenericGC >> #initializeExpTable, the lines

  					x := x bitXor: primitive.
  					x bitAnd: (size - 1)]].

should read

  					x := (x bitXor: primitive) bitAnd: size - 1 ]].
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

The segFault occurs with a call to a RSFECPlugin primitive>>#primMultiplyPolySelfCoefficients:otherCoefficients:product:fieldSize:. But the crash.dmp shows a RSErasureGaloisWithPlugin as the error site. I am very confused as the call I am making in the FEC test is to the RSFECPlugin, but the RSErasurePlugin is the one cited as causing the segFault. I even removed the RSErasurePlugin from the vm directory and this is still occurring.

RSFECGenericGFPolyWithPlugin>>#multiplyPoly: other

    | product |
    (field = other field)
        ifFalse: [RSErasureIllegalArgumentError signal: 'GenericGFPolys do not have same GenericGF field'].
    (self isZero)
        ifTrue: [^ field zero]
        ifFalse: [other isZero ifTrue: [^ field zero]].
    product := Array new: (self coefficients size + other coefficients size - 1) withAll: 0.

    product := self primMultiplyPolySelfCoefficients: self coefficients otherCoefficients: other coefficients product: product fieldSize: field size.

    ^ RSFECGenericGFPoly newField: field coefficients: product.

Any assistance most welcome. This has me stumped.
---
Kindly,
Robert


On 5/31/21 3:43 PM, Robert Withers wrote:

My previous email is awaiting approval as it is too big. I attached a crash.dmp. Here is the email with a link to the crash dump I placed on dropbox.

I implemented a set of 5 primitives for Polynomial math in the RSFECPlugin class and added a RSFECGenericGFPolyWithPlugin to call them. In testing FEC I get a segFault, but looking at the Smalltalk stack dump it is happening in the RSErasurePlugin which is not called by the RSFEC code. It looks like a DoIt and I cannot find the offending code. Where can I look for a DoIt that is calling RSErasure code when running RSFEC code? Here is the smalltalk stack dump and I am attaching the crash.dmp file [1].

    0x7ffde5e6a1b0 I RSErasureGaloisWithPlugin>galoisMultiply:by: 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e6a220 I RSErasureGaloisWithPlugin>generateMultiplicationTable 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59df0 I RSErasureGaloisWithPlugin>initialize 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59e20 M RSErasureGaloisWithPlugin class(Behavior)>new 0x7fdeec8: a(n) RSErasureGaloisWithPlugin class
    0x7ffde5e59e60 I RSErasureGalois class>newGalois 0x6a89510: a(n) RSErasureGalois class
    0x7ffde5e59e90 M UndefinedObject>DoIt 0x2de78e0: a(n) UndefinedObject

[]1 crash.zip - https://www.dropbox.com/s/qyq735yksmc4dbr/crash.zip?dl=0

---
Kindly,
Robert
---
Kindly,
Robert


On 5/30/21 8:50 PM, Robert Withers wrote:
Hi Levente,

Thank you for having a look-see and letting me know your thoughts! It
makes sense what you are saying. so I think the only way to speed it up
is to embed the loops into primitives. I have 5 of 6 methods from Poly
that I am stuffing into the plugin. The #dividePoly: is a bit more
complicated, updating the remainder poly and so forth. I do not think a
Poly can be within a primitive. Hopefully, {product := Array new: size
withAll: 0} will work in translation. I fixed #initializeExpTable per
your suggestion.

Thanks!

---
Kindly,
Robert


On 5/30/21 8:18 PM, Levente Uzonyi wrote:
Hi Robert,

I had a quick look at the plugin code. The operations you implemented in
the plugin are way too simple. The VM's JIT makes the smalltalk code run
faster, because it has the same operations implemented, and it can avoid
calling the overhead of calling the primitive many times. Hence the
slowdown.

For example, RSFECPlugin >> #primitiveAddOrSubtractby seems to implement
bitwise xor on two 32-bit unsigned values, but the arguments are always
0-255. The VM has primitive 16, which does bitwise xor and covers that
input range, so the JIT, combined with other operations can generate
machine code directly with that without jumping back and forth between
native code and smalltalk.

If you want a plugin to provide any noticable speedup, you need to do
more computation in a single primtiive call.


Levente

P.S.: I think, in RSFECGenericGC >> #initializeExpTable, the lines

  					x := x bitXor: primitive.
  					x bitAnd: (size - 1)]].

should read

  					x := (x bitXor: primitive) bitAnd: size - 1 ]].
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

Extremely strange. I unloaded the CryptographyRSErasure and CryptographyRSErasureTests packages. I still get a crash dump referencing RSErasureGaloisWithPlugin>galoisMultiply:by:. I am totally confused! This class isn't even loaded in my image! WTH is going on!?

Smalltalk stack dump:
    0x7ffde5e6a1b0 I RSErasureGaloisWithPlugin>galoisMultiply:by: 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e6a220 I RSErasureGaloisWithPlugin>generateMultiplicationTable 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59df0 I RSErasureGaloisWithPlugin>initialize 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59e20 M RSErasureGaloisWithPlugin class(Behavior)>new 0x7fdeec8: a(n) RSErasureGaloisWithPlugin class
    0x7ffde5e59e60 I RSErasureGalois class>newGalois 0x6a89510: a(n) RSErasureGalois class
    0x7ffde5e59e90 M UndefinedObject>DoIt 0x2de78e0: a(n) UndefinedObject

---
Kindly,
Robert


On 5/31/21 4:00 PM, Robert Withers wrote:

The segFault occurs with a call to a RSFECPlugin primitive>>#primMultiplyPolySelfCoefficients:otherCoefficients:product:fieldSize:. But the crash.dmp shows a RSErasureGaloisWithPlugin as the error site. I am very confused as the call I am making in the FEC test is to the RSFECPlugin, but the RSErasurePlugin is the one cited as causing the segFault. I even removed the RSErasurePlugin from the vm directory and this is still occurring.

RSFECGenericGFPolyWithPlugin>>#multiplyPoly: other

    | product |
    (field = other field)
        ifFalse: [RSErasureIllegalArgumentError signal: 'GenericGFPolys do not have same GenericGF field'].
    (self isZero)
        ifTrue: [^ field zero]
        ifFalse: [other isZero ifTrue: [^ field zero]].
    product := Array new: (self coefficients size + other coefficients size - 1) withAll: 0.

    product := self primMultiplyPolySelfCoefficients: self coefficients otherCoefficients: other coefficients product: product fieldSize: field size.

    ^ RSFECGenericGFPoly newField: field coefficients: product.

Any assistance most welcome. This has me stumped.
---
Kindly,
Robert


On 5/31/21 3:43 PM, Robert Withers wrote:

My previous email is awaiting approval as it is too big. I attached a crash.dmp. Here is the email with a link to the crash dump I placed on dropbox.

I implemented a set of 5 primitives for Polynomial math in the RSFECPlugin class and added a RSFECGenericGFPolyWithPlugin to call them. In testing FEC I get a segFault, but looking at the Smalltalk stack dump it is happening in the RSErasurePlugin which is not called by the RSFEC code. It looks like a DoIt and I cannot find the offending code. Where can I look for a DoIt that is calling RSErasure code when running RSFEC code? Here is the smalltalk stack dump and I am attaching the crash.dmp file [1].

    0x7ffde5e6a1b0 I RSErasureGaloisWithPlugin>galoisMultiply:by: 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e6a220 I RSErasureGaloisWithPlugin>generateMultiplicationTable 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59df0 I RSErasureGaloisWithPlugin>initialize 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin
    0x7ffde5e59e20 M RSErasureGaloisWithPlugin class(Behavior)>new 0x7fdeec8: a(n) RSErasureGaloisWithPlugin class
    0x7ffde5e59e60 I RSErasureGalois class>newGalois 0x6a89510: a(n) RSErasureGalois class
    0x7ffde5e59e90 M UndefinedObject>DoIt 0x2de78e0: a(n) UndefinedObject

[]1 crash.zip - https://www.dropbox.com/s/qyq735yksmc4dbr/crash.zip?dl=0

---
Kindly,
Robert
---
Kindly,
Robert


On 5/30/21 8:50 PM, Robert Withers wrote:
Hi Levente,

Thank you for having a look-see and letting me know your thoughts! It
makes sense what you are saying. so I think the only way to speed it up
is to embed the loops into primitives. I have 5 of 6 methods from Poly
that I am stuffing into the plugin. The #dividePoly: is a bit more
complicated, updating the remainder poly and so forth. I do not think a
Poly can be within a primitive. Hopefully, {product := Array new: size
withAll: 0} will work in translation. I fixed #initializeExpTable per
your suggestion.

Thanks!

---
Kindly,
Robert


On 5/30/21 8:18 PM, Levente Uzonyi wrote:
Hi Robert,

I had a quick look at the plugin code. The operations you implemented in
the plugin are way too simple. The VM's JIT makes the smalltalk code run
faster, because it has the same operations implemented, and it can avoid
calling the overhead of calling the primitive many times. Hence the
slowdown.

For example, RSFECPlugin >> #primitiveAddOrSubtractby seems to implement
bitwise xor on two 32-bit unsigned values, but the arguments are always
0-255. The VM has primitive 16, which does bitwise xor and covers that
input range, so the JIT, combined with other operations can generate
machine code directly with that without jumping back and forth between
native code and smalltalk.

If you want a plugin to provide any noticable speedup, you need to do
more computation in a single primtiive call.


Levente

P.S.: I think, in RSFECGenericGC >> #initializeExpTable, the lines

  					x := x bitXor: primitive.
  					x bitAnd: (size - 1)]].

should read

  					x := (x bitXor: primitive) bitAnd: size - 1 ]].
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

All right some more info. I still don't see what this has to do with RSErasureGaloisWithPlugin. This class in not loaded and I found out some info on the call into primitiveMultiplyByPolySelfCoefficientsOtherCoefficientsProductFieldSize. I used kdbg and set a breakpoint at the beginning of the function. Listed below with the BOOM site bold/italic. The last line in the listing is where it sigterms. The issue is that the 3 Oops, accessed just previously as stack values 3. 2 & 1 are all 0, when stepped over in the kdbg. I tried to recompile the build.debug and installed it. Same exact problem. I recompiled the #primMultiply calling site to this function, still no change. I hope an angel will fix it!

EXPORT(sqInt)

primitiveMultiplyPolySelfCoefficientsOtherCoefficientsProductFieldSize(void)

{

unsigned int aCoeff;

unsigned int *aCoefficients;

sqInt aIndex;

sqInt aLength;

unsigned int *bCoefficients;

sqInt bIndex;

sqInt bLength;

unsigned int fieldSize;

unsigned int *otherCoefficients;

sqInt otherCoefficientsOop;

unsigned int otherCount;

unsigned int *product;

sqInt productOop;

unsigned int *result;

unsigned int *selfCoefficients;

sqInt selfCoefficientsOop;

unsigned int selfCount;

if (!((methodArgumentCount()) == 4)) {

return primitiveFailFor(PrimErrBadNumArgs);

}

selfCoefficientsOop = stackIntegerValue(3);

otherCoefficientsOop = stackIntegerValue(2);

productOop = stackIntegerValue(1);

fieldSize = stackIntegerValue(0);

selfCount = stSizeOf(selfCoefficientsOop);

---
Kindly,
Robert


On 5/31/21 4:22 PM, Robert Withers wrote:

Extremely strange. I unloaded the CryptographyRSErasure and CryptographyRSErasureTests packages. I still get a crash dump referencing RSErasureGaloisWithPlugin>galoisMultiply:by:. I am totally confused! This class isn't even loaded in my image! WTH is going on!?

Smalltalk stack dump:     0x7ffde5e6a1b0 I RSErasureGaloisWithPlugin>galoisMultiply:by: 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin     0x7ffde5e6a220 I RSErasureGaloisWithPlugin>generateMultiplicationTable 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin     0x7ffde5e59df0 I RSErasureGaloisWithPlugin>initialize 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin     0x7ffde5e59e20 M RSErasureGaloisWithPlugin class(Behavior)>new 0x7fdeec8: a(n) RSErasureGaloisWithPlugin class     0x7ffde5e59e60 I RSErasureGalois class>newGalois 0x6a89510: a(n) RSErasureGalois class     0x7ffde5e59e90 M UndefinedObject>DoIt 0x2de78e0: a(n) UndefinedObject

---
Kindly,
Robert


On 5/31/21 4:00 PM, Robert Withers wrote:

The segFault occurs with a call to a RSFECPlugin primitive>>#primMultiplyPolySelfCoefficients:otherCoefficients:product:fieldSize:. But the crash.dmp shows a RSErasureGaloisWithPlugin as the error site. I am very confused as the call I am making in the FEC test is to the RSFECPlugin, but the RSErasurePlugin is the one cited as causing the segFault. I even removed the RSErasurePlugin from the vm directory and this is still occurring.

RSFECGenericGFPolyWithPlugin>>#multiplyPoly: other     | product |     (field = other field)         ifFalse: [RSErasureIllegalArgumentError signal: 'GenericGFPolys do not have same GenericGF field'].     (self isZero)         ifTrue: [^ field zero]         ifFalse: [other isZero ifTrue: [^ field zero]].     product := Array new: (self coefficients size + other coefficients size - 1) withAll: 0.     product := self primMultiplyPolySelfCoefficients: self coefficients otherCoefficients: other coefficients product: product fieldSize: field size.     ^ RSFECGenericGFPoly newField: field coefficients: product.

Any assistance most welcome. This has me stumped.
---
Kindly,
Robert


On 5/31/21 3:43 PM, Robert Withers wrote:

My previous email is awaiting approval as it is too big. I attached a crash.dmp. Here is the email with a link to the crash dump I placed on dropbox.

I implemented a set of 5 primitives for Polynomial math in the RSFECPlugin class and added a RSFECGenericGFPolyWithPlugin to call them. In testing FEC I get a segFault, but looking at the Smalltalk stack dump it is happening in the RSErasurePlugin which is not called by the RSFEC code. It looks like a DoIt and I cannot find the offending code. Where can I look for a DoIt that is calling RSErasure code when running RSFEC code? Here is the smalltalk stack dump and I am attaching the crash.dmp file [1].

    0x7ffde5e6a1b0 I RSErasureGaloisWithPlugin>galoisMultiply:by: 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin     0x7ffde5e6a220 I RSErasureGaloisWithPlugin>generateMultiplicationTable 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin     0x7ffde5e59df0 I RSErasureGaloisWithPlugin>initialize 0xc7ed9d8: a(n) RSErasureGaloisWithPlugin     0x7ffde5e59e20 M RSErasureGaloisWithPlugin class(Behavior)>new 0x7fdeec8: a(n) RSErasureGaloisWithPlugin class     0x7ffde5e59e60 I RSErasureGalois class>newGalois 0x6a89510: a(n) RSErasureGalois class     0x7ffde5e59e90 M UndefinedObject>DoIt 0x2de78e0: a(n) UndefinedObject

[]1 crash.zip - https://www.dropbox.com/s/qyq735yksmc4dbr/crash.zip?dl=0

---
Kindly,
Robert
---
Kindly,
Robert


On 5/30/21 8:50 PM, Robert Withers wrote:
Hi Levente,

Thank you for having a look-see and letting me know your thoughts! It
makes sense what you are saying. so I think the only way to speed it up
is to embed the loops into primitives. I have 5 of 6 methods from Poly
that I am stuffing into the plugin. The #dividePoly: is a bit more
complicated, updating the remainder poly and so forth. I do not think a
Poly can be within a primitive. Hopefully, {product := Array new: size
withAll: 0} will work in translation. I fixed #initializeExpTable per
your suggestion.

Thanks!

---
Kindly,
Robert


On 5/30/21 8:18 PM, Levente Uzonyi wrote:
Hi Robert,

I had a quick look at the plugin code. The operations you implemented in
the plugin are way too simple. The VM's JIT makes the smalltalk code run
faster, because it has the same operations implemented, and it can avoid
calling the overhead of calling the primitive many times. Hence the
slowdown.

For example, RSFECPlugin >> #primitiveAddOrSubtractby seems to implement
bitwise xor on two 32-bit unsigned values, but the arguments are always
0-255. The VM has primitive 16, which does bitwise xor and covers that
input range, so the JIT, combined with other operations can generate
machine code directly with that without jumping back and forth between
native code and smalltalk.

If you want a plugin to provide any noticable speedup, you need to do
more computation in a single primtiive call.


Levente

P.S.: I think, in RSFECGenericGC >> #initializeExpTable, the lines

  					x := x bitXor: primitive.
  					x bitAnd: (size - 1)]].

should read

  					x := (x bitXor: primitive) bitAnd: size - 1 ]].
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Levente Uzonyi
 
Hi Robert,

After a quick look, the problem seems to be that you're reading the first
parameter, selfCoefficientsOop with #stackIntegerValue:, but the parameter
is an array, not an integer, so it needs to be read with #stackObjectValue:.
I don't think #stSizeOf: does any good when its argument is an integer
oop, hence the crash.

For simplicity and efficiency, I suggest you should pass a WordArray to
the primtive (and use WordArrays in the image as well). It maps to the
unsigned ints the primitive uses, and you can use #firstIndexableField:
safely.
Passing an Array would complicate things, as an Array can hold any kind of
objects (which the primitive would have to check before using them), and
its field size doesn't fit into an unsigned int on 64-bits.

Also, validation of the arguments should be strict in the primitive. Have
a look at e.g. DESPlugin >> #primitiveDESTransform to see how it validates
its two arguments, a ByteArray and a WordArray.


Levente

P.S.: If the coefficients fit into smaller fields, it's worth to use a
ByteArray or a DoubleByteArray instead of a WordArray for better
performance.
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 
Awesome, Levente! My bad. Thanks so much! So glad you were able to clear
it up. I will make the appropriate changes. As the field size is max
256: { 0-255 }, perhaps using ByteArrays everywhere would be better. I
should make sure ParrotTalk uses them as well. In 64-bit VMs are
"unsigned ints" 64 bits? Thus the DoubleWordArray?

---
Kindly,
Robert


On 5/31/21 8:05 PM, Levente Uzonyi wrote:

> Hi Robert,
>
> After a quick look, the problem seems to be that you're reading the first
> parameter, selfCoefficientsOop with #stackIntegerValue:, but the parameter
> is an array, not an integer, so it needs to be read with #stackObjectValue:.
> I don't think #stSizeOf: does any good when its argument is an integer
> oop, hence the crash.
>
> For simplicity and efficiency, I suggest you should pass a WordArray to
> the primtive (and use WordArrays in the image as well). It maps to the
> unsigned ints the primitive uses, and you can use #firstIndexableField:
> safely.
> Passing an Array would complicate things, as an Array can hold any kind of
> objects (which the primitive would have to check before using them), and
> its field size doesn't fit into an unsigned int on 64-bits.
>
> Also, validation of the arguments should be strict in the primitive. Have
> a look at e.g. DESPlugin >> #primitiveDESTransform to see how it validates
> its two arguments, a ByteArray and a WordArray.
>
>
> Levente
>
> P.S.: If the coefficients fit into smaller fields, it's worth to use a
> ByteArray or a DoubleByteArray instead of a WordArray for better
> performance.

Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

Hey ya, Levente,

Woot!!! I was able to get all the primitives working with ByteArrays. Everything is passing green! (except for several GF fields: Aztec and Maxicode and QRCode) I think I will primitize one of the codingLoops on the RSErasure side. Here are the numbers.  Take note of the tallies and time: 50 seconds versus 114 seconds.

I updated the squeak wiki Cryptography page (http://wiki.squeak.org/squeak/5776) with d/l links to the plugins. Here are those links:

[1] RSErasurePlugin - https://www.dropbox.com/s/kog96d3pqentket/RSErasurePlugin.so?dl=1
[2] RSFECPlugin - https://www.dropbox.com/s/a1vvihi7o2fqtkj/RSFECPlugin.so?dl=1

Thanks so much for your help! You are an angel!
Robert

WITH PLUGINS

 - 49238 tallies, 50468 msec.

**Leaves**
13.0% {6580ms} RSFECDecoder>>decode:twoS:
7.8% {3931ms} RSErasureGaloisWithPlugin>>galoisMultiply:by:
6.8% {3450ms} RSErasureGaloisWithPlugin>>addOrSubtract:by:
6.1% {3063ms} RSErasureGaloisWithPlugin>>maskValue:
6.1% {3053ms} RSFECGenericGFPoly class>>newField:coefficients:
4.0% {2032ms} RSFECDecoder>>findErrorLocations:
2.4% {1197ms} RSErasureGaloisWithPlugin>>tableMultiply:by:
1.8% {917ms} RSFECGenericGFWithPlugin>>multiply:by:
1.7% {877ms} [] RSErasureOutputByteInputExpCodingLoop>>codeSomeShardsGalois:matrixRow...:outputCount:offset:byteCount:
1.6% {823ms} RSErasureGaloisWithPlugin>>normalizeInImageIndex:
1.2% {609ms} RSFECDecoder>>findErrorMagnitudes:errorLlocations:

And:

WITHOUT PLUGINS

 - 112019 tallies, 114572 msec.

**Leaves**
20.2% {23095ms} RSFECGenericGF>>exp:
12.0% {13745ms} RSFECGenericGF>>addOrSubtract:by:
10.8% {12381ms} RSFECGenericGF>>maskValue:
7.8% {8910ms} RSFECGenericGF>>log:
7.6% {8655ms} RSFECGenericGF>>normalizeIndex:
6.0% {6931ms} RSFECGenericGF>>multiply:by:
3.1% {3508ms} RSFECGenericGFPoly>>evaluateAt:
2.8% {3190ms} RSErasureGalois>>maskValue:
1.9% {2182ms} RSErasureGalois>>normalizeIndex:
1.8% {2009ms} RSFECGenericGFPoly>>addOrSubtractPoly:
1.7% {2003ms} RSFECGenericGFPoly>>multiplyByMonomialDegree:coefficient:
1.4% {1564ms} RSErasureGalois>>log:
1.3% {1522ms} RSErasureGalois>>galoisMultiply:by:
1.2% {1350ms} RSErasureGalois>>exp:
1.1% {1308ms} RSErasureGaloisTest(TestCase)>>assert:description:

---
Kindly,
Robert


On 5/31/21 8:27 PM, Robert Withers wrote:
Awesome, Levente! My bad. Thanks so much! So glad you were able to clear
it up. I will make the appropriate changes. As the field size is max
256: { 0-255 }, perhaps using ByteArrays everywhere would be better. I
should make sure ParrotTalk uses them as well. In 64-bit VMs are
"unsigned ints" 64 bits? Thus the DoubleWordArray?

---
Kindly,
Robert


On 5/31/21 8:05 PM, Levente Uzonyi wrote:
Hi Robert,

After a quick look, the problem seems to be that you're reading the first
parameter, selfCoefficientsOop with #stackIntegerValue:, but the parameter
is an array, not an integer, so it needs to be read with #stackObjectValue:.
I don't think #stSizeOf: does any good when its argument is an integer
oop, hence the crash.

For simplicity and efficiency, I suggest you should pass a WordArray to
the primtive (and use WordArrays in the image as well). It maps to the
unsigned ints the primitive uses, and you can use #firstIndexableField:
safely.
Passing an Array would complicate things, as an Array can hold any kind of
objects (which the primitive would have to check before using them), and
its field size doesn't fit into an unsigned int on 64-bits.

Also, validation of the arguments should be strict in the primitive. Have
a look at e.g. DESPlugin >> #primitiveDESTransform to see how it validates
its two arguments, a ByteArray and a WordArray.


Levente

P.S.: If the coefficients fit into smaller fields, it's worth to use a
ByteArray or a DoubleByteArray instead of a WordArray for better
performance.
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Levente Uzonyi
 
Hi Robert,

On Tue, 1 Jun 2021, Robert Withers wrote:

>
> Hey ya, Levente,
>
> Woot!!! I was able to get all the primitives working with ByteArrays. Everything is passing green! (except for several GF fields: Aztec and Maxicode and QRCode) I think I will primitize one of the codingLoops on the RSErasure
> side. Here are the numbers.  Take note of the tallies and time: 50 seconds versus 114 seconds. I updated the squeak wiki Cryptography page (http://wiki.squeak.org/squeak/5776) with d/l links to the plugins. Here are those

Brilliant. ByteArray is indeed the optimal choice there.

Let me answer your question in your previous email here:
AFAIK, on all the currently supported platforms and C compilers, unsigned
int is 32-bits, so it matches WordArray in Squeak.
DoubleWordArray is 64-bits, and in C that matches the type unsigned long long (on the currently supported platforms and compilers).

I think you can shave off a few more seconds there by adding those
<inline: #always> pragmas to the methods the primitive methods send
either directly or indirectly.

And, I think the logTable and expTable variables can be declared as
constant as well as unsigned char arrays instead of unsigned int arrays.
That might give more opportunities to the C compiler to optimize the code.

I prefer to use #var:type:array: instead of #var:declareC: when possible.
E.g.

  cg
  var: #logTable declareC: 'unsigned int logTable[256] = {
0, 0, 1, 240, 2, 225, 241, 53, 3, 38, 226, 133, 242, 43, 54, 210, 4, 195,
39, 114, 227, 106, 134, 28, 243, 140, 44, 23, 55, 118, 211, 234, 5, 219,
196, 96, 40, 222, 115, 103, 228, 78, 107, 125, 135, 8, 29, 162, 244, 186,
141, 180, 45, 99, 24, 49, 56, 13, 119, 153, 212, 199, 235, 91, 6, 76, 220,
217, 197, 11, 97, 184, 41, 36, 223, 253, 116, 138, 104, 193, 229, 86, 79,
171, 108, 165, 126, 145, 136, 34, 9, 74, 30, 32, 163, 84, 245, 173, 187,
204, 142, 81, 181, 190, 46, 88, 100, 159, 25, 231, 50, 207, 57, 147, 14,
67, 120, 128, 154, 248, 213, 167, 200, 63, 236, 110, 92, 176, 7, 161, 77,
124, 221, 102, 218, 95, 198, 90, 12, 152, 98, 48, 185, 179, 42, 209, 37,
132, 224, 52, 254, 239, 117, 233, 139, 22, 105, 27, 194, 113, 230, 206,
87, 158, 80, 189, 172, 203, 109, 175, 166, 62, 127, 247, 146, 66, 137,
192, 35, 252, 10, 183, 75, 216, 31, 83, 33, 73, 164, 144, 85, 170, 246,
65, 174, 61, 188, 202, 205, 157, 143, 169, 82, 72, 182, 215, 191, 251, 47,
178, 89, 151, 101, 94, 160, 123, 26, 112, 232, 21, 51, 238, 208, 131, 58,
69, 148, 18, 15, 16, 68, 17, 121, 149, 129, 19, 155, 59, 249, 70, 214,
250, 168, 71, 201, 156, 64, 60, 237, 130, 111, 20, 93, 122, 177, 150 }';

could be written as

  cg
  var: #logTable
  type: #'const unsigned char'
  array:  #[0 0 1 240 2 225 241 53 3 38 226 133 242 43 54
210 4 195 39 114 227 106 134 28 243 140 44 23 55 118 211 234 5 219 196 96
40 222 115 103 228 78 107 125 135 8 29 162 244 186 141 180 45 99 24 49 56
13 119 153 212 199 235 91 6 76 220 217 197 11 97 184 41 36 223 253 116 138
104 193 229 86 79 171 108 165 126 145 136 34 9 74 30 32 163 84 245 173 187
204 142 81 181 190 46 88 100 159 25 231 50 207 57 147 14 67 120 128 154
248 213 167 200 63 236 110 92 176 7 161 77 124 221 102 218 95 198 90 12
152 98 48 185 179 42 209 37 132 224 52 254 239 117 233 139 22 105 27 194
113 230 206 87 158 80 189 172 203 109 175 166 62 127 247 146 66 137 192 35
252 10 183 75 216 31 83 33 73 164 144 85 170 246 65 174 61 188 202 205 157
143 169 82 72 182 215 191 251 47 178 89 151 101 94 160 123 26 112 232 21
51 238 208 131 58 69 148 18 15 16 68 17 121 149 129 19 155 59 249 70 214
250 168 71 201 156 64 60 237 130 111 20 93 122 177 150]


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

Yes! Thanks so much! I made your suggested changes, every type to "char", including the #declareCVarsIn: changes to #var:type:array: and here are the results, with a 407% speedup, with and without RSFECPlugin. I dropped the RSErasure tests from the profiling.

If only I could pluginize #findErrorLocations: and #findErrorMagnitudes:errorLocations:. Perhaps #newField:coefficients:, as well. I do not think #decode:twoS: is pluginizable due to complex object instantiation. What are your thoughts? Can we squeeze it down further?

WITH PLUGIN

 - 22437 tallies, 22876 msec.

**Leaves**
28.8% {6577ms} RSFECDecoder>>decode:twoS:
12.7% {2904ms} RSFECGenericGFPoly class>>newField:coefficients:
9.2% {2095ms} RSFECDecoder>>findErrorLocations:
3.3% {766ms} RSFECGenericGFWithPlugin>>multiply:by:
1.8% {411ms} RSFECDecoder>>findErrorMagnitudes:errorLlocations:
1.6% {363ms} RSFECGenericGFWithPlugin>>inverse:
1.2% {265ms} RSFECDecoder>>runEuclideanAlgorithmPoly:poly:rDegrees:

and...

WITHOUT PLUGIN

 - 91190 tallies, 93245 msec.

**Leaves**
26.0% {24256ms} RSFECGenericGF>>exp:
13.7% {12778ms} RSFECGenericGF>>maskValue:
12.8% {11945ms} RSFECGenericGF>>addOrSubtract:by:
10.8% {10094ms} RSFECGenericGF>>log:
9.2% {8615ms} RSFECGenericGF>>normalizeIndex:
7.4% {6873ms} RSFECGenericGF>>multiply:by:
3.8% {3513ms} RSFECGenericGFPoly>>evaluateAt:
2.2% {2079ms} RSFECGenericGFPoly>>addOrSubtractPoly:
2.2% {2040ms} RSFECGenericGFPoly>>multiplyByMonomialDegree:coefficient:

---
Kindly,
Robert


On 6/2/21 10:40 AM, Levente Uzonyi wrote:
Hi Robert,

On Tue, 1 Jun 2021, Robert Withers wrote:

Hey ya, Levente,

Woot!!! I was able to get all the primitives working with ByteArrays. Everything is passing green! (except for several GF fields: Aztec and Maxicode and QRCode) I think I will primitize one of the codingLoops on the RSErasure
side. Here are the numbers.  Take note of the tallies and time: 50 seconds versus 114 seconds. I updated the squeak wiki Cryptography page (http://wiki.squeak.org/squeak/5776) with d/l links to the plugins. Here are those
Brilliant. ByteArray is indeed the optimal choice there.

Let me answer your question in your previous email here:
AFAIK, on all the currently supported platforms and C compilers, unsigned
int is 32-bits, so it matches WordArray in Squeak.
DoubleWordArray is 64-bits, and in C that matches the type unsigned long long (on the currently supported platforms and compilers).

I think you can shave off a few more seconds there by adding those
<inline: #always> pragmas to the methods the primitive methods send
either directly or indirectly.

And, I think the logTable and expTable variables can be declared as
constant as well as unsigned char arrays instead of unsigned int arrays.
That might give more opportunities to the C compiler to optimize the code.

I prefer to use #var:type:array: instead of #var:declareC: when possible.
E.g.

 	cg
 		var: #logTable declareC: 'unsigned int logTable[256] = {
0, 0, 1, 240, 2, 225, 241, 53, 3, 38, 226, 133, 242, 43, 54, 210, 4, 195,
39, 114, 227, 106, 134, 28, 243, 140, 44, 23, 55, 118, 211, 234, 5, 219,
196, 96, 40, 222, 115, 103, 228, 78, 107, 125, 135, 8, 29, 162, 244, 186,
141, 180, 45, 99, 24, 49, 56, 13, 119, 153, 212, 199, 235, 91, 6, 76, 220,
217, 197, 11, 97, 184, 41, 36, 223, 253, 116, 138, 104, 193, 229, 86, 79,
171, 108, 165, 126, 145, 136, 34, 9, 74, 30, 32, 163, 84, 245, 173, 187,
204, 142, 81, 181, 190, 46, 88, 100, 159, 25, 231, 50, 207, 57, 147, 14,
67, 120, 128, 154, 248, 213, 167, 200, 63, 236, 110, 92, 176, 7, 161, 77,
124, 221, 102, 218, 95, 198, 90, 12, 152, 98, 48, 185, 179, 42, 209, 37,
132, 224, 52, 254, 239, 117, 233, 139, 22, 105, 27, 194, 113, 230, 206,
87, 158, 80, 189, 172, 203, 109, 175, 166, 62, 127, 247, 146, 66, 137,
192, 35, 252, 10, 183, 75, 216, 31, 83, 33, 73, 164, 144, 85, 170, 246,
65, 174, 61, 188, 202, 205, 157, 143, 169, 82, 72, 182, 215, 191, 251, 47,
178, 89, 151, 101, 94, 160, 123, 26, 112, 232, 21, 51, 238, 208, 131, 58,
69, 148, 18, 15, 16, 68, 17, 121, 149, 129, 19, 155, 59, 249, 70, 214,
250, 168, 71, 201, 156, 64, 60, 237, 130, 111, 20, 93, 122, 177, 150 }';

could be written as

 	cg
 		var: #logTable
 		type: #'const unsigned char'
 		array:  #[0 0 1 240 2 225 241 53 3 38 226 133 242 43 54
210 4 195 39 114 227 106 134 28 243 140 44 23 55 118 211 234 5 219 196 96
40 222 115 103 228 78 107 125 135 8 29 162 244 186 141 180 45 99 24 49 56
13 119 153 212 199 235 91 6 76 220 217 197 11 97 184 41 36 223 253 116 138
104 193 229 86 79 171 108 165 126 145 136 34 9 74 30 32 163 84 245 173 187
204 142 81 181 190 46 88 100 159 25 231 50 207 57 147 14 67 120 128 154
248 213 167 200 63 236 110 92 176 7 161 77 124 221 102 218 95 198 90 12
152 98 48 185 179 42 209 37 132 224 52 254 239 117 233 139 22 105 27 194
113 230 206 87 158 80 189 172 203 109 175 166 62 127 247 146 66 137 192 35
252 10 183 75 216 31 83 33 73 164 144 85 170 246 65 174 61 188 202 205 157
143 169 82 72 182 215 191 251 47 178 89 151 101 94 160 123 26 112 232 21
51 238 208 131 58 69 148 18 15 16 68 17 121 149 129 19 155 59 249 70 214
250 168 71 201 156 64 60 237 130 111 20 93 122 177 150]


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
In reply to this post by Levente Uzonyi
 

Hey Levente,

I am trying to primitize the #findErrorLocations: & #findErrorMagnitudes:errorLocations: methods. I thought I had it done, but I am getting a segfault when running the RSFECReedSolomonTest>>#testDataMatrix test. I tried to dig into it and find out the issue. I think one of the params to the primitive is nil. The strange thing I observed is a debug stack in Squeak, where the argument to the prior stack frame is an instance, but the argument in the top frame is nil. I do not understand how it can go from type -> nil.

If you wish to help me figure this out, the current RSFEC code is loaded with:

Installer ss
    project: 'Cryptography';
    install: 'ProCrypto-1-1-1';
    install: 'ProCryptoTests-1-1-1'.

Then the Decoder primitives to RSFECPlugin is loaded from:

CryptographyRSFECPluginsDecoder-segfault-rww.2

---
Kindly,
Robert


On 6/2/21 10:40 AM, Levente Uzonyi wrote:
Hi Robert,

On Tue, 1 Jun 2021, Robert Withers wrote:

Hey ya, Levente,

Woot!!! I was able to get all the primitives working with ByteArrays. Everything is passing green! (except for several GF fields: Aztec and Maxicode and QRCode) I think I will primitize one of the codingLoops on the RSErasure
side. Here are the numbers.  Take note of the tallies and time: 50 seconds versus 114 seconds. I updated the squeak wiki Cryptography page (http://wiki.squeak.org/squeak/5776) with d/l links to the plugins. Here are those
Brilliant. ByteArray is indeed the optimal choice there.

Let me answer your question in your previous email here:
AFAIK, on all the currently supported platforms and C compilers, unsigned
int is 32-bits, so it matches WordArray in Squeak.
DoubleWordArray is 64-bits, and in C that matches the type unsigned long long (on the currently supported platforms and compilers).

I think you can shave off a few more seconds there by adding those
<inline: #always> pragmas to the methods the primitive methods send
either directly or indirectly.

And, I think the logTable and expTable variables can be declared as
constant as well as unsigned char arrays instead of unsigned int arrays.
That might give more opportunities to the C compiler to optimize the code.

I prefer to use #var:type:array: instead of #var:declareC: when possible.
E.g.

 	cg
 		var: #logTable declareC: 'unsigned int logTable[256] = {
0, 0, 1, 240, 2, 225, 241, 53, 3, 38, 226, 133, 242, 43, 54, 210, 4, 195,
39, 114, 227, 106, 134, 28, 243, 140, 44, 23, 55, 118, 211, 234, 5, 219,
196, 96, 40, 222, 115, 103, 228, 78, 107, 125, 135, 8, 29, 162, 244, 186,
141, 180, 45, 99, 24, 49, 56, 13, 119, 153, 212, 199, 235, 91, 6, 76, 220,
217, 197, 11, 97, 184, 41, 36, 223, 253, 116, 138, 104, 193, 229, 86, 79,
171, 108, 165, 126, 145, 136, 34, 9, 74, 30, 32, 163, 84, 245, 173, 187,
204, 142, 81, 181, 190, 46, 88, 100, 159, 25, 231, 50, 207, 57, 147, 14,
67, 120, 128, 154, 248, 213, 167, 200, 63, 236, 110, 92, 176, 7, 161, 77,
124, 221, 102, 218, 95, 198, 90, 12, 152, 98, 48, 185, 179, 42, 209, 37,
132, 224, 52, 254, 239, 117, 233, 139, 22, 105, 27, 194, 113, 230, 206,
87, 158, 80, 189, 172, 203, 109, 175, 166, 62, 127, 247, 146, 66, 137,
192, 35, 252, 10, 183, 75, 216, 31, 83, 33, 73, 164, 144, 85, 170, 246,
65, 174, 61, 188, 202, 205, 157, 143, 169, 82, 72, 182, 215, 191, 251, 47,
178, 89, 151, 101, 94, 160, 123, 26, 112, 232, 21, 51, 238, 208, 131, 58,
69, 148, 18, 15, 16, 68, 17, 121, 149, 129, 19, 155, 59, 249, 70, 214,
250, 168, 71, 201, 156, 64, 60, 237, 130, 111, 20, 93, 122, 177, 150 }';

could be written as

 	cg
 		var: #logTable
 		type: #'const unsigned char'
 		array:  #[0 0 1 240 2 225 241 53 3 38 226 133 242 43 54
210 4 195 39 114 227 106 134 28 243 140 44 23 55 118 211 234 5 219 196 96
40 222 115 103 228 78 107 125 135 8 29 162 244 186 141 180 45 99 24 49 56
13 119 153 212 199 235 91 6 76 220 217 197 11 97 184 41 36 223 253 116 138
104 193 229 86 79 171 108 165 126 145 136 34 9 74 30 32 163 84 245 173 187
204 142 81 181 190 46 88 100 159 25 231 50 207 57 147 14 67 120 128 154
248 213 167 200 63 236 110 92 176 7 161 77 124 221 102 218 95 198 90 12
152 98 48 185 179 42 209 37 132 224 52 254 239 117 233 139 22 105 27 194
113 230 206 87 158 80 189 172 203 109 175 166 62 127 247 146 66 137 192 35
252 10 183 75 216 31 83 33 73 164 144 85 170 246 65 174 61 188 202 205 157
143 169 82 72 182 215 191 251 47 178 89 151 101 94 160 123 26 112 232 21
51 238 208 131 58 69 148 18 15 16 68 17 121 149 129 19 155 59 249 70 214
250 168 71 201 156 64 60 237 130 111 20 93 122 177 150]


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

Oops, email size too big. I zipped everything up. The SqueakDebug.log has some good info in it. Here is the evidence:

https://www.dropbox.com/s/fcinfptyvv4xb42/Decoder-primitiveFailed.zip?dl=0

Levente,

Some more info. I saved another version of the segfault package. I was able to run the test resulting in a Squeak debugger. I am listing two snapshots, one is on the #decode:twoS: stack frame and the other is the next frame #findErrorLocations: As you can see the sigma method variable is a GFPolynomial, while the errorLocator in the subsequent frame is nil, even though the sigma is passed as that argument. How did *that* happen? Here is the DoIt to load everything needed:


Installer ss
    project: 'Cryptography';
    install: 'ProCrypto-1-1-1';
    install: 'ProCryptoTests-1-1-1';
    install: 'CryptographyRSFECPlugins';
    install: 'TraceMonitor';
    install: 'CryptographyRSFECPluginsDecoder-segfault'.

I am also attaching the etrace.log with some debug info I logged. I am confused. When I have gotten a crash.dmp, one of the lines in the c stack is a call to isBytes.

/usr/local/bin/../lib/squeak/5.0-202010010729/squeak(isBytes+0x9)[0x44d289]

Grazie!
Robert
---
Kindly,
Robert


On 6/2/21 2:49 PM, Robert Withers wrote:

Hey Levente,

I am trying to primitize the #findErrorLocations: & #findErrorMagnitudes:errorLocations: methods. I thought I had it done, but I am getting a segfault when running the RSFECReedSolomonTest>>#testDataMatrix test. I tried to dig into it and find out the issue. I think one of the params to the primitive is nil. The strange thing I observed is a debug stack in Squeak, where the argument to the prior stack frame is an instance, but the argument in the top frame is nil. I do not understand how it can go from type -> nil.

If you wish to help me figure this out, the current RSFEC code is loaded with:

Installer ss
    project: 'Cryptography';
    install: 'ProCrypto-1-1-1';
    install: 'ProCryptoTests-1-1-1'.

Then the Decoder primitives to RSFECPlugin is loaded from:

CryptographyRSFECPluginsDecoder-segfault-rww.2

---
Kindly,
Robert


On 6/2/21 10:40 AM, Levente Uzonyi wrote:
Hi Robert,

On Tue, 1 Jun 2021, Robert Withers wrote:

Hey ya, Levente,

Woot!!! I was able to get all the primitives working with ByteArrays. Everything is passing green! (except for several GF fields: Aztec and Maxicode and QRCode) I think I will primitize one of the codingLoops on the RSErasure
side. Here are the numbers.  Take note of the tallies and time: 50 seconds versus 114 seconds. I updated the squeak wiki Cryptography page (http://wiki.squeak.org/squeak/5776) with d/l links to the plugins. Here are those
Brilliant. ByteArray is indeed the optimal choice there.

Let me answer your question in your previous email here:
AFAIK, on all the currently supported platforms and C compilers, unsigned
int is 32-bits, so it matches WordArray in Squeak.
DoubleWordArray is 64-bits, and in C that matches the type unsigned long long (on the currently supported platforms and compilers).

I think you can shave off a few more seconds there by adding those
<inline: #always> pragmas to the methods the primitive methods send
either directly or indirectly.

And, I think the logTable and expTable variables can be declared as
constant as well as unsigned char arrays instead of unsigned int arrays.
That might give more opportunities to the C compiler to optimize the code.

I prefer to use #var:type:array: instead of #var:declareC: when possible.
E.g.

 	cg
 		var: #logTable declareC: 'unsigned int logTable[256] = {
0, 0, 1, 240, 2, 225, 241, 53, 3, 38, 226, 133, 242, 43, 54, 210, 4, 195,
39, 114, 227, 106, 134, 28, 243, 140, 44, 23, 55, 118, 211, 234, 5, 219,
196, 96, 40, 222, 115, 103, 228, 78, 107, 125, 135, 8, 29, 162, 244, 186,
141, 180, 45, 99, 24, 49, 56, 13, 119, 153, 212, 199, 235, 91, 6, 76, 220,
217, 197, 11, 97, 184, 41, 36, 223, 253, 116, 138, 104, 193, 229, 86, 79,
171, 108, 165, 126, 145, 136, 34, 9, 74, 30, 32, 163, 84, 245, 173, 187,
204, 142, 81, 181, 190, 46, 88, 100, 159, 25, 231, 50, 207, 57, 147, 14,
67, 120, 128, 154, 248, 213, 167, 200, 63, 236, 110, 92, 176, 7, 161, 77,
124, 221, 102, 218, 95, 198, 90, 12, 152, 98, 48, 185, 179, 42, 209, 37,
132, 224, 52, 254, 239, 117, 233, 139, 22, 105, 27, 194, 113, 230, 206,
87, 158, 80, 189, 172, 203, 109, 175, 166, 62, 127, 247, 146, 66, 137,
192, 35, 252, 10, 183, 75, 216, 31, 83, 33, 73, 164, 144, 85, 170, 246,
65, 174, 61, 188, 202, 205, 157, 143, 169, 82, 72, 182, 215, 191, 251, 47,
178, 89, 151, 101, 94, 160, 123, 26, 112, 232, 21, 51, 238, 208, 131, 58,
69, 148, 18, 15, 16, 68, 17, 121, 149, 129, 19, 155, 59, 249, 70, 214,
250, 168, 71, 201, 156, 64, 60, 237, 130, 111, 20, 93, 122, 177, 150 }';

could be written as

 	cg
 		var: #logTable
 		type: #'const unsigned char'
 		array:  #[0 0 1 240 2 225 241 53 3 38 226 133 242 43 54
210 4 195 39 114 227 106 134 28 243 140 44 23 55 118 211 234 5 219 196 96
40 222 115 103 228 78 107 125 135 8 29 162 244 186 141 180 45 99 24 49 56
13 119 153 212 199 235 91 6 76 220 217 197 11 97 184 41 36 223 253 116 138
104 193 229 86 79 171 108 165 126 145 136 34 9 74 30 32 163 84 245 173 187
204 142 81 181 190 46 88 100 159 25 231 50 207 57 147 14 67 120 128 154
248 213 167 200 63 236 110 92 176 7 161 77 124 221 102 218 95 198 90 12
152 98 48 185 179 42 209 37 132 224 52 254 239 117 233 139 22 105 27 194
113 230 206 87 158 80 189 172 203 109 175 166 62 127 247 146 66 137 192 35
252 10 183 75 216 31 83 33 73 164 144 85 170 246 65 174 61 188 202 205 157
143 169 82 72 182 215 191 251 47 178 89 151 101 94 160 123 26 112 232 21
51 238 208 131 58 69 148 18 15 16 68 17 121 149 129 19 155 59 249 70 214
250 168 71 201 156 64 60 237 130 111 20 93 122 177 150]


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Levente Uzonyi
In reply to this post by Robert Withers-2
 
Hi Robert,

I see the following potential causes:

In #findErrorLocationsDegree:coefficients:count:fieldSize:result: the
return value in ambiguous. You make the primitive fail if not enough
errors are found. But after the send, you don't check for failure
but simply send #methodReturnReceiver.
Instead, you should either check for primitive failure after the send
or make #findErrorLocationsDegree:coefficients:count:fieldSize:result:
return boolean value: the method was successful or not.
E.g. when numErrors = 1, return true, in the other branch, return e =
numErrors. (If slang doesn't support boolean types, just use 0 and 1).
In the actual primitive method, check for the value returned by that
method and fail the primitive when needed. Once the primitive has failed,
you ought to return from it.
I see the above pattern in other methods too (marking the primitive as
failed in a not-top-level method without checking for failure outside).
Those need to be fixed as well.

Another potential issue I see is that the validation of the arguments is
not complete.
E.g.: #stackIntegerValue: may have already set the primitive to failed if
the argument was not an integer. So, before continuing, that has to be
checked.

Another possible but unlikely problem is that you assume that both result
and coefficients have at least one field. But that may not be true which
can result in segfaults.


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 
Thanks very much, Levente. I'll dig into your suggestions tomorrow. I
made my first contact over ham radio tonight after another generous soul
totally hooked me up with cables and power supplies and programmed my
radio. I am so excited! Now on the air, KO4PYS.

---
Kindly,
Robert


On 6/3/21 8:43 PM, Levente Uzonyi wrote:

> Hi Robert,
>
> I see the following potential causes:
>
> In #findErrorLocationsDegree:coefficients:count:fieldSize:result: the
> return value in ambiguous. You make the primitive fail if not enough
> errors are found. But after the send, you don't check for failure
> but simply send #methodReturnReceiver.
> Instead, you should either check for primitive failure after the send
> or make #findErrorLocationsDegree:coefficients:count:fieldSize:result:
> return boolean value: the method was successful or not.
> E.g. when numErrors = 1, return true, in the other branch, return e =
> numErrors. (If slang doesn't support boolean types, just use 0 and 1).
> In the actual primitive method, check for the value returned by that
> method and fail the primitive when needed. Once the primitive has failed,
> you ought to return from it.
> I see the above pattern in other methods too (marking the primitive as
> failed in a not-top-level method without checking for failure outside).
> Those need to be fixed as well.
>
> Another potential issue I see is that the validation of the arguments is
> not complete.
> E.g.: #stackIntegerValue: may have already set the primitive to failed if
> the argument was not an integer. So, before continuing, that has to be
> checked.
>
> Another possible but unlikely problem is that you assume that both result
> and coefficients have at least one field. But that may not be true which
> can result in segfaults.
>
>
> Levente

Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
In reply to this post by Levente Uzonyi
 

Alright, Levente! Thank you so very much! You are a generous soul & an angel! Your suggestions (the first two) worked like a charm and I have a running plugin to GF arithmetic, GFPoly arithmetic and RSDecoder arithmetic {#findErrorLocation: & #findErrorMagnitudes:errorLocations:} both plugganized. The cumulative speedup is 568% and all the hotspots changed again.

I tried to implement GFPoly>>#initializeField:coefficients: in a plugin function, but my image is crashing on startup, when standard GFs are created in GF class>>#startUp:, when each GF initializes it attempts to create a one and a zero polynomial. This would call the primitive for #initializePoly. This code can be seen in the package: CryptographyRSFECPluginsInitializeGFPoly. I am attempting to run:

            (firstNonZero > coefficientsLength)
                ifTrue: [
                        mutableCoefficients := interpreterProxy
                            instantiateClass: interpreterProxy classByteArray
                            indexableSize: 1]
                ifFalse: [
                        mutableCoefficients := interpreterProxy
                            instantiateClass: interpreterProxy classByteArray
                            indexableSize: (coefficientsLength - firstNonZero + 1).

The cumulative speedup is 568% and all the hotspots changed again. Here are the profiling results from just the RSFEC code alone, no RSErasure code is in the profiles. The new hotspots are, with RSFECPlugin, then without.

((128683 / 22648) asFloat * 100) asInteger = 568%

The remaining potential plugganization relates to complex object instantiation inside the plugin functions. {#decode:twoS:, #runEuclideanAlgorithm: and #initializeField:coefficients:}. The best value for work return is to plugganize #initializeField:coefficients: at 3.3 seconds (14%)

WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:

And:

WITHOUT GF & GFPOLY PRIMITIVES - NO RSFEC PRIMITIVES

- 98352 tallies, 128683 msec.

**Leaves**
GF arithmetic
23.4% {30126ms} RSFECGenericGF>>exp:
13.4% {17229ms} RSFECGenericGF>>addOrSubtract:by:
11.9% {15251ms} RSFECGenericGF>>maskValue:
10.0% {12916ms} RSFECGenericGF>>log:
8.6% {11059ms} RSFECGenericGF>>normalizeIndex:
6.8% {8792ms} RSFECGenericGF>>multiply:by:
GFPoly arithmetic
2.7% {3529ms} RSFECGenericGFPoly>>evaluateAt:
2.1% {2715ms} RSFECGenericGFPoly>>addOrSubtractPoly:
2.0% {2578ms} RSFECGenericGFPoly>>multiplyByMonomialDegree:coefficient:
---
Kindly,
Robert


On 6/3/21 8:43 PM, Levente Uzonyi wrote:
Hi Robert,

I see the following potential causes:

In #findErrorLocationsDegree:coefficients:count:fieldSize:result: the
return value in ambiguous. You make the primitive fail if not enough
errors are found. But after the send, you don't check for failure
but simply send #methodReturnReceiver.
Instead, you should either check for primitive failure after the send
or make #findErrorLocationsDegree:coefficients:count:fieldSize:result:
return boolean value: the method was successful or not.
E.g. when numErrors = 1, return true, in the other branch, return e =
numErrors. (If slang doesn't support boolean types, just use 0 and 1).
In the actual primitive method, check for the value returned by that
method and fail the primitive when needed. Once the primitive has failed,
you ought to return from it.
I see the above pattern in other methods too (marking the primitive as
failed in a not-top-level method without checking for failure outside).
Those need to be fixed as well.

Another potential issue I see is that the validation of the arguments is
not complete.
E.g.: #stackIntegerValue: may have already set the primitive to failed if
the argument was not an integer. So, before continuing, that has to be
checked.

Another possible but unlikely problem is that you assume that both result
and coefficients have at least one field. But that may not be true which
can result in segfaults.


Levente
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 
Oh! I forgot to relocate leaves that have already been plugganized. This leaves (heh) 3 possible plugganizations that all instantiate ByteArrays. Here, I fixed it.
WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods, so I think these are as optimized as possible:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:
 
---
Kindly,
Robert


On 6/4/21 12:02 PM, Robert Withers wrote:
WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:
Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 
Oh! Heading for a coffee with my nurse. I realized I may be passing the field into the primitive instead of the field size. I’ll check it when I get back home!

Kindly,
Robert
. .. ... ‘...^,^


On Fri, Jun 4, 2021 at 12:12, Robert Withers <[hidden email]> wrote:
Oh! I forgot to relocate leaves that have already been plugganized. This leaves (heh) 3 possible plugganizations that all instantiate ByteArrays. Here, I fixed it.
WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods, so I think these are as optimized as possible:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:
 
---
Kindly,
Robert


On 6/4/21 12:02 PM, Robert Withers wrote:
WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:


Reply | Threaded
Open this post in threaded view
|

Re: Reed Solomon plugins & performance slow down

Robert Withers-2
 

Nope, this wasn't it.

---
Kindly,
Robert


On 6/4/21 12:36 PM, Robert wrote:
Oh! Heading for a coffee with my nurse. I realized I may be passing the field into the primitive instead of the field size. I’ll check it when I get back home!

Kindly,
Robert
. .. ... ‘...^,^


On Fri, Jun 4, 2021 at 12:12, Robert Withers <[hidden email]> wrote:
Oh! I forgot to relocate leaves that have already been plugganized. This leaves (heh) 3 possible plugganizations that all instantiate ByteArrays. Here, I fixed it.
WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods, so I think these are as optimized as possible:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:
 
---
Kindly,
Robert


On 6/4/21 12:02 PM, Robert Withers wrote:
WITH GF & GFPOLY PRIMITIVES AND DECODER PRIMITIVES
(3 asterix for in-progress plugganization)

 - 22194 tallies, 22648 msec.

**Leaves**
29.1% {6586ms} RSFECDecoderWithPlugin>>decode:twoS:
14.7% {3329ms} RSFECGenericGFPoly class>>newField:coefficients:
7.3% {1646ms} RSFECDecoderWithPlugin>>primFindErrorLocationsDegree:coefficients:result:fieldSize:
2.9% {654ms} RSFECDecoderWithPlugin>>findErrorMagnitudes:errorLocations:
1.0% {237ms} RSFECDecoderWithPlugin>>runEuclideanAlgorithmPoly:poly:rDegrees:
Calls to plugganized GF/GFPoly methods:
1.4% {317ms} RSFECGenericGFWithPlugin>>log:


12