FloatArray

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

FloatArray

Jimmie Houchin-5
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

SergeStinckwich


On Wed, Apr 24, 2019 at 6:20 AM Nicolas Cellier <[hidden email]> wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.


Yes, we be nice to have a Pharo port.
I would really like to integrate this with PolyMath in one way or another.

Do you have a benchmark to test the performance ?
We have already one implementation of SVD in PolyMath.
We can compare the results with LAPACK.

Not sure we can do sum of vectors or matrices with LAPACK ?

Regards,
--
Serge Stinckwic
h

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)
U
niversity of Yaoundé I, Cameroun
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich
Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

ducasse
In reply to this post by Nicolas Cellier


On 24 Apr 2019, at 07:20, Nicolas Cellier <[hidden email]> wrote:

Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Yes there is. I know that oleksandr wants to crunch a lot of numbers. 


Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie



Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Jimmie Houchin-5
In reply to this post by Nicolas Cellier

That would be awesome. There is most definitely interest.

Thanks.


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Jimmie Houchin-5
In reply to this post by Nicolas Cellier

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
Hi Jimmie,
The Metacello Configuration is not ready for Pharo7.
I have succeeded in loading Smallapack yesterday without Metacello, and could run some of the tests.
I had to modify a few methods because of Pharo refactorings, and there is still some work to make it fully operational.
When ready, I'll update the Metacello configuration and keep you informed.

Le jeu. 25 avr. 2019 à 19:06, Jimmie Houchin <[hidden email]> a écrit :

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Jimmie Houchin-5

Thanks.

I appreciate your working to make this happen. No pressure from me on time. I am blessed that you are contributing this to the community and that I can benefit from your labors. At your convenience.

I look forward to using Smallapack when available. I am thrilled to be able to stay with Pharo and not have to use Python/Numpy. :)

Thanks.

Jimmie


On 4/25/19 12:38 PM, Nicolas Cellier wrote:
Hi Jimmie,
The Metacello Configuration is not ready for Pharo7.
I have succeeded in loading Smallapack yesterday without Metacello, and could run some of the tests.
I had to modify a few methods because of Pharo refactorings, and there is still some work to make it fully operational.
When ready, I'll update the Metacello configuration and keep you informed.

Le jeu. 25 avr. 2019 à 19:06, Jimmie Houchin <[hidden email]> a écrit :

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
I've just published a new ConfigurationOfSmallapack

Currently, there is one test failing because I did not finish the support of LegacyFFI for invoking methods with more than 15 arguments (see below).
The other tests are passing, so you may try it now.

gory details:
The failing test requires the Opal Compiler to generate fallback code for invoking primitive 117 (ExternalFunction invokeWithArguments:) under the hood...
Old Compiler workaround was hackish, and is extremely complex to implement in cleaner Opal architecture
(Old compiler enabled mixing AST rewrite/code generation phase without a problem)
The plan was either to enhance primitive 120 to handle the case of many arguments passed through a single array...
Or to switch to UnifiedFFI (but even there, I don't remember if invoking primitive 117 was that straight forward...)

Le jeu. 25 avr. 2019 à 20:16, Jimmie Houchin <[hidden email]> a écrit :

Thanks.

I appreciate your working to make this happen. No pressure from me on time. I am blessed that you are contributing this to the community and that I can benefit from your labors. At your convenience.

I look forward to using Smallapack when available. I am thrilled to be able to stay with Pharo and not have to use Python/Numpy. :)

Thanks.

Jimmie


On 4/25/19 12:38 PM, Nicolas Cellier wrote:
Hi Jimmie,
The Metacello Configuration is not ready for Pharo7.
I have succeeded in loading Smallapack yesterday without Metacello, and could run some of the tests.
I had to modify a few methods because of Pharo refactorings, and there is still some work to make it fully operational.
When ready, I'll update the Metacello configuration and keep you informed.

Le jeu. 25 avr. 2019 à 19:06, Jimmie Houchin <[hidden email]> a écrit :

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

ducasse
Thanks nicolas.
We will discuss this during the sprint tomorrow. 


On 25 Apr 2019, at 22:43, Nicolas Cellier <[hidden email]> wrote:

I've just published a new ConfigurationOfSmallapack

Currently, there is one test failing because I did not finish the support of LegacyFFI for invoking methods with more than 15 arguments (see below).
The other tests are passing, so you may try it now.

gory details:
The failing test requires the Opal Compiler to generate fallback code for invoking primitive 117 (ExternalFunction invokeWithArguments:) under the hood...
Old Compiler workaround was hackish, and is extremely complex to implement in cleaner Opal architecture
(Old compiler enabled mixing AST rewrite/code generation phase without a problem)
The plan was either to enhance primitive 120 to handle the case of many arguments passed through a single array...
Or to switch to UnifiedFFI (but even there, I don't remember if invoking primitive 117 was that straight forward...)

Le jeu. 25 avr. 2019 à 20:16, Jimmie Houchin <[hidden email]> a écrit :

Thanks.

I appreciate your working to make this happen. No pressure from me on time. I am blessed that you are contributing this to the community and that I can benefit from your labors. At your convenience.

I look forward to using Smallapack when available. I am thrilled to be able to stay with Pharo and not have to use Python/Numpy. :)

Thanks.

Jimmie


On 4/25/19 12:38 PM, Nicolas Cellier wrote:
Hi Jimmie,
The Metacello Configuration is not ready for Pharo7.
I have succeeded in loading Smallapack yesterday without Metacello, and could run some of the tests.
I had to modify a few methods because of Pharo refactorings, and there is still some work to make it fully operational.
When ready, I'll update the Metacello configuration and keep you informed.

Le jeu. 25 avr. 2019 à 19:06, Jimmie Houchin <[hidden email]> a écrit :

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie



Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
Any feedback on this?
Did someone tried to use Smallapack in Pharo?
Jimmie?

Le jeu. 25 avr. 2019 à 22:50, ducasse <[hidden email]> a écrit :
Thanks nicolas.
We will discuss this during the sprint tomorrow. 


On 25 Apr 2019, at 22:43, Nicolas Cellier <[hidden email]> wrote:

I've just published a new ConfigurationOfSmallapack

Currently, there is one test failing because I did not finish the support of LegacyFFI for invoking methods with more than 15 arguments (see below).
The other tests are passing, so you may try it now.

gory details:
The failing test requires the Opal Compiler to generate fallback code for invoking primitive 117 (ExternalFunction invokeWithArguments:) under the hood...
Old Compiler workaround was hackish, and is extremely complex to implement in cleaner Opal architecture
(Old compiler enabled mixing AST rewrite/code generation phase without a problem)
The plan was either to enhance primitive 120 to handle the case of many arguments passed through a single array...
Or to switch to UnifiedFFI (but even there, I don't remember if invoking primitive 117 was that straight forward...)

Le jeu. 25 avr. 2019 à 20:16, Jimmie Houchin <[hidden email]> a écrit :

Thanks.

I appreciate your working to make this happen. No pressure from me on time. I am blessed that you are contributing this to the community and that I can benefit from your labors. At your convenience.

I look forward to using Smallapack when available. I am thrilled to be able to stay with Pharo and not have to use Python/Numpy. :)

Thanks.

Jimmie


On 4/25/19 12:38 PM, Nicolas Cellier wrote:
Hi Jimmie,
The Metacello Configuration is not ready for Pharo7.
I have succeeded in loading Smallapack yesterday without Metacello, and could run some of the tests.
I had to modify a few methods because of Pharo refactorings, and there is still some work to make it fully operational.
When ready, I'll update the Metacello configuration and keep you informed.

Le jeu. 25 avr. 2019 à 19:06, Jimmie Houchin <[hidden email]> a écrit :

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie



Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

ducasse
Thanks to remind us. It skipped our mind. I will raise the point tomorrow and discuss with marcus. 
Stef


On 16 May 2019, at 20:26, Nicolas Cellier <[hidden email]> wrote:

Any feedback on this?
Did someone tried to use Smallapack in Pharo?
Jimmie?

Le jeu. 25 avr. 2019 à 22:50, ducasse <[hidden email]> a écrit :
Thanks nicolas.
We will discuss this during the sprint tomorrow. 


On 25 Apr 2019, at 22:43, Nicolas Cellier <[hidden email]> wrote:

I've just published a new ConfigurationOfSmallapack

Currently, there is one test failing because I did not finish the support of LegacyFFI for invoking methods with more than 15 arguments (see below).
The other tests are passing, so you may try it now.

gory details:
The failing test requires the Opal Compiler to generate fallback code for invoking primitive 117 (ExternalFunction invokeWithArguments:) under the hood...
Old Compiler workaround was hackish, and is extremely complex to implement in cleaner Opal architecture
(Old compiler enabled mixing AST rewrite/code generation phase without a problem)
The plan was either to enhance primitive 120 to handle the case of many arguments passed through a single array...
Or to switch to UnifiedFFI (but even there, I don't remember if invoking primitive 117 was that straight forward...)

Le jeu. 25 avr. 2019 à 20:16, Jimmie Houchin <[hidden email]> a écrit :

Thanks.

I appreciate your working to make this happen. No pressure from me on time. I am blessed that you are contributing this to the community and that I can benefit from your labors. At your convenience.

I look forward to using Smallapack when available. I am thrilled to be able to stay with Pharo and not have to use Python/Numpy. :)

Thanks.

Jimmie


On 4/25/19 12:38 PM, Nicolas Cellier wrote:
Hi Jimmie,
The Metacello Configuration is not ready for Pharo7.
I have succeeded in loading Smallapack yesterday without Metacello, and could run some of the tests.
I had to modify a few methods because of Pharo refactorings, and there is still some work to make it fully operational.
When ready, I'll update the Metacello configuration and keep you informed.

Le jeu. 25 avr. 2019 à 19:06, Jimmie Houchin <[hidden email]> a écrit :

When I attempt to install via Metacello in Pharo 7, I get the following warning.

This package depends on the following classes:
  Parser
  MessageAsTempNode
  Compiler
  MessageNode


I do not know how to proceed from there. Any help greatly appreciated.

Thanks.

Jimmie


On 4/24/19 12:20 AM, Nicolas Cellier wrote:
Hi,
I recommand inquiring about Smallapack, the Smalltalk interface to LAPACK, on squeaksource.com or github. You'll get the speed of numpy. There is a Metacello configuration. I have not checked the port on current 
Pharo, but I can reactivate if there is some interest.

Le mer. 24 avr. 2019 à 05:55, Jimmie Houchin <[hidden email]> a écrit :
I just recently discovered FloatArray by accident.

I was close to having to use Python/Numpy due to considerable
performance differences. Performance is very important.

In one of my test using a 212000 item array of floats and doing a sum on
each iteration through the array Pharo was taking 17 minutes verses
Python/Numpy taking 20 seconds. Using FloatArray closes the gap to 40
seconds. That is acceptable.

However when looking at the FloatArray comment it says it uses 32bit
floats. I need 64bit. I don't even know where to look to create 64bit
FloatArrays. I am surprised that they didn't get converted when moving
to 64bit Pharo.

Would I need to change VM source? Compile a new VM and create image side
classes? I have not messed with VM in years. I do not know where the
FloatArray plugin would be.

Any pointers would be a great help. If this needs to be on the vm-dev
list I can move it there.

Thanks.

Jimmie




Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Jimmie Houchin-5
In reply to this post by Nicolas Cellier
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Jimmie Houchin-5

I wasn't worried about how to do sliding windows. My problem is that using LapackDGEMatrix in my example was 18x slower than FloatArray, which is slower than Numpy. It isn't what I was expecting.

What I didn't know is if I was doing something wrong to cause such a tremendous slow down.

Python and Numpy is not my favorite. But it isn't uncomfortable.

So I gave up and went back to Numpy.

Thanks.



On 5/20/19 5:17 PM, Nicolas Cellier wrote:
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

SergeStinckwich
There is another solution with my TensorFlow Pharo binding:

You can do a matrix multiplication like that :

| graph t1 t2 c1 c2 mult session result |
graph := TF_Graph create.
t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
c1 := graph const: 'c1' value: t1.
c2 := graph const: 'c2' value: t2.
mult := c1 * c2.
session := TF_Session on: graph.
result := session runOutput: (mult output: 0).
result asNumbers

Here I'm doing a multiplication between 2 matrices of 1000X1000 size in 537 ms on my computer.

All operations can be done in a graph of operations that is run outside Pharo, so could be very fast.
Operations can be done on CPU or GPU. 32 bits or 64 bits float operations are possible.

This is a work in progress but can already be used.
Regards,



On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <[hidden email]> wrote:

I wasn't worried about how to do sliding windows. My problem is that using LapackDGEMatrix in my example was 18x slower than FloatArray, which is slower than Numpy. It isn't what I was expecting.

What I didn't know is if I was doing something wrong to cause such a tremendous slow down.

Python and Numpy is not my favorite. But it isn't uncomfortable.

So I gave up and went back to Numpy.

Thanks.



On 5/20/19 5:17 PM, Nicolas Cellier wrote:
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.




--
Serge Stinckwic
​h​

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
​Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)​
U
​niversity of Yaoundé I​, Cameroon
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich
Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
Hi Jimmie,
I didn't take time yesterday to analyze your specific example because it was quite late, but here are some remarks:
1) First, I recommend using 64bits Pharo, because number crunching and Float operations will be faster (not FloatArray though).
2) it would be nice to use a profiler to analyze where time is spent
  I would not be amazed that (Float readFrom:...) takes a non neglectable percentage of it
3) ExternalDoubleArray only add overhead if no bulk-operation is performed
  (like reading raw binary data or serving as storage area passed to Lapack/blas primitives)
  it does not provide accelerated features by itself indeed.
  I think that it is too low level to serve as a primary interface.
4) LapackXXXMatrix sum has effectively not been optimized to use BLAS, and this can be easily corrected, thanks for giving this example.
With some cooperation, we could easily make some progress, there are low hanging fruits.
But I understand if you prefer to stick with more mature numpy solution.
Thanks for trying. At least, you were able to load and use Smallapack in Pharo, and this is already a good feedback.
If you have time, I'll publish a small enhancement for accelerating sum, and ask you to retry.
Thanks again


Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <[hidden email]> a écrit :
There is another solution with my TensorFlow Pharo binding:

You can do a matrix multiplication like that :

| graph t1 t2 c1 c2 mult session result |
graph := TF_Graph create.
t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
c1 := graph const: 'c1' value: t1.
c2 := graph const: 'c2' value: t2.
mult := c1 * c2.
session := TF_Session on: graph.
result := session runOutput: (mult output: 0).
result asNumbers

Here I'm doing a multiplication between 2 matrices of 1000X1000 size in 537 ms on my computer.

All operations can be done in a graph of operations that is run outside Pharo, so could be very fast.
Operations can be done on CPU or GPU. 32 bits or 64 bits float operations are possible.

This is a work in progress but can already be used.
Regards,



On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <[hidden email]> wrote:

I wasn't worried about how to do sliding windows. My problem is that using LapackDGEMatrix in my example was 18x slower than FloatArray, which is slower than Numpy. It isn't what I was expecting.

What I didn't know is if I was doing something wrong to cause such a tremendous slow down.

Python and Numpy is not my favorite. But it isn't uncomfortable.

So I gave up and went back to Numpy.

Thanks.



On 5/20/19 5:17 PM, Nicolas Cellier wrote:
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.




--
Serge Stinckwic
​h​

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
​Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)​
U
​niversity of Yaoundé I​, Cameroon
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich
Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
In reply to this post by SergeStinckwich
Hi Serge,
this is good news, having tensor flow bindings is also a must!
I have this in Smallapack with pure CPU unaccelerated blas (no MKL, nor ATLAS, just plain and dumb netlib code)

| a b |
a := LapackDGEMatrix randNormal: #(1000 1000).
b := LapackDGEMatrix randNormal: #(1000 1000).
[a * b] timeToRun
 783

| a b |
a := LapackSGEMatrix randNormal: #(1000 1000).
b := LapackSGEMatrix randNormal: #(1000 1000).
[a * b] timeToRun
 448

Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
So I think that we can get much better with accelerated library!

Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <[hidden email]> a écrit :
There is another solution with my TensorFlow Pharo binding:

You can do a matrix multiplication like that :

| graph t1 t2 c1 c2 mult session result |
graph := TF_Graph create.
t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
c1 := graph const: 'c1' value: t1.
c2 := graph const: 'c2' value: t2.
mult := c1 * c2.
session := TF_Session on: graph.
result := session runOutput: (mult output: 0).
result asNumbers

Here I'm doing a multiplication between 2 matrices of 1000X1000 size in 537 ms on my computer.

All operations can be done in a graph of operations that is run outside Pharo, so could be very fast.
Operations can be done on CPU or GPU. 32 bits or 64 bits float operations are possible.

This is a work in progress but can already be used.
Regards,



On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <[hidden email]> wrote:

I wasn't worried about how to do sliding windows. My problem is that using LapackDGEMatrix in my example was 18x slower than FloatArray, which is slower than Numpy. It isn't what I was expecting.

What I didn't know is if I was doing something wrong to cause such a tremendous slow down.

Python and Numpy is not my favorite. But it isn't uncomfortable.

So I gave up and went back to Numpy.

Thanks.



On 5/20/19 5:17 PM, Nicolas Cellier wrote:
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.




--
Serge Stinckwic
h

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)
U
niversity of Yaoundé I, Cameroon
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich
Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Jimmie Houchin-5
In reply to this post by Nicolas Cellier

Not a problem. I greatly respective other peoples time and priorities and their personal lives.

Just for the record I am using 64bit Pharo on a fast i7, 16gb ram, laptop running Xubunut 18.04 64bit.

I do not remember any problems loading. And within the small amount of experimenting that I did, it seemed to operate fine.

Again, thanks for your contribution. I know it is a lot of work and a pretty large area to cover. Python/Numpy has armies of people working on this.

Jimmie


On 5/21/19 2:54 AM, Nicolas Cellier wrote:
Hi Jimmie,
I didn't take time yesterday to analyze your specific example because it was quite late, but here are some remarks:
1) First, I recommend using 64bits Pharo, because number crunching and Float operations will be faster (not FloatArray though).
2) it would be nice to use a profiler to analyze where time is spent
  I would not be amazed that (Float readFrom:...) takes a non neglectable percentage of it
3) ExternalDoubleArray only add overhead if no bulk-operation is performed
  (like reading raw binary data or serving as storage area passed to Lapack/blas primitives)
  it does not provide accelerated features by itself indeed.
  I think that it is too low level to serve as a primary interface.
4) LapackXXXMatrix sum has effectively not been optimized to use BLAS, and this can be easily corrected, thanks for giving this example.
With some cooperation, we could easily make some progress, there are low hanging fruits.
But I understand if you prefer to stick with more mature numpy solution.
Thanks for trying. At least, you were able to load and use Smallapack in Pharo, and this is already a good feedback.
If you have time, I'll publish a small enhancement for accelerating sum, and ask you to retry.
Thanks again


Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <[hidden email]> a écrit :
There is another solution with my TensorFlow Pharo binding:

You can do a matrix multiplication like that :

| graph t1 t2 c1 c2 mult session result |
graph := TF_Graph create.
t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
c1 := graph const: 'c1' value: t1.
c2 := graph const: 'c2' value: t2.
mult := c1 * c2.
session := TF_Session on: graph.
result := session runOutput: (mult output: 0).
result asNumbers

Here I'm doing a multiplication between 2 matrices of 1000X1000 size in 537 ms on my computer.

All operations can be done in a graph of operations that is run outside Pharo, so could be very fast.
Operations can be done on CPU or GPU. 32 bits or 64 bits float operations are possible.

This is a work in progress but can already be used.
Regards,



On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <[hidden email]> wrote:

I wasn't worried about how to do sliding windows. My problem is that using LapackDGEMatrix in my example was 18x slower than FloatArray, which is slower than Numpy. It isn't what I was expecting.

What I didn't know is if I was doing something wrong to cause such a tremendous slow down.

Python and Numpy is not my favorite. But it isn't uncomfortable.

So I gave up and went back to Numpy.

Thanks.



On 5/20/19 5:17 PM, Nicolas Cellier wrote:
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.




--
Serge Stinckwic
​h​

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
​Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)​
U
​niversity of Yaoundé I​, Cameroon
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich
Reply | Threaded
Open this post in threaded view
|

Re: FloatArray

Nicolas Cellier
I have updated Smallapack to version 1.6.1 so as to accelerate sum.

| a b c |
a := LapackSGEMatrix randNormal: #(10000 1).
b := a as: FloatArray.
c := a asAbstractMatrix.
{a sum. b sum. c sum.}.
{[a sum] bench. [b sum] bench. [c sum] bench.}.
  '27,500 per second. 36.3 microseconds per run.'  "LapackMatrix"
  '117,000 per second. 8.54 microseconds per run.' "FloatArray"
  '1,180 per second. 845 microseconds per run.'  "Un-accelerated AbstractMatrix"

I measured with single precision for everything so as to have fair comparisons.
As you can see, LapackMatrix sum is still slower than FloatArray sum.
This is because we have to create a Matrix of ones first (xLASET) before calling BLAS dot-product primitive (xDOTU).

In Squeak, profiling can be obtained thru (I guess not much different in Pharo):
    AndreasSystemProfiler spyOn: [[a sum] bench].

which gives:
**Leaves**
37.19 (1,865)  LapackSLibrary xlasetWithuplo:m:n:alpha:beta:a:lda:length:
27.12 (1,360)  BlasSLibrary dotF2CWithn:x:incx:y:incy:
9.31 (467)  Behavior basicNew:
2.42 (121)  ByteString class compare:with:collated:
1.43 (72)  ByteString hashWithInitialHash:

Note that this is un-accelerated BLAS (no Intel Math Kernel Library or other accelerated version), but it does not make much difference for those BLAS level-1 functions (those with cost O(N))

But we can avoid that cost and compute all the cumulative sums, or some of them at once with a single Matrix operation (xTRMM / xGEMM):

| a b c |
a := LapackSGEMatrix randUniform: #(10000 1).
b := LapackSGEMatrix nrow: 10000 ncol: 10000 withAll: 1. "huge matrix, that's not cheap, don't even bench it!"
"cumsum"
c := b lowerTriangle * a.

| a b c |
a := LapackSGEMatrix randUniform: #(10000 1).
b := LapackSGEMatrix nrow: 10000 ncol: 10000 withAll: 1.  "huge matrix, that's not cheap, don't even bench it!"
"cumsum 1 out of 100"
c := (b lowerTriangle atRows: (100 to: 10000 by: 100)) * a.

| a b c |
a := LapackSGEMatrix randUniform: #(10000 1).
"cheaper construction of partial cum sum"
[b := LapackSGEMatrix rows: (  (100 to: a nrow by: 100) collect: [:n | ( LapackSGEMatrix nrow: 1 ncol: n withAll: 1.0) , (LapackSGEMatrix nrow: 1 ncol: a nrow - n)]).
c := b * a.] bench.

 '3.5 per second. 286 milliseconds per run.'

So as you see, with some moderate effort, we have computed 100 partial cumulative sums in about 286ms, that's 2.86ms per cumsum on average, not too bad.
Maybe not as straightforward as numpy, and maybe still not as fast, but not completely at west.



Le mar. 21 mai 2019 à 12:51, Jimmie Houchin <[hidden email]> a écrit :

Not a problem. I greatly respective other peoples time and priorities and their personal lives.

Just for the record I am using 64bit Pharo on a fast i7, 16gb ram, laptop running Xubunut 18.04 64bit.

I do not remember any problems loading. And within the small amount of experimenting that I did, it seemed to operate fine.

Again, thanks for your contribution. I know it is a lot of work and a pretty large area to cover. Python/Numpy has armies of people working on this.

Jimmie


On 5/21/19 2:54 AM, Nicolas Cellier wrote:
Hi Jimmie,
I didn't take time yesterday to analyze your specific example because it was quite late, but here are some remarks:
1) First, I recommend using 64bits Pharo, because number crunching and Float operations will be faster (not FloatArray though).
2) it would be nice to use a profiler to analyze where time is spent
  I would not be amazed that (Float readFrom:...) takes a non neglectable percentage of it
3) ExternalDoubleArray only add overhead if no bulk-operation is performed
  (like reading raw binary data or serving as storage area passed to Lapack/blas primitives)
  it does not provide accelerated features by itself indeed.
  I think that it is too low level to serve as a primary interface.
4) LapackXXXMatrix sum has effectively not been optimized to use BLAS, and this can be easily corrected, thanks for giving this example.
With some cooperation, we could easily make some progress, there are low hanging fruits.
But I understand if you prefer to stick with more mature numpy solution.
Thanks for trying. At least, you were able to load and use Smallapack in Pharo, and this is already a good feedback.
If you have time, I'll publish a small enhancement for accelerating sum, and ask you to retry.
Thanks again


Le mar. 21 mai 2019 à 05:13, Serge Stinckwich <[hidden email]> a écrit :
There is another solution with my TensorFlow Pharo binding:

You can do a matrix multiplication like that :

| graph t1 t2 c1 c2 mult session result |
graph := TF_Graph create.
t1 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
t2 := TF_Tensor fromFloats: (1 to:1000000) asArray shape:#(1000 1000).
c1 := graph const: 'c1' value: t1.
c2 := graph const: 'c2' value: t2.
mult := c1 * c2.
session := TF_Session on: graph.
result := session runOutput: (mult output: 0).
result asNumbers

Here I'm doing a multiplication between 2 matrices of 1000X1000 size in 537 ms on my computer.

All operations can be done in a graph of operations that is run outside Pharo, so could be very fast.
Operations can be done on CPU or GPU. 32 bits or 64 bits float operations are possible.

This is a work in progress but can already be used.
Regards,



On Tue, May 21, 2019 at 6:54 AM Jimmie Houchin <[hidden email]> wrote:

I wasn't worried about how to do sliding windows. My problem is that using LapackDGEMatrix in my example was 18x slower than FloatArray, which is slower than Numpy. It isn't what I was expecting.

What I didn't know is if I was doing something wrong to cause such a tremendous slow down.

Python and Numpy is not my favorite. But it isn't uncomfortable.

So I gave up and went back to Numpy.

Thanks.



On 5/20/19 5:17 PM, Nicolas Cellier wrote:
Hi Jimmie,
effectively I did not subsribe...
Having efficient methods for sliding window average is possible, here is how I would do it:

"Create a vector with 100,000 rows filles with random values (uniform distrubution in [0,1]"
v := LapackDGEMatrix randUniform: #(100000 1).

"extract values from rank 10001 to 20000"
w1 := v atIntervalFrom: 10001 to: 20000 by: 1.

"create a left multiplier matrix for performing average of w1"
a := LapackDGEMatrix nrow: 1 ncol: w1 nrow withAll: 1.0 / w1 size.

"get the average (this is a 1x1 matrix from which we take first element)"
avg1 := (a * w1) at: 1.

[ "select another slice of same size"
w2 := v atIntervalFrom: 15001 to: 25000 by: 1.

"get the average (we can recycle a)"
avg2 := (a * w2) at: 1 ] bench.

This gives:
 '16,500 per second. 60.7 microseconds per run.'
versus:
[w2 sum / w2 size] bench.
 '1,100 per second. 908 microseconds per run.'

For max and min, it's harder. Lapack/Blas only provide max of absolute value as primitive:
[w2 absMax] bench.
 '19,400 per second. 51.5 microseconds per run.'

Everything else will be slower, unless we write new primitives in C and connect them...
[w2 maxOf: [:each | each]] bench.
 '984 per second. 1.02 milliseconds per run.'

Le dim. 19 mai 2019 à 14:58, Jimmie <[hidden email]> a écrit :
On 5/16/19 1:26 PM, Nicolas Cellier wrote:> Any feedback on this?
 > Did someone tried to use Smallapack in Pharo?
 > Jimmie?
 >

I am going to guess that you are not on pharo-users. My bad.
I posted this in pharo-users as I it wasn't Pharo development question.

I probably should have posted here or emailed you directly.

All I really need is good performance with a simple array of floats. No
matrix math. Nothing complicated. Moving Averages over a slice of the
array. A variety of different averages, weighted, etc. Max/min of the
array. But just a single simple array.

Any help greatly appreciated.

Thanks.


On 4/28/19 8:32 PM, Jimmie Houchin wrote:
Hello,

I have installed Smallapack into Pharo 7.0.3. Thanks Nicholas.

I am very unsure on my use of Smallapack. I am not a mathematician or
scientist. However the only part of Smallapack I am trying to use at the
moment is something that would  be 64bit and compare to FloatArray so
that I can do some simple accessing, slicing, sum, and average on the array.

Here is some sample code I wrote just to play in a playground.

I have an ExternalDoubleArray, LapackDGEMatrix, and a FloatArray
samples. The ones not in use are commented out for any run.

fp is a download from
http://ratedata.gaincapital.com/2018/12%20December/EUR_USD_Week1.zip
and unzipped to a directory.

fp := '/home/jimmie/data/EUR_USD_Week1.csv'
index := 0.
pricesSum := 0.
asum := 0.
ttr := [
     lines := fp asFileReference contents lines allButFirst.
     a := ExternalDoubleArray new: lines size.
     "la := LapackDGEMatrix allocateNrow: lines size ncol: 1.
     a := la columnAt: 1."
     "a := FloatArray new: lines size."
     lines do: [ :line || parts price |
         parts := ',' split: line.
         index := index + 1.
         price := Float readFrom: (parts last).
         a at: index put: price.
         pricesSum := pricesSum + price.
         (index rem: 100) = 0 ifTrue: [
             asum := a sum.
      ]]] timeToRun.
{ index. pricesSum. asum. ttr }.
  "ExternalDoubleArray an Array(337588 383662.5627699992
383562.2956199993 0:00:01:59.885)"
  "FloatArray  an Array(337588 383662.5627699992 383562.2954441309
0:00:00:06.555)"

FloatArray is not the precision I need. But it is over 18x faster.

I am afraid I must be doing something badly wrong. Python/Numpy is over
4x faster than FloatArray for the above.

If I am using Smallapack incorrectly please help.

Any help greatly appreciated.

Thanks.




--
Serge Stinckwic
​h​

Int. Research Unit
 on Modelling/Simulation of Complex Systems (UMMISCO)
​Sorbonne University
 (SU)
French National Research Institute for Sustainable Development (IRD)​
U
​niversity of Yaoundé I​, Cameroon
"Programs must be written for people to read, and only incidentally for machines to execute."
https://twitter.com/SergeStinckwich
12