(newbie) String / Array / ArrayedCollection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

(newbie) String / Array / ArrayedCollection

Nicolas Cellier-3
Hello all,
I bounce on your example because i feel more concerned about arithmetic.

Primary interest of implementing a #sum #product #min #max #cumulativeSum
etc.. method in collection of numbers is the possibility to optimize some of
these messages at the VM level. That's getting important when you have
millions of numbers to crunch...

Suppose now you want the sum of squares:

 (aCollection collect: [:element | element squared]) sum.

Of course better code it in a single loop:

 aCollection inject: 0 into: [:sum :element | sum + element squared].

It does also work with fold: i presume.
But i personnally prefer the more legible:

 aCollection sum: [:element | element squared]

or maybe:

 aCollection sumOf: [:each | each squared]

You can also have the productOf: minOf: maxOf: cumulativeSumOf:
cumulativeProductOf: and get something powerfull...

If and only if you have optimized primitives, you'd better operate on
collections and have no loop at all in Smallatlk (at the expense of a memory
allocation and two loops in C code however), this is exactly how matlab
works:

 (aCollection*aCollection) sum

In my opinion, messages without a block argument are necessary for
optimization purposes in a matlab like environment.
Messages with block arguments, though not strictly necessary knowing fold: and
inject:into:, would ease implementation and code understanding of matrices
and polynomials for example. To me they seem obvious extensions.

I already proposed all that to VisualWorks NumericalCollections package.
(Cincom Public Store).

I also proposed the #count: extension.
 aCollection count: [:each | each > 1]
does answer the number of elements satisfying a condition (I feel
countSatisfy: is a little heavy). That is usefull more than rarely i think.

What is your opinion ?


Reply | Threaded
Open this post in threaded view
|

Re: (newbie) String / Array / ArrayedCollection

David T. Lewis
On Sun, Feb 05, 2006 at 04:57:04AM +0100, nicolas cellier wrote:

>
> If and only if you have optimized primitives, you'd better operate on
> collections and have no loop at all in Smallatlk (at the expense of a memory
> allocation and two loops in C code however), this is exactly how matlab
> works:
>
>  (aCollection*aCollection) sum
>
> In my opinion, messages without a block argument are necessary for
> optimization purposes in a matlab like environment.
> Messages with block arguments, though not strictly necessary knowing fold: and
> inject:into:, would ease implementation and code understanding of matrices
> and polynomials for example. To me they seem obvious extensions.
>
> I already proposed all that to VisualWorks NumericalCollections package.
> (Cincom Public Store).
>
> I also proposed the #count: extension.
>  aCollection count: [:each | each > 1]
> does answer the number of elements satisfying a condition (I feel
> countSatisfy: is a little heavy). That is usefull more than rarely i think.
>
> What is your opinion ?

It goes without saying that you should measure before you go too
far with this. In Squeak, you can use:
  "TimeProfileBrowser onBlock: [somethingToProfile]"

If you decide that primitives would help, you can implement them
as pluggable primitives in Squeak. Load VMMaker from SqueakMap to
get started.

Give it a try. If your primitives turn out to be useful, you can
post your plugin on SqueakMap so others can make use of them. If
they are useful to lots of folks, the various VM builders may
want to start distributing them.

If you do write primitives for numerical methods, you need to give
some thought to making them work on both 64 bit and 32 bit Squeak
platforms.  Almost all Squeakers are using 32 bit Squeak right
now, but a numerical plugin that did not handle 64 bits gracefully
would not be appreciated by the VM builders. The page is a bit
out of date, but you can see what's involved by reading this:
 <http://squeakvm.org/squeak64/>

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: (newbie) String / Array / ArrayedCollection

Nicolas Cellier-3

> It goes without saying that you should measure before you go too
> far with this. In Squeak, you can use:
>   "TimeProfileBrowser onBlock: [somethingToProfile]"
>
> If you decide that primitives would help, you can implement them
> as pluggable primitives in Squeak. Load VMMaker from SqueakMap to
> get started.
>
> Give it a try. If your primitives turn out to be useful, you can
> post your plugin on SqueakMap so others can make use of them. If
> they are useful to lots of folks, the various VM builders may
> want to start distributing them.
>
> If you do write primitives for numerical methods, you need to give
> some thought to making them work on both 64 bit and 32 bit Squeak
> platforms.  Almost all Squeakers are using 32 bit Squeak right
> now, but a numerical plugin that did not handle 64 bits gracefully
> would not be appreciated by the VM builders. The page is a bit
> out of date, but you can see what's involved by reading this:
>  <http://squeakvm.org/squeak64/>
>
> Dave

Thanks for advice. I do not want to write this primitive by now.
I will use/implement the messages #sum #sumOf: etc... because they are usefull
to me, even not optimized.
I am just asking if they are good candidates to be in the base image or not.
The fact that some could be optimized is just an argument, not an intention.


Reply | Threaded
Open this post in threaded view
|

Re: (newbie) String / Array / ArrayedCollection

stéphane ducasse-2
I have the impression that they fit naturally in a mathematical  
package/libraries.

Stef

>
> Thanks for advice. I do not want to write this primitive by now.
> I will use/implement the messages #sum #sumOf: etc... because they  
> are usefull
> to me, even not optimized.
> I am just asking if they are good candidates to be in the base  
> image or not.
> The fact that some could be optimized is just an argument, not an  
> intention.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: (newbie) String / Array / ArrayedCollection

Nicolas Cellier-3
In reply to this post by Nicolas Cellier-3

> I have the impression that they fit naturally in a mathematical
> package/libraries.
>
> Stef

You are right, in VW this is an extension that i proposed to integrate in the
NumericCollections package.

But the core of VW-NumericCollections is in the Squeak base image (you can do
arithmetics on collections like adding a number to it...). So this is why I
was proposing you integrate it.

cumulativeProduct and cumulativeSum really sound mathematical package
specific.

But in my opinion, the others (sumOf: maxOf: count: ...) are some very
 general purpose messages.
You can find such in matlab, F95, ...
They might be used anywhere you have math in squeak (squeak has a lot of math
especially but not only a lot of geometry).

I browsed the senders of #inject:into: just to check how it was used:
- it is used for concatenating strings or other collections with variants #,
#nextPutAll: #addAll: (the original posting on this thread)
- it is used in place of anySatisy: (inject:false into: ...need refactoring
 ?) - it is used to sum things (my proposition sumOf:)
- it is used to get max or min (my proposition maxOf:)
- it is used to count things (my proposition count:)
- in very rare case it does some other thing (packing string...)

Example:
DependentsArray>>size
 ^self inject: 0 into: [ :count :dep | dep ifNotNil: [ count _ count + 1 ]]

would be
 ^self count: [:each | each notNil]

I take a more academic example, because i do not really understand this one
above (overwriting a block temporary ? and what ifNil ? either it is broken
or i will learn something...)

addOtherMorphsTo: aMorph
 ...
 variableCount _ heights
      inject: 0
      into: [:count :ea | ea = 0 ifTrue: [count + 1] ifFalse: [count]].

would be:
 variableCount _ heights count: [:each | each = 0].

Which code is clearer, which one do you prefer ?
Today, inject:into: is obvious to me, but in my beginning Smalltalk days i
 can remember it was not. And i still prefer the second form for clarity. It
 is much more expressive.

Give a try to these messages and see how they spread into code, and how well
you would refactor those inject:into: and surely a lot of open-coded do
loops...

If you think they are useless simply because inject:into: exist,
then Collections are plenty of messages that are useless:
 reject: is select: [:e | (...) not]
 anySatisy: is contains:
 allSatisfy: is (contains: [:e | (..) not]) not
(that is really how i would have programmed things before implementing my
first proto #all:).

However, if you prefer a separate package, maybe you are right, you are
responsible for managing the growth of the base image.
But in this case, i suggest you might better remove Complex or other things
that does not seem to be used at all (see senders of #i and class refs of
Complex).

Sorry for being so long, but some decisions must be argued

-------------------------------------------------------


Reply | Threaded
Open this post in threaded view
|

Re: (newbie) String / Array / ArrayedCollection

Nicolas Cellier-3
Still about the sumOf: maxOf: etc... extensions,

Travis Griggs suggested on vwnc list that these where not absolutely necessary
if you have a class capable of lazy evaluation on collections:

This message is expensive on large collections, because two loops:
 (aCollection collect: [:each | each squared]) sum

This message is less expensive (one loop):
 (aCollection sumOf: [:each | each squared])

This one is not expensive and more general, because it can handle sumOf:
maxOf: etc...
 (aCollection compute: [:each | each squared]) sum

compute: does evaluate nothing, it just create a proxy with aCollection and
block as inst var.

Evaluation is deferred untill the proxy is accessed with:
at: index
 ^function value: (input at: anIndex)
superclass max will call at: index, that's all that simple.

I think i saw such a class in squeak once, so no use to do it again.
Anybody knows where i can find it ?