Polycephaly review

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Polycephaly review

Holger Kleinsorgen-4
Hello,

I've been trying to use Polycephaly for a specific task, and encountered
some technical and conceptual problems I'd like to share. The task was
to balance/distribute a fulltext search function, which uses Lucene + JNI.

Technical problems, reproducable in a 7.8 image:

- Polycephaly uses Standard I/O streams. All of our headless images
write to stdout/stderr, too, which caused BOSS marshaling errors. So I
created subclasses of VirtualMachine/Drone/Pipe which use a TCP
connection (published in the public repository as
Polycephaly-NetworkVirtualMachine).

- Evaluating any Polycephaly code in the Workspace that involves blocks
doesn't work (fails to properly write/read the namespace
WorkspaceVariablePool).
   Example:
     Polycephaly.VirtualMachine new do: [ 1 + 2 ]

- Evaluating any Polycephaly code and then saving the image results in
an image that cannot be opened anymore (error message on Windows 7 64
bit: vwnt.exe has stopped working).

Conceptual problems:

- The image that is running is launched when starting a VirtualMachine.
However, this is not desirable in a more complex application: The image
will have some startup code, which has to be bypassed. Secondly, the
image might be quite large. This wastes a lot of RAM when running a lot
of instances. On the other side, creating a dedicated drone image is not
trivial, too. And if there is the need to deploy a 64 bit image, four
images would be needed: master/drone 32/64 bit.

- Having to marshal everything limits the type of tasks that can be
solved with Polycephaly.

- the class VirtualMachines might be useful in some cases, but currently
we always want to balance load, not to replicate load.

And finally a totally different problem: The name is too difficult to
spell, pronounce and remember ;)
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

Runar Jordahl
> - The image that is running is launched when starting a VirtualMachine.
> However, this is not desirable in a more complex application: The image
> will have some startup code, which has to be bypassed.

In fact, the image saved on disk is used when you start a new image.

If your application has UI, you can simply use HeadlessImage
isHeadless ifFalse: ["startup code"] to skip executing this code .


> (...) the image might be quite large. This wastes a lot of RAM when running a lot
> of instances. On the other side, creating a dedicated drone image is not
> trivial, too. And if there is the need to deploy a 64 bit image, four
> images would be needed: master/drone 32/64 bit.

You can of course create special images for Polycephaly, but my guess
is that in most situations there is no need. Let's take an example
where your (runtime) image is 50 MB and you run on typical hardware
having 4 cores. Maybe you want 8 drone images working (to saturate the
CPU cores). In this example you end up with 400 MB "wasted" RAM. Given
that such a PC will typically be set up with 8000 MB or 4000 MB RAM, I
think the use of 10% - 5 % RAM by Polycephaly is tolerable. Of course
your situation might be different... I just discuss the typical setup.
Remember as the number of cores grow, so will RAM.


> - Having to marshal everything limits the type of tasks that can be
> solved with Polycephaly.

Cincom choose to use "message passing", not "shared memory" as the
concurrency model. I think this is a wise choice.


> - the class VirtualMachines might be useful in some cases, but currently
> we always want to balance load, not to replicate load.

I do not follow... At least you can use VirtualMachines to spread load
evenly across several images. Look at method #doActions: in this post:
http://blog.epigent.com/2011/03/use-of-visualworks-polycephaly-at.html


Runar
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

Holger Kleinsorgen-4
On 30.05.2011 14:09, Runar Jordahl wrote:
>  > - The image that is running is launched when starting a VirtualMachine.
>  > However, this is not desirable in a more complex application: The image
>  > will have some startup code, which has to be bypassed.
>
> In fact, the image saved on disk is used when you start a new image.
>
> If your application has UI, you can simply use HeadlessImage
> isHeadless ifFalse: ["startup code"] to skip executing this code .

The images are already headless, but obviously have some startup code to
run the application. And some parts of the startup code must be run in
the drone image too (e.g. connect to the database). Yeah, this can
restructured, and I wouldn't regard this as the most urgent problem to
solve. But it affects the benefits vs. effort ratio.

>  > - the class VirtualMachines might be useful in some cases, but currently
>  > we always want to balance load, not to replicate load.
>
> I do not follow... At least you can use VirtualMachines to spread load
> evenly across several images. Look at method #doActions: in this post:
> http://blog.epigent.com/2011/03/use-of-visualworks-polycephaly-at.html

VirtualMachines performs each action on each drone machine. Don't know
any use case for this, at least in our shop.

Your #doActions: balances load by distributing the actions. This is the
stuff we're looking for.
Java and Python provide frameworks for this
(http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/package-summary.html).


It would be convenient if there would be an integration of the class
Promise and Polycephaly, e.g.
   virtualMachines promise: [ 1 + 2]
which would select the next free VM and return a Promise whose value is
computed by the VM.
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

Holger Kleinsorgen-4
In reply to this post by Holger Kleinsorgen-4
> Technical problems, reproducable in a 7.8 image:
>

Another bug (or odd feature):

   Polycephaly.VirtualMachine>>timeout:do:

evaluates the block in the master image, not in the drone image. I don't
think that this is intended ;)
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

Michael Lucas-Smith-2
In reply to this post by Holger Kleinsorgen-4
Thanks for the feedback.

Some of this is already on my todo list for Poly. Some was not, so it has been added :)

Cheers,
Michael

On May 30, 2011, at 6:22 AM, Holger Kleinsorgen wrote:

> On 30.05.2011 14:09, Runar Jordahl wrote:
>>> - The image that is running is launched when starting a VirtualMachine.
>>> However, this is not desirable in a more complex application: The image
>>> will have some startup code, which has to be bypassed.
>>
>> In fact, the image saved on disk is used when you start a new image.
>>
>> If your application has UI, you can simply use HeadlessImage
>> isHeadless ifFalse: ["startup code"] to skip executing this code .
>
> The images are already headless, but obviously have some startup code to
> run the application. And some parts of the startup code must be run in
> the drone image too (e.g. connect to the database). Yeah, this can
> restructured, and I wouldn't regard this as the most urgent problem to
> solve. But it affects the benefits vs. effort ratio.
>
>>> - the class VirtualMachines might be useful in some cases, but currently
>>> we always want to balance load, not to replicate load.
>>
>> I do not follow... At least you can use VirtualMachines to spread load
>> evenly across several images. Look at method #doActions: in this post:
>> http://blog.epigent.com/2011/03/use-of-visualworks-polycephaly-at.html
>
> VirtualMachines performs each action on each drone machine. Don't know
> any use case for this, at least in our shop.
>
> Your #doActions: balances load by distributing the actions. This is the
> stuff we're looking for.
> Java and Python provide frameworks for this
> (http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/package-summary.html).
>
>
> It would be convenient if there would be an integration of the class
> Promise and Polycephaly, e.g.
>   virtualMachines promise: [ 1 + 2]
> which would select the next free VM and return a Promise whose value is
> computed by the VM.
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

Holger Kleinsorgen-4
In reply to this post by Holger Kleinsorgen-4
On 30.05.2011 17:26, Holger Kleinsorgen wrote:
>  > Technical problems, reproducable in a 7.8 image:
>  >

a problem related to raising exceptions:

Marshaling of exceptions raised in the drone image can fail, because the
error signal instance contains references to the context (originator
etc.), which might not be marshable.

e.g.

- define class Foobar with instVar 'stream'
- add

   Foobar class>>new
      ^super new initialize

   Foobar>>initialize
      stream := 'temp.txt' asFilename writeStream.

   Foobar>>compute
      self error: 'Cannot compute'.


- evaluate

   Polycephaly.VirtualMachine new do: [ Foobar new compute]
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

andre
In reply to this post by Holger Kleinsorgen-4

On 30.05.2011, at 13:39, Holger Kleinsorgen wrote:

> Conceptual problems: [...]

For me the conceptual problem lies in the drone image solution itself,  
because it involves duplicating the entire process. In a way, this  
philosophy reminds me of the ancient days where interprocess  
communication was implemented via files. Considering the 8 and 12 core  
desktops nowadays, it seems totally backwards.

Polycephaly might be suitable for server-type applications that  
involve a small number of CPU expensive "worker" tasks that run for at  
least a couple seconds. However it is completely inappropriate for  
parallel computing where many subjects (threads) work on the same  
objects.

For example, I have a requirement where multiple threads are supposed  
to process small chunks of data (in the 10-20 milliseconds range) and  
feed them into a single output. Seeing only 1 core at 100% all the  
time, while the other 7 were sitting idle was frustrating, let alone I  
was unable to find any excuses to explain this to my customers.

The outcome was I had to port really large portions of my product to C+
+. Over time, more and more parts moved down to a large cross-platform  
library that is still growing. I would rather use VisualWorks for the  
entire product, because C++ can be a pain.

A concurrent virtual machine is not trivial, that's for sure, but it  
can be done.

Andre

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

Eliot Miranda-2


On Tue, May 31, 2011 at 5:53 AM, andre <[hidden email]> wrote:

On 30.05.2011, at 13:39, Holger Kleinsorgen wrote:

> Conceptual problems: [...]

For me the conceptual problem lies in the drone image solution itself,
because it involves duplicating the entire process. In a way, this
philosophy reminds me of the ancient days where interprocess
communication was implemented via files. Considering the 8 and 12 core
desktops nowadays, it seems totally backwards.

Polycephaly might be suitable for server-type applications that
involve a small number of CPU expensive "worker" tasks that run for at
least a couple seconds. However it is completely inappropriate for
parallel computing where many subjects (threads) work on the same
objects.

For example, I have a requirement where multiple threads are supposed
to process small chunks of data (in the 10-20 milliseconds range) and
feed them into a single output. Seeing only 1 core at 100% all the
time, while the other 7 were sitting idle was frustrating, let alone I
was unable to find any excuses to explain this to my customers.

The outcome was I had to port really large portions of my product to C+
+. Over time, more and more parts moved down to a large cross-platform
library that is still growing. I would rather use VisualWorks for the
entire product, because C++ can be a pain.

A concurrent virtual machine is not trivial, that's for sure, but it
can be done.

That's the least of it.  Implementing a thread-safe class library is a larger and as yet less defined topic.

Eliot
 

Andre

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Polycephaly review

andre
On 31.05.2011, at 19:08, Eliot Miranda wrote:

On Tue, May 31, 2011 at 5:53 AM, andre <[hidden email]> wrote:

On 30.05.2011, at 13:39, Holger Kleinsorgen wrote:

> Conceptual problems: [...]

For me the conceptual problem lies in the drone image solution itself,
because it involves duplicating the entire process. In a way, this
philosophy reminds me of the ancient days where interprocess
communication was implemented via files. Considering the 8 and 12 core
desktops nowadays, it seems totally backwards.

Polycephaly might be suitable for server-type applications that
involve a small number of CPU expensive "worker" tasks that run for at
least a couple seconds. However it is completely inappropriate for
parallel computing where many subjects (threads) work on the same
objects.

For example, I have a requirement where multiple threads are supposed
to process small chunks of data (in the 10-20 milliseconds range) and
feed them into a single output. Seeing only 1 core at 100% all the
time, while the other 7 were sitting idle was frustrating, let alone I
was unable to find any excuses to explain this to my customers.

The outcome was I had to port really large portions of my product to C+
+. Over time, more and more parts moved down to a large cross-platform
library that is still growing. I would rather use VisualWorks for the
entire product, because C++ can be a pain.

A concurrent virtual machine is not trivial, that's for sure, but it
can be done.

That's the least of it.  Implementing a thread-safe class library is a larger and as yet less defined topic.

Eliot


Yes, without multiple inheritance, that's not as easy as with C++. I found that for the most part thread safety just means clever mixins of lock/sync behaviors. In C++ with its strict constructor/destructor nesting, it is often simply a matter of adding an extra parent class.

Also there is the cool concept of scoped objects:

void myFunction ()
{
   const ScopedLock l (myLock);
   ...
   do whatever, incl, random exits
}

The declaration alone manages the locking: The lock is obtained when the temporary is created and is released automatically when it's desctructed, i.e. the lexical scope is left - no matter how. Very robust and maintenance friendly.

I would love to have this in Smalltalk too: Temporaries with a lexical scope that receive a special desctruction message when the scope is left:

doWhateverWith: anArgument
   | other temps |
   @scopedTemp := ScopedLock newOn: self semaphore.
   ...
   ^result

The compiler would insert a #destruct messagesend to the @temp everywhere the method is exited. ScopedLock>>newOn: acquires the lock. ScopedLock>>desctruct releases the lock. Developing thread-safe code with this is a lot easier. There are many other applications for scope-aware temporaries that require proper cleanup. 

Andre






_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc