Trying to understand forwarding proxies

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Trying to understand forwarding proxies

NorbertHartl
After 4 years of abstinence from Magma I try to get back in touch to evaluate some new ideas. I was thinking how to utilize magma in a cloud environment like the one from amazon. So I'm interested in possible scaling scenarios.

If I understand it correctly then

- magma uses a directory to write its files. That could be called the repository
- one repository is served by one server at any time
- a special mode is possible where the client and the server reside in the same image (thus having only the need for a single image)
- HA splits one node over certain locations. A node is an arbitrary amount of servers serving a single shared repository
- forwarding proxies can be made from server to server. So these are cross-domain/cross-repository

Talking to magma is done using a session. So I can do the following.

- talk to a HA node that will read from an arbitrary server but will commit to a single one
- mimicking a domain model by using forwarding proxies. So objects partitioned over multiple repositories appear to be in the same domain
- using multiple sessions to read from multiple servers. That would be the case of domain model partitioning

Up to here I would like to know what is the tradeoff in using forwarding proxies. Is the whole communication done via the proxy on the first machine or is another session created to which the client has direct access? Or to be more precise: If I would have a forwarding proxy to a collection in another repository that would hold objects from a third repository and I would detect: an object from that collection to which server am I talking when invoking a method on that detected object?

As far as I remember sessions are not the most performant thing to establish. Do I remember that correct and has this changed? Same question goes for the start of a repository. Is this a quick operation or are there a lot of preparation steps that make startup rather slow?

I hope these are not too much dumb questions. I'm just thinking about possibilities to what would be worth to try out. With amazon you can have multiple machines attached to a shared block storage. That can share all the repositories over an arbitrary amount of machines. But at the moment I can see how to get a lot of benefit from that particular feature regarding magma. If forwarding proxies are not too expensive that would be still enable some things.

thanks in advance,

Norbert_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand forwarding proxies

Chris Muller-3
Hi Norbert, thanks for the _great_ note and questions.  I'm really
glad to see someone finally looking into ForwardingProxies.

On Wed, Feb 2, 2011 at 1:34 PM, Norbert Hartl <[hidden email]> wrote:
> After 4 years of abstinence from Magma I try to get back in touch to evaluate some new ideas. I was thinking how to utilize magma in a cloud environment like the one from amazon. So I'm interested in possible scaling scenarios.

Cool!  I've really wanted to look at Amazon ECC to see if Magma could
run there, just haven't had time to do it.

Please use the latest 1.2alpha.  I am on the cusp of publishing Magma
1.2 right after Squeak 4.2 is all done.

> If I understand it correctly then
>
> - magma uses a directory to write its files. That could be called the repository

Yes.

> - one repository is served by one server at any time

One repository is served by one or more servers at a time.

> - a special mode is possible where the client and the server reside in the same image (thus having only the need for a single image)

Yes, this is called "local" mode and it saves the need to serialize
requests and materialize responses, so this configuration offers the
best single-session performance.

The servers duty is relatively light compared the client-duty, so
running in local mode is not necessarily the best for a
web-application because it can't scale.  Multiple web-sessions
contending one Magma session could easily be worse than multiple
MagmaSessions contending for one server.

> - HA splits one node over certain locations. A node is an arbitrary amount of servers serving a single shared repository

Yes.  With HA, multiple copies of the one single repository are each
hosted by independently running server images.  In this mode, clients
make two connections, one to the "primary" for commits, and one to one
of the secondary's, for reads.

When the primary receives a commit, it is immediately broadcast to all
secondary's, so the persistent model is redundantly safe.

> - forwarding proxies can be made from server to server. So these are cross-domain/cross-repository

Yes, but I would prefer to say, "from repository to repository".  A
MagmaForwardingProxy is just a "bookmark", or a "soft-link" to another
object in another repository.  It only persists the 'location' and
'oid' of the remote object.  It implements #doesNotUnderstand: to look
it up and cache it the first time so that subsequent access is fast.
However, unlike MagmaMutatingProxy's, the ForwardingProxy will never
become: the remote object.  It will always forward through
#doesNotUnderstand:, so if you are sending to a FP in a inner-loop,
send #realObject to the FP to get the cached object for a fast send.

MagmaForwardingProxies are not intended to be transparent to the Magma
developer, they must be used deliberately.  MagmaMutatingProxies are
supposed to be transparent, but there are cases where a #yourself is
needed (e.g., to avoid the proxy being sent as an argument to a
primitive).

The thing to be very aware of about using FP's is that it does tie the
two repositories together.  The app needs them both running to work.
But the separation has performance and organizational advantages.

> Talking to magma is done using a session. So I can do the following.
>
> - talk to a HA node that will read from an arbitrary server but will commit to a single one

Yes.

> - mimicking a domain model by using forwarding proxies. So objects partitioned over multiple repositories appear to be in the same domain

Just to be clear, ForwardingProxies have no relation to HA.  HA is for
replicating one logical repository.  FP's are for linking one logical
repository (which could be hosted HA) to another logical repository
(which could also be hosted HA).  This is the configuration I run for
my own internal app; two repositories, each one HA, so 4 servers total
(but just two physical machines).

> - using multiple sessions to read from multiple servers. That would be the case of domain model partitioning
>
> Up to here I would like to know what is the tradeoff in using forwarding proxies. Is the whole communication done via the proxy on the first machine or is another session created to which the client has direct access? Or to be more precise: If I would have a forwarding proxy to a collection in another repository that would hold objects from a third repository and I would detect: an object from that collection to which server am I talking when invoking a method on that detected object?

A FP only points to one object in one repository.  The FP, itself, is
a persistent object residing in one repository.  In general, the only
requirement for the client app to do this is, whereever it SETS the
object that you want remotely-linked, just send
#asMagmaForwardingProxy to it.

When that is committed, the object it refers to MUST be already
persistent in its own repository so an appropriate location of that
remote object can be determined and persisted with the FP.

Later, when another session comes along and, pretending that FP _is_
the remote object, sends it a message the proxy itself does not
understand.  Trace the code starting at
MagmaForwardingProxy>>#realObject to see that it goes and looks to see
if a session to that is already present and, it so, uses that one.
Otherwise, a new session is established.

> As far as I remember sessions are not the most performant thing to establish. Do I remember that correct and has this changed? Same question goes for the start of a repository. Is this a quick operation or are there a lot of preparation steps that make startup rather slow?

One nice thing about MagmaSessions is that they're persistent with the
image.  Save the image with GUI screens showing persistent objects.
Sessions are connected - even with open transactions - and, when the
image is later restarted, the sessions are reconnected transparently,
the persistent view updated, and that transaction can even then be
committed.

Note that the repositories may continue to have been heavily updated
by other sessions while these sessions were offline.  When the image
restarts, any or all of the thousands of persistent objects in the
image, some being shown on the GUI's, COULD have been updated while
the image was hibernating.

Magma tries to be smart about handling this situation.  First, it
checks which commitNumber the client-session is at vs. where the
server is at.  If it is not a great difference, then the client simply
downloads those few commit-log records from the server and applies
only those updates if they're present in the image.

However, if there were more commits by other sessions since the image
save than there are cached objects in the session, then it would be
faster to refresh all of those objects (even if some of them didn't
change) instead of replaying all of those commit-logs.

So this is the reality that the convenience of resuming the image
state exactly where it left off, there can be a brief pause for it to
bring the objects up to current state.  If sessions do not have a lot
of cached objects, or if the app can open a new session, then that can
be considerably faster.



> I hope these are not too much dumb questions. I'm just thinking about possibilities to what would be worth to try out. With amazon you can have multiple machines attached to a shared block storage.
That can share all the repositories over an arbitrary amount of
machines. But at the moment I can see how to get a lot of benefit from
that particular feature regarding magma. If forwarding proxies are not
too expensive that would be still enable some things.
>
> thanks in advance,
>
> Norbert_______________________________________________
> Magma mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand forwarding proxies

NorbertHartl
Chris,

On 02.02.2011, at 23:58, Chris Muller wrote:


Cool!  I've really wanted to look at Amazon ECC to see if Magma could
run there, just haven't had time to do it.

I'm just starting myself looking into it and I'm reading a lot. I was amazed by names like SimpleDB at first. But then I recognized that it is really simple and that using it with more complex models can be even more cumbersome than using an RDBMS. So I'm looking for alternatives.

Please use the latest 1.2alpha.  I am on the cusp of publishing Magma
1.2 right after Squeak 4.2 is all done.

If I understand it correctly then

- magma uses a directory to write its files. That could be called the repository

Yes.

- one repository is served by one server at any time

One repository is served by one or more servers at a time.

Does this mean that magma handles file locking of files and transaction/timestamping data inside the files to be safe to have any number of images accesing it? That would be great. If so could you give information how the locking is achieved/when are commits rejected. Or better how fine/grain coarsed the locking mechanism is.


- forwarding proxies can be made from server to server. So these are cross-domain/cross-repository

Yes, but I would prefer to say, "from repository to repository".  A
MagmaForwardingProxy is just a "bookmark", or a "soft-link" to another
object in another repository.  It only persists the 'location' and
'oid' of the remote object.  It implements #doesNotUnderstand: to look
it up and cache it the first time so that subsequent access is fast.
However, unlike MagmaMutatingProxy's, the ForwardingProxy will never
become: the remote object.  It will always forward through
#doesNotUnderstand:, so if you are sending to a FP in a inner-loop,
send #realObject to the FP to get the cached object for a fast send.

MagmaForwardingProxies are not intended to be transparent to the Magma
developer, they must be used deliberately.  MagmaMutatingProxies are
supposed to be transparent, but there are cases where a #yourself is
needed (e.g., to avoid the proxy being sent as an argument to a
primitive).

The thing to be very aware of about using FP's is that it does tie the
two repositories together.  The app needs them both running to work.
But the separation has performance and organizational advantages.

Ok. Seems like a nice feature that opens a whole new bunch of possible use cases. I can imagine that can be very useful if there is a balanced domain partitioning in place. I assume that copying a tree from one repository to another is just a detach/attach operation, right? Forwarding proxies can close some gaps I think.

- mimicking a domain model by using forwarding proxies. So objects partitioned over multiple repositories appear to be in the same domain

Just to be clear, ForwardingProxies have no relation to HA.  HA is for
replicating one logical repository.  FP's are for linking one logical
repository (which could be hosted HA) to another logical repository
(which could also be hosted HA).  This is the configuration I run for
my own internal app; two repositories, each one HA, so 4 servers total
(but just two physical machines).

I agree. I didn't mix HA with scaling. Btw. is the master server automatically taken over by another one if the master goes down?

- using multiple sessions to read from multiple servers. That would be the case of domain model partitioning

Up to here I would like to know what is the tradeoff in using forwarding proxies. Is the whole communication done via the proxy on the first machine or is another session created to which the client has direct access? Or to be more precise: If I would have a forwarding proxy to a collection in another repository that would hold objects from a third repository and I would detect: an object from that collection to which server am I talking when invoking a method on that detected object?

A FP only points to one object in one repository.  The FP, itself, is
a persistent object residing in one repository.  In general, the only
requirement for the client app to do this is, whereever it SETS the
object that you want remotely-linked, just send
#asMagmaForwardingProxy to it.

When that is committed, the object it refers to MUST be already
persistent in its own repository so an appropriate location of that
remote object can be determined and persisted with the FP.

I assume you mean the referred object needs to have been persisted before to create an oid that is used in the proxy?

Later, when another session comes along and, pretending that FP _is_
the remote object, sends it a message the proxy itself does not
understand.  Trace the code starting at
MagmaForwardingProxy>>#realObject to see that it goes and looks to see
if a session to that is already present and, it so, uses that one.
Otherwise, a new session is established.

Is that a normal MagmaSession like I create in my app? If so, can I access automatically created sessions?

Let me bother you with some scenarios. Just to be sure I understood you correct.

Let's take 10 machines (EC2) that have all attached a shared storage all of them could access the same repository in a safe way, right? If this is possible then locking will prevent scaling when a specific number of accessing processes is reached. I could then partition the user data into multiple repositories and still have the ability to access them from multiple machines or to switch responsibility of a repository from one host to another? Furthermore it would be possible to have a central server that keeps information (that is not easy to partition) and that references/is referenced via forwarding proxies from other servers? And mutating proxies might help to lower forwarding requests by materializing the object in another repository. And if it is needed I could even store images that recover automatically when restarted again?

Well, that sounds to good to be true;

thanks,

Norbert



_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma
Reply | Threaded
Open this post in threaded view
|

Re: Trying to understand forwarding proxies

Chris Muller-3
Hi,

>> One repository is served by one or more servers at a time.
>
> Does this mean that magma handles file locking of files and
> transaction/timestamping data inside the files to be safe to have any number
> of images accesing it?

No, only one Squeak process at a time can access one physical copy of
a repository at a time.  In fact, a temporary file is created as a
crude-but-effective check just to prevent accidentally opening a
second process on the same files.

When I said "one or more" I was referring to an HA configuration.  One
_logical_ repository, of which there are multiple physical copies,
each "served" by one squeakvm process concurrently.

HA configurations can help scaling an applications read performance,
but not its write performance.  For that, FP's can help.

> Ok. Seems like a nice feature that opens a whole new bunch of possible use
> cases. I can imagine that can be very useful if there is a balanced domain
> partitioning in place.

As an example, I have a Magma app for personal finances.  The users
financial information is kept in one repository.  But since the domain
includes references to various FinancialSecurity objects (as in stocks
or mutual-funds) and their price history for calculating portfolio
values, it really makes sense to partition that part of the domain out
to its own repository (called "FinancialWorld" for discussion).  That
way, there is only one copy of the repository which can be referenced
from multiple external repositories / applications.

> I assume that copying a tree from one repository to
> another is just a detach/attach operation, right? Forwarding proxies can
> close some gaps I think.

Maybe not _copying_ a tree, but _referencing_ a tree in another
repository is a commit of just one object-pointer to a FP instance.

> I agree. I didn't mix HA with scaling. Btw. is the master server
> automatically taken over by another one if the master goes down?

It is automatic, but a secondary will not assume the primary role
until it is sent a request that only the primary can carry out, like
#commit.

So, if the primary goes down only briefly, and only reads came in
during that time, not commits, then when the primary comes back up it
will still be the primary.

If a commit does come in, the secondary must assume primary role.  It
does so automatically.  Then, if the original primary is restarted, it
will join the node as a secondary.  The node is running fine, but the
admin can force a primary-secondary swap without disrupting service if
the original configuration is desired.

>> A FP only points to one object in one repository.  The FP, itself, is
>> a persistent object residing in one repository.  In general, the only
>> requirement for the client app to do this is, whereever it SETS the
>> object that you want remotely-linked, just send
>> #asMagmaForwardingProxy to it.
>>
>> When that is committed, the object it refers to MUST be already
>> persistent in its own repository so an appropriate location of that
> >remote object can be determined and persisted with the FP.
>
> I assume you mean the referred object needs to have been persisted before to
> create an oid that is used in the proxy?

"referred object" = "object it refers to"

So, with the financial program, the FinancialSecurity instance, must
exist in the FinancialWorld repository before the ForwardingProxy
instance, which refers to it, is persisted in the users
personal-finance repository.

> Is that a normal MagmaSession like I create in my app?

Yes.

> If so, can I access
> automatically created sessions?

You can ask any object for its #magmaSession.

I like to open a session to the shared repository (FinancialWorld)
when I bootstrap my app for the first time.

> Let me bother you with some scenarios. Just to be sure I understood you
> correct.
> Let's take 10 machines (EC2) that have all attached a shared storage all of
> them could access the same repository in a safe way, right?

They could access the same _logical_ repository, of which there would
need to be 10 physical copies.

> If this is
> possible then locking will prevent scaling when a specific number of
> accessing processes is reached. I could then partition the user data into
> multiple repositories and still have the ability to access them from
> multiple machines or to switch responsibility of a repository from one host
> to another? Furthermore it would be possible to have a central server that
> keeps information (that is not easy to partition) and that references/is
> referenced via forwarding proxies from other servers? And mutating proxies
> might help to lower forwarding requests by materializing the object in
> another repository. And if it is needed I could even store images that
> recover automatically when restarted again?
> Well, that sounds to good to be true;

(These questions are based on that wrong assumption about the file
locking.  Magma doesn't do that.)

FP's are simple.  They're just a soft-link from one logical repository
to another.

 - Chris



> thanks,
> Norbert
>
>
> _______________________________________________
> Magma mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>
>
_______________________________________________
Magma mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/magma