After 4 years of abstinence from Magma I try to get back in touch to evaluate some new ideas. I was thinking how to utilize magma in a cloud environment like the one from amazon. So I'm interested in possible scaling scenarios.
If I understand it correctly then - magma uses a directory to write its files. That could be called the repository - one repository is served by one server at any time - a special mode is possible where the client and the server reside in the same image (thus having only the need for a single image) - HA splits one node over certain locations. A node is an arbitrary amount of servers serving a single shared repository - forwarding proxies can be made from server to server. So these are cross-domain/cross-repository Talking to magma is done using a session. So I can do the following. - talk to a HA node that will read from an arbitrary server but will commit to a single one - mimicking a domain model by using forwarding proxies. So objects partitioned over multiple repositories appear to be in the same domain - using multiple sessions to read from multiple servers. That would be the case of domain model partitioning Up to here I would like to know what is the tradeoff in using forwarding proxies. Is the whole communication done via the proxy on the first machine or is another session created to which the client has direct access? Or to be more precise: If I would have a forwarding proxy to a collection in another repository that would hold objects from a third repository and I would detect: an object from that collection to which server am I talking when invoking a method on that detected object? As far as I remember sessions are not the most performant thing to establish. Do I remember that correct and has this changed? Same question goes for the start of a repository. Is this a quick operation or are there a lot of preparation steps that make startup rather slow? I hope these are not too much dumb questions. I'm just thinking about possibilities to what would be worth to try out. With amazon you can have multiple machines attached to a shared block storage. That can share all the repositories over an arbitrary amount of machines. But at the moment I can see how to get a lot of benefit from that particular feature regarding magma. If forwarding proxies are not too expensive that would be still enable some things. thanks in advance, Norbert_______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Hi Norbert, thanks for the _great_ note and questions. I'm really
glad to see someone finally looking into ForwardingProxies. On Wed, Feb 2, 2011 at 1:34 PM, Norbert Hartl <[hidden email]> wrote: > After 4 years of abstinence from Magma I try to get back in touch to evaluate some new ideas. I was thinking how to utilize magma in a cloud environment like the one from amazon. So I'm interested in possible scaling scenarios. Cool! I've really wanted to look at Amazon ECC to see if Magma could run there, just haven't had time to do it. Please use the latest 1.2alpha. I am on the cusp of publishing Magma 1.2 right after Squeak 4.2 is all done. > If I understand it correctly then > > - magma uses a directory to write its files. That could be called the repository Yes. > - one repository is served by one server at any time One repository is served by one or more servers at a time. > - a special mode is possible where the client and the server reside in the same image (thus having only the need for a single image) Yes, this is called "local" mode and it saves the need to serialize requests and materialize responses, so this configuration offers the best single-session performance. The servers duty is relatively light compared the client-duty, so running in local mode is not necessarily the best for a web-application because it can't scale. Multiple web-sessions contending one Magma session could easily be worse than multiple MagmaSessions contending for one server. > - HA splits one node over certain locations. A node is an arbitrary amount of servers serving a single shared repository Yes. With HA, multiple copies of the one single repository are each hosted by independently running server images. In this mode, clients make two connections, one to the "primary" for commits, and one to one of the secondary's, for reads. When the primary receives a commit, it is immediately broadcast to all secondary's, so the persistent model is redundantly safe. > - forwarding proxies can be made from server to server. So these are cross-domain/cross-repository Yes, but I would prefer to say, "from repository to repository". A MagmaForwardingProxy is just a "bookmark", or a "soft-link" to another object in another repository. It only persists the 'location' and 'oid' of the remote object. It implements #doesNotUnderstand: to look it up and cache it the first time so that subsequent access is fast. However, unlike MagmaMutatingProxy's, the ForwardingProxy will never become: the remote object. It will always forward through #doesNotUnderstand:, so if you are sending to a FP in a inner-loop, send #realObject to the FP to get the cached object for a fast send. MagmaForwardingProxies are not intended to be transparent to the Magma developer, they must be used deliberately. MagmaMutatingProxies are supposed to be transparent, but there are cases where a #yourself is needed (e.g., to avoid the proxy being sent as an argument to a primitive). The thing to be very aware of about using FP's is that it does tie the two repositories together. The app needs them both running to work. But the separation has performance and organizational advantages. > Talking to magma is done using a session. So I can do the following. > > - talk to a HA node that will read from an arbitrary server but will commit to a single one Yes. > - mimicking a domain model by using forwarding proxies. So objects partitioned over multiple repositories appear to be in the same domain Just to be clear, ForwardingProxies have no relation to HA. HA is for replicating one logical repository. FP's are for linking one logical repository (which could be hosted HA) to another logical repository (which could also be hosted HA). This is the configuration I run for my own internal app; two repositories, each one HA, so 4 servers total (but just two physical machines). > - using multiple sessions to read from multiple servers. That would be the case of domain model partitioning > > Up to here I would like to know what is the tradeoff in using forwarding proxies. Is the whole communication done via the proxy on the first machine or is another session created to which the client has direct access? Or to be more precise: If I would have a forwarding proxy to a collection in another repository that would hold objects from a third repository and I would detect: an object from that collection to which server am I talking when invoking a method on that detected object? A FP only points to one object in one repository. The FP, itself, is a persistent object residing in one repository. In general, the only requirement for the client app to do this is, whereever it SETS the object that you want remotely-linked, just send #asMagmaForwardingProxy to it. When that is committed, the object it refers to MUST be already persistent in its own repository so an appropriate location of that remote object can be determined and persisted with the FP. Later, when another session comes along and, pretending that FP _is_ the remote object, sends it a message the proxy itself does not understand. Trace the code starting at MagmaForwardingProxy>>#realObject to see that it goes and looks to see if a session to that is already present and, it so, uses that one. Otherwise, a new session is established. > As far as I remember sessions are not the most performant thing to establish. Do I remember that correct and has this changed? Same question goes for the start of a repository. Is this a quick operation or are there a lot of preparation steps that make startup rather slow? One nice thing about MagmaSessions is that they're persistent with the image. Save the image with GUI screens showing persistent objects. Sessions are connected - even with open transactions - and, when the image is later restarted, the sessions are reconnected transparently, the persistent view updated, and that transaction can even then be committed. Note that the repositories may continue to have been heavily updated by other sessions while these sessions were offline. When the image restarts, any or all of the thousands of persistent objects in the image, some being shown on the GUI's, COULD have been updated while the image was hibernating. Magma tries to be smart about handling this situation. First, it checks which commitNumber the client-session is at vs. where the server is at. If it is not a great difference, then the client simply downloads those few commit-log records from the server and applies only those updates if they're present in the image. However, if there were more commits by other sessions since the image save than there are cached objects in the session, then it would be faster to refresh all of those objects (even if some of them didn't change) instead of replaying all of those commit-logs. So this is the reality that the convenience of resuming the image state exactly where it left off, there can be a brief pause for it to bring the objects up to current state. If sessions do not have a lot of cached objects, or if the app can open a new session, then that can be considerably faster. > I hope these are not too much dumb questions. I'm just thinking about possibilities to what would be worth to try out. With amazon you can have multiple machines attached to a shared block storage. That can share all the repositories over an arbitrary amount of machines. But at the moment I can see how to get a lot of benefit from that particular feature regarding magma. If forwarding proxies are not too expensive that would be still enable some things. > > thanks in advance, > > Norbert_______________________________________________ > Magma mailing list > [hidden email] > http://lists.squeakfoundation.org/mailman/listinfo/magma > _______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Chris,
On 02.02.2011, at 23:58, Chris Muller wrote: I'm just starting myself looking into it and I'm reading a lot. I was amazed by names like SimpleDB at first. But then I recognized that it is really simple and that using it with more complex models can be even more cumbersome than using an RDBMS. So I'm looking for alternatives. Does this mean that magma handles file locking of files and transaction/timestamping data inside the files to be safe to have any number of images accesing it? That would be great. If so could you give information how the locking is achieved/when are commits rejected. Or better how fine/grain coarsed the locking mechanism is. Ok. Seems like a nice feature that opens a whole new bunch of possible use cases. I can imagine that can be very useful if there is a balanced domain partitioning in place. I assume that copying a tree from one repository to another is just a detach/attach operation, right? Forwarding proxies can close some gaps I think. I agree. I didn't mix HA with scaling. Btw. is the master server automatically taken over by another one if the master goes down? I assume you mean the referred object needs to have been persisted before to create an oid that is used in the proxy? Is that a normal MagmaSession like I create in my app? If so, can I access automatically created sessions? Let me bother you with some scenarios. Just to be sure I understood you correct. Let's take 10 machines (EC2) that have all attached a shared storage all of them could access the same repository in a safe way, right? If this is possible then locking will prevent scaling when a specific number of accessing processes is reached. I could then partition the user data into multiple repositories and still have the ability to access them from multiple machines or to switch responsibility of a repository from one host to another? Furthermore it would be possible to have a central server that keeps information (that is not easy to partition) and that references/is referenced via forwarding proxies from other servers? And mutating proxies might help to lower forwarding requests by materializing the object in another repository. And if it is needed I could even store images that recover automatically when restarted again? Well, that sounds to good to be true; thanks, Norbert _______________________________________________ Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Hi,
>> One repository is served by one or more servers at a time. > > Does this mean that magma handles file locking of files and > transaction/timestamping data inside the files to be safe to have any number > of images accesing it? No, only one Squeak process at a time can access one physical copy of a repository at a time. In fact, a temporary file is created as a crude-but-effective check just to prevent accidentally opening a second process on the same files. When I said "one or more" I was referring to an HA configuration. One _logical_ repository, of which there are multiple physical copies, each "served" by one squeakvm process concurrently. HA configurations can help scaling an applications read performance, but not its write performance. For that, FP's can help. > Ok. Seems like a nice feature that opens a whole new bunch of possible use > cases. I can imagine that can be very useful if there is a balanced domain > partitioning in place. As an example, I have a Magma app for personal finances. The users financial information is kept in one repository. But since the domain includes references to various FinancialSecurity objects (as in stocks or mutual-funds) and their price history for calculating portfolio values, it really makes sense to partition that part of the domain out to its own repository (called "FinancialWorld" for discussion). That way, there is only one copy of the repository which can be referenced from multiple external repositories / applications. > I assume that copying a tree from one repository to > another is just a detach/attach operation, right? Forwarding proxies can > close some gaps I think. Maybe not _copying_ a tree, but _referencing_ a tree in another repository is a commit of just one object-pointer to a FP instance. > I agree. I didn't mix HA with scaling. Btw. is the master server > automatically taken over by another one if the master goes down? It is automatic, but a secondary will not assume the primary role until it is sent a request that only the primary can carry out, like #commit. So, if the primary goes down only briefly, and only reads came in during that time, not commits, then when the primary comes back up it will still be the primary. If a commit does come in, the secondary must assume primary role. It does so automatically. Then, if the original primary is restarted, it will join the node as a secondary. The node is running fine, but the admin can force a primary-secondary swap without disrupting service if the original configuration is desired. >> A FP only points to one object in one repository. The FP, itself, is >> a persistent object residing in one repository. In general, the only >> requirement for the client app to do this is, whereever it SETS the >> object that you want remotely-linked, just send >> #asMagmaForwardingProxy to it. >> >> When that is committed, the object it refers to MUST be already >> persistent in its own repository so an appropriate location of that > >remote object can be determined and persisted with the FP. > > I assume you mean the referred object needs to have been persisted before to > create an oid that is used in the proxy? "referred object" = "object it refers to" So, with the financial program, the FinancialSecurity instance, must exist in the FinancialWorld repository before the ForwardingProxy instance, which refers to it, is persisted in the users personal-finance repository. > Is that a normal MagmaSession like I create in my app? Yes. > If so, can I access > automatically created sessions? You can ask any object for its #magmaSession. I like to open a session to the shared repository (FinancialWorld) when I bootstrap my app for the first time. > Let me bother you with some scenarios. Just to be sure I understood you > correct. > Let's take 10 machines (EC2) that have all attached a shared storage all of > them could access the same repository in a safe way, right? They could access the same _logical_ repository, of which there would need to be 10 physical copies. > If this is > possible then locking will prevent scaling when a specific number of > accessing processes is reached. I could then partition the user data into > multiple repositories and still have the ability to access them from > multiple machines or to switch responsibility of a repository from one host > to another? Furthermore it would be possible to have a central server that > keeps information (that is not easy to partition) and that references/is > referenced via forwarding proxies from other servers? And mutating proxies > might help to lower forwarding requests by materializing the object in > another repository. And if it is needed I could even store images that > recover automatically when restarted again? > Well, that sounds to good to be true; (These questions are based on that wrong assumption about the file locking. Magma doesn't do that.) FP's are simple. They're just a soft-link from one logical repository to another. - Chris > thanks, > Norbert > > > _______________________________________________ > Magma mailing list > [hidden email] > http://lists.squeakfoundation.org/mailman/listinfo/magma > > Magma mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/magma |
Free forum by Nabble | Edit this page |