Hi All,
Typically in a traditional architecture you have: aReverseProxy -- WebApplication (java) -- Database Server. High Availability (at DataBase) means that there is an Active-Active (or Active-Passive) configuration, if the DataBase server goes down then the other DataBase will respond to WebApplication request, and the application still goes on (without human intervention). It is possible to do something like this with GLASS (free license) ? To automtically recover from a Stone that went down. I have read Admin Manual at page 291 "Activate the hot standby in case of failure in the primary". Can we put this process of recover in a Script that it will be executed after a detecting that the primary Stone is down ? The original idea here was to develop a Java app (not by me), but i have my GS application already running ok. But "High Availability" without human intervention is a requirement. Regards, Bruno |
Ciao, Dario
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by BrunoBB
Hi Bruno,
I’d be interested in how other databases handle this problem. I imagine that they do but it seems that the control would need to extend beyond the database machine(s). The challenge is that you want to avoid a single point of failure (so have redundancy), but then need to ensure that if the primary goes down it is really down. In the 1980s I worked for a healthcare database company that supported automatic fail-over. We had a situation where the network failed in such a way that the secondary machine lost connectivity with the primary machine and thought that the primary machine was down so it “took over.” The network failure was such that half of the clients stayed with the primary (which was still visible to part of the network) and half of the clients went with the secondary (since they could no longer see the primary). When we discovered the problem we had to bring down both systems and merge the transaction log (examining each one to detect conflicts). At that point we decided that manual intervention to validate the primary failure was better than automatic failure detection. So far, it seems that GemStone has followed a similar path. James On Feb 20, 2014, at 5:23 AM, BrunoBB <[hidden email]> wrote: > Hi All, > > Typically in a traditional architecture you have: aReverseProxy -- > WebApplication (java) -- Database Server. > > High Availability (at DataBase) means that there is an Active-Active (or > Active-Passive) configuration, if the DataBase server goes down then the > other DataBase will respond to WebApplication request, and the application > still goes on (without human intervention). > > It is possible to do something like this with GLASS (free license) ? > > To automtically recover from a Stone that went down. I have read Admin > Manual at page 291 "Activate the hot standby in case of failure in the > primary". > > Can we put this process of recover in a Script that it will be executed > after a detecting that the primary Stone is down ? > > The original idea here was to develop a Java app (not by me), but i have my > GS application already running ok. But "High Availability" without human > intervention is a requirement. > > Regards, > Bruno > > > > -- > View this message in context: http://forum.world.st/High-Availability-in-GemStone-tp4745211.html > Sent from the GLASS mailing list archive at Nabble.com. > _______________________________________________ > Glass mailing list > [hidden email] > http://lists.gemtalksystems.com/mailman/listinfo/glass _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi James,
Thanks for the answer. The "official" requirement is: "High Availability" without human intervention. But after find out more details some application has the database automatic take over disable. Why ? I do not know. The problem you describe is pretty interesting, we are talking about that now. I will collect more data about this i will come back then to this issue. Daemon tools are to handle more than one GemStone installation ? After reading this link is not clear to me: https://github.com/glassdb/GemStone_daemontools_setup Regards, Bruno |
Hi Bruno,
There are so many failure scenarios that it is difficult to automate the responses. The “hot standby” is a situation where transactions are sent to a backup system that tries to keep up. If the backup system has not received the most recent transaction (network delays, slow processing, etc), then bringing it on-line will involve lost transactions. The decision to fail over to the backup should consider the trade-off of transferring the latest transaction logs (which might be on a disk that is still good) vs. restarting quickly. Is it more important to have the system up quickly or to avoid lost transactions? How quickly? How many lost transactions? If the primary system suffered a crash due to corruption in the Shared Page Cache but the host OS is fine, then it might be best to just restart the primary system and let it automatically replay the existing transactions. An automated system to do this could have the stone back in a minute or so. This is what we do for our internal bug tracking GemStone database. With any fail-over the existing sessions will be lost. With a web application where every request is a new connection that can be handled by a newly-started gem, this can be transparent (the replacement system can even take over the IP address of the failed system). With a rich-client application that has a long-running connection (or with some interface-type background processes), the client will have to reconnect. It might seem fairly easy to confirm that a system is dead in an automated way, but even then one has the challenge of distinguishing between slow responses and a hung system. Typically a stone will respond to a request from a gem in less than a millisecond, but a network “glitch” with retry might get unstuck in a few seconds. What if just after you decide that a remote host is not responding it comes back to life? What if some clients think it is gone but others never noticed the absence? All this “smarts” will need to be external to GemStone, probably on a separate host. But will that monitoring system itself have redundancy or is it okay if it is a single point of failure? Will the backup monitor be on a separate host? What if it loses connection to the primary monitor, but the primary monitor has not failed? We are back to the split network problem. Again, I’m sure someone has thought about this and is selling a solution (Oracle?). But it isn’t an easy problem to solve and I’d be reluctant to let some automated process decide to take over without being sure that there are no missing transactions. The most I’d consider as a general solution is a front-end that routes requests to a read-only system if the main system appears to be unavailable. James On Feb 20, 2014, at 9:59 AM, BrunoBB <[hidden email]> wrote: > Hi James, > > Thanks for the answer. > > The "official" requirement is: "High Availability" without human > intervention. > But after find out more details some application has the database automatic > take over disable. Why ? I do not know. > > The problem you describe is pretty interesting, we are talking about that > now. > > I will collect more data about this i will come back then to this issue. > > Daemon tools are to handle more than one GemStone installation ? > > After reading this link is not clear to me: > https://github.com/glassdb/GemStone_daemontools_setup > > Regards, > Bruno > > > > -- > View this message in context: http://forum.world.st/High-Availability-in-GemStone-tp4745211p4745353.html > Sent from the GLASS mailing list archive at Nabble.com. > _______________________________________________ > Glass mailing list > [hidden email] > http://lists.gemtalksystems.com/mailman/listinfo/glass _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Hi James,
Thanks again. You have very good points in this problem, i'm talking with the java people here. It seems that there is NO 100% automatic recover from a Database crash. So after all i think the “hot standby” is ok with our requirements. Regards, Bruno |
James,
Let suppose that a company has a VM Ware virtual datacenter with lot of Linux running on it. Let say Node-1 is the master and Node-2 slave, and Node-1 is copied (at VMWare level, maybe at memory level) to Node-2. If Node-1 crash then Node-2 take the requests. If it is possible at VWare level maybe it needs some GemStone configuration or not ? (i do not know if it possible i will found out with a VM Ware administrator) If this is possible then we have "traslated the problem" to the datacenter administrator :) Regards, Bruno |
Bruno, We plan on writing some example scripts for handling the mechanics of switching over to standby stone, but anything beyond that is really dependent upon your hardware configuration and your definition of failure ...
we'll share the script with you when we get one put together. I believe that daemontools was mentioned because it can be used to detect that a process is no longer running ... it is very useful for "instantly _restarting_" gems, but I'm not sure that it could be used in this case unless you wanted to simply restart the stone process ...
Regarding the virtual node copying technology ... I don't think the technology has reached the point where such copying is feasible:) For a small enough installation it might work, but if you have a sizable amount of disk and memory that needs to be copied and synchronized atomically at each write??? I think that running processes can be migrated from one machine to another, but IIRC the processing is stopped for the duration of the copy ... so not practical as a hot standby option (yet).
Dale On Thu, Feb 20, 2014 at 1:59 PM, BrunoBB <[hidden email]> wrote: James, _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by BrunoBB
Bruno,
I’m not that familiar with VMware fail-over but a quick Google search took me to http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.availability.doc_41/c_useha_works.html. One way I’d approach this is to look at separating the redundant hosts (CPU/RAM) from the redundant disks. If you had a disk system (with RAID, etc.) that could be switched easily between hosts, then when a machine failed you could restart on another machine that would use the exact same extents and transaction logs as the original. This would avoid GemStone altogether when it comes to the “hot standby” issue. The only delay would be in replaying the transactions since the last checkpoint—and checkpoint frequency is configurable. See http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc%2FGUID-52DC7277-5321-4BB5-86B4-D73D258F6529.html for a discussion of “Sharing a VMFS Datastore Across Hosts.” Again, as you suggest, with this approach it is no longer a GemStone DBA problem but a datacenter administrator problem. On Feb 20, 2014, at 1:59 PM, BrunoBB <[hidden email]> wrote: James, _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Just a nit...
If you have to replay tranlogs to bring the stone up-to-date, that is more of a warm standby than hot standby ... the hot standby is designed to keep the standby stone within a few transactions of the master ...
Prior to the introduction of the hot standby functionality, warm standby was the only option and was/is used by a number of commercial customers Dale On Thu, Feb 20, 2014 at 3:08 PM, James Foster <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |