Recently customers asked me about best practices for disaster/failover recovery and high-availability (HA) implementation for their CLM deployments.
This topic resonates particularly as the Rational Solution for Collaborative Lifecyle Management 2012 provides a new clustering feature. As you feel certainly curious and interested about it, I suggest you have a read at the end of this post for references.
In enterprise organizations, Business Impact Analysis is usually performed, critical functions/activities are identified and requirements in terms of :
are set as well.
But how to make sure your CLM deployment will comply with these “control points” ?
What are the solutions in terms of deployment architecture that could be used ?
To help you along that path, I’ve gathered some considerations and technical aspects you should be aware of :
A backup strategy first !
The need for putting a backup strategy in place shall always be kept in mind.
It’s basically the starting point of any disaster/failover recovery strategy.
As a result, check for the best online backup capability of your enterprise database vendor to produce frequent backups.
Drop the idea of using repotool -import/export commands for backup purpose :
- they’re not ideal as server must be taken offline (which is not very handy for regularly scheduled backups) and Generally the off-line backup and restore facilities provided by an enterprise database
- they were not designed for this purpose but instead for database vendor migration scenarios (and more rarely for upgrade scenarios e.g. when upgrading from CLM 2.x to CLM 2011).
The safe path here is to parse CLM 2012 InfoCenter and check for backup techniques : Backing up and restoring other supported databases
I also advise this article from Ralph Schoon: Backup the Rational solution for CLM. It provides valuable insights on what exactly to backup up and which ordering sequence should be respected.
You said high-availability ? Here’s a simple requirement
WebSphere Application Server must be used for any HA configurations, both for manual or automatic failover.
A CLM history : from manual failover to CLM 2012 Clustering…
Act I : in the v2.x times…
A manual failover strategy was available for RTC exclusively. The approach is depicted in Deploying RTC 2.0 on WAS for HA using idle standby :
- backup server is running but isn’t connected to the database.
- a couple other tweaks are required to ensure a safe failover.
Act II : in CLM 2011
Manual failover strategy is generalized to all CLM applications. Scott Rich’s article High availability and disaster recovery for Rational’s CALM products details both the cold and idle/hot standby solutions. Unless you really have aggressive RTO, the cold standby :
- where the backup server is started but
- where the Jazz applications (JTS/CCM/RM/QM) are stopped
appears as a simpler/safer solution than the hot/idle standby. Because :
- the backup server needs fewer configuration tweaks and
- the chances of a request accidentally activating the backup server is reduced
Note that the backup server installation is not intended to be run for extended periods in place of the primary server (which should be put back on-line asap).
For InfoCenter references, check :
- Basic high-availability requirements
- Setting up a basic high-availability configuration
- Planning an idle standby deployment for crash recovery
Check also this guide for additional failure scenarios (and guidance) associated to virtualization, disk storage and network:
Act III : the clustering feature in CLM 2012
This new feature guarantees a server operating with no manual intervention in case of hardware/software failure on any of the server nodes. In other words, as long as one node is alive, the application is responding.
High Availability with Rational solution for Collaborative Lifecycle Management 2012 Clustering gives an overall picture of the technical implications of adopting the clustering solution.
It shall be noted that this new feature :
- requires additional key from Rational Support.
- provides horizontal scalability but primary focus is on high-availability only. In other words, performance scalability is out of its scope. From a practical point of view, expect a 3 nodes configuration to get similar performances than with a non HA-enabled solution.
(CLM 2012 System Requirements for clustering) Related Information:
- List of supported Application Servers
- List of supported High Availability Databases
- List of supported Load Balancers