Mission Critical: Cloud

Recovery In the Cloud, Part 2 – What CIOs Should Ask

Recovery In the Cloud, Part 2 – What CIOs Should Ask

Ram Shanmugan, our  Senior Director of Product Management for Recovery Services, offered the important points below in an interview recently with Smart Business Philly.  –Carl M.

Evaluating cloud providers is time-consuming and can be a nerve-racing task.  You don’t do it often enough to become expert at it and sometimes its hard to separate reality from hype.  Here are a few questions that go to the crux of most CIO concerns.

  1. Does the  provider offer meaningful service level guarantees for recovery of mission-critical applications?  Can it reliably recover mission-critical applications in the wake of failure?
  2. Does it support heterogeneous computing platforms (e.g., Windows, Linux) and hybrid architectures that meet the recovery needs of the entire IT portfolio?
  3. Does the staff have hands-on disaster recovery experience?  Has it recovered from a disaster?  Does it understand the entire disaster recovery lifecycle?  Can it provide audit-ready test reports?
  4. Does it provide options for high availability, as well as less crucial applications in a heterogeneous environment?  More specifically, can this partner support a broad portfolio of Recovery Point Objectives (i.e., for each application, the amount of downtime and data loss your business can sustain after a disaster) and Recovery Time Objectives (i.e., the recovery timelines and priorities your business requires for mission-critical applications and processes).
  5. What is the range of options supported for moving data to the cloud?  Does it use monitoring and automation tools to ensure rapid and effective response to failures?
  6. Can the cloud partner handle your current and future needs?  Can it expand and contract on demand, handle sudden growth or support large amounts of application data?
  7. Can clients pay as they go?
  8. Does the provider offer multiple levels of security and service options?  If some data is too sensitive for the cloud, will the provider use a private cloud for that data but use a shared cloud for everything else?

One size does not fit all, so cloud partners should offer a range of private, hybrid and physical environments to make sure your data is secure and can be recovered after a disasters.

What combination of shared, hybrid and private cloud makes the best economic sense for your company?

Visit our Cloud Solutions Center for videos, white papers and case studies about SunGard’s Enterprise Cloud Services.

Recovery in the Cloud – Part I, CEO Decision Drivers

Ram Shanmugan, our  Senior Director of Product Management for Recovery Services, was recently interviewed by Smart Business Philly magazine.  Below are some of the important points he discussed.  We’ll have more next week.  – Carl M.

“Weathering a storm” is more than just an off-hand comment these day. The U.S. experienced eight disasters costing over $1B in the first 6-months of 2011.  Few areas of the U.S were shared the business complications caused by tornado, blizzard, wildfires and floods.

Planning for erratic weather can be tricky.  Of course, you want secure data, redundant infrastructure and business continuity processes, but balancing those needs against the needs for revenue-generating IT projects is difficult.

Fortunately, “recovery in the  cloud” offers a cost-effective, reliable option.  It lets you formulate the right availability service for your applications, from mission-critical to important but infrequently used applications.

Four elements drive the decision to move to a cloud-based recovery service:

  1. Cost savings.  The ability to fulfill recovery needs and lower costs is the most significant driver,
  2. RPO/RTO.  The Recovery point objectives (how long you can tolerate an application being down) and the recovery time objectives (how long it takes to recover the application) determine the level of resources your need to avoid serious impact to your business.
  3. Reliability. The true value of a recovery environment comes during a time of disaster, and managed cloud-based solutions offer higher reliability in recovery of mission-critical applications than do in-house solutions.
  4. Skilled Resources.  In-house recovery solutions require an investment in specialized skills to support the recovery infrastructure.  Cloud-based recovery eliminates that need.

Can your IT department recover from an outage without incurring emergency resources and costs?

Visit our Cloud Solutions Center for videos, white papers and case studies about SunGard’s Enterprise Cloud Services.

The Cloud and the Availability Continuum – PART 2

Like dedicated hosting, cloud computing has to address availability.  Continued cloud outages, and the corresponding publicity, remind us of the importance of resiliency and availability.  One of the major benefits of cloud computing is scalability and efficiency of multi-tenant infrastructure.  However, even cloud infrastructures have to run in a physical data center somewhere, bringing us back to the critical nature of infrastructure availability.

Fortunately, the same availability you are accustomed to as part of a dedicated environment can be found in cloud computing.  Availability can be viewed in a continuum that ranges from high availability to failover and recovery, with many nuances in-between.  This continuum of  availability enables clouds to fulfill enterprise application and business needs at many different price points.

Platform Resiliency for Continuous Uptime

The first area to address availability is the resiliency of the platform itself.  Businesses requiring enterprise-class infrastructure need to look under the hood to determine how the infrastructure is architected and how resiliency is addressed.  A highly resilient environment should automatically
detect and address the failure of a system component—whether it is a server, network, a full blade or the VM —to quickly shift to a redundant component in order to keep the application running in the current site.

Failover

Failover is the capability to switch to a redundant or standby computer server, system, or network upon the failure or interruption of the primary environment.  Cloud computing has allowed failover practices to become less reliant on physical hardware and therefore more
available and less costly.  Service providers vary in the type of fail-over they provide as well as the time to respond, depending on the customers’ RPO and RTO needs.

A failover, or warm failover can be used for applications that require slightly less than real-time (e.g. hours VS. seconds).  In warm failover, a second site stands ready to be activated and made current as quickly as required.  Depending on the need, the time to failover depends on the Customer’s recovery time objective.  Sometimes the options can include the secondary site begin brought on line using a previous copy of the primary site.  Usually the copy is from the previous day, but it can be older depending on the business need.

High Availability for Mission-critical Apps

High availability addresses mission-critical production systems that require immediate, continuous, 24/7 access to data.  More technically, it means data must be duplicated at another location, usually in a different geographic area.   Essentially you are renting resources at one location and identical resources at another location, so costs are higher.

The communication method used between the systems also affects availability and costs.  Synchronous near real-time communication  pdates data from the primary system immediately  to the secondary system.  The secondary system mirrors the first and is ready to go into operation if the first system fails for any reason.

Asynchronous communications is where data waits in queue until the second system is free to accept it, so by its nature is less real-time.  Again, the business need determines which communications method is better.
Recovery for Availability

Recovery represents the other end of the availability continuum.  Cloud computing is changing the disaster recovery landscape.  The scalability and
flexibility of cloud computing platforms enable higher application availability.  Recovery can be used as a back-up to a production system already in the cloud or as a recovery solution to  another data center.  Further, the back-up can be on-line, ready to operate at the cloud site (like a warm failover) or off-line at a cloud site, as done in traditional recovery scenarios, since the cloud is a cost-effective recovery site for legacy systems.

As is obvious, different applications require different levels of availability, and applications should not be shoehorned into a “one size fits all” cloud
environment.  The best cloud providers will work closely with you to understand the business requirements of your business  applications  and devise the appropriate level of availability for each application you want to move to the cloud, along with any need for cloud resources to facilitate recovery of applications you do not move to the cloud.

Click here to view the SunGard Recover2Cloud Overview

Should you Negotiate your SLA?

Solutions Marketing Manager Janel Ryan discusses service level agreements today. –  Carl M

Much has been written in the few months about negotiating a better Service Level Agreement (SLA) with your cloud vendor.  Before you follow that advise, you may want to consider a few key points.

Be Realistic

First, If you are going to negotiate with your cloud provider, you have to be realistic about the performance you need and you have to be prepared to pay for those services. No vendor is going to take on more responsibility without charging more, no matter how hard you press.

Review the Architecture

Second, you’ll need to determine whether the vendor is capable of providing the service or performance level you are requesting.  Recognize that the services offered by the provider are usually governed by the cloud’s architecture and how it is implemented.  A cloud architected for inexpensive IaaS and quick provisioning may not use the most agile, efficient and self-managing software for storage, network and hypervisor.

Ask questions like, what uptime are you engineered for?  What exclusions would prevent you from obtaining an SLA remedies. Do they adhere to industry standards, like ITI for service management; ISO-9001:2008 for business processes, and  ISO 20000-1 for continuous improvement?  Do their internal procedures adhere to COBIT standards for governance?

Consider Walking Away

Finally and most importantly, if a cloud provider does not offer the SLA commitments you want and need, you are probably talking to the wrong provider.  Providers know what they do best and they know what is not in place.  If you need additional services, redundancy, a geographical distributed architecture and the vendor does not offer it, it is time to walk away.  Pushing a vendor out of his comfort zones adds more risk to an SLA, rather than adding more trust and confidence.

The clearer you are about your company’s needs for latency, redundancy, recovery, security and compliance, customer support, and technical support requirement, the easier it will be for you to select a cloud provider that can become a trusted partner.   Ask for a copy of the SLA early in your conversation with a vendor.  It could save you considerable time.

What improvements in service and support would benefit your company when it moves to a cloud?

The Cloud and its Continuum of Availability -PART 1

One of the major benefits of  cloud computing is availability and that availability comes in a continuum that ranges from high availability to high resilient, warm failover, failover and recoverable, with many nuances in-between.   This continuum of availability enables clouds to fulfill  application and business needs at many different price points.

High Availability for Mission-critical Apps

High availability is used for mission-critical production systems that require immediate, continuous, 24/7 access to data.  More technically, it means data must be duplicated at another location, usually in a different geographic area.   Essentially you are renting resources at one location and identical resources at another location, so costs are higher.

The communication method used between the systems also affects availability and costs.  Synchronous communication replicates the data in near real-time.  That is, data from the first system immediately updates the second system.  The second system mirrors the first and is ready to go into operation if the first system fails for any reason.

Asynchronous communications sends data from the first system to the second, where it waits in queue until the second system is free to accept it.  Again, the business need determines which communications method is better.

High Resiliency for Continuous Uptime

High resiliency is used for applications that do not require high availability.  In a highly resilient environment, automatic systems detect the failure of a system component—whether it is a server, a full blade or the VM software—to quickly shift to an alternate component to keep the application running in the current site.

Warm failover and failover are used for less critical applications.  In warm failover, a second site stands ready to be activated and made current as quickly as possible.  In failover, a second site is brought up using a previous copy of the primary site.  Usually the copy is from the previous day, but it can be old depending on the business need.

Recovery for Back-up.

Recovery represents the other end of the continuum.  Recovery is used as a back-up to a production system already in the cloud or as a back-up to another data center.  Further, the back-up can be on-line, ready to operate at the cloud site (like a warm failover) or off-line at a cloud site, as done in traditional recovery scenarios, since the cloud is a cost-effective recovery site for legacy systems.

As is obvious, different applications require different levels of available, and applications should not be shoehorned into a “one size fits all” cloud environment.  The best cloud providers will work closely with you to  understand the importance of your applications to your business and devise the appropriate level of availability for each application you move to the cloud, along with any need for cloud resources to facilitate recovery of applications you do not move to the cloud.

How does the continuum of availability fit with your move to the cloud?

Visit our Cloud Solutions Center for videos, white papers and case studies about SunGard’s Enterprise Cloud Services.

Multi-site Options Allay High Availability, Recovery and Interconnectivity Concerns

Organizations moving essential business applications to the cloud are often concerned that they will gain cost-efficiency and on-demand capacity but loss application availability.  Given the importance of production applications to the continuity of your businesses, those concerns are legitimate.

Fortunately, new capabilities being added to our Enterprise Cloud Services address those concerns.  Today, we are making high availability (at the 99.95 level) part of our Enterprise Cloud Services and including that commitment in our standard Service Level Agreement (SLA).  In doing so, we are going beyond the norms for the cloud computing industry.

Our high availability commitment is possible because of enhancements to our fully redundant architecture.  It now utilizes two geographically diverse production sites integrated with recovery capabilities.  These enhancements afford seamless cloud services continuity and greater availability assurances for your applications.

In addition, we have added a new option for cloud applications that do not require high availability: Managed Multi-Site Recovery.  With this option, a secondary cloud site becomes available for recovery within four hours of an outage at your primary cloud site.  That four hour recovery time objective is backed by your SLA, too.

Because more and more organizations operate in the hybrid world of cloud, co-location and managed services, we are now offering the ability to interconnect applications running on our Enterprise Cloud Services with other environments hosted in our data center(s).  This connectivity can be done within the same site or between multiple sites.  That means data from your legacy environments can be shared easily with your cloud-based applications to maximize business value.

Finally, we now provide active management for Microsoft Exchange Server, Microsoft Active Directory and Hosted Blackberry Services to reduce your IT administration burdens and help ensure production workloads are available

SunGard Meets Las Vegas: Put Your Photography Skills to Work!!

Now that we have names for our mascots, Olivia Octopus and Alex Alligator,  we need tour guides to show them around Las Vegas! Take photos with Olivia and/or Alex around Sin City and post them to our Facebook Wall for a chance to win a MacBook Air® or… iPad® 2!  See video for details!

Visit SunGard Availability Services Facebook page for official terms and conditions.

Gartner Cites High Cost of Disaster Recovery Testing as a Critical Obstacle

At June’s Infrastructure Summit, Gartner’s John Morency went on the record about the high cost of recovery testing.  He remarked that most organizations with whom John speaks report averaging $30-40K per test. Some even estimated spending as much as $100K on a single disaster recovery test exercise.

That’s an enormous expense under any circumstance. With that type of cost, it’s no wonder than more organizations report putting off recovery test walk-throughs, and incomplete testing at “best effort” levels.

Also, when recovery testing is best perceived as being “insurance” against a “smoking hole” style of disaster, then increasing the effort and resources put against it seems hard to justify.

The key to unlocking better recovery test practices requires a combination of things. First, organizations need to redefine the purpose of recovery testing – to include finding technology gaps and opportunities for process improvement in their production environment. Second, organizations can also benefit from a closer working relationship with their cloud recovery service provider. Demanding that the provider step up to assume responsibility for recovery – backed by contractual guarantees – is one way to make sure that recovery testing is a manageable burden assumed by the provider, at a predictable and manageable expense.

In fact, in most cases, taking this approach can substantially reduce an organization’s recovery testing costs. Elimination of staff travel and expenses to the recovery site, and time out of office, is one obvious savings. Additionally, however, reducing the guesswork – and relying on recovery experts – can not only reduce costs but speed test and increase the value of testing.

How can recovery tests benefit organizations in their production environment? These are some of the benefits which SunGard Recovery Services customers report experiencing first hand:

  • Finding previously unknown patches to apply in production environments as well as at the recovery site – improves applications performance and operations
  • Improving change management processes at the production site – asset and license management, change control, and related process improvement can improve customer service and improve IT budget control
  • Using runbooks created for recovery for smaller-scale recovery to overcome local outages – improves application availability
  • Redesigning protection strategies and technology implementations to better support recovery objectives – can streamline and lower the complexity and cost of protection systems

For more on how SunGard can assist your disaster recovery test efforts – and make them more valuable to your business – read about our latest Managed Recovery Program service.

 

For Recovery, Cloud Platforms Lower Cost and Improve Scalability

Cloud has received much industry attention in the last year. Some believe that cloud is a marketing fad. But others recognize that virtualization technologies when implemented as a cloud make fundamental changes in how applications can be designed, managed, maintained – and, most interestingly in terms of the Recovery Services line of business at SunGard, cloud also changes how applications can be recovered.

It is a truism by now, that cloud “is just another platform.” However, the differences in the platform are fundamental – with huge implications for the applications which run on them. ITIL best practices are also at work, changing the process by which applications are developed, tested, and provisioned in most organizations. The result seems to be more modular applications designed with “share-everything” resource utilization, to lower costs and improve efficiencies.

Even for traditional applications, however, cloud offers some exciting new recovery options. New recovery options include:

  • Shared tenancy with other organizations, to spread the cost of resources across more budgets – and lower the recovery platform costs for all
  • Receive faster, automated response to real-time fluctuations in capacity demand – across networks, compute and storage resources
  • Recover applications faster due to automation capabilities built into the cloud platform – capabilities which avoid human error and other delays
  • Transform traditional Capex into Opex monthly fees which are structured to be “pay-as-you-go” – which further reduces the pressure on IT budgets

These exciting new opportunities to ease recovery costs are helping more organizations to put effective disaster recovery in place for their businesses, by lowering the costs and increasing the benefits. Additionally, organizations are able to more of their applications under sufficient recovery protection, by lowering the costs of recovery across every type of application.

However, cloud platform does not solve everything for recovery. Some of the key challenges which are unchanged by cloud include:

  • The need for organizations to adopt modern data movement, to improve recovery points and recovery times and eliminate costs and risks associated with tape-dependent disaster recovery
  • The need for organizations to analyze applications value and the impact of downtime to their businesses – so they can appropriately prioritize recovery spending and resources
  • The need for applications expertise built into recovery plans and procedures
  • The need for organizations to maintain and test their recovery plans and procedures, on a regular basis

What’s in a name? Could be an iPod Touch!

If you have been following us on Facebook or Twitter, you may have noticed that we have been searching for names for our alligator and octopus mascots.

Although we are no longer accepting submissions,  we will still need your help. Beginning Monday, August 15 through Friday, August 19, we will allow our Facebook fans to vote on the four finalists for alligator names and the four finalists for octopus names.

The participants whose name is chosen will win one of two Apple 32GB iPod Touch media players. Winners will be notified by email, and then announced on Facebook by August 22. Click here for Terms and Conditions.

So…what are you still doing here?! Visit us on Facebook!