Cloud environments grow and evolve. Their dynamism gives organizations unprecedented flexibility, but managing that dynamism can be a challenge.
Managing cloud performance is necessary, however, for ensuring that your cloud environment grows and evolves in lock-step with your organization’s needs. That’s where cloud performance management comes in.
A Holistic Approach to Managing an Evolving Digital Environment
Cloud performance management sounds like a loaded term, but it isn’t. It simply refers to the processes and metrics you use to see how well your cloud architecture is meeting your business needs.
The key is to approach cloud management holistically. It’s not simply a matter of gauging the performance of individual applications or processes (though this is important). Rather, the goal is to understand whether your cloud architecture has the right balance of scalability and elasticity to keep your operations nimble and responsive.
Within this framework, there is certainly room for an application-level view of performance. That’s what application performance management (APM) does, and it is a crucial part of holistic cloud performance management. APM metrics let organizations understand performance from an end-user perspective, rather than relying on second or third-party data from your host or network, as the team at Lightstep writes. Significantly more value is delivered as a result.
“You’re able to answer questions about specific page load times and database queries in a way that you simply can’t with traditional host-based monitoring,” they explain. “This information can be invaluable in trying to track down bugs in your software, or in understanding how your application performs under load.”
However, a holistic perspective also concedes that the technology itself is a secondary concern because the individual applications and processes are changeable, as Deloitte Chief Cloud Strategy Officer David Linthicum notes. You manage the system those technologies create and the environment in which they operate, then update or tweak the individual tools and processes as necessary.
What Are the Right Metrics to Track?
It is important to track specific metrics to gauge the total performance of your cloud systems. Below are five metrics that you can gather at the API gateway level to get visibility into how well your cloud environment is performing against your own needs and expectations.
The request rate is a measure of how much traffic your application receives. This can be very telling in and of itself:
- If an application gets a sustained spike of traffic, it may need to be scaled.
- If traffic on an application suddenly dries up, this could be a sign that something has gone wrong.
Application availability measures how often an application is available to use. Ideally, that will be a figure greater than 99 percent.
However, note that availability might not tell a complete story about an application’s performance. Joe Hertvik at BMC illustrates this with an example. “Let’s say you are a telecom provider with 99.9% weekly availability (.1% or 10 minutes of downtime a week),” he writes.
“But that .1% downtime occurs during high usage events, such as a record stock trading day, the live finale of a mega-popular TV show, or Amazon Prime day. You reached your availability targets, but your customers are unhappy.”
User satisfaction is typically measured using the application performance index, also known as the Apdex score. Specifically, it’s a measure of the response time of a request or transaction against a threshold of how long that action is expected to take.
“Those transactions are then bucketed into satisfied (fast), tolerating (sluggish), too slow, and failed requests,” explains Stackify’s Matt Watson. A score from zero to one is calculated by a formula that adds the satisfied requests with half of the tolerated requests, then divides that figure by the total number of requests.
The error rate tracks how many application requests end in failures.
Failures can mean different things depending on the application, writes Splunk’s Chris Tozzi. “A 404 response in a Web app would be an error. An application instance or process that exits with an error code, or simply logs a failure to answer a given request, would also be an error. In some languages, like Java, you may want to consider exceptions a form of error too.”
CPU, Memory and Other Resource Metrics
It is important to measure CPU usage and other server-related metrics, even if you don’t host applications on your servers. You can — and should — automate this kind of resource monitoring so that the system will be able to commit infrastructure as necessary whenever there is a need to scale up or down.
“Many people think of monitoring as being alerted about a problem and guided to the issue source to fix it,” says Todd Kindsfather, former product manager at IBM and current OEM product manager at Sirius XM Radio. “But another motivation for monitoring is to proactively avoid those problems in the first place.”
Get the Most From Your Cloud
Let’s take this view a level up now.
While the metrics above will provide valuable insights, you also need to have in place processes that let you observe and monitor cloud traffic and all of the integrated systems. When you have this kind of dashboard view of the whole environment, you have visibility into things like data movement, real-time service availability and how each of your services works with the others.
All of this gives you the ability to adjust your cloud infrastructure at will so that you can constantly be striking a balance between technology costs and business needs. This is an ongoing, ever-changing process because your cloud environment is itself ever-changing.
As David Linthicum at Deloitte notes, cloud performance management is “about success in ongoing operations.”