Leveraging BigData Analytics to Predict and Isolate Application Performance Management problems

Monitoring complex business applications has always been a difficult task.   As more and more applications move to the Cloud and include dynamic provisioning and de-provisioning of resources as application demand grows and shrinks, that task has become even more difficult.  In an ideal world, companies would like to be able to identify key application performance characteristics that are precursors to an application outage or significant slowdown.  With dozens or even hundreds of machines, network switches, storage, hardware appliances, and hypervisors supporting the business application, this can be a daunting task.  In a small enterprise customer with 5000 servers/Virtual Machines, there is typically more than a terabyte of performance and log data per day that is relevant to their environment.  This blog will talk about some of the tools and techniques available for handling these complex problems.

Fortunately, there are some new tools and techniques available for analyzing this massive quantity of data.  There are log analytics tools available for analyzing system logs, middleware logs, and custom application logs to help you isolate problems in your complex business applications.   Analyzing log data is a very interesting area that is only recently providing significant value to the business.  Traditionally, log analytics provided a wealth of data regarding patterns that have emerged across the logs and applications in an environment.   It is certainly useful to know that 30% of your problems are being caused by your Windows 2003 R2 systems.  But, this doesn’t necessarily help you determine the root cause of your problem.  New tools have become available that integrate search capabilities along with the log analytics to allow customers to quickly isolate the cause of a problem and the solution to fixing the problem.   By integrating in run books, data from product manuals, knowledge bases, and more, the solution to the problem is at the operators finger tips.  The operator can see the error message that is causing the problem and, in context, see the search results across problem determination guides, redbooks, run books, and more.

A second application of BigData Analytics is the use of streaming analytics to analyze the performance metrics in real-time.  Current technologies allow organizations to stream and analyze massive amounts of data looking for patterns in the data and anomalies.  After analyzing the business application for a period of time, the BigData analytics tools can establish baselines for normal behavior.  At it’s simplest, streaming analytics can identify when a specific metric is behaving abnormally for that specific time of day or day of week.  That’s great, but some of the current tools can do much more.    They can automatically identify the relationships between metrics on different servers and establish a set of normal business application behaviors.    For example, when transaction rates through an HTTP server increase, there may be a corresponding decrease in response time, a corresponding increase in KPIs on the messaging system, the J2EE server, and the database server.   Those patterns are learned and when the behavior of one of the key application components starts to misbehave, customers can be alerted to the abnormal behavior.   In addition, the products can detect early characteristics of a problem that will ultimately cause an outage.  In some cases, these patterns can be detected hours or even days before the problem occurs.

Using Predictive Analytics to Detect Anomalies

Using Predictive Analytics to Detect Anomalies

As seen in the picture above, normally, the orange line which represents response time is low, even when other metrics such as network performance, CPU utilization, page faults, etc. are increasing and decreasing in a fairly regular pattern.   However, about two thirds the way through the time series, you can see that response time has as sudden spike even though the other performance KPIs are closely following their normal behavior.

Early Detection of a Problem Using Streaming Analytics

Early Detection of a Problem Using Streaming Analytics

In the picture above, you can see a gradual increase in utilization across multiple metrics.   The analytics tools can detect the early staging of these increases before they affect the business application.

As you can see, analytics can be a powerful tool in your arsenal to keep your business applications up and running smoothly.  You may be interested in some of the recently announced IBM IT Operations Analytics products.   You can find out more information at this URL:  http://www.ibm.com/software/products/en/subcategory/SWU70


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s