Thoughts on migrating Enterprise Management tools

I just came back from a visit with a customer.   They were looking to move from their existing monitoring and Service Desk solution to a new solution with more capabilities.   In addition, they wanted to introduce an Event Management layer because the number of events was getting too high as the customer has grown.  I wanted to share some thoughts from the meeting.    The interesting part of the discussion had to do with the actual migration process.  The customer wanted to know where to start. 

The customer wanted a complete solution with network monitoring, storage monitoring, application performance management, event management, and integration with a service desk.  You know what you want your final state to look like, but you have to figure out how to get from point A to point B.  So, where to start?    Normally, when I have these conversations with customers, I start by asking the customer where their biggest pain points or gaps are.    With most management solutions, there are several different entry points into the solution.    You can start with the top of the enterprise management solution and work your way down.   You can start with the monitoring  layer and work your way up.    But, ultimately, what usually work best is by solving the customer’s biggest challenge first.  

In this case, the customer had two major challenges.   They had too many events because they were sending events directly from the monitoring tool to the service desk with no event filtering, correlation or de-duplication.  Second, they were encountering response time problems in their remote locations with Citrix XenApp applications.   This provided a challenge in identifying whether the problem was introduced by the Citrix environment, the business applications being hosted in the Citrix environment, or by the network.    By introducing response time monitoring and monitoring of the Citrix XenApp environment, the customer will be able to identify any issues with response time and Citrix environment.   If we see that specific applications are causing problems, monitoring can be put in place for those applications.   Response Time monitoring is the real key.   By detecting slow response time and identifying where those response times are occurring, then it’s easy to identify whether the response time issues are a geographic/networking problem.   Based on our conversations with the customer, identifying and isolating the response time problems seemed to be the bigger issue and that was the recommendation where to start.

Every customer is different.    In some cases, a business service management service is the key challenge because of the size of the environment.  In other cases event correlation or event management is the problem.   Sometimes, the problem is a specific application or issues with the application performance management tools.   What’s important is to start with the pain point and what will provide the biggest impact to the customer and build out a deployment plan.

Customer deployment of Citrix XenDesktop VDI solution

I just finished up a monitoring deployment of our IBM’s Citrix XenDesktop monitoring solution in a VMware/Citrix VDI environment.   The customer is monitoring their VMware and Citrix XenDesktop enviroment with IBM’s IBM Tivoli Monitoring based solution.   I wanted to share a few key thoughts from the customer and my observations.

First, it’s key to have monitoring in place to monitor the entire Citrix XenDesktop environment including the license server, broker controllers, etc.   But, it’s equally important to monitor the VMware environments as well as the key infrastructure components such as the storage and networking components.  There have been incidents where problems with the back end storage affected the performance of the virtual desktops.   In addition, there have been cases where the applications being accessed via XenDesktop such as their E-mail system was suffering performance problems.   The end user assumed that the problem was related to the VDI solution even though the problem was the E-mail system.    Therefore, it’s critical that the VDI team have monitoring visibility into all of the key applications that the users are accessing.

From a VMware perspective, the customer had good monitoring and reporting in place to detect problems with the VMware environment.   But, they were just starting to look at capacity planning for he VMware environment.  We discussed the benefits of IBM’s capacity planning and optimization tools.   Customers can do typical what-if capacity planning scenarios, but can also optimize the environment to get more use out of the existing hardware and potentially eliminate servers along with the associated power and cooling costs, administrative costs, VMware licensing costs, etc.

In terms of XenDesktop, there were some key elements that were very important to the customer.   The monitoring tools couldn’t have a negative impact on the XenDesktop environment.   With some architectural changes made to the monitoring agent, we saw very low impact.   Overall, the Broker Controllers were utilizing around 5% CPU utilization even with the monitoring.   The environment is expected to grow, but with the new architectural changes, adding more concurrent desktop sessions won’t affect how much load the monitoring tools place on the Broker Controllers.  Reporting was key.    The product provided some key out of the box reports and the customer was easily able to use the Cognos based reporting tool to create a custom report for the VDI team.   The report showed the historical utilization of various users of the VDI system over time.   End users are being asked to use VDI rather than traditional laptops and the customer has the need to determine who is actually using the VDI environment.    The out of the box thresholds provided most of their needs, but the customer created a few custom thresholds.    One of the most important thresholds was to determine the percentage of available desktops within a desktop group are in use.   Another desired threshold was to identify sessions that have been in an Unregistered state for a significant period of time.  After a number of hours, the sessions associated with disconnected servers should become inactive, but that doesn’t always happen.  That leaves the sessions in an undesired state.    Through the user interface, it’s very easy to see sessions that have been unregistered for an extended period.

I will continue working with the customer and their VDI team.   They are going to identify their top 10 most common problems and we’ll make sure that monitoring and reporting is in place to identify and resolve the problems.     It was a great and educational visit.

Excellent blog post from Matt Ellis and video on BigData Analytics

Below you’ll find a link to an excellent blog post from Matt Ellis and a related video.   The video is very funny, but also very insightful.   It really helps you understand the complexity of our IT systems and business applications and how BigData Analytics can help predict problems before them happen as well as quickly isolate those problems.   Our business applications continue to grow in complexity with many applications that dynamically provision and deprovision servers as workloads change.   The amount of data and complexity can be overwhelming.   Traditional application performance management is still critical in ensuring that your environment is being monitored properly, but in many cases, it is not enough.

Read Matt’s blog and view the video here:  http://www.servicemanagement360.com/2013/12/03/data-talks-back/

IBM SmartCloud Application Performance Management and SmartCloud Monitoring Application Insights BETA

IBM SmartCloud Application Performance Management and IBM SmartCloud Monitoring Application Insights BETA program

Image
IBM is delighted to announce the release of Milestone 13 of the IBM SC APM Standard Edition vNext open beta program.

The IBM Application Performance Management and Diagnostics portfolio is a comprehensive solution that intelligently manages performance, availability, and capacity for complex application infrastructures in cloud, on-premise, and hybrid environments.  These solutions provide you with visibility, control, and automation for your mission-critical applications, protecting revenue and ensuring customer satisfaction.  Modular design, flexible deployment options, and quick time to value, help IT administrators or application teams get started quickly, and add more capabilities, as they are needed.

We invite you to see the latest innovations in this portfolio by visiting our hosted beta environment, featuring new capabilities being delivered in upcoming releases of IBM SmartCloud Application Performance Management v7.7.0.1, SmartCloud Monitoring – Application Insight v1.2, and IBM Application Performance Diagnostics v1.0, all in a managed “sandbox” that allows you to explore these new features without having to download or install any code in your own environment.

Discover new application monitoring agents like Ruby, MySQL, and MongoDB
Visualize performance data for mobile applications
See how easy it is to perform deep-dive diagnostics on misbehaving applications
Explore the intuitive new user interface, shared by all of the portfolio application performance management offerings

Contact Kim O’Connor (mailto:kimoconnor@ie.ibm.com) to arrange access to this hosted environment.  Once access has been granted, you need only point a web browser to the service to begin exploring.

By following an Open Development model, IBM is striving to make a broad set of information available to our customers and business partners earlier in the development cycle than ever before.  I have included several links below where you will be able to find valuable information.

Wiki Page
The IBM SmartCloud APM SE vNext Wiki contains information on how to download beta drivers, and offers information covering product documentation, demos and videos.

Technical Series Events
The SmartCloud APM SE team is holding a series of technical events over the coming weeks. The schedule is available online – if interested, please contact Kim O’Connor (mailto:kimoconnor@ie.ibm.com) or Cathal O’Donovan (mailto:cathalodonovan@ie.ibm.com).

Provide Feedback
You can ask questions, propose new ideas, make comments, and engage with IBM SC APM SE solution developers in the IBM SmartCloud APM SE vNext beta forum.

Please contact Kim O’Connor (mailto:kimoconnor@ie.ibm.com) or Cathal O’Donovan (mailto:cathalodonovan@ie.ibm.com) if you have questions about the IBM SmartCloud APM SE vNext Open Beta Program.

IBM SmartCloud Monitoring 7.2 Trial Virtual Machine available for Trials and POCs

I wanted to let everyone know that a Trial Virtual Machine is available for the SmartCloud Monitoring version 7.2 FP1 product. The Trial provides a 90 day trial of the software to monitor your virtualized environment and includes the Capacity Planning tools for VMware and PowerVM. These tools can help you optimize your virtualized environment and save money.

Within a few hours you can have the Virtual Machine up and running and monitoring your Virtualized environment.

This is a great tool if you are working with a customer on a Proof of Concept. Or, if you are customer, it is a really quick and easy way to evaluate the software.

The Trial includes the SmartCloud Monitoring product plus a little bit of extra content. It includes monitoring for:

VMware
PowerVM including (OS, VIOS, CEC, and HMC)
Hyper-V
Citrix XenApp, XenDesktop, XenServer
KVM
Cisco UCS
Log File Monitoring
DB2
Agent Based and Agent-less Operating System monitoring
Network Devices
NetApp Storage
Integration with Tivoli Storage Productivity Center
Integration with IBM Systems Director

The trial also includes Predictive Analytics, Capacity Planning and Optimization for VMware and PowerVM

Here is a screen portion of the VMware Expense Reduction report where we show you the potential savings gained by optimizing your VMware environment.  The data came from a customer and is based on the typical costs of a US data center.  The Virtualization Licensing costs are based on list price in the US.

This report shows the potential savings gained by optimizing the VMware environment

This report shows the potential savings gained by optimizing the VMware environment

You can find the software at the following URL: https://www.ibm.com/services/forms/preLogin.do?source=swg-ibmscmpcvi2

If you have any questions or need assistance, you can send me an email at bstern@us.ibm.com

Ben Stern

Leveraging BigData Analytics to Predict and Isolate Application Performance Management problems

Monitoring complex business applications has always been a difficult task.   As more and more applications move to the Cloud and include dynamic provisioning and de-provisioning of resources as application demand grows and shrinks, that task has become even more difficult.  In an ideal world, companies would like to be able to identify key application performance characteristics that are precursors to an application outage or significant slowdown.  With dozens or even hundreds of machines, network switches, storage, hardware appliances, and hypervisors supporting the business application, this can be a daunting task.  In a small enterprise customer with 5000 servers/Virtual Machines, there is typically more than a terabyte of performance and log data per day that is relevant to their environment.  This blog will talk about some of the tools and techniques available for handling these complex problems.

Fortunately, there are some new tools and techniques available for analyzing this massive quantity of data.  There are log analytics tools available for analyzing system logs, middleware logs, and custom application logs to help you isolate problems in your complex business applications.   Analyzing log data is a very interesting area that is only recently providing significant value to the business.  Traditionally, log analytics provided a wealth of data regarding patterns that have emerged across the logs and applications in an environment.   It is certainly useful to know that 30% of your problems are being caused by your Windows 2003 R2 systems.  But, this doesn’t necessarily help you determine the root cause of your problem.  New tools have become available that integrate search capabilities along with the log analytics to allow customers to quickly isolate the cause of a problem and the solution to fixing the problem.   By integrating in run books, data from product manuals, knowledge bases, and more, the solution to the problem is at the operators finger tips.  The operator can see the error message that is causing the problem and, in context, see the search results across problem determination guides, redbooks, run books, and more.

A second application of BigData Analytics is the use of streaming analytics to analyze the performance metrics in real-time.  Current technologies allow organizations to stream and analyze massive amounts of data looking for patterns in the data and anomalies.  After analyzing the business application for a period of time, the BigData analytics tools can establish baselines for normal behavior.  At it’s simplest, streaming analytics can identify when a specific metric is behaving abnormally for that specific time of day or day of week.  That’s great, but some of the current tools can do much more.    They can automatically identify the relationships between metrics on different servers and establish a set of normal business application behaviors.    For example, when transaction rates through an HTTP server increase, there may be a corresponding decrease in response time, a corresponding increase in KPIs on the messaging system, the J2EE server, and the database server.   Those patterns are learned and when the behavior of one of the key application components starts to misbehave, customers can be alerted to the abnormal behavior.   In addition, the products can detect early characteristics of a problem that will ultimately cause an outage.  In some cases, these patterns can be detected hours or even days before the problem occurs.

Using Predictive Analytics to Detect Anomalies

Using Predictive Analytics to Detect Anomalies

As seen in the picture above, normally, the orange line which represents response time is low, even when other metrics such as network performance, CPU utilization, page faults, etc. are increasing and decreasing in a fairly regular pattern.   However, about two thirds the way through the time series, you can see that response time has as sudden spike even though the other performance KPIs are closely following their normal behavior.

Early Detection of a Problem Using Streaming Analytics

Early Detection of a Problem Using Streaming Analytics

In the picture above, you can see a gradual increase in utilization across multiple metrics.   The analytics tools can detect the early staging of these increases before they affect the business application.

As you can see, analytics can be a powerful tool in your arsenal to keep your business applications up and running smoothly.  You may be interested in some of the recently announced IBM IT Operations Analytics products.   You can find out more information at this URL:  http://www.ibm.com/software/products/en/subcategory/SWU70