I bet we would all agree that the ultimate goal of IT monitoring is to prevent IT service outages. But admit, even with all the monitors set up in System Center Operations Manager and alerts configured to notify admins when vital thresholds have been reached, we are still not being efficient enough in solving (not to mention preventing) critical issues on time. Dealing with alerts generated in SCOM has multiple complications. Too many alerts being generated and no clear priority system are a few examples, leading administrators into ‘alert ignorance’. In this article we are going to look at a different approach to identifying upcoming issues in your IT environment which will introduce clarity and guidance into the assorted jungle of alerts and capacity issues.
With the massive amount of data collected in the System Center Operations Manager from all servers and other monitored equipment, IT Operations departments are sitting on a gold mine of data just begging to be used. One of the areas that can benefit from such internal data capital is forecasting. By implementing forecasting processes you can predict the behavior of managed objects some months into the future. This knowledge enables you to act in advance in order to prevent service failures and service level breaches. Most business areas use some kind of forecasting methods when planning new investments, calculating yearly budgets etc. We believe that IT organizations should be no different and start using operational data to gain insights and learn from the past while planning their future.
Operations Manager is good at monitoring performance of separate software components.It also has an interface to bundle them together into groups in order to be able to understand what the health state of the whole as a group is. In SCOM context this is called a Distributed Application. At Approved we treat this (with addition of Live Maps Unity from Savision) as an interface for managing IT Services. As in most cases after getting data into SCOM and then into SCOM Data Warehouse, some day we want to extract and analyze this data or, better yet, use it as a base to predict future outcomes and deal with issues before users even notice them. And, as in most cases, extracting and querying SCOM DW Distributed Application data is not really straightforward. Lets start with finding the ‘services’ themselves.
In previous blog post we looked at custom SLO reporting solution for SCOM. Now we will take a closer look at what can be done with all the alert-related data that is stored in SCOM Data warehouse. Understanding Alert Nature At first it might seem that all we want to see about alerts is just sitting and waiting for us in the alert view (Alert.vAlert). But by now we all know that nothing is as easy as it seems in System Center data warehouses. There are three main points that complicate matters when it comes to alert querying. First of all comes the fact that alerts can be generated either by Rules or by Monitors. And both alert sets end up in one fact table, making it a little more complicated to figure out the exact entity which generated each row. Second thing that might happen is that your query might return results generated by Managed Entities which are no longer available in SCOM (deleted old stuff etc.). And we don’t want those to be cluttering our reports either. Finally, figuring out the current (latest) resolution state is a bit of a painful task (both for the DB server and…