You have both hourly and daily data for service component availability, performance, events and alerts gathered in SCOM. Since it is just heaps of data, it can be difficult to get an understanding whether any of the logged events occur according to some kind of time pattern and when any of them relate to one another. Recognizing reocurrences is very beneficial when managing sparse and at first sight unrelated events. After grouping and filtering all events by an identified time pattern we are able to find correlations and (hopefully) causations of specific happenings. In this blog post we will review the built-in SCOM reports for analyzing this kind of data and will also show you how adding some extra capabilities makes life much easier.
I bet we would all agree that the ultimate goal of IT monitoring is to prevent IT service outages. But admit, even with all the monitors set up in System Center Operations Manager and alerts configured to notify admins when vital thresholds have been reached, we are still not being efficient enough in solving (not to mention preventing) critical issues on time. Dealing with alerts generated in SCOM has multiple complications. Too many alerts being generated and no clear priority system are a few examples, leading administrators into ‘alert ignorance’. In this article we are going to look at a different approach to identifying upcoming issues in your IT environment which will introduce clarity and guidance into the assorted jungle of alerts and capacity issues.