Your Resource for All Things Apps, Ops, and Infrastructure

The 4 Cornerstones of an Unbreakable Intelligent Operations Foundation

As businesses expand into new markets, add new application features, and strive to enhance their customers’ digital experiences, a need arises to redefine the platforms that support these demands.

Today’s leading companies are innovating at the application layer. Applications are being modernized to meet business goals and deployed through methods including on-premises and the public cloud, virtual machines, Kubernetes/containers and serverless features. These application modernization and innovation efforts (Innovate), coupled with new technology for deployment (Build) mean we must change how we Run the environment. We must move beyond old-school monitoring towards an intelligent operations approach that addresses deeper, business-oriented information and actions. 

The promise of intelligent operations is wide and deep. More engaged customers, optimized costs, a greater focus on innovation, decreased mean time to repair, and a deeper operational efficiency are all achievable—with a well-designed strategy. 

By focusing on the four areas below, enterprises can look beyond standard metrics and get insights closer to the customer and their business objectives.   

The 4 Key Intelligent Operations Cornerstones 

Monitoring

Monitoring is changing what you currently see in the environment away from infrastructure performance and availability metrics towards customer experience and business metrics—addressing the adage that, “Slow is the new down.” 

Legacy monitoring and alerting typically provide basic insights like CPU saturation, IOPS performance, or whether something is up or down. With intelligent operations, we want to see insights into customer experience and business metrics (like revenue for example). Users of mobile applications are not patient, if the application is slow it is considered down, and they will move on. Monitoring business metrics lets you truly observe the impact on your business based on how well your applications are performing. In many industries, dynamic dashboards showing revenue loss/gain are more important business metrics than CPU or I/O performance. Monitoring in this way leads us to new methods for leveraging application performance monitoring (APM). 

Correlation

Correlation is being able to connect all the metrics and events in the environment to something meaningful, separating the signal from the noise. 

Enterprises typically have multiple monitoring tools, event logs, and other metrics to contend with in hopes of gaining a deep understanding of their application performance. Oftentimes companies can’t find the “signal for the noise” or their staff experience alert fatigue due to the sheer volume of information. With correlation, we want to tie events and metrics together in a way that winnows down the extraneous noise so that the problem is easily identifiable. There are technologies available to let you keep your existing monitoring tools, yet correlate and compress their information so that you can find the performance problem quickly. For example, one of AHEAD’s customers was experiencing 25,000 alerts per month and these were mapped 1:1 to a ServiceNow incident which essentially flooded the support desk. After deploying a correlation and compression solution, the customer experienced a 97% compression of alerts and a 40% reduction in time to resolution for Tier 1 production applications. 

Service

Service is managing change and provisioning in the environment in such a way that it is tightly tied to monitoring, events, and metrics. Don’t let an undocumented change introduce an outage into your environment. 

Many organizations have workflows around IT service management, change management, provisioning, etc. Are these correlated with your monitoring and event management strategy? With intelligent operations, there is a tight connection between automated workflows for incident response, knowledge of the impact of changes to performance, and the ability to integrate and automate provisioning. For example, integrating correlated events with ServiceNow incidents that are automatically routed to the appropriate responders is critical to intelligent operations. Similarly, being able to correlate a change ticket to a pending event so that remediation activities can begin before a brewing problem becomes an issue is also key. 

Execution

Execution means enhancing the ability to respond quickly and oftentimes automatically, whether it be identifying and remediating a problem or provisioning new resources, ultimately lowering MTTR. 

As a company matures on its journey to intelligent operations it can respond to problems faster and in a more automated way. Runbooks are in place to remediate less manually and many remediation activities are fully automated. For example, pending problems impacting user experience and/or revenue with shopping carts, web sites, or mobile applications can oftentimes be automatically remediated by provisioning more web or application servers.   

The Cornerstones in Action 

The following sample reference architecture illustrates the four solution areas and how they integrate to provide intelligent operations.  

The left side of the graphic indicates the collection of monitoring tools spanning applications and databases, through infrastructure and networking with full-blown logging. BigPanda can be used to analyze and correlate information from the monitoring and logging tools so that alert “noise” is eliminated and only the true signal of the problem is used to create the required incident automatically in ServiceNow. ServiceNow workflows coupled with the CMDB can ensure the correct people are paged and notified to begin root cause analysis. Similarly, remediation workflows in ServiceNow can be used to automatically fix the problem by integrating with automation- and infrastructure-as-code technologies like Ansible, Puppet, Chef, CloudBees, Terraform, or GitHub. And because applications can be deployed on-premises and in the public cloud, this reference architecture is built to support enterprise cloud deployment. 

The Journey to Intelligent Operations Success 

The ability to leverage the components of intelligent operations is often phased in over time as an organization matures its capabilities, tools, and processes. The below maturity model illustrates this progression from reactive (without a strategy) through optimized and finally to intelligent where the business can proactively mitigate issues prior to impact and remedies are performed in a more automated manner. 

AHEAD assists our customers in their journey to intelligent operations maturity by helping to define the strategy, assessing current state and performing tool rationalization and guidance, creating a user-experience and business metric application monitoring foundation, and stitching together the required tools, processes, workflows, and automation capabilities to realize the benefits of intelligent operations. 

To learn more about how AHEAD can facilitate your intelligent operations journey, reach out to our team at contact@thinkahead.com

Subscribe to AHEAD i/o for industry insights,

straight to your inbox.