What we do

Sumerian Analyst

Contact us

Anatomy of an outage

Although there are some outages and service degradations that are unavoidable - most are, in fact, preventable.

At Sumerian, we have identified three main contributors to poor service performance. We classify these as:

Our Service Delivery Analytics service focuses on providing insight across all three of these contributor categories.

Smoking gun

Smoking guns

Single issues that will directly lead to outage or degradations

This is the least common cause of application failure, causing less than 20% of outages, but ironically, this is the only category of outage-cause that most IT organisations look to address.

Background hazzards

Application Environment

Moderate to severe deviations from expected application/infrastructure behaviour

Causing around 30% of outages, these are minor to medium variations in normal application behaviour. For example, imbalanced Web farms, higher-than-specified database thread counts and higher CPU per user than specified. Taken together, these create an environment prone to outages.

Systemic issues

Organisation Environment

Processes and behaviours that create conditions for outage

Accounting for over 50% of outages, these are typically caused by organisational barriers and communication issues between teams. Examples include over-use of emergency patch procedures, poorly specified test environments and failure to assess the results of application or infrastructure changes on application performance.

What Sumerian searches for, in addition to locating the ‘smoking guns’ that will directly lead to outages, are the application and organisation environment issues that, if addressed by your team early, enable you to proactively reduce the inherent risk within your IT service environment. This creates a truly proactive environment where outages and performance degradations are less likely to occur.