There’s a great quote in my favourite film, The Empire Strikes Back, during Han’s attempt to escape from The Empire through an asteroid field.
C-3PO: Sir, the possibility of successfully navigating an asteroid field is approximately three thousand, seven hundred twenty to one.
Han Solo: Never tell me the odds!
While most of use aren’t attempting to escape from the Galactic Empire on a daily basis, it’s not unusual for us to run a little bit of risk around some pretty mundane stuff. It can add a bit of excitement to the normal working day. “The light is changing to amber – should I go for it?”, “I could probably get away with carrying three cups of tea”, “If I just stretch a little bit further, I’ll be able to reach the bulb and not need to get the ladders out”, “You know… that server always raises that event around this time and nothing has happened… I think I can just ignore it”
And… most of the time we get away with it… and quite often after we get away with it, we feel pretty chuffed with ourselves. A small victory.
But every now and again things don’t go to plan. The light changes to red and you get a fine… or worse. You drop the hot tea down yourself… or a co-worker. You over-stretch, lose your balance, and fall to the floor. The server tops out of capacity causing a critical business outage that needs to be reported for regulatory purposes, and the business faces a substantial fine. These usually happen because you didn’t really appreciate all the factors that contribute to success or failure. You didn’t know the odds.
What if you could have your own C-3PO calculating the odds for you? What if, in the first example, factors like the previous history of traffic in that area, the amount of ‘amber gamblers’ that get fined, the amount of accidents, the weather conditions, the volume of traffic… huge numbers of metrics, could be crunched together quickly to give you a risk score for your action.
‘If you don’t brake now you run a 73% chance of a fine’. You decide to brake.
‘If you attempt to carry 3 cups of tea, you are running a 58% chance of burning a colleague’. You decide to make two trips to the kitchen.
‘If you don’t use ladders for this job, you run an 87% chance of breaking your leg’. You decide to go and get some ladders.
‘If you don’t add 3 TB of storage and another blade to this cluster, you run a 64% chance of a capacity issue with a business critical service which will cost your business £175.000 in fines.’
Makes you think a bit differently doesn’t it? This is why Han didn’t want to know the odds.
In enterprise IT, knowing the quantified risks associated with an infrastructure component, an application tier, a business critical service, is absolutely vital to help you prioritise what needs to be done to ensure business continuity. The business stakes are too high to rely on guesswork or gut instinct, to decide whether or not a risk is worth taking. Just look at what happens when things go wrong…an entire airline gets grounded with thousands of passengers stranded around the globe (http://bgr.com/2016/08/14/delta-finally-explained-how-one-power-outage-grounded-an-entire-airline/), high profile publicity for all the wrong reasons and serious impact to product rollout and revenue streams ( http://venturebeat.com/2016/07/21/pokemon-go-servers-down-for-many/).
Predictive analytics can be used to look across all the IT metrics that are important to your organisation, from cpu and memory utilization through to application response times, and calculate an accurate risk score. By analysing things like the variance in each individual metric and the demand profile, and combining this with other Business Service data from a CMDB, risk can be aggregated accurately to a Business Service level.
A score isn’t enough though – you need to be able to do something about it and provide the evidence to support any change.
The latest predictive analytics can help here too – using advanced scenario modelling to let you play out different ways to mitigate the risk. Should you introduce a new piece of hardware? Could you move workloads to an overprovisioned part of the estate and make use of that excess capacity? Could you move those workloads into the cloud? How many new web servers need to be spun up to make sure response times stay below a threshold? With every scenario you try you can instantly understand the risk impact, allowing you to make the changes that will have the biggest impact for the optimal cost.
So, while predictive analytics solutions like Sumerian CPaaS aren’t fluent in over 6 million forms of communication, they can tell you the odds.