At Sumerian, we’re seeing a clear and increasing interest in helping our customers to take their IT service management to the next level. This is happening even though we see ITIL exam candidates declining year on year. The reason is that the product suites supporting effective ITSM are becoming more powerful and better integrated with enterprise process, and so are automating more and reducing the need to train large numbers of people in ITIL.
Garbage in, garbage out
Most IT service management is ‘workflow’-based, and a well implemented and integrated service management platform can successfully automate and underpin many best practice processes. But any workflow, automated or manual, is only as good as the results it yields. Let’s take a look at a typical ITSM workflow…
It’s all in the workflow – isn’t it?
The business is running an advertising campaign and expecting a significant increase in web traffic to an online service over the next 3 months. Your first job as Service Manager is to take this business requirement and translate it into an application or service requirement. How do you know how much capacity will be required across physical and virtual servers, storage and networking? It’s not likely to be supported by detailed models, so this will almost certainly be a very rough estimate. Most likely erring significantly on the side of caution and so over-provisioning. Using this estimate, you request a bunch of new instances across the application tiers into the service management portal and the automated workflow kicks in and creates a change ticket. Job done.
The next stage in the workflow is normally manual and now requires the operations team to determine whether or not that capacity can be realised and where it will come from. This is often supported using nothing more than the latest data from a real time monitoring tool or a spreadsheet exercise. Neither can provide sufficient accuracy or confidence, particularly if the workloads in questions support critical business functions. At this point we have your hunches and assumptions built on top of the operations team’s hunches and assumptions. Not a recipe for success.
Once the hunches and assumptions have been used to reach a conclusion, a change ticket is raised, probably with an update including some recommendations, but possibly not. It’s a manual process after all. Once the change is approved, the next stage is probably manual as well. The actual deployment of the capacity. Some environments will have automated orchestration around this, and leading service management frameworks do provide this.
And that’s that… Perfect execution of the process! The auditor will be pleased.
The advertising campaign is a big success, fuelled on by social media the volumes to the online server are on the high end of the predictions. It turns out that the web tier can’t cope. Your estimate for number of instances was wrong and more capacity is required. That’s ok, there’s space in the cluster so more web servers can be spun up. A little bit of panic, some more grey hair, but the performance problems weren’t prolonged enough to cause any reputational damage.
For the next few weeks, it’s running well. The business is pleased. Good job. However, the workloads were placed in a cluster that’s also hosting a month end service. At the time of placement, the cluster had plenty of capacity, but now it doesn’t. That end of month service kicks in and suddenly the cluster is out of capacity. Both the online service and month end service find themselves significantly constrained. Performance falls off a cliff. The portal stops responding. The advertising doesn’t stop. Customers are complaining online… there’s a hashtag trending… the month end service doesn’t complete in time which means a regulatory requirement isn’t met. The boss isn’t happy. He wants a word….
Waking up from the nightmare
Ok – so I’ve painted a bit of a nightmare worst case scenario, but at no point in that story was process avoided. All the ‘i’s were dotted and the ‘t’s crossed. There’s an audit trail leading all the way back to you (and now you wish there wasn’t).
The problem here wasn’t workflow… it was information flow. Or rather, lack of it.
Capacity Planning and Management is the key to information flow. It’s the process that embeds vital insight into your service management workflows and lets you make smart decisions. Done well, it removes the guess work by arming you with accurate data and evidence. It provides a full understanding of what’s going on across the IT estate, the compute, storage and networking demand. It joins the dots across growth trends and the correlations between volumes and infrastructure utilization. It’s provides peace of mind and confidence, enabling Service Managers and operations teams to make better, well-informed decisions.
Let’s run that again…
So – with automated capacity planning integrated into your service management platform workflow… let’s see how that scenario plays out now:
The business is running an advertising campaign and expecting a significant increase in web traffic to an online service over the next 3 months. Your first job as Service Manager is to take this business requirement and translate that into an application or service requirement. You speak to marketing to get their projections and determine how that translates into online page hits. You take these numbers and plug them into your capacity planning tool. You add some growth over time and create a number of growth scenarios and, as a result of understanding how those volumes correlate with IT resource utilization, you accurately determine the amount of new VMs needed across the tiers to support those volumes. You also identify whether or not the cluster hosting the service has the capacity required. It turns out it doesn’t – as there’s a business critical service that demands significant capacity once a month. You’re going to need to add some new blades into this cluster to allow both those services to run at the same time without competing for the same precious resource.
As your capacity planning tool is taking data directly from your CMDB and Catalog, you can tell which services are affected and you can also determine the cost of the new hardware Catalog items you’ll need. The business can make the decision whether or not that’s a cost they’re willing to pay. You suspect they are. The evidence is there and the budget is there to support it. You commit that change scenario which automatically raises a change ticket and the workflow kicks into action. You make a cup of tea.
The advertising campaign was a big success, fuelled on by social media the volumes to the online server are on the high end of the predictions. The web tier is humming nicely. The weeks roll on and the month end process starts and completes in time. The volumes increase, the trending continues and the reputation of the business benefits. The boss is happy. He’d like a word….
If you’re interested in learning more about this topic, why not schedule a talk with one of our experts?