How better capacity planning can help in the cost-cutting battle
When I was a child, a teacher once told me that science fiction becomes science fact, eventually.
That night, as I watched the cartoon "He-man and the Masters of the Universe", I remember reflecting on her assertion and feeling somewhat sceptical. One of the main protagonists in this cartoon, who went by the curious title of “Man-at-Arms”, had just perfected another of his inventions and was demonstrating it to He-Man. Man-at-Arms invented totally new technologies on literally a weekly basis, and then never used them ever again, after that week’s episode had ended. 
On this particular evening, he was exhibiting his freshly-invented freeze ray. The week before it had been a shrinking ray and not long before that, an ageing ray. He was clearly fortunate enough to benefit from an unlimited research budget at his off-camera, Novelty Rays Research Centre.
I doubted whether we’d ever see such devices appearing on Earth as science fact, and I wondered if, indeed, Man-at-Arms’ research budget was really being directed to where it was most needed. For example, he already had a powerful laser blaster by his side,
as well as the capability to blow things up in a righteous variety of monochromatic hues. But he evidently had nevertheless decided that it would act as a more effective deterrent if he could instead threaten his enemies with freezing, shrinking or ageing.
However, despite all this undoubtedly impressive boutique technology, there still wasn’t any glass in the castle windows, and if you wanted to get around planet Eternia in a hurry, the fastest mode of transport was still a large, roaring, green-and-yellow-striped angry tiger.
But what vexed me most of all was why, after having gone to all the trouble of inventing these impressive weapons of… inconvenience, Man-at-Arms would then spend further research dollars on the even more difficult task of developing built-in reverse switches, which in an instant could unfreeze, un-shrink or un-age, as appropriate, a target. Of course, unlike Man-at-Arms, I hadn’t come across the concept of handy plot devices at that age.
Meanwhile, here on Earth, reversing change is, of course, often impossible and shouldn’t be attempted. For example, we can’t un-bake a cake, un-smash a glass or un-break a heart (and they probably can’t in Eternia either). But there is much in life that ought to be reversible, that in theory is, but that in practice somehow confounds our best efforts to do so.
In IT Operations, this applies very much to the field of capacity planning. In the last five years in particular, capacity planning has been about only one thing: growth planning, that is to say, planning for the deployment of ever more computing infrastructure to meet ever more forecasted business demand.
Now, in these credit-crunched, recessional times, and where growth has reversed in many sectors, organisations are under correspondingly strong pressure to cut costs in line with this falling business demand. But, when talking with IT operations managers and capacity planners, I’ve been struck by the fact that taking out computing capacity is rarely taken seriously as an available cost-cutting option. Is there no reverse switch to all that capacity growth of recent years?
Instead, IT cost-cutting this time round is again generally following the familiar path of reducing contract headcount, reducing permanent headcount and squeezing 10-20% price reductions from suppliers. At the beginning of the proto-typical programme of cost-saving initiatives, capacity reduction is often on the table, but it rarely ends up as one of the options that were carried out. Why is this?
There are several reasons typically put forward. For example, that demand may pick up again soon or that the computing assets have already been purchased and aren’t fully depreciated yet. But I don’t find these types of reasons to be particularly convincing. Significant savings in power, cooling and backup would be realised by consolidating and turning off servers, leaving them in-situ until demand returned, or even by temporarily removing them from the data centre into storage, thereby saving precious data centre space. Remember that ‘temporarily’ could mean 2-3 years in this recession, and could even mean, in the aggregate, that a costly data centre migration could be postponed.
I believe that the overriding reason capacity reductions are so rarely executed in response to reduced business demand is that application capacity planning models across the industry are still, in general, woefully inadequate. They usually take little or no account of demand and consequently lack a detailed and explicit understanding of the relationship between it and the utilisation and performance of the supporting application infrastructure. The state of the art in capacity planning in most organisations, in this twenty-first century, is still to fit straight lines through infrastructure metrics charts in isolation, with no consideration given to the demand data at all.
We get away with this in times of growth because then it doesn’t really matter if our planning approach is highly inaccurate, so long as the error lies on the side of overprovision. After all, the only major downside to conservatively throwing too many servers at an IT infrastructure is cost, which isn’t an issue when IT budgets are swelling.
“We work out what capacity we need and then deploy twice as much server capacity, just to be on the safe side” is a familiar refrain of operations directors. One even told me: “this application is critical so, whenever I see the developers in the corridor, I stop them and ask them if they’d like more money for additional production servers, just in case”. Of course, he didn’t intend me to take this remark completely seriously (I think), but these remarks well illustrate our lack of faith in capacity planning accuracy and our tolerance of that inaccuracy.
But, when it comes to capacity reduction, there is simply no equivalent crude-yet-workable reverse approach. Where would you start, in attempting to reduce server capacity, without damaging the operational performance of an application? Which downwards sloping line would you draw through the infrastructure data and how long would that line be, where should it stop? What servers would you switch off and how many, to match the new business demand? Who would take that decision and based on what information?
When it comes to taking out infrastructure capacity, it becomes essential to first quantify the correlation between business demand and the corresponding load on the supporting application infrastructure. You need to know how the CPU, memory, threadcount etc., of each application server varies per unit of business load. If you don’t know that, you can’t predict how these parameters will alter when demand decreases, so you can’t determine by how much server capacity can be reduced whilst still preserving performance. Nor can you say for how long you can run at the reduced capacity in the face of recovering business volumes.
Until these business-to-infrastructure relationships become essential, standard parts of the capacity planning models of IT organisations, a reverse switch for capacity growth - and its associated cost savings - will remain, along with freeze, shrink and aging rays, securely in the realms of science-fiction.
With the recent weather showing no signs of letting up in the next few weeks in terms of temperature, and with further snow storms forecast, various seams in the UK Infrastructure are beginning to show signs of stress.
It’s more important than ever for organisations to have a strong handle on their IT capacity. Business volumes and technology are changing at an increasing pace – more and more IT Services rely on virtualised infrastructure with configurable and shared capacities, and IT shared services (or outsourced services) are costed and priced on the capacity provided.