Main

Capacity planning Archives

March 6, 2009

How better capacity planning can help in the cost-cutting battle

When I was a child, a teacher once told me that science fiction becomes science fact, eventually.

That night, as I watched the cartoon "He-man and the Masters of the Universe", I remember reflecting on her assertion and feeling somewhat sceptical. One of the main protagonists in this cartoon, who went by the curious title of “Man-at-Arms”, had just perfected another of his inventions and was demonstrating it to He-Man. Man-at-Arms invented totally new technologies on literally a weekly basis, and then never used them ever again, after that week’s episode had ended. heman_1.bmp

On this particular evening, he was exhibiting his freshly-invented freeze ray. The week before it had been a shrinking ray and not long before that, an ageing ray. He was clearly fortunate enough to benefit from an unlimited research budget at his off-camera, Novelty Rays Research Centre.

I doubted whether we’d ever see such devices appearing on Earth as science fact, and I wondered if, indeed, Man-at-Arms’ research budget was really being directed to where it was most needed. For example, he already had a powerful laser blaster by his side, Altocumulus Lenticularis, taken by Hew Bruce-Gardyne, Sumerian Analyst, whilst on holiday in Perthshire last summeras well as the capability to blow things up in a righteous variety of monochromatic hues. But he evidently had nevertheless decided that it would act as a more effective deterrent if he could instead threaten his enemies with freezing, shrinking or ageing.

However, despite all this undoubtedly impressive boutique technology, there still wasn’t any glass in the castle windows, and if you wanted to get around planet Eternia in a hurry, the fastest mode of transport was still a large, roaring, green-and-yellow-striped angry tiger.

But what vexed me most of all was why, after having gone to all the trouble of inventing these impressive weapons of… inconvenience, Man-at-Arms would then spend further research dollars on the even more difficult task of developing built-in reverse switches, which in an instant could unfreeze, un-shrink or un-age, as appropriate, a target. Of course, unlike Man-at-Arms, I hadn’t come across the concept of handy plot devices at that age.

Meanwhile, here on Earth, reversing change is, of course, often impossible and shouldn’t be attempted. For example, we can’t un-bake a cake, un-smash a glass or un-break a heart (and they probably can’t in Eternia either). But there is much in life that ought to be reversible, that in theory is, but that in practice somehow confounds our best efforts to do so.

In IT Operations, this applies very much to the field of capacity planning. In the last five years in particular, capacity planning has been about only one thing: growth planning, that is to say, planning for the deployment of ever more computing infrastructure to meet ever more forecasted business demand.

Now, in these credit-crunched, recessional times, and where growth has reversed in many sectors, organisations are under correspondingly strong pressure to cut costs in line with this falling business demand. But, when talking with IT operations managers and capacity planners, I’ve been struck by the fact that taking out computing capacity is rarely taken seriously as an available cost-cutting option. Is there no reverse switch to all that capacity growth of recent years?

Instead, IT cost-cutting this time round is again generally following the familiar path of reducing contract headcount, reducing permanent headcount and squeezing 10-20% price reductions from suppliers. At the beginning of the proto-typical programme of cost-saving initiatives, capacity reduction is often on the table, but it rarely ends up as one of the options that were carried out. Why is this?

There are several reasons typically put forward. For example, that demand may pick up again soon or that the computing assets have already been purchased and aren’t fully depreciated yet. But I don’t find these types of reasons to be particularly convincing. Significant savings in power, cooling and backup would be realised by consolidating and turning off servers, leaving them in-situ until demand returned, or even by temporarily removing them from the data centre into storage, thereby saving precious data centre space. Remember that ‘temporarily’ could mean 2-3 years in this recession, and could even mean, in the aggregate, that a costly data centre migration could be postponed.

I believe that the overriding reason capacity reductions are so rarely executed in response to reduced business demand is that application capacity planning models across the industry are still, in general, woefully inadequate. They usually take little or no account of demand and consequently lack a detailed and explicit understanding of the relationship between it and the utilisation and performance of the supporting application infrastructure. The state of the art in capacity planning in most organisations, in this twenty-first century, is still to fit straight lines through infrastructure metrics charts in isolation, with no consideration given to the demand data at all.

We get away with this in times of growth because then it doesn’t really matter if our planning approach is highly inaccurate, so long as the error lies on the side of overprovision. After all, the only major downside to conservatively throwing too many servers at an IT infrastructure is cost, which isn’t an issue when IT budgets are swelling.

“We work out what capacity we need and then deploy twice as much server capacity, just to be on the safe side” is a familiar refrain of operations directors. One even told me: “this application is critical so, whenever I see the developers in the corridor, I stop them and ask them if they’d like more money for additional production servers, just in case”. Of course, he didn’t intend me to take this remark completely seriously (I think), but these remarks well illustrate our lack of faith in capacity planning accuracy and our tolerance of that inaccuracy.

But, when it comes to capacity reduction, there is simply no equivalent crude-yet-workable reverse approach. Where would you start, in attempting to reduce server capacity, without damaging the operational performance of an application? Which downwards sloping line would you draw through the infrastructure data and how long would that line be, where should it stop? What servers would you switch off and how many, to match the new business demand? Who would take that decision and based on what information?

When it comes to taking out infrastructure capacity, it becomes essential to first quantify the correlation between business demand and the corresponding load on the supporting application infrastructure. You need to know how the CPU, memory, threadcount etc., of each application server varies per unit of business load. If you don’t know that, you can’t predict how these parameters will alter when demand decreases, so you can’t determine by how much server capacity can be reduced whilst still preserving performance. Nor can you say for how long you can run at the reduced capacity in the face of recovering business volumes.

Until these business-to-infrastructure relationships become essential, standard parts of the capacity planning models of IT organisations, a reverse switch for capacity growth - and its associated cost savings - will remain, along with freeze, shrink and aging rays, securely in the realms of science-fiction.

January 7, 2010

Capacity planning in the "big freeze"

Image of Britain taken by Nasa's Terra satellite on 7 Jan 2010 With the recent weather showing no signs of letting up in the next few weeks in terms of temperature, and with further snow storms forecast, various seams in the UK Infrastructure are beginning to show signs of stress.

It has been reported just today that gas supplies are running dangerously low, with power companies exercising the interruptible contracts that many manufacturing companies signed up to ensure gas supplies to domestic users.

There has been the well reported lack of grit stocks from councils all over the UK whereby only A and B roads are being cleared and has left many people stranded in their homes, not daring to venture out for anything other than necessities or emergencies (indeed the Police have advised many in the worst hit areas to do just this). We have also witnessed the inevitable interruptions to public transport due to adverse weather conditions, making the journey into work nigh on impossible for some. Just last week, I was in this position myself after a lorry jackknifed on the M8 and slid down the railway embankment leaving both the train line and motorway closed for 3 hours.

All in all, it's a difficult time for businesses and contingency plans are being put to the test. And none more so than IT services. With so many people stranded at home, there's increased demand on IT for remote access services, and web sites, particularly for public information such as rail and traffic reports, have been experiencing delays and outages due to high peaks in demand.

Faced with such difficulties, it's the organisations who can continue to operate in these testing conditions that will gain a competitive advantage.

Last year, the threat of swine flu threw business continuity plans a curve ball. In much the same way, 2010's snowy start is raising similar questions to how IT departments can best prepare for the worst.

If your IT is struggling to cope with the increased demand placed on it this week, take a look at our services and find out how IT Analytics capacity planning and scenario modelling can help you to be better prepared.

By Lynn Allan, Sumerian Analyst

May 26, 2010

Six reasons why IT capacity management fails

Measuring_jug.jpgIt’s more important than ever for organisations to have a strong handle on their IT capacity. Business volumes and technology are changing at an increasing pace – more and more IT Services rely on virtualised infrastructure with configurable and shared capacities, and IT shared services (or outsourced services) are costed and priced on the capacity provided.

Continually rightsizing the IT environment strikes that important balance between cost and service quality, but instead, many organisations continue to oversize their IT environments. A capacity management practice could address this, but as you will see, many teams struggle to get them off the ground – and not for lack of trying.

A capacity manager is appointed, a team of capacity analysts is formed, they run workshops, write process documents. It’s easy at this stage, frameworks like ITIL and IT-CMF supply the basic principles that you need.

The capacity analysts then get to work to start the implementation. ITIL states that you need to “perform modelling for business, service and resource capacity activities”. The rationale is that you compare all aspects of the service (utilisation, performance, availability, resource usage) against business volumes to extract a predictable relationship. Once you have that relationship, you can have conversations with the business about current capacity in business terms, investment required to accommodate growth etc. Finding the predictable relationship is the “Holy Grail” for the capacity analyst.

An IT service is selected and the capacity analysts get to work to build the model and uncover the relationships. And this is where the problems begin:

1. The data sets are large. In order to find that predictable relationship, you need data at a fine granularity. This means data files with one record per transaction, or log files with activity every 5, or 15 minutes. Very quickly you reach the limits of spreadsheets, you need databases. Timescales increase and costs rise.

2. You need to consider the full end-to-end service. There are many dependencies within any multi-tier service: network components, web server, app server, database, back end, software, JVMs etc. There is no point in increasing capacity in the database if the real bottleneck is in the network. You need to bring in more data.

3. Capacity is not just about the infrastructure. Capacity analysts tend to focus on CPU utilisation but what about the application layer? More input data required.

4. What about performance? In reality performance will degrade before infrastructure capacity runs out, so what is the real service capacity expressed as a measure of performance. You need to correlate performance data too.

5. The business volumes are complex. Most systems process more than one type of transaction, with each transaction type having a different impact on the system. No matter how many charts you look at – you may never see a relationship. You need advanced statistics to uncover it.

6. CHANGE HAPPENS. The infrastructure may change, the application will change, upgrades will be scheduled, the business will launch new products. Each change has the potential to change the capacity relationship. So you need to understand when change happens and repeat the entire exercise. And when the format of the input data has changed – you might as well start again from scratch.

It is no surprise, then, that the capacity management programme is disbanded after 6 - 24 months. It is resource intensive, too expensive, and the results delivered too late.

The result is that organisations fall back on the usual siloed approach to capacity planning. Examining the utilisation on a platform by platform basis and reacting when capacity looks to be running short.

And when IT ask for more investment and the business replies with how many pensions/ mortgages/trades/accounts/users can the system handle at the moment? You will just have to blush.

Sumerian runs capacity management services for our clients’ most critical IT applications.

We have a capacity management service that combines a cutting-edge analytics platform with human expertise. We have the statistical capability to uncover the predictable relationships between business and IT. All you need to do is provide the input data – and we can take lots of it!

Posted by Mike Allan, Sumerian Partner