Main

IT Analytics Archives

November 1, 2006

Welcome...

It’s generally accepted that IT is an enabler of business value. For organisations like Google, Fedex and Amazon that’s a no-brainer, but it’s probably true in most businesses isn’t it? Of course, sometimes, the opposite is also true, either temporarily due to under-performing critical systems or structurally, because the business wants to change faster than its IT lets it.

The trouble is that, while we have a qualitative feel for these factors, it has proven very difficult to quantify the impact that IT has on the business, positive or negative, in practice. There are a lot of reasons for that. But let’s start where the action is – monitoring the deployed IT infrastructure. We need to understand what’s happening on our infrastructure in support of our business, and once we’ve done that, we should be well on the way to understanding the dynamics of the services that run on that infrastructure. Yes?

Well no, sadly. In practice, very little business value is ever returned to organisations from often heavy investments in IT infrastructure monitoring systems alone. I’ve frequently seen FTSE100 organisations spend millions of pounds on multiple brand-name IT management systems to gain visibility of their WAN, server and storage estates in the expectation that service and business insight will follow (after all, that’s what it said on the box). Remember, that’s just the capital spend – the cost of just running these systems can amount to hundreds of thousands of pounds per year. And that’s before we count the professional services bandwagon that the vendors send along to “customize” the system to the organisation’s requirements.

But despite all that spend, the level of insight into how the infrastructure is supporting the business is minimal. Most of these management solutions are marketed as service-level management tools. But they’re pointing in the other direction – toward the infrastructure they monitor. They can tell you maybe too much about disk and network card activity, CPU utilisation, processor queues, database response times, you name it. But if you ask the guy who sits in front of that management system whether this means that your business’ critical systems can sustain the demand created from your upcoming product launch, he’ll scratch his head.

There are three reasons for this. The first is that management systems have no conception of the services overlying that infrastructure and so can’t model the impact of variations in infrastructure capacity and performance on those services. The second is that, for those management solutions that do claim to model services, they do so in a way that is too inflexible to keep up with business change, or they require organisations to dump existing investments in management systems and go uni-vendor. Most people aren’t willing to do that.

The third reason is organisational and is what I call Zookeeper Syndrome. In a zoo (at least as far as I know), there is a specialist for the crocodiles, another for the zebras, the cheetahs and so on. Only the crocodile guy knows how to look after the crocodiles. He doesn’t care about the polar bears, and he sure doesn’t care about how the zoo’s revenue and profit margins are doing. In a large IT organisation there’s a specialist who knows how to monitor the SAN management system, someone else who knows how to produce detailed WAN capacity reports, and so on. They don’t understand each others’ reports and they don’t know about the services that run across all those different infrastructure domains.

It’s not surprising really, as each management system is too complicated in itself for anyone to master all of them and understand the significance of the applications that run over the infrastructure. But, meanwhile, this is all a long way from being able to tell the business how the IT infrastructure supports it, impacts it and enables competitive advantage. Or from knowing when to invest and when not to, when to re-architect and when existing systems can sustain, for example, temporarily increased customer growth during a special product sale window.

Clearly, it’s important to enable basic visibility of infrastructure resource consumption as a first step, and that’s what these systems do. But shouldn’t they do more than that?

Well I would argue No. We shouldn’t look to infrastructure management vendors to solve this problem, but instead view these systems as a necessary but limited component in a much broader information management challenge. In future blogs I’ll start to map out that challenge and how it can be solved in the real world. Who knows, maybe zoos will look different one day too?


November 15, 2007

Are you being fooled by randomness?

“We all so willingly record our gains
Until the hour that leads us into loss
Then every single thought is tears and sadness”

Dante, Inferno

A lot has been written about stock market trader blow-ups – where a previously high-performing star trader loses five times his earnings over the last three years in a couple of hours, and then is escorted, broken, from the premises.

Until this point his strategy was lauded and imitated – clearly whatever trading strategy he was following must have been responsible for his consistent run of results over the years, we think. But it turns out that his previously successful spell was just one path through randomness and that his performance eventually, inevitably, returned to the mean. His “trading strategy” was simply an example of attribution bias, where human nature leads us to fit an attractive explanation to observed results, because we “want to believe.”

This is what Taleb asserts in his book Fooled by Randomness. We take comfort from a strategy, conjured to explain a positive run of luck, but then are badly wrong-footed when the luck unexpectedly runs out.

Taleb argues that stock market behaviour follows a random distribution and that traders are random players, whether they wish to believe it, or not.

To what extent is the same true of IT? How alike are the patterns of behaviour in our systems and application environments to the randomness of the stock market? Do we attribute good performance over a period to superior service improvement strategy, without any real justification for believing so? If that were true then we would be just as vulnerable to a swing in our luck as our sad trader above.

Let’s take an example: your IT department has a bad reputation for application outages. It’s undermining the team’s credibility with the business and prevents you from engaging with the business on a more proactive, higher-value level. You institute a set of new processes and activities designed to improve application performance and availability. For the next few months these figures improve and you attribute this improvement to the steps you took a few months ago.

Should you? How do you know that this improvement isn’t just random chance, unrelated to your activities? I’ve seen exactly this happen more than once. Then two weeks later several major mainframe outages occur in one month and the IT organisation is returned to its former reputational purgatory.

The flaw here was in assuming causality between the actions you took and the subsequent performance and availability figures. Of course, there may have been a relationship, at least in part. But how did you know that? What did you do to verify that? Or did you just want to believe?

To really be sure, we need to measure the before and after of our IT environment in detail and that is what IT analytics is all about. To take one basic but extreme example of what IT analytics prevents: one large organisation upgraded the memory in several of its critical mid-range servers in response to a suspicion that memory capacity limitations were squeezing performance. It was only several months later when it accidentally came to light that, due to a faulty upgrade process, the memory had been halved instead of doubled. Simply measuring the before and after of that change would have saved hundreds of thousands of degraded business hours.

If we truly understand what works and what doesn’t, then we can make confident assertions about sustained service improvement, otherwise we risk a rather unpleasant “return to the mean”, when every single thought is tears and sadness.


January 29, 2008

The Evolution of IT

In nature, retroviruses are more sinister than normal viruses. They permanently change the structure of the cells that they invade and, if these are sperm or egg cells, their RNA is inherited and they become a permanent part of our species, passed endlessly from generation to generation (assuming they didn’t kill the host first, of course). About 8% of our DNA consists of retroviral fragments that infected our ancestors millions of years ago.

There’s a particularly high concentration of retroviral material embedded in the cellular structure of the human placenta, leading to the theory that the placenta has evolved as a defensive mechanism against viruses, shielding the unborn child from attack. The side effect of this development could have been to provide an environment extremely rich in foetal nutrients, which accelerated the development of the mammalian brain and, ultimately, even led to live birth amongst mammals instead of egg-laying.

This is an extraordinary example of the impact of the side effects of an evolutionary response to adversity.

In comparison to natural evolution, information technology evolves at a phenomenal rate; in the span of our careers we are privileged to be witness to the equivalent of millions of years of natural evolution. And, like natural evolution, it’s usually difficult to recognise when a change with profound side-implications has occurred, of which we can opportunistically take advantage.

Examples abound throughout IT. Take the IP protocol, for example. Originally conceived with the flexibility to provide communications through a fractured network in the event of nuclear near-annihilation, its pragmatic robustness made the Internet possible. The subsequent invention of secure protocols, firewalls and identity recognition in response to casual hacking made services like on-line banking conceivable.

IT Analytics emerged initially as a response to the increasing problem of preventing critical application outages and degradations because of the failure of traditional approaches to address these within n-tier application environments. But the evolutionary side-effect is the potential to directly improve the performance of the business itself, not just its IT systems.

A massive amount of data is generated by applications and infrastructure as a side-effect to their central role in business operations. When we apply IT Analytics to this data, we can bring unprecedented insight into how the business itself operates through the lense of its IT systems taken together. Properly contexted, this serves as a means to model business change scenarios, to improve operating margins, to study productivity directly and even to improve employee satisfaction.

This has been the profound corollary to the evolution of IT Analytics; that applying high-end analysis techniques to the reams of data that an organisation’s IT systems produce can provide direct insight into the business bottom-line, not just its IT.

The organisations that recognise this opportunity will develop a significant and sustained operating advantage. However, whether this will put an end to egg-laying, it’s too early to say.


February 26, 2008

Cutting through the complexity

In Shakespeare’s Hamlet, the protagonist reflects: “What a piece of work is a man! How noble in reason! how infinite in faculty! in form, in moving, how express and admirable! in action how like an angel! in apprehension, how like a god! “

And yet, a few scenes later he has killed his uncle, driven his girlfriend to suicide, conducted a fist fight with her brother on her coffin in its open grave in front of the other mourners, stabbed a defenseless old man speculatively behind a curtain, terrified his mother into fearing that he will kill her and has otherwise generally made a nuisance of himself.

Many of the problems we humans create for ourselves stem from our tendency to overestimate our human capabilities, as Hamlet did before he left the rails. So charmed are we by our undoubted ingenuity that we overlook other areas where our minds are poorly adapted. We assume that we are strong in all areas of reasoning because we are strong in some.

As recently as 500 years ago, almost everyone lived their entire lives within a 2 mile radius, meeting very few people and dealing with very little large-scale complexity. That’s not to say life wasn’t hard – in many ways it was much harder than today. Survival, finding food and sustainable shelter, were daily challenges.

As a result, over millions of years, we have evolved extraordinary tenacity and problem-solving skills, but remain under-developed in handling large scale complex scenarios simply because we were never exposed to them in the preceding ten million years. In understanding our world today, we still tend to localise the universe and to construct highly over-simplified models of our context. We don’t even realise that we’re doing it, because we don’t know what we don’t know.

I see both our human strengths and limitations exercised simultaneously in information technology. Our ingenuity leads to the creation of highly sophisticated applications running on miraculous machines. However, we don’t understand the workings of the monsters we have created, though dangerously we think we do.

In almost every instance where I meet development teams, I’m impressed by the sophistication and scalability of the n-tier applications they’ve constructed. And, in almost every case, there are one or two people who’ll tell you they know intimately how the application works. But in every case it turns out that they don’t. The complexity is just too high.

This often leads to bad decisions being made regarding application changes and capacity forecasting or when diagnosing performance degradations. We’re making decisions using mental models of our applications that are too simple, while we find it hard to accept that we can’t deal with the complexity we created.

And that’s when we need to fall back again to our strengths, our ingenuity. That’s when we should apply IT Analytics to the visualisation of our applications. In every case where we’ve worked with clients to bring insight into their application environments, they’ve been amazed by what they didn’t know about their applications, how they really work, what really communicates with what, how they really react to load, how their capacity really grows over time…the list is long.

Armed with that insight, every decision about an application, from forecasting to diagnosis is radically improved. The trend in application complexity is upward; the evolution of our brains isn’t keeping pace. Luckily, IT Analytics is, to hold, as 'twere, the mirror up to nature.

June 23, 2008

The best of all possible worlds?

When ants forage for food, most search within a safe distance from the nest, while a tapering minority venture further, at greater risk, into unexplored territory. The result is a good overall blend of caution and opportunity, in this case the opportunity to discover better food-sourcing conditions.

Ants are, in effect, practicing superior risk management, with a population that exhibits an optimal distribution in its appetite to risk. Being a mobile, expendable (and unwitting) member of a large homogenous swarm undoubtedly makes this approach easier for the ants to operate.

Perhaps, at some point in our history, people too exhibited this swarm behaviour, with a similarly blended risk profile to those of our ants. Thus equipped, we too could have remained alert to previously undiscovered ways in which to optimise our environment, to the betterment of the colony.

But at some point that tapered approach to explorative risk diminished in our population or, at least, became less balanced. Did our increasing self-awareness and desire for self-preservation lead ever more of us to favour risk taking by others rather than by ourselves? In time perhaps we even developed a resentment of those risk takers because they reminded us of that quality’s absence within us.

Whatever the reason, many societies eventually came to overly-embrace caution. It’s more comforting to believe that our current environment is as good as it could possibly be than to challenge ourselves with the task of changing it, or of finding another. This is especially true for any group that happens to disproportionately benefit from the status quo. Most great societal reformers didn’t find their ideas well received at first, for this reason.

This cautious, static mentality eventually came to dominate the philosophy of some western countries in the 1700s, where it was proposed that we must be living in “the best of all possible worlds” (because God had created it) and that everything that happened must therefore be “for the best”, even if any given individual happening was not necessarily a positive experience (like plague, famine, or disembowelment).

To disagree was to confront the Church and for some, like Voltaire, to invite practical exile. And until those states moved past such modes of thinking, human progress was arguably much reduced. But even nowadays, a related mentality is extant in many aspects of our society, including the area of our interest in this blog; large IT organisations.

In most industry sectors, 20 or 30 applications define an organisation, that is to say, the organisation depends on them for its ongoing delivery of service. If we looked at this set of applications across multiple institutions in the same sector, we would find, not surprisingly, that the same types of applications existed in each company. In banking, for example, these would include workflow management, customer account management, document imaging & storage, and so on. And, of course, we would witness much time, effort and emotion being directed into maintaining these applications, their performance, availability and business effectiveness.

But, significantly, all of this intense activity is measured and considered within the confines - and from the perspective of - the respective individual institutions. To continue the financial services example, there’s no way for an organisation to independently benchmark the effectiveness of its workflow management software or customer account solution.

A compounding force is the surprisingly limited movement of IT staff between companies in a given sector, and the consequently narrow perspective with which many applications are specified and operated – there are too few risk-taking foragers.

Most very large organisations incentivise staff to remain with them for years, even decades. But, as a result, there’s little knowledge or curiosity within any given institution as to how effective its applications are at supporting the business when compared to other similar companies. We don’t carry a notion of an “industry average” with us, because most of us haven’t experienced a sufficient variety of examples across organisations in our sector.

Without that outside knowledge and perspective, application sectors only evolve within the constraints of the institution’s direct knowledge and experience. In effect, the IT team behaves as if it lives in the “best of all possible worlds”; despite the day-to-day trials that may befall it, resulting in the same constrained progress as we saw earlier.

I’d like to see us address the need for an “industry average” for application effectiveness by instituting an application audit across specific industry sectors. In this scenario IT Analytics could be used to assess an organisation’s key IT applications against an anonymised industry benchmark.

What if you could compare the architecture, resilience, performance, availability and usage of your key banking applications confidentially against an anonymised industry average for those applications? How would that change your IT planning and investment decisions?

Maybe then, in our little space, we’d eventually arrive at the best of all possible worlds.


July 31, 2008

Applying the "swing vote" principle to IT systems

How many people directly influence the result of a national election? In Britain, for example, the size of the electorate is roughly 30 million people. But the number of people who decide our political future at a general election is actually much less.

The two main parties in Britain each command a “natural” vote of about 30%. In other words, in almost any election, and regardless of any particular policies or the prevailing economic climate, 30% of the electorate will almost always vote for each of these parties. In a similar way, the natural vote of the remaining parties, taken together, is about 20%.

This leaves the 20% of voters – only six million people - whose vote is volatile across elections and who therefore could conceivably influence the overall outcome. But this isn’t the answer to our question either.

Of that electorally volatile 20%, only one fifth live in marginal constituencies, where their variability can overcome the natural vote in those areas. Revising our figure to account for this leaves 4% of the overall electorate. And of that 4%, a quarter of them won’t actually bother to vote on Election Day, leaving us with the answer to our question: that only 3% of the electorate, or 900,000 people - from a total electorate of 30 million - directly influence who will govern the country for the next five years.

That’s less than one in thirty of the electorate. In a country as varied, in a society as diverse, in a culture as opinionated, this is a striking phenomenon, but there it is.

Now consider the activities of two political parties during a general election campaign, where one of the parties knows about this compression and the other doesn’t. It’s easy to see that the first party would very likely conduct a highly focussed, efficient campaign, targetting the issues and desires of swing voters in marginal constituencies. The other party, lost in the scale of the task, would spread its resources too widely, spending a lot of money and time on lobbying voters whose opinions it couldn’t change.

Of course, the political parties in real life know all this, and act accordingly. They drench marginal constituencies in senior cabinet members during the campaign, reserving cheaply printed, rainwater-soluble print brochures for the rest of us.

Even more impressively, the election special televsion pundits can often magically predict the outcome of the entire election, with high accuracy, after the first two or three marginal seats have declared.

Electoral mathematics displays the fortunate property of being controlled by a small subset of the measured population. It’s also fortunate for our parties and pundits that they know which subset this is.

Can the same effect be brought to bear in IT? Could we identify this effect within our application infrastructure and derive analogous benefits of better targetted investment and prediction of issues?

There can be up to 1000 individual measurements obtainable from a typical, large n-tier application. That’s over half a million measurements to trend and analyse across the estate as a whole, in a large organisation. Sumerian’s Service Delivery Analytics routinely monitors all of these for our customers, detecting emerging change.

But with one of our most innovative clients we’re now applying IT Analytics, and in particular, clustering techniques, to identify that small percentage of measurements that most strongly indicate application health and emerging issues. We hope to build deep predictive models around these key attributes, calibrating them with the knowledge of prior history.

By identifying these “swing voters”, and studying them deeply, we hope to advance further, the state of the art in pro-active outage and degradation prevention. We’ll keep you posted as to how we get on.


December 9, 2008

Getting the right blend between automation and human judgement...

Think how much shorter cities would be if we didn’t have lifts. We wouldn’t be much inclined to venture above the 5th floor on foot, so cities would have to be far more spread out. Our economic output might be more restricted because there wouldn’t be enough space for commercial centres to expand as much as they’d need to, to fully compensate for the lost floors.

elevator.jpg
But lifts themselves consume a lot of floor space, not to mention money. So there are usually comparatively few lifts available to serve a high-rise building. How then do we best optimise the use and people-delivery performance of this precious and expensive resource?

Or let’s put it another way. Suppose there are two thousand people arriving for work in a 30 storey building each day. They typically arrive between 8:00am and 9:30am each morning and you have six lifts available to serve them. What’s the most practical method of allocating people to each of the lifts, so that they are delivered to their places of work in a reasonably efficient manner, avoiding overcrowding and short-tempers in the entrance lobby?

When lifts were first incorporated into the newly appearing high-rise office blocks of the world’s major cities, the best answer to this question was by a fully manual process. Each lift had its own operator who would decide the order in which to visit floors within a certain range. A central controller would remain on the ground, directing employees to specific lifts based on the floor to which they were travelling.

This system had some strong points (besides being the only available option at the time). A friendly human interface was one, as was a certain adaptability – for example, when the CEO arrived, the controller could immediately reserve a lift to whisk them straight to the top floor. But the efficiency of this system, relying as it did on human judgement, would deteriorate beyond a certain arrival load, and it was expensive too.

Eventually, this method was replaced by fully automated systems where each lift made its own decisions about the order to visit floors based on the buttons pressed by its passengers. A simple automated central control unit decided which lifts and when to bring back to ground level.

But technology was limited back then, and certain information essential to optimising the delivery of people to floors was no longer available to the decision process. The system didn’t know which floors people were travelling to until they had already entered a particular lift, and lifts weren’t able to adapt their behaviour in the event of, for example, a lift-car being out of order, or the CEO arriving (though some would argue that this was an improvement).

These deficiencies became more visible as buildings became taller and more populated, and queuing in lobbies became more frequent. It has led to the return, in recent years, of a visible central controller to direct traffic, though this time in the form of a machine.

Under this system people select on a console which floor they want to travel to, and the machine-controller directs them to the most appropriate lift.

This provides for optimal lift utilisation and performance, even under peak arrival load (unlike with our original human controllers). But people often find it all a bit confusing and intimidating, so the most recent change has been to have an attendant stand next to the console, advising people on how to work the thing. In one sense, we have come full-circle, however, the service blend of automation and people is much improved over the starting point, and lets us build even higher skyscrapers…

What we’ve just explored here is how the techniques used to optimise performance and utilisation of a scare resource evolve over time. The variables blended during that evolution are: available information, automation, human insight, human and machine interfaces.

This is highly analogous to the evolution of systems’ optimisation and planning in IT. At first, systems could be optimised and planned almost by human judgement and rule of thumb alone. As complexity and scale increased, this approach became unfeasible and heralded attempts to highly automate demand-supply management, forecasting, outage analysis and failure prevention, none of them particularly successfully.

IT Analytics is a further evolution, recognising that only a blend of automated data analysis, human judgement, experience and presentation provides for real insight into these IT challenges on the scale our systems’ landscapes are painted. And that, you could say, is my IT Analytics elevator pitch.


April 28, 2009

Turning data into insight

At Sumerian, we take pride in the Capture-Model-Analyse-Inform approach that guides us in delivering results to our clients: the method that we have developed is an important factor in why our IT Analytics services stand out from the crowd.

With this in mind, Analysis, Plus Synthesis: Turning Data into Insights in the UXmatters Web magazine might provide some food for thought. The article may be discussing the world of usability research rather than the challenges of optimising enterprise IT environments, but some of the underlying methodological problems are similar. People often focus on collecting heaps of data but then don’t pay enough attention to how that data should be handled, ultimately failing to turn the data into insights. For example, the article discusses how a card sort (a popular usability design exercise) may reveal patterns in information architecture but also goes on to note that a card sort can’t tell you what those patterns mean.

Similarly, when it comes to optimising enterprise IT, throwing every piece of data you have into a standard tool and cranking the handle can get you started by identifying problem areas, but real value is added by human experts who can interpret the significance of patterns. For reliable results, you need a holistic method that doesn’t stop at data gathering and enables you to produce actionable insight.

Posted by Katja McLaughlin, Technical Author

June 10, 2009

Is it time to start preparing for the recession's end?

fingers_crossed.jpgMany of us in the IT industry are experiencing the unsettling repercussions of the economic downturn. But the IT industry is no stranger to downturns.

Although it's almost 10 years since the start of the dot.com boom, we can all name companies that rose from its ashes to become some of today's strongest players in the market: Google & Youtube, Amazon, and eBay. And then there's all the others that took influence from their success. Many companies radically modified their traditional business models to remain viable and stay competitive - online banking services grew rapidly, the way we bought food, music, travel and retail goods were all revolutionised. Today, we're seeing some of the ones that didn't change suffer the consequences.

If we take any lessons from this - it's that the ones who started early and prepared for the up-turn whom survived.

In our recent newsletter, we examined how business-aligned capacity planning is helping many of our clients work smarter with their existing IT assets. It's back to basics measures like these that are enabling companies to ride out budget cuts and stay competitive. But what happens when the recession ends, how will you know when to start preparing for growth again?

Two articles that reported on the current state of things grabbed my attention this week - both were from industry analyst Gartner. One was gloomy, one positive.

The gloomy one, reported in CIO.com, focussed on the fact that 4 out 10 CIOs cut their budgets in Q1 of this year. Not good news.

The positive one, reported in Network World, was that Gartner are advising tech companies to start preparing for modest growth in 2010. Citing economists' predictions that the US could start to see economic recovery in late 2009/early 2010, it's well worth a read if not only to take hope from phrases like "that window (of preparation) is now and you will regret it if you miss it."

Let's hope that this is one prediction that comes true.

Posted by Fran Bolton, Web Channel Manager


August 10, 2009

Getting the message about IT Analytics

In this blog we've talked a lot about how visuals go a long way to making complex information easily understood. For us at Sumerian, it's vital to our business that we do that.

Take, for example, what we do as a company: IT Analytics. For a start, what we do is relatively new in the market. Even for people that work in IT, many have never come across IT Analytics before. Where a lot of traditional suppliers can compare and relate their solutions to pre-existing ones, we couldn't do that. No one else was doing what we were doing.

So, to cut a long story short, we had to think of a way to get across what we do in a way that would catch attention and get people interested, from scratch. So how? Well it's a technique that's been around since the turn of the century. Not the last one, though - we're talking 20th! We turned to animation...

On our "what we do" web page, we have a 1 minute movie that explains our service in quite a different, eye-catching way. The movie takes the form of an abstract look at enterprise IT. It depicts IT services as cogs generating data that can be mined to release insight that helps businesses transform. It's been really well received by our clients, and we hope you'll like it too.

Of course, none of it would of been possible without the brilliant imagination, skills and vision of the animator, Selina Wagner (also known as Blobina). The images are hand drawn then replicated into an animation application, giving a superb level of detail to the final look. You can download the storyboard here if you're interested in how it was built.

We'd like to say a huge thanks to Selina, and recommend that you view this other excellent piece of animation by her - a short animated film called ‘Crow Moon’ which premiered at the Edinburgh International Film Festival in 2006.

Story by Fran Bolton, Sumerian.