
PALO ALTO, Calif., May 31, 2005 — According to a poll taken at Gartner's Data Center Conference in December of 2004, "IT operations organizations are focusing on end-to-end response-time monitoring technologies as their first priority for availability- and performance-monitoring investments in 2005."
To understand how end-to-end monitoring impacts the challenge of managing Web application performance, Symphoniq convened a roundtable of seasoned system management experts to discuss the implications:
In the conversation that followed, Wayne, Russ, and Rich shared concrete examples from their own experiences and shed light on how the world's leading IT organizations are approaching the challenge of managing Web applications.
Symphoniq: Thank you again for taking time out of your busy schedule to talk with us. I'd like to begin by asking each of you how you're detecting problems and tackling the issue of managing end-to-end Web application performance.
Rich Burton: It's harder to work with Web applications than the old client-server environment. More and more people want Web applications, and the business side is using these applications outside the datacenter. As a result, IT learns about most customer complaints after the fact, via the phone.
What we try to do is to identify the common denominator for Web-enabled applications. The URL is one. It doesn't matter what's happening behind the scenes, because every user comes in through the URL, the business can be represented by a set of URLs the user needs to access.
Symphoniq: How about you, Russ? You're running one of the world's largest e-commerce sites. What kind of challenges do you face?
Synthetic transaction monitoring is good for baselining, but it doesn't predict how people will use an application.
Russ Kieckhafer: Web application traffic and usage is very unpredictable–users can always find something to totally skew the typical user behavior patterns coming into the site. And synthetic transaction monitoring is good for baselining, but it doesn't predict how people will use an application.
Symphoniq: So if synthetic monitoring isn't enough, what methods do you use?
Russ Kieckhafer: If the site is slow, we know because we trend and forecast sales. We know how much stuff we should be selling in real time. If there's a drop-off in sales, I can detect it by 10 or 11 AM in the morning, based on where the sales should be at that time of day.
We also rely on the helpdesk to detect issues. When the customer calls, the help desk can open a ticket right away. Our IT people check in with the call center every shift, and monitor the call queues as well. If the number of calls spikes, we call in to find out what's going on.
The biggest pain is when the user knows before you do. Then you're always in catch-up mode.
William Foley: We also find out about the majority of problems from the helpdesk. There's a lot of folklore involved. Someone will complain and say that the application is slow, and then the investigation begins.
Sometimes the EVP will call up and say, "Jeez, it's really slow today."
The biggest pain is when the user knows before you do. Then you're always in catch-up mode.
Symphoniq: What would help deal with these issues?
Russ Kieckhafer: The biggest thing by far that would most improve the lives of my people would be a real-time understanding of what the user is experiencing, not just synthetic monitoring. If the users are having a problem, most people don't see it until the system goes down.
Symphoniq: When these kinds of Web application performance problems occur, what are their costs? Wayne, you're processing millions of dollars in financial transactions per day. How do performance problems impact your business?
Web performance problems always disrupt planned activities. I've had people disappear for weeks.
William Foley: The main costs are lost productivity and sales.
We're in a saturated banking market, and customers are sensitive to response time.
For example, the employees in the branches have customers staring at them when they use our Web apps. They're expecting two second response times, and if they don't get them, the customer may lose patience.
There is also the cost of resolving problems. Web performance problems always disrupt planned activities. I've had people disappear for weeks. In general, folks might spend five to seven hours per week in meetings. My gut feel is that there's about 10% overhead that we spend on fixing problems.
The operation is the business. When the customer can't get to the site, we lose sales.
Russ Kieckhafer: For us, the biggest issue is lost revenue and customers. The operation is the business. When the customer can't get to the site, we lose sales.
There are also infrastructure costs. Our hardware and software is highly redundant. We're constantly redesigning the application to cover things we didn't expect, like handling multiple suppliers when we planned for only one.
Symphoniq: How about you, Rich? You're primary dealing with internal applications. Are their significant costs there?
If the users aren't satisfied with performance, they stop using the application, especially if they have other means like the fax or phone.
Rich Burton: First, there's the cost of lost business. Losing one day's worth of sales could cost millions.
Then, there's the cost of over-provisioning. IT needs to right-size its investment strategy. We should not charge the business with the cost of storing the same data in multiple locations for risk mitigation unless that's of value to the business. The trend is that people want to understand the business impact and cost of performance failures and invest accordingly.
User satisfaction is also an issue. If the users aren't satisfied with performance, they stop using the application, especially if they have other means like the fax or phone.
Symphoniq: So let's say that a performance problem occurs. Walk me through how you would diagnose and resolve these issues.
William Foley: The first thing is to determine which technician to contact. It sounds trivial, but it's not. It can be hard to know who has the ability to fix the problem, or to link a server to the person responsible.
If the problem is bigger, we convene a Technical Recovery Team (TRT) with about six senior techs on it to look deeper. We set up a conference call, and everyone sits at their desk with their tool of choice so that they can look at the various sides of the problems. Sometimes they email reports back and forth. That happens a few times per week.
Once a month or so, we might need a System Recovery Team (SRT). That's a 20 person team that's basically all hands on deck. There's only about 20 key applications that qualify for an SRT, out of the 600 or so that we run.
Symphoniq: Sounds like quite a challenge. Is that true for you as well Rich?
Rich Burton: Resolving problems can be tough. Often times, the techs taking the call from users don't understand the app well enough to diagnose the issue while the user is on the line. This means that they have to collect enough information to route the ticket properly. The problem is that if a ticket bounces around two or three people, you can't keep asking for the same data or the user feels like you don't know what you're doing.
Usually the problems span multiple areas of responsibility. Getting people to agree on the source and agreeing to coordinate the fix across those multiple areas is a challenge. It's also expensive–you have to pull high level people off deployment or design-related work.
It's also hard to find the right data. I talked with a company the other day that had installed Quest for Siebel. The guy bought the product because Siebel told him that he needed it, but it was producing so many alerts per day that he might as well not have turned it on.
Symphoniq: How do you recommend solving this data overload?
Measure what you need to manage–no more, no less. The value is when you link all the data together, rather than just putting it into separate silos.
Rich Burton: Measure what you need to manage–no more, no less. The value is when you link all the data together, rather than just putting it into separate silos.
Symphoniq: How about you, Russ? What kind of resources do you devote to troubleshooting?
Russ Kieckhafer: We have a couple of groups that do nothing by watch the applications for problems. There are two or three people per shift.
When we do have a problem, others get involved right away. Even if we only suspect there's a problem, we get the backup team involved. Our 24-7 operations team is trained not to wait, and they have permission to call senior people at any time. Critical issues go all the way up to the CEO, and alerts are sent to many different departments. For example, if there's a site outage, the PR people get the information they need to filed press calls.
When the customer calls, the help desk can open a ticket right away. Our IT people check in with the call center every shift, and monitor the call queues as well. If the number of calls spikes, we call in to find out what's going on.
Symphoniq: It sounds like you've build quite a lot of problem resolution infrastructure. How long did it take to perfect your system?
Russ Kieckhafer: It took us five years to built this system, growing from 12 people to 600.
Symphoniq: How do you communicate the value of IT activities to business management?
Russ Kieckhafer: We're joined at the hip. We talk every day. For example, our business people watch sales and transaction volumes in real time. If things aren't up 100% of the time, we haven't met the SLA.
Another good example is a new email marketing campaign. We have to be ready from a capacity perspective–they can't just send an extra million people to the site without our knowing it.
William Foley: We have some pretty formal processes in place. We survey them on a weekly basis. For example, if the cash management line of business has six major products, we talk to their various sub-groups on a weekly basis and ask if they like the services.
This organization is amazing in terms of using surveys and interviews to make sure that its efforts are on track.
The executives also get a copy of everything that goes wrong anywhere, about 3-10 issues per day. On Friday afternoons, we have a weekly meeting to review the week–anything the customers thought was important, the nature of the problem, and what we're doing to fix it.
Symphoniq: How do the business executives value your efforts?
If there's a dependency, you have to help the business understand how they rely on various IT systems. Only then are the business users going to get it and understand the value, and then it's easier to sell IT inside the company. This is the future of IT and business.
Rich Burton: Right now, the business asks questions like, "Why isn't this happening?" They focus on what IT isn't doing for the business, rather than all the things that IT does do.
It's IT's job to make sure that the information that the business guys see is useful. IT needs to map to the business function, and not vice versa. This means that the IT guy needs to think in business-level terms.
The business cares about the impact on the business. We need to raise our level of thinking to line up with the business. The higher level message is how does any given application support your enterprise or business function, and the information you gather has to provide that understanding. If there's a dependency, you have to help the business understand how they rely on various IT systems. Only then are the business users going to get it and understand the value, and then it's easier to sell IT inside the company. This is the future of IT and business.
Symphoniq: Gentlemen, thanks again for taking the time to talk. I know that our customers and other Web site visitors will benefit from the experiences you've shared.
Symphoniq's TrueView product suite lets IT operations detect and diagnose performance problems more quickly and accurately, reduce operational costs, and improve customer satisfaction and communication with stakeholders such as business management. People who are interested in learning more can read about TrueView at http://www.symphoniq.com/products/, or download product information or contact our sales team at http://www.symphoniq.com/company/more-info.php.
Symphoniq Corporation (www.symphoniq.com) created the patent-pending TRUE™ (The Real User Experience) monitoring technology to track real responses times from the browser to the back-end. Symphoniq's TrueView product suite simplifies the complexity of managing Web applications and services. Based on the TRUE technology, TrueView analyzes end-to-end performance, making it easy to find and fix problems and bottlenecks. Since 1990, Symphoniq's executive team has delivered innovative solutions for managing enterprise environments, first by founding EcoSystems, then NetIQ (NASDAQ:NTIQ). Leading companies from Fortune 500 to Web 2.0 use TrueView to rapidly diagnose Web disruptions, plan Web infrastructure improvements, reduce operational costs and improve customer satisfaction.
Symphoniq®, TRUE and TrueView are trademarks of Symphoniq Corporation and may be registered in the US Patent and Trademark Office and in other countries. All other trademarks and registered trademarks are the property of their respective owners.
###
© 2006 Symphoniq Corporation. All rights reserved.
Symphoniq and TrueView are registered trademarks or trademarks of Symphoniq Corporation. Symphoniq, the Symphoniq logo, TrueView and Web Acceleration Speedometer are registered trademarks or trademarks of Symphoniq Corporation. All other trademarks and registered trademarks are property of their respective owners.


Read more about how our solutions can help you provide better service levels to the business

Listen to industry experts discuss ways to address your Web application performance problems

Watch as your peers talk with industry experts about new technologies available to manage their Web applications

Learn about the new technologies being deployed as part of today's complex web applications, and about the tools and processes required to manage them successfully

Download analyst solution briefs to get a broader perspective on new web application management challenges
Kelly Indrieri
Kulesa Public Relations
Office: (650)-340-1983