Compuware's Management Hosts Application Performance Academy -- Part 1: Performance Concepts Conference (Transcripts)

| About: Compuware Corporation (CPWR)

Compuware Corporation (NASDAQ:CPWR)

November 28, 2012 12:00 pm ET


Nagraj Seshadri

Alois Reitbauer


Good day. My name is Andrea, and I will be your conference operator today. At this time, I would like to welcome everyone to the Application Performance Academy Webcast Part 1: Performance Concepts. [Operator Instructions] I would now like to turn the call over to our host, Mr. Nagraj Seshadri, Director of Product Marketing at Compuware APM. Please begin, sir.

Nagraj Seshadri

Hello, everyone. Welcome. First, housekeeping issues. [Operator Instructions] Today's session is being recorded, and all registrants will be provided links to the replay, as well as the deck within a few days. We will be doing a Q&A at the end of the webcast, so please submit your questions throughout the webcast using the Q&A function on your WebEx control panel.

I'm pleased to introduce our speaker today, Alois Reitbauer. Alois Reitbauer is the Technology Strategist at Compuware. As a major contributor to dynaTrace Labs, he influences the company's future technological direction. He also helps Fortune 500 companies implement performance management successfully. He has authored the Performance Series of the German Java magazine, written articles and other online and print publications and also, contributed to several books. At, he regularly writes on performance and architectural topics to an audience reaching up to 100,000 visitors.

So now without further ado, Alois Reitbauer. Alois, take it away.

Alois Reitbauer

Hello. Thank you, Nagraj, and welcome, everybody, also from my side. So I'm very excited to today start our Application Performance Academy where we try to bring really basic concepts of application performance to a broad audience, and what we will start with today is performance concepts. So in this first webinar of the series, we will deal with like the starting points once you enter the APM field. But even if you're an experienced performance guy who works with APM or in the APM space every day, it's sometimes good to get kind of a recap what are we talking about and a refresh of some of the very basic and vital concepts that we use in our work every day.

To get started, we will look at first into performance series. So don't be scared, it won't be too much theory. But a bit of theory is good because it will help us throughout this webinar and also things to come to better understand what we're actually talking about and just lay the groundwork so that we are all on the same page here.

First, very important concept is actually queuing theory. I think many of you have heard the queuing theory and probably using it also. And I think it's a very vital concept to understand performance and how performance in kind of every software system actually works. So on the right-hand side, we have this picture of a very simple application, which consists of the server, the database and the network in between the 2. If we look into the server from a queuing theory standpoint, we see all the typical resources we have in this server modeled as so-called queues. It's like we have 1 for server threads, CPU, the network and the database connection.

So as everybody, in fact, knows, resources are limited. For the server threads here, we have a couple of those. The CPU, in this case, is just one. The same is true for the network and for the database connection. We also have the connection for a couple of those resources.

Whenever a request now hits the system, like in our case, it would hit the server, we first try to access the first resource in here, like our service reps. As long as the resource is available, we take this resource right away. If it's not available, we have to wait in a queue, and that's where the name queuing actually comes from. Once we have like our service reps, we can go into some of the other queues to get the CPU -- access to the CPU for doing our computation, the network and also, getting database connections and other resources.

So while the model is very simple, it helps us to understand kind of every performance-related computing problem we might experience in a system. Because what we typically do, we consume new resources, kind of requesting the system to try to consume these resources as well. And the more the resources are used, typically their wait times are longer. So whenever we see an increase in our response times, it's typically because we have to wait on certain resources or that the resources in a system are used much more extensively. So whenever we you want to find out why something is actually slower, slowing down, queuing theory is a good model to see which resources we actually use. At the same time, it tells us that all resources that we use in our systems have to be monitored carefully.

Now we're talking about a couple of laws which actually already use queuing theory to derive some -- what we would call like the laws of physics of performance. We won't take too many, but just 2 of them that are really important to understand for everyday performance work.

The first one is a Little's Law. So what Little's law says in its original text is that the long-term average number of customers in a stable system is equal to the long-term average effective arrival rate, multiplied by the average time a customer spends in a system, which then also leads to a formula. So what this more or less means is a system is stable when a number of new requests is not higher than the maximum load that the system can actually process.

Now that sounds very basic. So we can't process more than we can actually possibly do. But what does this actually help us, although it's a basic, we can use it in, I say, 2 main situations. The situation for -- is first we can do some simple capacity planning using this information. Let's take, for example, we have 2 CPUs available, which makes every second about 2,000 milliseconds of CPU time, and we have 200 milliseconds of consumed CPU per request. So this leads that our system can perform a maximum of 10 requests per second, given the resources that we have available.

And just by using this simple capacity [indiscernible] on top of queuing theory, we can figure out what's kind of the maximum capacity that our system can actually reach. If we know how much resources certain requests in a system requires, we know how much we need. It also helps us to answer questions to how big do we need to size our database connection pools, and our connection pool is another resources we need in the system. So it's a very basic means to figure out how we kind of need to plan the capacity of our service.

On the other hand, we can use Little's way -- Little's law just the other way around for a situation that occurs very often. Whenever you do load test that you reach a limit, there are this basic question: Is it actually the application of the limit or is it the load test driver? And I have seen quite some cases where people say the application can take the load and later on, they figured out that the load test driver, so the load test environment, wasn't actually able to put enough load on the system.

Let's take this example. Say, we have 5 requests per second. That's what our maximum load is, what we've found out during our load test. And we know we have a 200-millisecond response time here. We also know that we have 10x [ph] the array [ph] level on the system. And if we now do the math, we figure out that this would actually give us 10,000 milliseconds and not just the 1,000 that we actually consumed.

So looking here at this from a response time perspective, we see that over here we have way more resources available on the system than we were actually able to consume in a given load testing scenario. This is a very easy way to figure out how an extra load test environment works. Just keep in mind, that here we take the actual response time and not the CPU time. It might be the reason that we ran out of CPU time. But here in this case, we look at the actual response time of the transaction, and this should actually be equal if we are using resources optimally in our system.

The second law to look into is Amdahl's law. Amdahl's law deals with the speedup that we can achieve in the system, and what it basically says is the speed of a program using multiple processors is limited by the number of time for the sequential fraction of a problem. This more than tells us how much optimization can we achieve with parallel execution, so whether it really gets faster if we speed up a certain part of our program by parallelization and what will the accident impact of this will be. This answers questions like when somebody says, "Okay, we take this part of the program and have to split up and go into parallel execution."

Again, here is a little example of speedup. Say, we have about 500 milliseconds of request time. Probably, the request takes 0.5 second here. And we assume that 200 milliseconds can be parallelized, which leaves about 300 milliseconds that cannot be parallelized. And say, we parallelize it to 4 threads, so these 200 milliseconds are executed by 4 threads simultaneously. What this means is that we actually get about 150-millisecond improvement, which is more or less, if we take the 300 milliseconds plus another 1/5 [ph] divided by 200 milliseconds we have in here by the 4 threads would help us to parallelize this, coming up to 350 milliseconds.

So what this law actually tells us, will we actually get faster by doing such [indiscernible] computations in parallel. And this is something we see very often neglected in reality. People genuinely think that you get faster by doing things in parallel, which actually is not necessarily the case, at least, the performance gain might not actually be worth it. Because whenever you try to do something in parallel, it actually has additional implementation effort as well.

Next, we'll look into performance and scalability and what these terms actually mean to us. You see the terms very much being used like they are synonyms for the other, so performance versus scalability. And people are saying, "Our response times are 2 seconds. We have a performance problem." So let's start by defining these terms first. Now take the example of a racecar. So on the last time, we have this Ferrari that can actually go very fast on the racetrack. So it can use -- it's just a one seater, but this one person can go very fast. So in other words, it has very good performance. And that's what we deal with when we talk about performance, how fast certain things actually go. The ideal performance is if there is nothing else in the system, how fast can we actually perform a certain task.

With scalability, we look, okay, how does it actually look like if we want to do more -- or if we want to transport more people with racecars? How fast will we get there once we have more people sitting in racecars? Will the performance still be the same? Or how can the system cope with the situation that there is more requests or higher load in the system? So while performance is a matter of how fast we are, scalability tells us how good we can cope with increased and additional load on the system. And everybody who was in a traffic-jam for one time in his life knows that performance and scalability are as different like going one route in the middle of the night and doing it during rush hour. And that's the same difference we are talking about here.

And it's very important to be clear what our actual problem is. When we're talking about performance, it's not that performance is just one thing. So it might mean different things to different people. The most obvious definition of performance is to tell it by the response time, which is more than it takes -- the time it takes for an application to respond, like we issue a web request to a server and we measure the time until this request comes back. That's our response time.

For other systems like trading systems, throughput is much more important. It is really about how many requests we can concurrently process in a given time period, so requests per second, per minute would be examples here. Depending on which kind of systems you have, you're typically more interested in response time or throughputs, whatever suits your needs better.

A completely different angle to performance is availability. This means, is our system reachable for the outside world at all? And specifically, when you look at performance from a production perspective, your biggest problem typically is if something's not available, it then comes that something is actually slow. So performance is more than just response time, whether things are fast. It also includes whether things are working at all.

And another factor is, what we put here, by accuracy, which means what is functionally correct, that's what the application is doing. So we might have the perfect response time, but everything we return to the user is a blank page. So response time is fine, our throughput might be great, but what we were actually shipping back to the user is not what they expect. And when you talk about performance optimization, we typically have a priority laid out to exactly those characteristics, where availability is the highest one. First, the system has to be available and then you think about functional correctness and then actually comes performance. In some cases, this might be interrelated to each other. So if you may have a performance problem, that means that you cannot reply to a request within 1 or 2 minutes, which leads to a timeout, then you also have an availability problem. But typically, you have this priority in those requirements.

And how does our performance relate to scalability? Typically, these performance characteristics, like response time or throughputs, are related to the scalability or the load on our systems. So here, we see an example where we have the load starting more or less from 1 and then going up really and the response times associated to it. What we will see for a certain time, the response time will stay pretty stable. And at a certain stage, suddenly the response times will go up as the load increases. This is actually what we see where the scalability comes into the game now. Scalability tells us how good our system scales, tells us what is this point where performance really starts to become unacceptable because of the load on our system.

So taking this into account, we have to think, how do we actually describe performance when we talk to other people about the performance of an application? And what very often happens is that you get a reply like, "Our response time on something is 2 seconds." In reality, this, however, is not really very helpful to people because it's not specific enough. It doesn't define a whole situation, especially if you can take the scalability example before into account. You also have to add some more information to give some more color to what was situation of the system when we measure it this time.

And it's a requirement for us as performance engineers to want to expect people to be more specific and to be also more specific when we talk to other people. Actually, a good answer would be like, "The system response time is 2 seconds at 500 concurrent requests," so this is where a scalability part comes in, "with the CPU load of 50%," this is where resource usage comes in. And if you think back to the queuing model where we can see, okay, how much of the resources are we actually using at that moment. And a memory utilization of 92%. So we see, okay, at a certain load of concurrent requests, we are obviously still pretty good at it from a CPU perspective, but not that good from a memory perspective. So that might be the first point where we actually want to look into. And whenever you define performance characteristics of an application or a certain request, always keep in mind that the scalability aspect, like the current load on the system, as well as the current resource consumption, is part of that equation because otherwise, people cannot really make any sense out of it.

On scalability, there are typically 2 ways how you can scale a system or how to typically solve a scalability problem. The first one is vertical scalability. So we add just more resources to one node in our system, more memory, more CPU, just that we can scale out here -- scale up here on a single node. So what we basically do is we add capacity. This is, if you realize, like in the example, before that you were using 92% of your memory, and actually, this sort of user loads, memory requirements per application is higher. The next step to increase the scalability of the application or to improve it would be to add more memory to that machine. The alternative is to use additional nodes. This is what we refer to as horizontal scalability. So here, we suddenly start to add additional servers.

Very often, it's tempting to start with vertical scalability simply because it's simpler. It's simpler to, up to a certain point, to add resources to a single node in our system. We don't have multiple nodes that we have to manage. We don't have any data replication and consistency problems across those nodes. But you will always reach a point where you simply can't add anymore resources to a system or adding resources sometimes doesn't help you anymore. So it might help you to add some more memory or some more CPU processing time, but you won't be able to do this infinitely simply because the hardware that you will get is not available.

A good example here are cloud environments where you're limited in your deployment sizes that you get from your cloud provider. That's why there are horizontal scalability or scaling out your system with additional nodes is typically the way how you scale your system. The point is that a system must always be designed for horizontal scalability. This is nothing that you get for free, and it very often also means that you actually have to change your replication or the way it works.

A question -- or let's say a typical answer to a performance problem or a scalability problem is we can't simply scale. So like, "Okay, we can't keep up with our own response times that exists, it creates user loads. So let's buy a couple of additional servers." In some cases, this might help. In other cases, it might not. Again, the example takes some synchronous access to some piece of data. We have to update some stock count. And for some reason, we can only up -- logically, we can only up one after the other because we have to be sure that we always have latest and most accurate value here.

Let's say it's 200 milliseconds of access time for updating this data structure. With 1 server, we might be able to do 3 requests per second. So this is our scalability now. Now we decide, "Okay, let's add a second server."And we see, "Okay, now we get 5 requests." And then somebody has the great idea, "Let's scale our system out completely and use 100 servers." But then they realize, with 100 servers, we are still at 5 requests. So why did this happen? We have this synchronous access to one piece of data that we have to update, so we can't do more than one, and this is actually our scalability bottleneck.

And as this takes 200 milliseconds, we can never do more than 5 requests per second on this request. How much harder did we work to avoid our problem? So figuring out whether scalability or scaling a system will actually help to solve a performance problem which we get at increased loads requires us, and that's now going back to the queuing theory part at the very beginning, to understand where our bottlenecks are and what our resource or requests actually have to wait for in our system to see whether these parts of our systems can be scaled. Because scaling any other resources that are not actually scarce, like in this case, CPU time or memory, won't actually bring us any benefit in scaling up a system and making it faster.

So next, we'll go into how we actually calculate performance data. Typically, we get a lot of growth, and we got a lot of measurements. And we can't look at each and every single measurement, so we have to aggregate them, which means we only look at kind of some portion of the truth. So we have to use some statistical means to aggregate them.

We have one question. How do you think something like this can happen? First, one comes in says, "Our response times are 1.6 seconds." The next one comes in and says "No, no. Our response times is 2.8 seconds. I checked it. That's true." The third one says, "No, you guys, I mean, I don't know what you measured. But we were actually at 7 seconds response time. You are both wrong." In fact, it can't be that all of these 3 people are actually right. So everybody said his own measurement was right, and how can this happen?

It simply depends on how we measure and how we aggregate data, and this is one of the biggest problems of working with performance data. If you do not agree beforehand how you calculate the actual metric that you're working with, you will all have discrepancy between those metrics. And this becomes even more striking if you use a different set of tools and kind of want to bring metrics of different tools together, you typically tend to never match together perfectly, simply the way the tools have totally calculated those metrics.

So when we work with numbers, we have, in this case, raw data, it's either raw measurements that we received. Let it be response times or whatever for now. There's one that's 3 -- 0.3 seconds, 0.5, 1, 2.5 , up to 3 seconds. We can't always look at this raw data, so we have to decide how we want to look at this data in aggregate it. One approach is we look at the minimum, which, in this case, would be 0.3 seconds. And then one would be to look at the maximum, which is the highest value of at least 3 seconds. Or we look at the average, which is 1.64 seconds. And as you can see, I didn't draw in the average here because the average is not a value that we actually measured. So what we did is we took the sum of all those measures and divide it by the count, and that's the value we took. The better these values are or the more evenly distributed these values are, the better the result is. But in our case, it's kind of representative to the measures, but not necessarily.

Or we take what we call the median, which, in this case, would be 2 seconds. What the median is -- the median is also called the 50th percentiles. It means it is the value from which we can say 50% of all other values are smaller or equal to this specific value. So the median tells us 50% of all the values are smaller or equal to 2 seconds. So the median is much more representative than the actual average.

There are other forms of percentiles. So things like the median that we can use, a typical one would be that 95th percentile, which would mean 95% of all requests are faster than a certain threshold and so on. Typically, when working with performance data, especially when it comes to response times, you actually work with percentiles rather than actually minimum, maximum or average number. Minimum, average or maximum numbers are sometimes useful when you look at resource usage for a given time period. You would want to what's my average CPU consumption, what's my maximum CPU consumption and the like.

What people can't do when they use the average, if you see before the average is not value that's kind of in our data kind of naturally. They use the standard deviation to kind of draw scope of the values. So what the underlying theory here is that you see this bell curve, this distribution, let's just call it normal distribution. So the assumption is that our value spread in both directions from the average kind of equally. And the standard deviation just tell us if you add or subtract a certain amount to this value, 70% of your values will be in there. And once you extend it, more and more values. However, you may often just take the standard deviation. It will actually be in there.

So if you remember back from math class, one of the things about statistics is that if you choose your statistics model wrong, it won't actually work out with the data. So if our data isn't actually normally distributed, this kind of average and standard deviation thing won't actually work.

If I look at the data below, we see again our 1.6 seconds in the data. And they also put in the median here again to see where it actually is. So what we would assume now by adding the standard deviation, which in this case is 1, that we would see more values on both sides of the range. But what actually happens, we see just one value to the lower part and 2 values to the upper part. So this doesn't really reflect this bell curve.

So what does this then mean for us? We expected it to be equally distributed to both sides in the real values that we actually measure, and we see something that's not actually working on our data. This means that our data does not match this assumption of average and standard deviation of normal distribution of our data. Probably, our data is distributed differently than what we assumed before.

And once you look at real-world data, and this is an example of real-world response time data for requests, you'll see that the actual distribution does not look at all like a normal distribution. So this is more like an algorithmic distribution of response times. Why am I stressing this point so much? If you look at a lot of performance tools out there that worked a lot with averages, that tells you that you just need the average and the standard deviation. But as you can see, the conclusions that you can draw from this then are very much wrong. This becomes especially striking once you base your incident and the learning on top of it because it means that you either get false alerts because you also adopt the distribution or you're not alerted because the system thinks, this value is still okay, but it's far out of the normal behavior.

So you might now wonder okay, why is it that a lot of tools out there use averages and standard deviations for the values? It’s simply because it is easier to calculate. I would recommend to everybody working with actual data, first of all, look at the actual distribution of your values, and then choose your model appropriately. I'd rather go with the median, like the 50th percentile or any other percentile values rather than the average.

Now let's look at how can we actually collect performance data. There are a lot of performance data that we can collect out there. And how do we actually get this data, what are the means to access this data. And for now, I'll focus on Java-based environments that is pretty much the same for every runtime environment you can actually look into.

The easiest way in Java and their equivalents was the performance called [indiscernible]. It's what's called a Java management extension. Java management extensions provide a means, and they are also standards on the metrics that are exported by a Java management extension to access some typical values we are interested from a performance perspective and access them in a standardized way from a Java virtual machine.

As you can see on the left side, this is a JMX console that shows peak memory usage, the number of threads, the number of classes loaded and the CPUs which just off Java process. They provide a variety of data that we can get. We can also get metrics like the maximum full usage, the maximum wait time, number of active threads, number of active sessions and so forth. So for understanding the basic health of our runtime environment, JMX is pretty good. It tells us about is our application container healthy or are we lacking any resources that we would need? Again, back to the queuing theory, does our database pool too slow? Are we using too much CPU? Are we running short of other resources? And even helps us like how do we have a memory leak if we see increase in memory consumption versus garbage collection. So it already tells us a lot about our environment.

But it doesn’t tell us as well how our actual user requests or actual transactions are doing. And therefore, the Java runtime provides what they call Java virtual machine profiling interface or in Universe and Java virtual machine tooling interface. They are all the equivalents for the common language runtime end of that. So they provide us with this bicode instrumentation, which means we actually can change the code if it is loaded and add our own instrumentation, our own monitoring code in there to modify and support something like memory analysis, so we can create a whole dump of the evening cases as a memory leak, and a thread analysis to see which states or threads are in, whether they're waiting on something, whether they're waiting on and what others threads are waiting, certain monitors and locks as well, in case we are confronted with that transactions. And also, the ability to use something like B value [ph], so B value [ph] towards all these interfaces that this performance goes to.

So bicode instrumentation is de facto standard today to collect any information over a runtime environment that uses interpretative code, whether it's Java, it is .net, it is any other runtime environment that's actually out there. What happens is what we can see in the picture here. We have our business methods that's typically something in a loop. So once this code gets loaded by a runtime, we take a timestamp before this code was actually executed and right after the code was executed, and then we report it somewhere.

So when this code is loaded by, in this case a Java virtual machine, before it is actually concurrent and interpreted, there is a step in between where a so-called agent is injected. It takes us towards modifying it with statements marked in red here to collect additional performance data that we can use later on. And as we can see in the example below, create an execution time frame with exact execution time data by modifying the actual code on the application.

This can now happen in 3 different approaches. First one is static instrumentation, which means you have kind of a post compound step where you're adding this instrumentation to your codes. This, however, means that you have to decide beforehand what you want and you have to add it to your classes. And you kind of limit it to a certain system class that you might also want to be able to look into.

The second approach is load time instrumentation, which means instrumenting the code when it is actually loaded. So when the cloud loader loads the cloud you actually execute it. This process is intercepted, and then this additional instrumentation code is added.

Or a third version is it's even done at runtime. So even at runtime, you decided you want to get more insight into the behavior of some specific portions of your code, you might then decide to instrument this while the application is running without any need for a recompilation or a restart of your code.

What happens typically today is a combination of the later 2. So the code is instrumented when the application starts off, so when it's used the first time. And if then, during the running applications, you need, for some diagnostics purposes, more information, you would do runtime instrumentation as a way to modify the bicode -- running code at runtime here.

So this is, more or less, what we measure in the data center, kind of outdoor application servers. But performance management is more, so we also have to go beyond the data center and take our end users into account. What we get so far is the application response time, and the application response time might be perfectly fine. It might be 200 milliseconds or whatever. But for our users in the cloud, it might be 6 seconds, 7 seconds or longer. So we have to extend our measurements beyond the application to get actual end user response time. These specifics are important when we look at the typical distribution of how time is spread out between the server side and the client side. And here we say -- see that about 2/3 of the execution time actually happen after the response has left the application server.

The first approach that we can use here is synthetic monitoring. Synthetic monitoring, what it more or less does, it records scripts. It has certain steps that are executed. And then you have a synthetic simulated user that is based on the browser. In many cases, that then actually executes virtual requests and give a timed response from different locations against your servers so that you see what the actual response time for end users are.

Depending on which provider you use here, you might get this from real end users, like very fine grain down to, I want somebody in this part of Boston to access it. I want it to have -- I want somebody to access it from downtown Paris. Or it might just be a request that actually come from backbones. But depending on the granularity and quality of data you want, you might decide which measurement you want to. These measurements that go to -- more or less to the place where the end user would be himself as well is typically called last mile measurement.

And another network-centric approach, which means that we tap into the actual network framework from the end users to the first server or data center using a so-called network probe, which sees all the traffic that goes by. It looks at the traffic and then from this traffic, can derive the actual response times for the end user. So it will see how long it will take for each of the packets to travel to the end user. And last but not the least, we have real user monitoring. What we do here is we actually inject a piece of Java script in our page that then executes, measures the response time in the user's browser and that reports its information back.

All of these approaches have their downs and upsides. So the downside of the synthetic approach is you're limited to the request that you actually defined in your script. But you get them very reliably, so data is then typically reliable. And you're kind of also limited to the locations from which you executed your scripts from. On the other hand, you know that you will always get this request even if no users are using your application. If you have fair application availability problem in the middle of the night when there is no users, synthetic would be able to tell you this.

The network-centric approach is great because you see all the traffic that goes in there, and it can be deployed in your data center. The real user Java script rate approach has the advantage of it is actual response time from the users in the browser, also seeing things like third-party resources that are not loaded properly that are actually out of your data center that you would not see within that network-centric approach. Just as you have something like the Facebook outage, you can't see this in your data center. So you have to actually measure this in the user's browser. On the other hand, with regards to user measurement, if there is no traffic coming in or you have unavailability in the server and there is no traffic coming in, you wouldn't actually see that.

So now we have the data, we have to collect execution time data. We'll now focus on the execution time data piece here because the JMAX and other metrics I think are pretty much clear how to get them. The first way we can get execution base data is we take snapshots of our runtime at given time intervals. So every 10 seconds, we look at the execution stack of all our threads and see what's currently executing. What this means is that the overhead in this case is very predictable because depending on your sampling interval, we have higher or lower overheads.

On the other hand, we might mix methods like the red ones we see here, which is executed in between these sampling intervals. If it is not that much slower in the system, we might see this methods are actually executed. And in some cases, they might also be the performance problem. Typically we will see them at a certain point, when the load is high. It might just mean that we are missing out on certain pieces. It is also referred to as the sampling-based approach which are set in their own time in given intervals.

The other approach is the event-based data connection. So thinking back what we did before with the bicode instrumentation, this is what we get. Here, this would be a classical example of event-based data. For every method entered and exit, we create an event. Like this method was entered, now it's finished executing. As you can see here, we now see everything that was executed, but we get way more events. So if we would do this for every single method in our application, this would lead to way higher overheads. But therefore, we get higher and better precision.

So there might be the question of what's actually better. Should we go for the sampling approach? Perhaps, it will probably miss something and have lower overhead or should we see actually everything and have higher overhead. The answer is that today, you don't have to decide anymore because performance management tooling these days provide the ability to use a combination of both. So you have this natural based approach that is combined with an events-based approach. And they kind of merge these 2 things together, which leads to the most complete picture. And especially when you're evaluating a performance management solution, you should look very much into how they are actually connecting their data and what sense you can make out of it. So sampling-based might be great for a CPU hotspot, but if you have something like a functional problem in your application, you might confine it or very small running method of execution, you might also see those.

So now we have the data. Let's now visualize the data. So that's now what is -- all the data that we collected so far actually hits the performance and that's how we can see it. And there are a number of ways how we can visualize this data. The first one is a call tree. What a call tree models is, is it shows you a method that was called and all methods that were called from this method on. This kind of visualization can be built using different measurement approaches.

So whether you use this time-based sampling approach, you will build a call tree, and you might also want to build a call tree if you use an events-based approach. So just from looking at the visualization, you do not necessarily know how this information is collected. But the call tree is great. It tells you where most of your time is spent from a hierarchical execution level and which stock orders are responsive before this. If you have a CPU problem, a call tree is actually perfect because you can pick the top contributors, and that just always pick the biggest portions in there, and you will find it.

On the other hand, what you completely lose is the logical order in which things were executed because a call tree is not actually taking care of it. That's what a call trace does. So a call trace has the actual execution order, how things actually happened in it and it tells you what was executed after which other step. But it's not necessarily pure performance problem where you need to know more, like why is the database access to us, when they should actually be accessed to a cache or you have kind of a functional problems associated with it.

We want to keep the actual execution order and want to see what -- which step happened after which other step, which is especially important once you give it back to development. And as we can see here with this example, we see that we have like a prepared statement, then it executes and as we invoke something and obviously, this service was not found. So we have an exception. And then we have a matter of number of other database statements going on. But the order is preserved, and we can really see what was executed from a logical perspective in the code, which is often very much helpful.

The thing to remember is you can always create a call tree from a call trace, but not the other way around. One thing you might notice is that, at first sight, these 2 things look pretty much the same. But the value of the data, what you can do with is, is very much different. So whenever you look at this kind of data, really ask the question, how did we collect it? Is it actual trace data or is it some aggregated call tree data that I'm currently looking at?

And in the case of dynaTrace, here this is what we refer to as the pure trace data. These are exact call traces even across system boundaries here. These traces are all nice. But now I measure the production system, it has 200 or more nodes available. And you will definitely go for a different kind of visualization, which we refer to as a transaction flow. So you want to see how -- kind of the bigger picture, how things are connected in your data center, which JBMs which services are talking to which other services and how they are used with each other.

Especially when you start to look at a complex system, typically your starting point is something like this. They already have this slow layout. You have to see which parts are used, which other services the service users. Just like in this case, you see a number of restings, call it rest back end systems, which means backup systems that actually have problems affects the performance of other layers in front of them.

So getting a firm understanding of a complex system, how it works, you start with something like a transaction flow diagram here. This is something that is typically built from call trace data and is just a more high-level aggregated view of the data, which just helps you to cope with the complexity. Because looking at these detailed traces, especially if you have a lot of them like 10,000 requests per second, won't be your starting point. It comes very much at end of your analysis or you would use it like in the final step or in development step.

The last point I want to stress today is the whole topic of measurement overhead. Whenever it comes to adding something -- some measurement to a system, it typically deals with the problems of how much additional overhead does it create. And this picture you see , Werner Heisenberg, who has done -- perhaps he's the father of the uncertainty principle, which more or less tells you whenever you measure a system, the results are not accurate enough because just by measuring the system, you're actually changing the system.

Thinking now of performance, you don't want a performance tool that actually has that much impact on your system that it actually renders all metrics, more or less, useless. And also, when you think about production, many of the problems you have to resolve are in production. You cannot expect -- you cannot kind of accept that a 2-second transaction in productions suddenly takes 5, 6, 7 or 8 seconds. So overhead is an important topic because by measuring it, you might modify systems. Worst case, we might even introduce something that's called a heisenbug. The heisenbug it's actually not there, if you're not measuring.

Like you might see a performance hotspot in your application that is just there because you actually put your measurements code in there, which wouldn't be in there before. Or it might even be a bug that goes away, when you think of a synchronization problem, it simply doesn't have many more because you slowed down the code that much that this problem is no longer actually there.

When we then talk about overhead, you have to be also specific what kind of overhead we mean. And we have to look at overhead from different perspectives. What most people think about when they mean -- or when they talk about overheads, they mean response time overheads. So how much do our started transaction gets slower when we measure something? So this is what response time overhead is.

But an additional response factor is also CPU overhead. You collect a lot of data within the runtime environment that you're working with and you have to process it. The more processing you do, the more likely you are to add additional overhead. The more computation you do in your runtime, in your actual application, the more CPU time you take away from the application. That's why performance tools that specifically target low overhead production use, also lot of these computations that we've seen before, like the data aggregation, building up those call trace and [indiscernible] offload it to a dedicated server to not affect the application itself.

The next slide is memory overhead. So how much additional memory do you need? Somebody might tell you we have the greatest performance tools in the planet, and it doesn't require a lot of data bandwidth because we send data only every 15 minutes and cache it in between, which means that all this information, and think of like every method being instrumented or call traces or snapshots being kept for 15 minutes, and then the applications are stored into the applications' actual memory. It might just mean that the application has a completely different memory behavior than it used to have without the solution. So also looking at the memory consumption is important here. And if you think of a Java virtual machine, by the way, you consume memory, the application that you actually measure, you might massively impact the garbage collection behavior of that application.

And last but not the least, we have the network overhead. So how much overhead is incurred to transfer data that was collected in the application to some natural storage or processing server that's actually working with that data? So how do we actually now define overhead? And I look at the response time over here specifically because this is something that people often get wrong, and a lot of overhead definitions are actually not really correct.

So we have a non-instrumented application to the left and an instrument application to the right. What happens using instrumentation, you add these little portions of code to the application. And as you can already see for the bigger block, the overheads we add by instrumentation is much less than for the smaller block. So imagine that in most cases, these costs and the time it takes for the measurement.

The overhead however is relative depending to the actual execution time. So if you would instrument everything in method in an application, given that it takes, say, 0.1 millisecond, the overhead will be massive. If you instrument at the level where the execution is 10 or 15 milliseconds, overhead might actually be minimal. So relative overhead in this case, while the absolute measurement time taken is the same one, you might add correctly the number of what the overhead is. In one case, you might say it is 1%. And in the other case, it might be 500%, 600%, depending on where you put your measurements in, how you put the measurements in and how intelligent your method is actually defines your overheads.

That's why I'm personally in favor of defining overhead as an absolute number on the transaction itself, say, instrumentation or kind of performance data collection has 0 to 3 milliseconds of overhead to match transaction response time. This tells you much more than something like 3% overhead because you might have transactions that run for 200 milliseconds, and you might have one and it runs for 500 milliseconds. The 3% is just depending on the transaction time, a different number. So the best way to actually get overhead numbers is to really take a measurement without your monitoring solution in there, with your monitoring solution and then look at the actual numbers in every before and after run of your application.

So when somebody just gives you like a relative number, you really have to question what it means for your application rather than do testing yourself. That’s why it’s actually hard to tell without having seen an application and specific characteristics how much overhead you will see once you deploy a performance management solution. That’s also why monitor solutions and [indiscernible] one of them have been an adaptive approach where they control -- they have kind of boundaries for overhead, and once the overhead gets too big, they cut down on the measurements that they're actually taking.

So at this point of the presentation right now, it was just getting started with APM and some of the core concepts and the core questions that might come up once you start this performance measurement. We now kind of got your interest in learning more about this topic and going beyond the core concepts. I invite you to go to There you will find all of the content of today. And we'll then see booklike description and other topics, other specifics like memory management, how to do performance testing and a lot of other topics that we put on there. We will also have as part of our Performance Academy questions specifically on those. But for, those among you who cannot, I strongly advise you to go there. And that's how I want to conclude for today. And now pass over to questions.

Nagraj Seshadri

Okay, Alois. That was fantastic presentation, all the way from the physics principles to why is my holiday shopping got slow. We are almost up to the hour, so we don't really have much time for questions. Of course, you can always contact Alois at

Question-and-Answer Session

Nagraj Seshadri

But I'm just going to address a couple of questions that came up. One is when and where can I get a copy of the slides? So as we have mentioned earlier, we will email all the registrants links to these resources. And of course, you can also look at

The second question is, are these screenshots which Alois showed the output of Compuware's diagnostic tools? That's the question. And the answer is yes, it's screenshots from Compuware dynaTrace APM solutions. So if you are interested, we'd be happy to kind of demonstrate the product for you and see how that works for you in your environment.

And there are some additional technical questions, and we really don't have time, so we will address them separately to the participants. So with that, thank you all for listening. And thanks again, Alois, for an excellent presentation. So we hope to be talking to all of you soon.


Ladies and gentlemen, this concludes today's webcast. You may now disconnect.

Copyright policy: All transcripts on this site are the copyright of Seeking Alpha. However, we view them as an important resource for bloggers and journalists, and are excited to contribute to the democratization of financial information on the Internet. (Until now investors have had to pay thousands of dollars in subscription fees for transcripts.) So our reproduction policy is as follows: You may quote up to 400 words of any transcript on the condition that you attribute the transcript to Seeking Alpha and either link to the original transcript or to All other use is prohibited.


If you have any additional questions about our online transcripts, please contact us at: Thank you!