pnp.gif

How To: Model the Workload for Web Applications

J. D. Meier, Prashant Bansode, Carlos Farre, Scott Barber

Applies To

  • Web Applications
  • Performance Testing

Summary

This How To explains how to create a workload model that represents how a web application is expected to be used in production. For performance testing to yield results that are directly applicable to understanding the performance characteristics of an application in production, the tested workloads must represent the real-world production scenario. To create a reasonably accurate representation of reality, you must understand the business context for the use of the application, expected transaction volumes in various situations, expected user path(s) by volume, and other usage factors.

Contents

  • Objectives
  • Overview
  • Summary of Steps
  • Step 1 – Identify the Objectives
  • Step 2 – Identify Key Scenarios
  • Step 3 – Determine Navigation Paths for Key Scenarios
  • Step 4 – Identify Unique Data for Navigation Paths and / or Simulated Users
  • Step 5 – Determine Relative Distribution of Scenarios
  • Step 6 – Identify Target Load Levels
  • Step 7 – Prepare to Implement the Model

Objectives

  • Learn how to construct realistic workload models for Web applications based on expectations, documentation, observation and other data available prior to releasing to production.

Overview

Workload modeling is the process of identifying one or more composite application usage profiles of interest for use in performance testing. A workload model contains data related to such items as:
  • Key user activities.
  • Navigation path(s) related to those activities.
  • Relative distribution of users and/or activities across the target load.
  • Unique data and application interaction patterns for each user to be simulated.

While it is certainly true that simulating unrealistic workload models can provide valuable information to a team when conducting performance testing, you can only make accurate predictions about performance in production, or accomplish performance optimizations, when realistic workload models are simulated.

Summary of Steps

  • Step 1 – Identify the Objectives
  • Step 2 – Identify Key Scenarios
  • Step 3 – Determine Navigation Paths for Key Scenarios
  • Step 4 – Identify Unique Data for Navigation Paths and / or Simulated Users
  • Step 5 – Determine Relative Distribution of Scenarios
  • Step 6 – Identify Target Load Levels
  • Step 7 – Prepare to Implement the Model

Step 1: Identify the Objectives

The objectives of creating a workload model typically center on ensuring the realism of a test scenario, or on designing a test to address a specific requirement, goal, or performance-testing objective. (For more information, see {HowTo:Quantify End User Requirements and HowTo:Determine Performance Testing Objectives}. When identifying the objectives, you should work with targets that will satisfy the stated business requirements. Consider the following key questions when formulating your objectives:
  • What is the current or predicted business volume over time? For example, how many orders are typically placed in a given time period, and what other activities—number of searches, browsing, logging, and so on—support order placement?
  • How is the business volume expected to grow over time? Your projection should take into account future needs such as business growth, possible mergers, introduction of new products, and so on.
  • What is the current or predicted peak load level? This projection should reflect activities that support sales and other critical business processes, such as marketing campaigns, newly shipped products, time-sensitive activities such as stock exchange dependent on eternal markets, and so on.
  • How quickly do you expect peak load levels to be reached? Your prediction should take into consideration unusual surges in business activity—how fast can the organization adjust to the increased demand when such an event happens?
  • How long do the peak load levels continue? That is, how long does the new demand need to be sustained before exhaustion of a resource compromises the service level agreements (SLAs)? For example, an economic announcement may cause the currency exchange market to experience prolonged activity for two or three days, as opposed to just a few hours.

This information can be gathered from Web server logs, from marketing documentation reflecting business requirements, or from stakeholders. The following are some of the objectives identified during this process:
  • Ensure that one or more models represent the peak expected load of X orders being processed per hour.
  • Ensure that one or more models represent the difference between “quarterly close-out” period usage patterns and “typical business day” usage patterns.
  • Ensure that one or more models represent business/marketing projections for up to one year into the future.

It is acceptable if these objectives only make sense in the context of the project at this point. The remaining steps will help you fill in the necessary details to achieve the objectives.

Considerations

Consider the following key points when identifying objectives:
  • Throughout the process of creating workload models, remember to share your assumptions and drafts with the team and solicit their feedback.
  • Don’t get overly caught up in striving for perfection, and don’t fall into the trap of oversimplification. In general, it is a good idea to start executing tests when you have a testable model and then enhance the model incrementally while collecting results.

Step 2: Identify Key Scenarios

It is typically either impractical or impossible to simulate every possible user task or activity in a performance test. As a result, whether you are identifying user behavior by analyzing server logs, observing usability studies, interpreting marketing materials, or simply starting with your best educated guess, you will probably want to apply some limiting heuristic to the number of activities, or key scenarios you identify for performance testing. You may find the following limiting heuristics useful:
  • Include the key scenarios implied or mandated by the objectives.
  • Include the most common activities.
  • Include high-visibility activities. For example, a user might only register on your Web site once, but if the user has a bad experience, he or she may never return.
  • Include business-critical activities. For example, if your business depends on customer orders for revenue, it will incur losses if users can’t place orders, making it important that this activity performs well.
  • Include performance-intensive activities. Even if these activities are extremely rare, they can have a significant system-wide impact on performance. For example, a monthly batch processing event in the background could have significant performance impact during business operations.
  • Include activities whose performance is mandated by contract, SLA, or an influential stakeholder.

The following are some key scenarios identified for an e-commerce application:
  • Browse
  • Create User Account
  • Search
  • Login
  • Place Order

Considerations

Consider the following key points when identifying key scenarios:
  • Think about nonhuman system users and batch processes as well as end users. For example, there might be a batch process that runs to update the status of orders while users are performing activities on the site. In this situation you would need to account for those processes because they might be consuming resources.
  • For the most part, Web servers are very effective at serving text and graphics. Static pages with average-size graphics are probably less critical than dynamic pages, forms, and multimedia pages.

Step 3: Determine Navigation Paths for Key Scenarios

Human beings are unpredictable and Web sites commonly offer multiple paths to accomplish the same task or activity. Even with a relatively small number of users, it is almost certain that real users will not only use every path you think they will to complete a task, but they also will inevitably invent some that you hadn’t thought of. Each path the user takes to complete an activity will put a different load on the system. That difference may be trivial, or it may be enormous—there is no way to be certain until you test it. There are many methods for determining navigation paths to complete a task or activity, including the following:
  • Identify user paths within your Web application that are expected to have a significant performance impact and that accomplish one or more of the identified key scenarios.
  • Read design and/or usage manuals.
  • Extract the data from log files.
  • Try to accomplish the activities yourself.
  • Observe others trying to accomplish the activity without instruction.

After the application is released for unscripted user acceptance testing, for beta testing, or to production, you will be able to determine how the majority of users accomplish activities on the system being tested. It is always a good idea to compare your models against reality and make an informed decision about whether to perform additional testing based on the similarities and differences you find.

Apply the same limiting heuristics to navigation paths as you did when determining activities to decide which paths you want to include in your performance simulation.

Considerations

Consider the following key points when determining navigation paths for key scenarios:
  • Some users will complete more than one activity during a visit to your site.
  • Some users will complete the same activity more than once per visit.
  • Some users may not actually complete any activities during a visit to your site.
  • Navigation paths are often easiest to capture by using page titles.
  • If page titles don’t work, or are not intuitive for your application, the navigation path may be more easily defined by steps the user takes to complete the activity.
  • First-time users frequently follow a different path to accomplish a task than users who have experience with the application. Consider this difference and what percentage of new versus return user navigation paths should be represented in your model.
  • Different users will spend different amounts of time on the site. Some will log out, some will close their browser, and others will leave their session to time-out. Be sure to take these factors into account when determining or estimating session durations.

Step 4: Identify Unique Data for Navigation Paths and / or Simulated Users

Unfortunately, navigation paths alone do not provide all of the information required to implement a workload simulation. To fully implement the workload model, several more pieces of information are needed, including:
  • How long users may spend on a page
  • What data may need to be entered on each page
  • What conditions may cause a user to change navigation paths

The following table provides an example of unique data identified for an e-commerce application.

Implementation Data
Scenario Page/ Step Data Inputs Data Outputs Think Time
Login Login page Username (unique), Password (matched to username) 6 – 9 Sec, Random
Browse (experienced user) Login Page Username (unique), Password (matched to username) 6 – 9 Sec, Random
Browse Catalog Tree/Structure (static), User Type (weighted) Product Description, Title, Category 4 – 60 Sec, Random
Browse (new user) Login Page Username (unique), Password (matched to username) 6 - 21 Sec, Random
Browse Catalog Tree/Structure (static), User Type (weighted) Product Description, Title, Category 20 - 90 sec, Random

Considerations

Consider the following key points when identifying unique data for navigation paths and/or simulated users:
  • Performance tests frequently consume large amounts of test data. Ensure that you have enough.
  • Using the same data repeatedly will frequently lead to invalid performance results.
  • Especially when designing and debugging performance tests, test databases can become dramatically overloaded with data. Periodically check to see if the data base is storing unrealistic volumes of data for the situation you are trying to simulate.
  • Consider including invalid data in your performance tests. For example, include some users who mistype their password on the first attempt but get it correct on a second try.
  • First time users usually spend significantly longer on each page or activity than experienced users.
  • The best possible test data is test data collected from a production database or log file.
  • Consider client-side caching. First-time users will be downloading every object on the site, while frequent visitors are likely to have many static objects and/or cookies stored in their local cache. When capturing the uniqueness of the user’s behavior, consider whether that user represents a first-time user or a user with an established client-side cache.

Step 5: Determine Relative Distribution of Scenarios

Now that you have determined what scenarios you want to simulate and what the steps and associated data are for those scenarios, you need to determine how often each scenario needs to be simulated relative to the other scenarios in order to complete the workload model. Sometimes one workload model is not enough. Research and experience have shown that user activities often vary greatly over time. To ensure test validity, you must validate that activities are evaluated according to time of day, day of week, day of month, and time of year. As an example, consider an online bill-payment site. If all bills go out on the 20th of the month, the activity on the site immediately before the 20th will be focused on updating accounts, importing billing information, and so on by system administrators, while immediately after the 20th, customers will be viewing and paying their bills until the payment due date of the 5th of the next month.

The most common methods for determining the relative distribution of scenarios are:
  • Extract the actual usage, load values, common and uncommon usage scenarios (user paths), user delay time between clicks or pages, and input data variance (to name a few) directly from log files.
  • Interview the individuals responsible for selling/marketing new features to find out what features/functions are expected and therefore most likely to be used. By interviewing existing users, you may also determine which of the new features/functions they believe they are most likely to use.
  • Deploy a beta release to a group of representative users—roughly 10-20 percent the size of the expected user base—and analyze the log files from their usage of the site.
  • Run simple in-house experiments using employees, customers, clients, friends, or family members to determine, for example, natural user paths and the page-viewing time differences between new and returning users.
  • As a last resort, you can use your intuition, or best guess, to make estimations based on your own familiarity with the site.

The following table provides an example of the distribution of scenarios for an eCommerce application.

Work Distribution
User Scenarios % of Work distribution
Browse 50
Search 30
Place Order 20
Total 100

Considerations

Consider the following key points when determining relative distribution of scenarios:
  • Create visual models and circulate them among both users and stakeholders for review/comments.
  • Ensure that the model is intuitive to both non-technical users, technical designers, and everyone in between.
  • Ensure that the model contains all of the supplementary data needed to create the actual test.
  • It is during this step that you would account for user abandonment if applicable to your application. {For more on user abandonment, see HowTo:AccountForUserAbandonment}

Step 6: Identify Target Load Levels

Although it frequently the case that each workload model will be executed at a variety of load levels and that the load level is very easy to change at run time using most load-generation tools, it is still important to identify the expected and peak target load levels for each workload model for the purpose of predicting or comparing with production conditions. The following are the inputs and outputs used for determining target load levels:

Inputs

  • Business volume (both current and projected) mapping to objectives
  • Key scenarios
  • Distribution of work
  • Session characteristics (navigational path, duration, percentage of new users)

Volume

The information in the following table can be extracted from Web server logs, marketing documentation, or stakeholders.

Time Period Business volume (# sessions to the web site) Business volume (# sessions, peak values) Peak Load increase Peak build up time Peak Duration
Monthly 460789 1360890 2.95 1 hour 2 hours
Daily 15359 45363 2.95 1 hour 2 hours
Hourly(15 hour traffic) 1023 3024 2.95 1 hour 2 hours

Output

By combining the volume information with objectives, key scenarios, user delays, navigation paths, and scenario distributions from the previous steps, you can determine the remaining details necessary to implement the workload model at a particular target load.

Example:
Total hourly sessions: 1023
Total hourly sessions( peak ): 3024
Place Orders 20%: 205
Place Orders 20% ( peak ) : 605
Session Average time = 18 minutes
Sessions Per hour = 60/18=3.7
Total Users to produce 205 orders = 61 ( 205/3.7)
Total Users to produce 605 order = 181(605/3.7)

The following table presents an example of target load levels for an e-commerce application.

User Scenarios % of Work distribution Total Normal sessions Total Peak sessions Session duration minutes # concurrent Users Normal # concurrent users peak %Percentage New Users
Browse 50 512 1512 15 128 378 10%
Search 30 307 907 7 36 106 10%
Place Order 20 205 605 18 61 181 12%
Total 100 1023 3024 12%

Considerations

Consider the following key points when identifying target load levels:
  • See {HowTo: Determine Load Levels (or whatever it gets called) for a discussion about concurrent users and overlapping active user sessions}
  • Changing load levels even slightly can sometimes change results dramatically.

Step 7: Prepare to Implement the Model

Preparing to implement the model is tightly tied to the method of implementation, typically a load-generation tool. For more information about implementing a workload model using Visual Studio 2005 Team Suite or Visual Studio 2005 Team Edition for Software Testers , see {HowTo:SomeName}.

Considerations:

Consider the following key points when preparing to implement the model:
  • Do not change your model blindly because the model can be difficult to implement in your tool.
  • If you cannot implement your model as designed, ensure that you record the details pertaining to the model you do implement.
  • Implementing themodel frequently includes identifying metrics to be collected and determining how to collect those metrics. For more information, see {HowTo: DoMetricsStuff}.

Additional Resources


Last edited Mar 16, 2007 at 8:52 PM by prashantbansode, version 4

Comments

chapalav Aug 12 at 2:27 PM 
Hello Guys, Thanks very much for this valuable information, It's very helpful. Appreciate , if you could explain me more about session average time as to how to calculate this value. Thanks again!
- Vijay.

Asuarez09 Aug 13, 2008 at 3:06 PM 
Hi guys,
I use for the calculus of concurrent users, the best way is to use statistical probability formulas, as the model of "Poisson."
generally consists of 3 simple variables that can weigh as variations that each has.
Variables:
1. Potential Users: Is the population or total sample, that is if you have an application in which the number of potential users are employees of the company or only a portion of them.
2. Time of the session: It is the duration of the operation that each user performs in the application or application option.
3. Time Availability: Is the time in which the application is available for use by users.

Then the formula simply would look like this: C = (n * l) / T
c = Concurrence
n = Potential users
l = Time Session
T = Time Available for use.

Example:

1. Potential Users = 1500 users with access to the possibility of application
2. Time Session = 5 minutes on average for users to consult the payment of wages and their respective deductions.
3. Available = The only time can be used in work schedules that would be of 8.5 hours per day of availability.

The result would be as follows:

C = (1500 * 5) / 510 (510 minutes = 8.5 hours.)
C = 14.7

Result, the probability is 14.7 concurrent users by 5 minutes of session.

Note: these values are likely, which may be key to percentage values should be established and interpreted correctly to make them correct values.

Sincerely,
Andres.

DouglasBrown Mar 6, 2007 at 1:56 PM 
Hi, I find that 'concurrent users' is, in load testing, used to mean the amount of users currently using the application, or possibly the number that are logged in. For some applications this number is not always proportional to the load. For example, a call center may have 200 customer service staff logged in, however only 150 are taking calls, and of those maybe 100 are using the application, the rest may be just talking to a customer without using the application (e.g. reading a sales script).

What I try to find out is how many users are actually doing something, i.e. creating load. If for example there are 100 users doing something and the rest are idle, it is the 100 that I will consider for the load.

pablop2006 Feb 19, 2007 at 7:55 PM 
hey guys , thanks, this information is very helpful. Could u please explain a bit more the values 'Session Average time', 'Sessions Per hour' and Concurrent users? By 'concurrent' u mean users hiting the site at the same time or active users?.
In my case I know that within an hour I can have 250 sessions but it is still not clear for me how to know how many users will be active at the same time, considering an avarage of 8 minutes per session. Following your calculations, I get a #Conc of 33 , but as I mentioned I do not know if that means 33 active users or 33 concurrent users which should have been multiplied for some figure.
thanks,
PP