pnp.gif

How To: Model the Workload for Web Applications

J. D. Meier, Prashant Bansode, Carlos Farre, Mark Tomlinson, Scott Barber

Applies To

  • Web Applications
  • Performance Testing

Summary

This how to shows you how to model the workload for Web Applications. For performance testing to yield results that are directly applicable to understanding the performance characteristics of an application in production, the tested workloads must represent reality. To create a reasonably accurate representation of reality one must understand the business context for the use of the application, expected transaction volumes in various situations, expected user path(s) by volume and other usage factors.

Contents

  • Objectives
  • Overview
  • Summary of Steps
  • Step 1 – Identify the Objectives
  • Step 2 – Identify Key Scenarios
  • Step 3 – Determine Navigation Paths for Key Scenarios
  • Step 4 – Identify Unique Data for Navigation Paths and / or Simulated Users
  • Step 5 – Determine Relative Distribution of Scenarios
  • Step 6 – Identify Target Load Levels
  • Step 7 – Prepare to Implement the Model

Objectives

  • Learn how to construct realistic workload models for web applications

Overview

Workload modeling is the process of identifying one or more composite application usage profiles of interest for use in performance testing. A workload model contains data related to such items as:
  • Key user activities.
  • Navigation path(s) related to those activities.
  • Relative distribution of users and/or activities across the target load.
  • Unique data and application interaction patterns for each user to be simulated.

It is certainly true that simulating unrealistic workload models can provide valuable information to a team while conducting performance testing, but it is only when realistic workload models are simulated that predictions about performance in production be made, or that performance optimizations for production be accomplished.

Summary of Steps

  • Step 1 – Identify the Objectives
  • Step 2 – Identify Key Scenarios
  • Step 3 – Determine Navigation Paths for Key Scenarios
  • Step 4 – Identify Unique Data for Navigation Paths and / or Simulated Users
  • Step 5 – Determine Relative Distribution of Scenarios
  • Step 6 – Identify Target Load Levels
  • Step 7 – Prepare to Implement the Model

Step 1: Identify the Objectives

The objectives of creating a workload model typically center around ensuring realism of a test, or designing a test to address a specific requirements, goal or performance testing objective. (For more info see {HowTo:Quantify End User Requirements and HowTo:Determine Performance Testing Objectives}. When identifying the objectives work with targets that will satisfy business requirements. Below is the key input for working towards building the objectives:
  • What is the current or predicted business volume over time? For example what is the number of orders placed over time and all other activities supporting it? Number of searches, browsing, logging etc.
  • How is the volume expected to grow over time? This will project future needs, like business growth, merge of companies, new products incorporated etc.
  • What is the current or predicted peak load level? This will reflect activities that support sales and other critical business processes, like marketing campaigns, newly shipped products, time sensitive operations like stock exchange dependent of eternal markets, etc.
  • How quickly the peak load levels are expected? This will reflect the burst of requests to the business: how fast is the ramp up for demand when the event happens?
  • How long are the peak load levels? This answers for how long the new demand needs to be sustained before exhaustion of a resource compromises the service level agreements. For example, a notification of an economic announcement may cause the currency exchange market to experience a longer activity that may prolong for 2 and 3 days, as opposed just a few hours.

This information can be gathered from Web server logs, from marketing, reflecting business requirements or from stakeholders. Below are some objectives identified during this process:
  • Ensure one or more models represent the peak expected load of X orders being processed per hour.
  • Ensure one or more models represent the difference between “quarterly close-out” period usage patterns and “typical business day” usage patterns.
  • Ensure that one or more models represent business/marketing projections for up to one year out.

It is acceptable if these objectives only make sense in the context of the project at this point. The remaining steps will help you fill in the necessary details to achieve the objectives.

Considerations

  • Throughout the process of creating workload models, remember to share your assumptions and drafts with the team and solicit their feedback.
  • Don’t get overly caught up in striving for perfection and don’t fall into the trap of oversimplification. In general, it’s a good idea to start executing tests when you have a testable model and enhance the model incrementally while collecting results.

Step 2: Identify Key Scenarios

It is typically somewhere between impractical and impossible to simulate every possible user task or activity in a performance test. As a result, whether identifying what users do by analyzing server logs, observing usability studies, interpreting marketing material or starting with your best educated guess, you will probably want to apply some limiting heuristic to the number of activities, or key scenarios you identify for performance testing. You may find the following limiting heuristics useful:
  • Include the key scenarios implied or mandated by the objectives.
  • Include the most common activities.
  • Include high visibility activities. For example, a user may only register on your web site once, but if that is a bad experience, they may never return.
  • Include business critical activities. Placing an order may not be common or highly visible on your website, but if users can’t place orders, you lose revenue.
  • Include performance intensive activities. Even if these activities are extremely rare, when they happen they can have a significant system wide impact on performance.
  • Include activities whose performance is mandated by contract, SLA or an influential stakeholder.

Below are an example of key scenarios identified for an eCommerce application:
  • Browse
  • Create User Account
  • Search
  • Login
  • Place Order

Considerations

  • Think about system users and batch processes as well as human end-users. For example there might be a batch process that runs to update the status of orders while users are performing the activities in the site. Account for those processes because they might be consuming resources.
  • For the most part, web servers are very good at serving text and graphics. Static pages with average sized graphics are probably less critical than dynamic pages, forms and multi-media pages.

Step 3: Determine Navigation Paths for Key Scenarios

Human beings are unpredictable and web sites commonly offer multiple paths to accomplish the same task or activity. Even with a relatively small number of users, it is almost certain that real users will not only use every path you think they will to complete a task, they will inevitably invent some that you hadn’t thought of. Each path they take to complete an activity will put a different load on the system. That difference may be trivial, it may be enormous. There is no way to be certain until we test it. There are many methods to determine navigation paths to complete a task or activity. Some include:
  • Identify user paths within your web application expected to have significant performance impact and that accomplish one or more of the identified key scenarios
  • Read design and/or usage manuals
  • Extract the data from log files
  • Try to accomplish the activities yourself
  • Observe others trying to accomplish the activity without instruction.
Once the application is released for unscripted user acceptance testing, beta testing or to production, you will be able to determine how the majority of users accomplish activities on the system under test. It is always a good idea to compare your models against reality and make an informed decision about whether to do additional testing based on the similarities and differences found.

Apply the same limiting heuristics to navigation paths as you did when determining activities to decide which paths you want to include in your performance simulation.

Considerations

  • Some users will complete more than one activity during a visit to your site.
  • Some users will complete the same activity more than once per visit.
  • Some users may not actually complete any activities during a visit to your site.
  • Navigation paths are often easiest to capture using page titles.
  • If page titles don’t work, or aren’t intuitive for your application, the navigation path may be easier defined by steps the user takes to complete the activity.
  • First time users frequently follow a different path to accomplish a task than users experienced with the application. Consider this difference and what percentage of new vs. return user navigation paths should be represented in your model.
  • Different users will spend a different amount of time on the site. Some will log out, some will close their browser and others will leave their session to time out. Take these factors into account when determining or estimating session durations

Step 4: Identify Unique Data for Navigation Paths and / or Simulated Users

Unfortunately, navigation paths alone don’t provide all of the information required to implement a workload simulation. To fully implement the workload model, several more pieces of information are needed. This information includes items such as:
  • How long users may spend on a page
  • What data may need to be entered on each page
  • What conditions may cause a user to change navigation paths

Below is an example of unique data identified for an eCommerce application:
Implementation Data
Scenario Page/ Step Data Inputs Data Outputs Think Time
Login Login page Username (unique), Password (matched to username) 6 – 9 Sec, Random
Browse
Login Page Username (unique), Password (matched to username) 6 – 9 Sec, Random
Browse Catalog Tree/Structure (static), User Type (weighted) Product Description, Title, Category 4 – 60 Sec, Random

Considerations

  • Performance tests frequently consume large amounts of test data. Ensure you have enough.
  • Using the same data repeatedly will frequently lead to invalid performance results.
  • Especially while designing and debugging performance tests, test databases can become dramatically overloaded with data. Periodically check to see if the data base is storing unrealistic volumes of data for the situation you are trying to simulate.
  • Finally, consider including invalid data in your performance tests, for example, include some users who mistype their password on the first attempt, but get it correct on a second try.
  • The best possible test data is test data collected from a production database or log file.
  • Client side caching. First time users will be downloading every object on the site. Frequently returning users are likely to have many static objects and/or cookies stored in cache locally. When capturing the uniqueness of the user, consider whether that user represents a first time user or a user with an established client-side cache.

Step 5: Determine Relative Distribution of Scenarios

Now that you’ve determined what scenarios you want to simulate and what the steps and associated data are for those scenarios, you need to determine how often each scenario needs to be simulated relative to the other scenarios to complete the workload model. Sometimes, one workload model is not enough. Research and experience tell us that, user activities often vary greatly over time. To ensure test validity, we must validate that activities are evaluated by time of day, day of week, day of month and time of year. As an example, consider an on-line bill payment site. If all bills go out on the 20th of the month, the activity on the site immediately before the 20th will be focused on updating accounts and importing billing information, etc. by system administrators, while immediately after the 20th, customers will be viewing and paying their bills until the payment due date of the 5th of the next month. The most common methods to determine the relative distribution of scenarios are:
  • Extract the actual usage, load values, common and uncommon usage scenarios (user paths), user delay time between clicks or pagesand input data variance to name a few directly from log files.
  • Interview the individuals responsible for selling/marketing new features, you will find out what features/functions are going to be expected, and therefore most likely used. By interviewing existing users, you may also determine which of the new features/functions they believe they are most likely to use.
  • Deploy a beta release to a group of representative users, roughly 10-20% the size of the expected user base and analyze the log files from their usage of the site.
  • Run simple in-house experiments using employees, customers, clients, friends, or family members to determine, for example natural user paths and the page-viewing time differences between new and returning users.
  • As a last resort, you can use your intuition, or best guess, to estimate based on your own familiarity with the site.

Below is an example of the distribution of scenarios for an eCommerce application:
Work Distribution
User Scenarios % of Work distribution
Browse 50
Search 30
Place Order 20
Total 100

Considerations

  • Create visual models and circulate them to both users and stakeholders to review/comment.
  • Ensure the model is intuitive to both non-technical users, technical designers and everyone in between.
  • Ensure the model contains all of the supplementary data needed to create the actual test.
  • It is during this step that you would account for user abandonment if applicable to your application. {for more on user abandonment see HowTo:AccountForUserAbandonment}

Step 6: Identify Target Load Levels

While it is frequent that each workload model will be executed at a variety of load levels and that changing the load level is very easy to change at run time in most load generation tools, it is still important to identify the expected and peak target load levels for each workload model for the purposes of predicting or comparing to production conditions. Below are the inputs and outputs for determining target load levels:

Inputs

  • Business volume current and projected mapping to objectives
  • Key scenarios
  • Distribution of work
  • Session characteristics: navigational path, duration, percentage of new users.

Volume

Below information can be extracted from Web server logs, marketing or stakeholders:

Time Period Business volume (# sessions to the web site) Business volume (# sessions, peak values) Peak Load increase Peak build up time Peak Duration
Monthly 460789 1360890 2.95 1 hour 2 hours
Daily 15359 45363 2.95 1 hour 2 hours
Hourly(15 hour traffic) 1023 3024 2.95 1 hour 2 hours


OutputCombining the volume information with objectives, key scenarios the user delays, navigation paths and scenario distributions from the previous steps, you can determine the remaining details necessary to implement the workload model at a particular target load.

Example:
Total hourly sessions: 1023
Total hourly sessions( peak ): 3024
Place Orders 20% : 205
Place Orders 20% ( peak ) : 605
Session Average time = 18 minutes
Sessions Per hour = 60/18=3.7
Total Users to produce 205 orders= 61 ( 205/3.7)
Total Users to produce 605 order= 181(605/3.7)

User Scenarios % of Work distribution Total Normal sessions Total Peak sessions Session duration minutes # concurrent Users Normal # concurrent users peak %Percentage New Users
Browse 50 512 1512 15 128 378 10%
Search 30 307 907 7 36 106 10%
Place Order 20 205 605 18 61 181 12%
Total 100 1023 3024 12%

Considerations

  • See {HowTo: Determine Load Levels (or whatever it gets called) for a discussion about concurrent users, overlapping active user sessions}
  • Changing load levels even slightly can sometimes change results dramatically, don’t become a victim of inertia.

Step 7: Prepare to Implement the Model

Preparing to implement the model is tightly tied to the method of implementation model, typically a load generation tool. For more information about implementing a workload model using VSTS see {HowTo:SomeName}.

Considerations:

  • Don’t change your model blindly because the model is difficult to implement in your tool.
  • If you cannot implement your model as designed, ensure that you record the details about the model you do implement.
  • Implementing the frequently includes identifying metrics to be collected and determining how to collect those metrics see {HowTo: DoMetricsStuff} for more information.

Additional Resources


Last edited Feb 7, 2007 at 4:11 AM by mycodeplexuser, version 3

Comments

No comments yet.