pnp.gif

How To: Correlate Dynamic Data in a Load Test Transaction

Mark Tomlinson

Applies to

  • Performance Test Design
  • Performance Test Script Development
  • Visual Studio Team System Edition for Software Testers

Summary

This How-To will introduce you to the concepts and processes used to correlate dynamic data that is sent to and received from the system being tested in a performance testing project. The reader will learn the advantage of having a test script that supports dynamic data values and why it is essential to the accuracy of the load test replay. It will cover the step-by-step processes for enabling dynamic data handling in most test tools and custom load drivers, and cover the definitions for data handling in load test scripts.

Contents

  • Objectives
  • Overview
  • Parameterization and Correlation
  • Using dynamic data: why does it matter
  • Mechanisms for saving data in the test script
  • Summary of Steps
  • Additional Resources

Objectives

  • Learn why it is important to the test design to support dynamic data during replay.
  • Learn basic definitions for parameterization and correlation as it is used in the context of load testing scripts.
  • Learn how to improve the accuracy of load testing execution and results.
  • Learn process for capturing data, and reusing that data in a load test script.

Overview

Exchanging data is one of the very basic needs we have of any computer system. Right now I am writing this document by typing data into my computer and the data will be saved into a file up on the server. You are probably sitting in front of a computer screen right now reading this after downloading the data from the web, or perhaps after ordering the book online. As any user interacts with the system, and there are other systems interacting with systems; they are all exchanging data in real-time. It should be easy to assume that all the users in the world don’t exchange the same data; we are all unique, different and dynamic. Thus, for your load testing to accurately simulate the real-world, the scripts must support the dynamic and varied parameterization of data values in the script and not just use static,static or hard-coded values.

When we are planning a performance test we must start with a base definition of the transactions with will be executed during the test, simulating the real-time interactions and exchanging of data. And, when our performance test design calls for a replay of 100’s or 1000’s of users it is very likely that the data is random or diversified across the different simulated transactions. Can you imagine an online shopping site where there will be 10,000 users connecting to the system and searching for the exact same item (e.g. an Extreme Tickle-me Elmo)? Probably not. No matter how popular the new Elmo doll is, it is highly improbable that all 10,000 users will be searching for the exact same product.

Parameterization and Correlation

  • Parameterization: this is the automatic replacement of hardcoded data query values in a test script; so that different scripts will use different data. For instance, the following table shows the different search values used during a simulation for 3 users:

ITERATIONUSERSEARCH DATA
1 Bob query.aspx?str=RABBITS
Mary query.aspx?str=ELMO
Peter query.aspx?str=GUITARS
2 Bob query.aspx?str=CARROTS
Mary query.aspx?str=TICKLE ME
Peter query.aspx?str=TUBE AMPS
3 Bob query.aspx?str=ANIMAL CONTROL
Mary query.aspx?str=STOP IT, ELMO
Peter query.aspx?str=JIMI HENDRIX


You can see that as each test script repeated and continued running, different user scripts had differing values used in the ‘str’ value for the query. Parameterization is simply randomizing the data values used in a test script. The data values may come from an internal variable, an external comma-delimited text file, or from another database of values, or the parameter value may be captured from data returned by the query of the system.
  • Correlation: this describes the relationship between 2 data values used in a script; more commonly the reuse of data values as a test script is executed where data values captured at the beginning of the script execution are used again for queries later on in the script execution. The following table describes a simple flow of data used in an online ordering transaction, describing the script steps and correlation of the product id:

1 – Product Search : User opens the product search page and does a query for ‘RABBITS’ The value of RABBITS is assigned to the parameter searchStr, and posted: “query.aspx?str=searchStr”
Query returns 5 different rabbits that user can choose The results of the query are saved into a parameter table:
DescriptionProductID
Small Brown 974443
Large Brown 974444
Brown and White977321
All White974414
All Black974446


User chooses a rabbit, randomly from the list of items returned The value ‘977321’ is selected from the product table and saved in the productID parameter.
2 – Click on Buy Now to order:User clicks on the Buy now button for that Brown and White rabbit The value of productID is passed into the order page: “order.aspx?pid=productID”
3 – Enter information and confirm: User enters shipping information “order.aspx?addr=My House” User enters billing information “order.aspx?money=creditcard”
4 – Verify Order Number and Confirmation : User confirms the order Application redirects to “confirm.aspx” and the Order Number is saved into the orderID parameter.
5 – Update Order: Opens the order again to update the shipping information The value of the orderID parameter is passed into the application’s update page: “orderedit.aspx?iod=orderID&addr=My Work”

Using dynamic data: why does it matter?

There is a technical root reason that your load testing should include dynamic and varied data values: caching. When a system retrieves data from the database copies of that data are saved in memory on different components throughout the system. This can happen at many layers within the architecture for the system, from individual hard drives, storage controllers, in the operating system kernel and in the various buffer managers in the application’s software. If you don’t use dynamic data to continuously query for new data, the load test will not accurately flush the cache buffers throughout the system; and thus the system may seem to respond with faster than in the real world.

Mechanisms for saving data in the test script

A requirement of every test tool to support dynamic data correlation is a capability of storing data values as capture during real-time replay of the test script. Each individual script should have a mechanism such as a variable, or structure that allows values to be saved, changed and re-called during the execution of the test script.

Summary of Steps

The following steps explain the process for correlating dynamic data used in the majority of load testing tools.

Step 1

Detail each step of the test script and the data exchange at each step. This could be achieved with a recording of the usersusers’ interactions with the application, or the network API calls made by the application while it is being used.

Step 2

Analyze the trace of data exchanges throughout the entire sequence of the script execution to find similar data values; where data received from the system is re-submitted and sent to the system. This is called a correlated pair; where data received and data sent do have a relationship. Be careful to consider behaviors where the received data is manipulated or changed before it is re-submitted (e.g. an order number is incremented before it is saved, or the date & and time values are updated).

Step 3

Next, you can programmatically store the first occurrence of the data value in the script variable at the time it is retrieved from the system under test; using the storage mechanism in the test tool such as a script variable, memory structure or external file.

Step 4

For each of the matching correlated pair(s) replace the later occurrences of the same data value in the test script with the dynamic data value that had been saved.

Step 5

Replay the script with debugging enabled so you can watch the data values as they are received and re-submitted to the system.

Considerations for multiple related data values

There are data values that are related to each other, that must be correlated in the script at the same points. One example of this is a shipping address which ahs has multiple parameters like address, address 2, city, state and zip. In these cases you should be sure to store the dynamic data values in a structure where they can be retrieved together with a common index; where the values are part of one row of data.

Considerations for data reuse between scripts

If you need to capture data from one script, and then re-send that data to the system later on in other scripts then you must use some external mechanism for sharing that data. It is common to use an external text file, a message queue or database to enable this.

Considerations for finite input data values

Certain situations may require unique or finite data values, such as updating a bill of materials or general ledger entries for different accounts. The process for working in these situations is to separate the test script into 2 separate sections; the first to capture all the items (line items, or GL entries), and the second section to process each of the entries. It is common to use an external data source to manage these situations.

Additional Resources

<<TBD>>

Last edited Feb 7, 2007 at 3:07 AM by mycodeplexuser, version 3

Comments

rmahmood Oct 8, 2008 at 8:18 PM 
Interesting article. I have few question about this topic:
1). I saw different vendors like loadrunner, silkperformer they claim that they provide automatic correlation engine. But in fact there is no such a pure automatic
tool which can find and resolve correlations fully automatically. Human intervention always needed. If such an intervention is need for each V-user replay then
how can we emulate real user with realtime request and response?
2). Almost all articles on internet they discuss web application scenarios but what happens if i have for example a J2ee application client?
3). In case, if a real user is creating a new account on a system which needs a unique username or user-id and we have recorded this scenario in a script.
How we will know such situation while replaying and genaret unique names for maybe 100's of virtual users?