pnp.gif

How to Identify a Disk Performance Bottleneck Using the Microsoft Server Performance Advisor (SPA) Tool

Clint Huffman

Applies To

  • Microsoft Server Performance Advisor (SPA)
  • Performance Testing
  • Performance Analysis
  • Microsoft Windows Server 2003

Summary

This How-To shows how to use the Microsoft Server Performance Advisor (SPA) tool to identify which processes and files may be causing a disk subsystem performance bottleneck on Windows Server 2003.

Contents

  • Objectives
  • Overview
  • Download
  • Summary of Steps
  • Step 1. Run and Configure the Microsoft Server Performance Advisor(SPA) Tool
  • Step 2. Collect Data
  • Step 3. Compile the Report
  • Step 4. Analyze the Report
  • Conclusion
  • Production Server Considerations

Objectives

In this module, you will learn to do the following:
  • Identify a disk subsystem bottleneck
  • Identify which processes are causing highest disk usage
  • Identify which files are causing the highest disk usage
  • Determine the data pattern (read/write bytes and I/O’s) of the disk usage

Overview

Microsoft Performance Monitor (perfmon) can gather performance counter data and Event Tracing for Windows (ETW) data, but it requires manual intervention to do the analysis. This is where the Microsoft Server Performance Advisor (SPA) picks up. The Microsoft Server Performance Advisor (SPA) tool collects performance data in the same manner as Performance Monitor. In addition, it analyzes the data and generates a detailed report on its findings.

Here is the disk related section of the SPA report:
  • Hot Files: Files Causing Most Disk I\O: This section of the report identifies specific files which are causing the most disk I\O, the process involved, and the read/write bytes and IO’s per second.
  • Disk Breakdown: Disk Totals: This section of the report identifies specific processes which are causing the most disk I\O on the physical disk.

In this how to article, we will use the SPA tool on a Windows 2003 Server to identify a disk subsystem bottleneck, identify which processes are causing the highest disk usage, identify which files are causing the highest disk usage, and determine the data pattern (read/write bytes and I/O’s) of the disk usage.

Download

You can download the Microsoft Service Performance Advisor (SPA) from the following location: http://www.microsoft.com/downloads/details.aspx?FamilyID=09115420-8c9d-46b9-a9a5-9bffcd237da2&DisplayLang=en

Summary of Steps

Here is a summary of steps:
  • Step 1. Run and configure the Microsoft Server Performance Advisor (SPA) Tool.
  • Step 2. Collect data
  • Step 3. Compile the report.
  • Step 4. Analyze the report.

Step 1. Run and Configure the Microsoft Server Performance Advisor (SPA) Tool

In order for the SPA tool to properly diagnose a performance problem, it must collect performance data from the computer when the problem is occurring.
  • Run the SPA Tool - Run the SPA tool by clicking Start, All Programs, then click Server Performance Advisor (at the root of All Programs). The SPA tool start page will appear, as shown in figure.
SPA1.GIF
If you are not familiar with SPA, then consider taking the quick tour of the SPA tool by clicking “Quick Tour”. Otherwise, continue to the next step.
  • Open the “System Overview” Data Collector Group
The data collected by Server Performance Advisor and the reports it can generate are specified by data collector groups. Data collector groups enable collection of data that is relevant to the server role of the computer, and when you install Server Performance Advisor, it automatically detects the server roles currently configured for the computer. When a role matches a data collector group included with Server Performance Advisor, that data collector group is installed automatically. You can also create your own data collector groups.

In this case, we are only interested in disk subsystem analysis which is provided in the “System Overview” data collector group.

Note: Not all data collector groups analyze data on the disk subsystem.
  • Click View, then Scope Tree. The scope tree will show.
  • Under Local Computer, Data Collectors and Reports, locate the System Overview role.
  • Configure the “System Overview” Data Collector Group
    • Set the report generation to Manual
    SPA’s data collection mechanisms are very low overhead and designed to be ran in production, but the report generation/compilation takes up a lot of resources and should be ran on another non-production server. By default, SPA will automatically generate the report after the collection period. Therefore, we will set the report generation to manual, so we can compile it on another server.
    • In the scope tree, right-click the “System Overview” role, and then click Properties. The System Overview Properties data sheet shows.
    • Set Generate to Manual.
    SPA2.GIF
    • Set the Data Collection Interval - Click the Schedule tab and set the Duration to the desired collection period in seconds. Keep in mind that SPA gathers a large amount of data quickly, so keep the collection interval as low as possible.
    SPA3.GIF
    • Click OK on the System Overview Properties window.
    • Set the Disk Utilization Thresholds
    Prior to using SPA v2.0 for disk analysis, it is necessary to set the disk utilization thresholds according to the I\O’s per second that your physical disks are expected to perform at. The following steps show how to set the disk utilization thresholds:
    • Click Edit, then select Rules.
    • Locate the Disk Utilization thresholds and set them to the performance specifications of your locally attached physical disks.
    • Scroll to the bottom and click Apply. This will persist the new threshold settings. Keep in mind, this change affects all of the data collectors in SPA.

    Step 2. Collecting Data

    SPA’s analysis will focus only on the data collected during the collection period. Therefore, it is paramount to choose the appropriate collection duration and appropriate time. Ideally, you want to run SPA just prior to the performance problem and stop just after the performance problem is gone.

    Configure SPA to collect data when the just before or during high disk activity:
    • Start the System Overview Data Collector Group
      • Select the System Overview data collector group, then click the green play button. Alternatively, you can click Record
    • Wait for SPA to automatically stop
      • The SPA tool will automatically stop collecting data when the elapsed time equals the Duration setting of the data collector.

    Normally, SPA would compile the report after the collection period, but we set it to manual compilation.

    Step 3. Compile the Report

    After SPA has finished collecting data, the data must be manually compiled.

    The following steps show how to compile the report on another server:
    • Move or copy the data to another server with SPA installed.
      • Copy the contents of the SPA Data to the respective Data directory on another computer with SPA installed. For example, if both servers are using default installations, then copy the data “C:\PerfLogs\Data” to the “C:\PerfLogs\Data” on the server where you intend to compile the report.
    • At a command prompt on the server where you want to compile the report, change directory to the SPA installation directory, then type:

    spacmd compile “System Overview”

    Compilation of the report can take a long time depending on how long your data collection period was. In addition, compiling a report can take up a large amount of resources.

    Step 4. Analyze the Report

    Once the report is generated, we need to review the report to see what is causing our disk bottleneck.
    • Locating the report: To review the report, open the SPA application and click on the icon with the red clock to see a list of reports that the server had generated.
    SPA4.GIF
    SPA5.GIF
    The reports are listed by computer, year, month, day, and time corresponding to when the data was collected. Select the report that corresponds with when the performance problem occurred.

    Note: The symbols in the Status column are relative to weather forecasts. For example, a cloudy symbol represents a server under distress while a sunny symbol represents a relatively idle server.
    • Overview of the Report: After selecting the report, the report shows. The Summary section of the report shows us that cidaemon.exe is taking up 11% CPU and a file in the catalog.wci directory is using the most disk I/O.
    SPA6.GIF
    The SPA tool will analyze the performance of the system. If it has significant findings, then it will show its recommendations in the Performance Advise section. The Performance Advise Section will only show if there are any significant findings.

    Next, the System Health section is an overview of the overall health of the 4 subsystems of the computer.
    SPA7.GIF
    As you can see here in the System Health section, SPA has detected a disk performance bottleneck. Normally, 78 I/Os per second is not considered to be high usage for a fast, locally attached hard disk. In this case, we ran our tests on a slow, externally attached hard disk and adjusted SPA’s thresholds accordingly.
    • Analyzing the Disk SubSystem Performance: In this section we will look at more details of the disk response times and discover which processes and files are involved.
      • Analyzing Disk Response Times: To determine if the disk subsystem is responding poorly, we need to look at the response times of the disks. To look at the details of the disk response times, then we need to look at the System Monitor view of the report. Click on the System Monitor icon at the top of the report.
    SPA8.GIF
    • Clear the existing counters by clicking the “New counter set” button in the upper left hand corner.
    • Next, click the “Add” (plus sign) button to add counters.
    • Add all of the instances for the “Physical Disk\Avg. Disk sec/Read” and “Physical Disk/Avg. Disk sec/Write” counters. These counters are how long the disk responded in seconds.
    • The System Monitor will show the counter values. We are looking for times when the response times were greater than 15ms (milliseconds) which is (0.015 seconds). 15ms is certainly not a hard threshold in determining if a disk is slow, but it can be used as something to go by. For example, some consider 10ms or even 20ms to be the deciding point.

    In the chart below, all values above the black line (15ms) are considered a long response time and considered to be a bottleneck.
    SPA9.GIF
    • Based on this data, we can conclude that C: drive (thin red line and thin green lines) has significant disk latency loads and is a performance bottleneck on the system.
    • Identify the files and processes consuming the most disk I/O: Now that we have identified a disk bottleneck, let’s see which processes and files are involved with the bottleneck.
      • Navigate to the Disk, Disk Breakdown, Disk Totals section.
      SPA10.GIF
      In this section, we see a breakdown of each of the physical disks on the system and the processes that are most active on the disk. In this case, we see the cisvc.exe (Indexing Service) consuming the most I/O of physical disk 0 (C: drive).
      SPA11.GIF
      • Next, navigate to the Disk, Hot Files, “Files Causing Most Disk IOs” section.
      SPA12.GIF
      In this section, we see a breakdown of the files consuming the most disk I/O. Each breakdown shows the respective processes involved with that file and it’s data patterns (Read/sec, Kb/Read, Writes/sec, and Kb/Write). In this case, the Indexing Services’s catalog files are causing the most I/O on the disk.

      Note: The Summary Section at the beginning of the report shows the file taking up the most I/O.
      SPA13.GIF
      • Next Steps

The next steps are to first try to make the process taking up the most disk I/O more efficient. After the process is made as efficient as possible, then consider additional hardware to make the physical disk faster for this kind of disk I/O. For example, if high write I/O is the problem, then consider RAID0+1 because RAID5 has a 4 to 1 hit ratio for write operations. For more information on RAID type considerations, see the “RAID Type Considerations” below.

Disk optimization is large subject on its own and beyond the scope of this document. In this case, Index Server was misconfigured to index its own catalogs, so changing its catalog settings would make it more efficient.

Conclusion

The Microsoft Server Performance Advisor (SPA) tool is very good at showing which files and processes are causing the most disk I/O.

Last edited Feb 10, 2007 at 3:15 AM by mycodeplexuser, version 6

Comments

ArntK Jan 20, 2011 at 2:27 PM 
Can I run SPA on a Windows 7 PC or at least view the reports by copying them to the Windows 7 PC?