Windows Download Testing (3/2003) - DRAFT

Document revision: 0.2 - 03/19/2003
http://yendi.cc.cmu.edu/work/win/dl-03-2003.html

1.0 Introduction

This document documents the testing of Windows cluster downloads. The purpose of this testing is to help us determine the best path to ensure that we can download an entire room within a 6 hour period.

We will defer testing AFS as a platform for delivering MSIs due to time constraints.

2.0 Components

These are the major subsystems in the download process:
  1. PXE Boot - This component listens to the boot requests and sends down an OS image to the client machine

  2. MSI server - This share provides access to the MSIs which are installed via GPO on the client machine.

3.0 Variables

These are the variables that we will change in order to help us determine which is the best performing option.
  1. Run the PXE boot server and MSI server on different machines.

  2. Use a faster computer to perform all tasks. The current system is a 933mhz Dell PE 2450 with 4x18GB 10,000rpm disks and a the PERC2 RAID subsystem. The faster system is a 2.8ghz Xeon Dell PE 2650 with 4x73GB 10,000rpm disks and the PERC3/DI RAID subsystem.

  3. Use a single compressed file MSI.

  4. Anti-Virus

4.0 Constants

We will keep the following items constant:
  1. Client machines - We will use the Cyert Cluster which has a set of 25 identically configured client machines.

  2. Network - Both servers are uplinked via Gigabit ethernet. The network configuration to the client machines will not be changed. The testing is going on on a 'live' network so that may introduce some variance. However, we will keep an eye on the network graphs to help ensure that other factors are not occurring.

  3. Actual software - we will install the same software images for all the testing.

  4. Server OS - Servers will be running Windows 2000 server Service pack 3.

It would be also be useful to add something with the theoretical maximums based on maximum network and disk bandwidth.

5.0 Experiments

The following experiments will be performed. The method of testing will be to boot all machines simultaneously and timing how long it takes for the machines to complete the download.

During this time, network and machine performance statistics should be recorded.

5.1 Baseline

The first experiment to run is a baseline timing. We will boot all the machines and do the download from the current system. We will record the time it takes to perform this operation.

5.1.1 Hypothesis

We expect this to take a significant amount of time, likely greater than 12 hours.

5.1.2 Materials/Setup

None are needed. Everything should be already set up.

5.1.3 Results

tbd

5.1.4 Analysis

tbd

5.2 Faster CPU

5.2.1 Hypothesis

The current machine, DIST, is too slow to keep up with the clients. The current machine is a 933mhz Dell PowerEdge 2450 with 4x18GB 10,00rpm drives.

We have a new server that is a 2.8Ghz Xeon Dell PowerEdge 2650 with 4x73GB 10,000rpm drives.

While the disks have the same rotation speed, the 73GB disks have a faster data transfer rate.

TODO: verify the RAID subsystems; verify the transfer rate and seek time is faster. We expect things to be faster but probably no more than 2x faster. We expect to run into issues with network bandwidth or possibly the maximum speed of the RAID controller.

5.2.2 Materials/Setup

  1. Install DIST2 with win2k sp3.
  2. Copy existing MSI images over from DIST
  3. Set up another OU and have new GPOs that point to DIST2 and not DIST.
  4. Work with NG to get clients to use DIST2 instead of DIST.

5.2.3 Results

tbd

5.2.4 Analysis

tbd

5.3 Single MSI

5.3.1 Hypothesis

It will now the MSIs on the MSI server are stored as multiple files rather than a single compressed file. By having a single file, we expect significant performance improvements as this reduces the amount of network traffic required, requires less disk seeking on the MSI server, and reduces the number of requests sent to the server.

We expect the resulting time to be faster than that achieved in experiment 5.2

5.3.2 Materials/Setup

The server used in this test will be DIST2. However the following additional work will need to be done:
  1. Existing MSIs will need to be recompiled into a single file.
  2. A new OU and GPOs will need to be created to point to the new MSI.

5.3.3 Results

tbd

5.3.4 Analysis

This solution does impose a change in processes. A reason why the MSIs are stored as multiple files is that it is easier to replace files as necessary with this setup.

more tbd

5.4 Separate PXE BOOT and MSI server

5.4.1 Hypothesis

Separating the two services will provide better performance by spreading the load between two machines.

We do not expect that this will provide much benefit if all the machines are booted at once. What will occur is that all the machines will be waiting for the boot server at first and then all the machines will be waiting for the MSI server. Possible variant: Boot half the machines. Wait for them to finish the initial transfer and start loading MSIs. Boot the rest.

5.4.2 Materials/Setup

  1. Assuming that 5.2.2 has already been performed, most of the prep and setup has been done.

5.4.3 Results

tbd

5.4.4 Analysis

tbd

5.5 Build more complete RIS images

5.5.1 Hypothesis

Downloads will be much faster if most of the MSI images are already installed in the initial image that is downloaded to the client.

This is likely the case since there is less work the client needs to do. This will likely end up being bounded by the speed of the network or the speed of the server.

5.5.2 Materials/Setup

  1. Create the SYSPREP/RIPREP images of a cluster machine

5.5.3 Results

tbd

5.5.4 Analysis

The problem with this is that it is even harder than 5.3 to change the images once they get built. This also takes advantage of the fact that the machines in the clusters have a fairly uniform software collection.

more tbd, esp. on processes

5.6 Anti-Virus

5.5.1 Hypothesis

The current version of Norton Anti-Virus on DIST appears to result in a significant performance bottleneck. Will a faster machine and/or the version of NAV fix this problem?

5.5.2 Materials/Setup

While we could, for the sake of completeness test all the previous experiments with and without both the current NAV and the newer NAV, it is likely that we don't want to spend the time doing so.

The purpose of this test is to determine if the new NAV and the faster hardware still poses a significant bottleneck to the best solution.

As such, I propose we only test the following:

We should also determine whether or not NAV can scan a compressed/single file MSI.

Part of the NAV testing should also determine if we have the configuration right. I would presume that if we configured NAV to only scan on WRITE then that should be sufficient.

5.6.3 Results

tbd

5.6.4 Analysis

tbd

ChangeLog

0.2  - 03/19/2003 - added a couple of missed things
0.1  - 03/18/2003 - Initial version