Windows Download Testing (3/2003) - DRAFT
Document revision: 0.2 - 03/19/2003
http://yendi.cc.cmu.edu/work/win/dl-03-2003.html
1.0 Introduction
This document documents the testing of Windows cluster downloads. The
purpose of this testing is to help us determine the best path to
ensure that we can download an entire room within a 6 hour period.
We will defer testing AFS as a platform for delivering MSIs due to
time constraints.
2.0 Components
These are the major subsystems in the download process:
- PXE Boot - This component listens to the boot requests and
sends down an OS image to the client machine
- MSI server - This share provides access to the MSIs which
are installed via GPO on the client machine.
3.0 Variables
These are the variables that we will change in order to help us
determine which is the best performing option.
- Run the PXE boot server and MSI server on different machines.
- Use a faster computer to perform all tasks. The current system
is a 933mhz Dell PE 2450 with 4x18GB 10,000rpm disks and a the PERC2
RAID subsystem. The faster system is a 2.8ghz Xeon Dell PE 2650 with
4x73GB 10,000rpm disks and the PERC3/DI RAID subsystem.
- Use a single compressed file MSI.
- Anti-Virus
4.0 Constants
We will keep the following items constant:
- Client machines - We will use the Cyert Cluster which has a set
of 25 identically configured client machines.
- Network - Both servers are uplinked via Gigabit ethernet. The
network configuration to the client machines will not be changed. The
testing is going on on a 'live' network so that may introduce some
variance. However, we will keep an eye on the network graphs to help
ensure that other factors are not occurring.
- Actual software - we will install the same software images for
all the testing.
- Server OS - Servers will be running Windows 2000 server Service
pack 3.
It would be also be useful to add something with the theoretical
maximums based on maximum network and disk bandwidth.
5.0 Experiments
The following experiments will be performed. The method of testing
will be to boot all machines simultaneously and timing how long it
takes for the machines to complete the download.
During this time, network and machine performance statistics should
be recorded.
5.1 Baseline
The first experiment to run is a baseline timing. We will boot all the
machines and do the download from the current system. We will record
the time it takes to perform this operation.
5.1.1 Hypothesis
We expect this to take a significant amount of time, likely greater
than 12 hours.
5.1.2 Materials/Setup
None are needed. Everything should be already set up.
5.1.3 Results
tbd
5.1.4 Analysis
tbd
5.2 Faster CPU
5.2.1 Hypothesis
The current machine, DIST, is too slow to keep up with the
clients. The current machine is a 933mhz Dell PowerEdge 2450 with
4x18GB 10,00rpm drives.
We have a new server that is a 2.8Ghz Xeon Dell PowerEdge 2650
with 4x73GB 10,000rpm drives.
While the disks have the same rotation speed, the 73GB disks have a
faster data transfer rate.
TODO: verify the RAID subsystems; verify the transfer
rate and seek time is faster.
We expect things to be faster but probably no more than 2x faster. We
expect to run into issues with network bandwidth or possibly the
maximum speed of the RAID controller.
5.2.2 Materials/Setup
- Install DIST2 with win2k sp3.
- Copy existing MSI images over from DIST
- Set up another OU and have new GPOs that point to
DIST2 and not DIST.
- Work with NG to get clients to use DIST2 instead of DIST.
5.2.3 Results
tbd
5.2.4 Analysis
tbd
5.3 Single MSI
5.3.1 Hypothesis
It will now the MSIs on the MSI server are stored as multiple files
rather than a single compressed file. By having a single file, we
expect significant performance improvements as this reduces the amount
of network traffic required, requires less disk seeking on the MSI
server, and reduces the number of requests sent to the server.
We expect the resulting time to be faster than that achieved in
experiment 5.2
5.3.2 Materials/Setup
The server used in this test will be DIST2. However the following
additional work will need to be done:
- Existing MSIs will need to be recompiled into a single file.
- A new OU and GPOs will need to be created to point to the new MSI.
5.3.3 Results
tbd
5.3.4 Analysis
This solution does impose a change in processes. A reason why the
MSIs are stored as multiple files is that it is easier to
replace files as necessary with this setup.
more tbd
5.4 Separate PXE BOOT and MSI server
5.4.1 Hypothesis
Separating the two services will provide better performance by
spreading the load between two machines.
We do not expect that this will provide much benefit if all the
machines are booted at once. What will occur is that all the machines
will be waiting for the boot server at first and then all the machines
will be waiting for the MSI server.
Possible variant: Boot half the machines. Wait for them to
finish the initial transfer and start loading MSIs. Boot the rest.
5.4.2 Materials/Setup
- Assuming that 5.2.2 has already been performed, most of the prep
and setup has been done.
5.4.3 Results
tbd
5.4.4 Analysis
tbd
5.5 Build more complete RIS images
5.5.1 Hypothesis
Downloads will be much faster if most of the MSI images are already
installed in the initial image that is downloaded to the client.
This is likely the case since there is less work the client needs
to do. This will likely end up being bounded by the speed of the
network or the speed of the server.
5.5.2 Materials/Setup
- Create the SYSPREP/RIPREP images of a cluster machine
5.5.3 Results
tbd
5.5.4 Analysis
The problem with this is that it is even harder than 5.3 to change
the images once they get built. This also takes advantage of the fact
that the machines in the clusters have a fairly uniform software
collection.
more tbd, esp. on processes
5.6 Anti-Virus
5.5.1 Hypothesis
The current version of Norton Anti-Virus on DIST appears to result
in a significant performance bottleneck. Will a faster machine
and/or the version of NAV fix this problem?
5.5.2 Materials/Setup
While we could, for the sake of completeness test all the previous
experiments with and without both the current NAV and the newer
NAV, it is likely that we don't want to spend the time doing
so.
The purpose of this test is to determine if the new NAV and the
faster hardware still poses a significant bottleneck to the best
solution.
As such, I propose we only test the following:
We should also determine whether or not NAV can scan a
compressed/single file MSI.
Part of the NAV testing should also determine if we have the
configuration right. I would presume that if we configured NAV to only
scan on WRITE then that should be sufficient.
5.6.3 Results
tbd
5.6.4 Analysis
tbd
ChangeLog
0.2 - 03/19/2003 - added a couple of missed things
0.1 - 03/18/2003 - Initial version