Server Summary - Q1 2003

Document revision: 0.5 - 02/24/2003
http://www.andrew.cmu.edu/~wcw/work/servers/2003-q1/

1.0 Introduction

This document provides a summary of the current servers and services managed by Infrastrcture and Middleware Services (ISAM) pf Computing Services. We will discuss hardware, software, and any comments about the future of the services.

In addition, we provide two additional sections on the physical and environments on the machine room from the ISAM perspective.

This document does not discuss development or other non production hardware.

2.0 General Thoughts

2.1 Solaris and Linux

The general direction that we have been proceeding is migrating hardware from Sun Sparc/Solaris to Intel Linux systems. The driving factor for this movement is the low cost of Linux systems and high price for the value of Sun/SPARC systems.

The Solaris operating system has many redeeming features, however the low end Sparc equipment lacks the CPU performance or the features (onboard gigabit ethernet) that are commonly found on the Intel systens.

Running Solaris on Intel is not an overly attractive option as we have concerns about hardware compatibility.

FreeBSD is another option that may be considered.

Consideration of support for Solaris/Intel and FreeBSD would likely become driven by whether or not SPARC remains a viable platform. It could also be that we would look at having these new platforms as a "server only" platform.

Because of budgetary reasons, we have been limiting and delaying the purchase of new sparcs with the hope that Sun will come out with something better.

2.2 Replacement Cycle

We would like to limit the age of servers to 4 years. The failure rate increases significantly, especially with hard disk drives, beyond this time. Also, based on current trends, four year old hardware can be more than four times slower than new systems.

Unfortunately, we do not have the budget to do this. As such, when newer hardware is obtained and it replaces hardware less than 5 years old, we will reallocate the older hardware to replace even older hardware.

This "trickle down" strategy is the most cost efficient in terms of raw dollars. The older hardware usually does not have any significant resale value. However, the cost in this strategy is that it tends to expend more staff time.

2.3 Principles

Here are some general principles we believe in:

Scale horizontally - Rather than go with a single (or fewer) large machines, try to have more smaller machines. Usually, this will result in better performance and the cost of many smaller machines is usually less than a single larger machine. The typical industry problems with horizontel scaling are administration and space. We tend not to have these problems since package/depot helps us keep the machines in sync and adding an additional machine is very little work. We currently have plenty of space in A100.
Put unrecoverable data on RAID - If there is data that can not be regenerated from other sources or would take too long to regenerate, try to put it on RAID.
Avoid software RAID - ...especially on boot devices. The confusion factor and the difficulty in recovery, especially if there are errors, just aren't worth it.

2.4 Thoughts of the future

SANs with iSCSI

Also, many of the SAN characteristics -- being able to easily add additional space to a service, be able to move data from one server to another -- are implemented in the software that we use (AFS, Cyrus Murder).

However, iSCSI promises to, at least, eliminate the cost of the second network. If pricing does drop and the model enables a a distributed storage model, should we start taking advantage of this technology?

3.0 Services

3.1 Email

3.1.1 Email Software

IMAP server software is the Cyrus Murder. Details on the system can be found at http://asg.web.cmu.edu/cyrus/ag.html.

The MTA software is Sendmail.

Spam filtering is provided by SpamAssassin (http://www.spamassassin.org) and SIEVE.

Webmail via Squirrellmail (http://www.squirrellmail.org).

IMSP service is provided for IMSP aware clients. IMSP allows users addressbooks and options to be available from any machine. The IMSP software is home grown.

We are considering going to the commercial IMSP server by Cyrusoft. No other changes in core software is expected.

Mailing list software via Majordomo.

3.1.2 Email Hardware

Email Hardware Summary
Type	OS	Config
Front-End	Solaris	Ultra80. 2GB memory
Back-End	Solaris	E220R. 2GB memory. qty 2 - 4x36GB RAID5
IMSP	Solaris	Ultra2170. 1GB memory
Webmail	Linux	Dell GX250
Mupdate	Linux	Dell PowerEdge 2450 server
MX	Linux	Dell PowerEdge 2450 server
SMTP Submission	Linux	Dell PowerEdge 2450 server
mailing lists	Linux	Dell PowerEdge 2650; 2x73GB mirror; 1GB memory

One back-end server provides only bboard and netnews.

Three back-ends currently provide 219GB of usable space for user mail storage.

SMTP and MX servers will likely need to be refreshed no later than FY2005/2006 time frame. The addition of virus scanning software may accelerate the refresh as it would put additional CPU demands on these machines.

We plan to increased the number of front-end servers. A cheaper workaround may be to just increase the memory. This should be done in early FY2004.

Disk utilization is around 65%. We plan to increase quotas so an additional back-end server will likely need to be purchased in early FY2004.

Webmail utilization appears to be lower than expected. If there is a spike, additional hardware would be required. However, the hardware being used for webmail is relatively inexpensive and adding additional webmail capacity is fairly straightforward. There are also PHP accelerators that may be purchased that could improve performance.

Mailing list server was just upgraded.

3.1.3 Email Futures

The following items are currently under discussion.

Virus filtering - The primary impediment is cost.
Enhancements to spam filtering - using MAPS/RBL (http://mail-abuse.org/rbl+). The cost is not high and we will likely bundle this with the virus filtering proposal.
Web Access to bboards - A prototype is available but better UI work may be required. The prototype is currently available. For example, Computing Services staff can view the bboard org.acs.general via http://bboard.andrew.cmu.edu/bb/org.acs.general.
Increased quotas - The primary impediment is cost. We already have a cost recovery quota system.

The following items are ideas that still need to be explored:

Bboards as mailing lists -- For those that want bboard postings mail to their INBOX since their mail client can't deal with so many folders or informing the user when new messages are available.
Addressbook sync -- Expanding the IMSP addressbook service to provide addressbook access/synchronization for all clients. This way you could have the same addressbook between webmail, pine, mulberry, outlook, etc.
Online backup is another possibility. See section 3.11 details.

3.2 Calendar

3.2.1 Calendar Software

Oracle/Steltor Corptime. A web interface is also available.

There are some concerns with the Oracle acquisition of Steltor and whether or not the direction that Oracle wishes to take will fit with our direction. However, at this point, there do not appear to be any immediately viable alternatives.

Event calendaring is something we know we want to do and are currently working on requirements. Event calendaring is being bundled with the Portal project and so future discussions on this topic may be under that umbrella.

3.2.2 Calendar Hardware

Calendar Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
Calendar Server	Solaris	Ultra60 2360; 1GB memory; RAID

The hardware will likely need to be refreshed no later than FY2005. Moving to Linux may be problematic as there is currently no Linux client.

Utilization has not resulted in performance problems.

3.2.3 Calendar Futures

We need better collection of performance data to track usage and properly estimate when additional upgrades are needed.

At this point, there are no plans to replcate Oracle/Steltor with a different product.

3.3 Web

3.3.1 Web Software

Apache 1.3.x is the current core web server.

Web publishing on www.cmu.edu is done with custom software. Publishing to www.andrew.cmu.edu is done with different custom software. There has been some consideration to moving www.andrew publishing to use the same as www.cmu but there are still details to work out.

Web authentiction is using WebISO.

A web proxy service (to allow access to IP restricted web pages) was written in house.

3.3.2 Web Hardware

Web Hardware Summary
Type	OS	Config
www.cmu.edu	Solaris	E220R
www.andrew.cmu.edu	Solaris	E220R
publishing servers	Solaris	E220R
cgi.andrew.cmu.edu	Solaris	Ultra 1 170
webiso.andrew.cmu.edu	Linux	PowerEdge 2450
web proxy	Linux	GX260

Load on the web servers have generally not exceeded capacity. There are occasional instances where load has spiked was due to poorly written CGIs or people abusing CGI.

The primary web servers will likely need to be refreshed in the FY2005/FY2006 time frame.

The CGI server is most need of immediate replacement.

The CGI service is also tricker and additional hardware may be required as CGIs may run as different entities or have different security characteristics. For example, a password change CGI (not deployed yet) should likely run on its own machine. CGIs that have passwords to access back end services (i.e. sieve) should be separated from other CGIs that do not need this authentication or needs some other authentication.

Note that the web publishing system uses AFS to store and manage the data. The data is also duplicated at least 2x due to hooks for revision control. As such, any significant increases in data will also require AFS capacity to be increased.

If we do not change the publishing system for www.andrew.cmu.edu we should move the data to a RAID unit. While the data is mirrored in AFS space, regenerating an exact copy after a disk failure may not be possible and "bad data" could get out. For example, a user may have updated their AFS web space but was not ready to actually publish. If we force publish the data to recover from a disk failure then we've changed their web state and possibly replaced "good" pages with "bad."

Switching to Linux should be a relatively straightforward option if desired.

3.3.3 Web Futures

The immediate future is tied in with the portal. Some of the hardware refresh may also not be necessary as the portal takes on the role of primary campus web server.

Supporting departmental or user CGIs, PHP, etc. may require additional hardware resources.

With HTTP being stateless, this provides opportunies for having redundant servers. This option should be considered if better availability is a goal.

3.4 Database

3.4.1 Database Software

Oracle 8 is used in a hot standby fashion. If one server fails then another can be brought up quickly.

We provide two oracle instances. The first is for our own use. The other is dedicated to Blackboard.

3.4.2 Database Hardware

Database Hardware Summary
Type	OS	Config
Primary Server	Solaris	E220R. 1GB memory
Backup Server	Solaris	E220R. 1GB memory
Blackboard Servers (qty 2)	Solaris	E220R. 1GB memory

The current systems has 40GB of RAID5 table space and 10GB of log space using 4x18GB disk drives. Utilization is under 40%. Hardware refresh by FY2006 is likely to be needed.

The systems can be expanded by 1GB of physical memory. The RAID unit has 4 empty slots and so another 4 disks of up to 180GB per disk can be added.

3.4.3 Database Futures

No significant changes are currently planned.

There has been some thought to experiment with Oracle and Linux as well as other database systems such as Postgres.

3.5 Unix Servers

For each supported system type, we provide a set of Unix Servers. The unix servers are used for general unix usage. Currently, it appears that the bulk of the use is for homework assignments requiring Unix and for email (Pine).

The default pool was recently switched to Linux.

3.5.1 Unix Server Software

There is no special software for Unix servers.

3.5.2 Unix Server Hardware

Unix Server Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
n/a	Linux	Dell GX260 small form factor. 2ghz+. 1GB memory
n/a	Solaris	Ultra 80, Ultra 30 systems

Hardware is generally on a 3-4 year upgrade cycle where new systems are added to the pool and the oldest systems are phased out.

Linux systems were recently purchased. Next purchase is likely FY2005 or later.

Sparc systems should likely be refreshed in FY2005 as well. However, given that the default pool has been moved to Linux, the usage of the Sparc systems have dropped and we may be able to defer purchase until a later date.

3.5.3 Unix Server Futures

No significant changes in strategy are expected.

Increased utilization may occur if Clusters decides to remove unix desktops from clusters and requiring those that need unix cycles to connect remotely to a Unix server. This option is unlikely to occur at this time.

3.6 Directory

3.6.1 Directory Software

The current software in use is OpenLDAP. We have been using CVS nightly updates instead of specific releases.

The OpenLDAP software has not been updated for some time and should be synchronized with the current release this quarter.

Various CGIs are being run on the master server (metadir) to avoid having to authenticate. These CGIs should likely be moved off to better partition the services.

3.6.2 Directory Hardware

Directory Server Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
Master	Solaris	E220R w/RAID
Replicas	Linux	Dell GX260 small form factor. 2ghz+. 1GB memory

There are currently no performance issues with the existing hardware.

3.6.3 Directory Futures

The directory replicas are already using load balanced DNS based on the ldap.andrew.cmu.edu name. However, many clients cache the IP address of a specific machine, thus if that machine fails service becomes unavailable. To avoid this problem, one would need to load balance multiple machines using a single IP address and not just a single name. To do so, we would need purchase a hardware box. The total cost is around $50K.

Increased load on the directory service is expected when we cut over /etc/passwd lookups to use the directory instead of the actual file. This may require additional servers.

We have discussed the possibility of putting an LDAP interface to administrative queries. For example, a LDAP query would result in a quota query being issued to the cyrus servers. If this is to be implemented additional servers would be required.

There are a number of CGI being run on the master directory server. It may be worth separating the services to avoid running too many services on a single machine.

There are plans to master the directory data in Oracle instead of having the data represented only in the LDAP server's database. This will allow for better consistency and flexibility.

3.7 AFS File Service

3.7.1 AFS Software

The software we are using for AFS service is OpenAFS 1.2.

3.7.2 AFS Hardware

AFS Server Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
Replication	Solaris	Ultra1 2x18GB disks. 320MB memory
User	Linux	Dell PE 2650 4x73GB RAID5. 1GB memory

User file servers provide the general file service. User home directories, project volumes and other "important" data is stored on this class of servers.

Replication file servers provide software binary images and AFS infrastructure items. This replication is used for availability -- in case a server goes down, another one can be used, and for load balancing -- clients can connect to any server thereby sharing the work load. AFS replication is fairly static. One needs to issue a command for changes to appear. As such, it is most suitable for providing multiple copies of read-only data.

The current hardware goal is to finish standardizing on Linux. There is currently an outstanding float request to do this.

3.7.2 AFS Futures

Students do not have sufficient central working space for their projects and so require to carry around Zip disks, floppy disks, or other media. It would seem a productivity boon to provide them with a significant amount of central fileserver space. At this point, it would be good if we could give students 1GB quotas. Assuming a student body of 5000 then that will require 5TB of space if we do not overallocate (usually we overallocate though). That's about 20 Linux servers (using 73GB disks): $120K. Backup costs would be in the neighborhood of another $40K.

Online backup is another possibility. See section 3.11 for details.

3.8 Windows File Service

Windows file service is not provided as a general service. There are two specific instances of Windows file service: DIST for in-domain (predominately clusters) and NTFS1 for DSP clients.

3.8.1 Windows FS Software

Software used is the native file sharing mechanism provided by Windows 2000.

3.8.2 Windows FS Hardware

Windows FS Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
DIST	Win2k	PE2450 4x36GB RAID5; 1GB memory
NTFS1	Win2k	PE2650 4x735GB RAID5; 1GB memory

3.8.3 Windows FS Futures

The expansion of Windows file system service is likely to depend on the adoption of AFS clients on windows. If AFS becomes popular, there may only be niche use of Windows file service.

3.9 Macintosh File Service

Macintosh file service is similar to Windows File service: there is a download/boot server for clusters and NTFS1 provides AFP service to DSP clients.

An AFS client is available for MacOS X and is in active use in the Clusters.

3.9.1 Mac FS Software

Cluster download service is provided via the native filesharing from MacOS X.

DSP clients can get AFP service via Windows 2000 Services for Macintosh.

3.9.2 Mac FS Hardware

Mac FS Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
netboot	MAcOS X	Apple Xserve
NTFS1	Win2k	PE2450 4x735GB RAID5; 1GB memory

3.9.3 Mac FS Futures

Similar to Windows?

3.10 Windows Infrastructure

3.10.1 Windows Infrastructure Software

Windows 2000 is providing the domain infrastructure.

3.10.2 Windows Infrastructure Hardware

Windows Infrastructure Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
AD domain	Win2k	PE2450 2x36GB mirror; 1GB memory
ANDREW AD	Win2k	PE2450 2x36GB mirror; 1GB memory

3.10.3 Windows Futures

The future of the windows infrastructure is still being determined.

3.11 Backup

This section does not cover the Windows/Macintosh backup service.

3.11.1 Backup Software

AFS backups are done by the Stage software: an internally written AFS backup system.

Amanda backs up the local disk of servers. It is scheduling system that wraps the standard unix dump utility. Amanda was developed by the University of Maryland.

3.11.2 Backup Hardware

Backup Hardware Summary
Type	OS	Config
AFS	Linux	PE2650 8x180GB; 1GB memory
Internal Amanda	Solaris	E220R 8x180GB; 1GB memory
Cyrus Amanda	Solaris	E220R 2- 8x180GB; 1GB memory
Cost Recovery Amanda	Solaris	Ultra60 8x180GB; 1GB memory

Note that all backups are done to RAID.

3.11.3 Backup Futures

Archival policies - more flexible policies on what data gets archived and what does not may be needed.

Online backups - Given that we are backing up to RAID, instead of having the backups accessible only by system administrators, one could make the backups directly accessible by the enduser. This way restores could be done 'on demand' by the user without any staff intervention.

3.12 Monitoring and Graphing

3.12.1 Monitoring and Graphing Software

The software being used is a modified version of mon with custom integration with the Network Group databases.

Graphing uses RRDTOOL as a base. There is some custom code for managing the graphs known as Hammer. The plan is to move towards Cricket.

3.12.2 Monitoring and Graphing Hardware

Monitoring Hardware Summary
Type	OS	Config
monitor	Linux	Dell GX260; 1GB memory
netsage	Linux	PE1650 2x73GB mirror;; 1GB memory
graphs	Linux	PE2450 4x36GB mirror;; 512GB memory

3.12.3 Monitoring and Graphing Futures

This project is actively being worked on. There are no expected shifts in strategy.

Not yet in scope is the whole area of distributed data collection and analysis.

3.13 Printing

3.13.1 Printing Software

We are currently using an BSD LPD code base that has been modified to allow Kerberos 4 authentication. The initial modifications were made by MIT. Our code has likely diverged from the MIT base significantly.

3.13.2 Printing Hardware

Printing Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
spoolers	Solaris	Ultra1; 320MB memory; 36GB spool disk

There are multiple spoolers but all basically configured as above.

3.13.3 Printing Futures

The status quo of the printing system is likely to remain until some hard decisions are made. It is unlikely that any future subsystem would require expensive hardware.

3.14 Zephyr

3.14.1 Zephyr Software

Zephyr is an instant messaging system created at MIT.

3.14.2 Zephyr Hardware

Zephyr Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
zephyr servers	Solaris	Ultra30; 256MB

The hardware was recently upgraded via trickle down. The hardware requirements for this system have not been high and so usually is upgraded via trickle down. It is possible that this would change in the future.

3.14.3 Zephyr Futures

The instant messaging world is still fragmented and, at this point, zephyr serves it role. However, it is possible that standards will work themselves out and we'll want to deploy those systems (either replacing or augmenting the zephyr service).

3.15 Authentication Servers

3.15.1 Authentication Software

The primary authentication service is Kerberos 5. We are using the Heimdal software from KTH.

3.15.2 Authentication Hardware

Authentication Hardware Summary
Type	OS	Config	Purchase FY	Replacement FY	Replacement Cost
KDC	Linux	PowerEdge 1650; 1GB; RAID

3.15.3 Authentication Futures

We don't see any shifts away from Kerberos but additional systems such as KX509, smart cards, etc. are likely in the future.

3.16 Other Administrative Servers

This is a catch-all section for all the other things that haven't been categorized above.

3.16.1 Other Adminsitrative Software

adm - adm allows system administrators to grant authorized users to perform specific system administrative tasks. systype support - for each system type we support, we try to provide the following: a washing machine and a download machine. The purpose of the washing machine is to allow people to compile and test software for that platform. The purpose of the download machine is allow network operating system installation. softdist - This system distributes software installers, licensed and free, for users download and install. Remedy - This system is the incident management system used by the Help Center. license - There are a mix of floating license servers that provide FlexLM, Keyserver, etc. Older hardware tends to be used for this as these servers have a fairly light workload.

3.16.2 Other Adminsitrative Hardware

Other Admin Hardware Summary
Type	OS	Config
EMT/ADM	Solaris	Ultra80; 1GB memory
Linux download	Linux	PE2450 4x36GB mirror; 512GB memory
Linux wash	Linux	PE2450 4x36GB mirror; 512GB memory
Solaris downlaod	Solaris	Ultra80; 1GB memory
Solaris wash	Solaris	Ultra80; 1GB memory
softdist	Linux	PE2550 4x36GB mirror; 512GB memory
Remedy	Solaris	E220R; 1GB memory
license servers	Solaris/Linux	older equipment

4.0 Physical/Environmental

4.1 Locations

4.1.1 Cyert A100

There have been significant improvements the environment of the A100 machine room. With the recent installation of a generator, we should be able to withstand the loss of power. Cooling systems should also have a backup system in place so that the loss of the central chilled water system will not result in overheating.

4.2 Wean Hall

We are in the process of expanding the number of machines in the Computer Science machine room in Wean. The idea is to provide some degree of redundancy in case of failures in Cyert Hall.

The challenge is how to best use this space. Having machines physically distant increases the staff time in having to deal with problems. Also, we want to ensure that the machines in the remote location are actually redundant and do not have dependencies on systems in Cyert.

4.3 Futures

We have started to investigate remote serial console access using Cyclades hardware. We are considering this for both Wean and A100. Intel systems provide additional challenges as a kvm switch is often required and remote KVM have security issues that require a separate private network.

5.0 Networking

This section provides an overview of the networking infrastructure for ISAM servers and from the ISAM point of view. This focus is also only on the networking in Cyert A100.

5.1 Network Hardware

Most of the equipment is plugged into a Cisco 6509 which is uplinked to both cores via fibre Gigabit ethernet providing 2Gbps full-duplex to the cores.

The 6509 has multiple 48 port 10/100 blades. The 6509 also has a single 16 port 1000BaseTX blade.

A Dell 5224 24 port 10/100/1000BaseTX switch is uplinked to the 6509 blade to provide additional copper gigabit ethernet port capacity. It is uplinked via four copper gigabit ethernet blades providing 4Gbps full-duplex to the 6509.

5.2 Network Toplogy

Physically the 6509 and 5224 are located in the back of the machine room and all the networking cabling is run, under the floor, back to this rack.

There are three VLANs of note. The first is VLAN10. The bulk of the machines are on this VLAN. The second is VLAN14. The machines running Windows machines are on this VLAN. The third is VLAN13, the "unsecure" VLAN where Unix servers and other general login machines are located.

There is also a switch that is connected to the same network that the cluster machines are in.

5.1 Network Futures

These are our expectation of the network futures assuming that the Network Group does not significantly change the status quo. This assumption is likely not correct.

Additional gigabit ports - the most straightforward mechanism for providing additional gigabit ports is to put Cisco gigabit blades into the 6509 chassis. However, the problem with this approach is cost. This is why we have deployed a Dell switch. The immediate plan to add additional gigabit ports would be take another 4 ports from the 6509 blade and attach another Dell switch.

10/100 blade phase out - With the reduction in number of machines and the move of more machines to gigabit, we expect to be able to start phasing out 10/100 blades if needed.

Cabling - Because of the port density and placement of the 6509 in the rack, we have the problem where it is impossible to remove and move cables. There have been a number of solutions proposed; however, they all involve non-trivial downtime and/or cost.

Network Debugging and IDS - We should start planning on how to best set things up so that we can sniff the network traffic on the main switch as a debugging tool. Also, we should look at having some form of IDS that watches over "critical" or "secure" machines.

Firewalls - We should re-evaluate and consider if there is any type of firewalling that makes sense.

A100 fault tolerance - We have a significant single point of failure: the A100 switch. If high availability is desired, we should evaluate options on how to add a redundant switch.

TODO

double check sheet from karen about license servers and av server.
sparing?
add current default configurations?
break up email section?
add system count in hw summary
fill in fields in hw summary

ChangeLog

0.6 -  02/23/2003 - fixed typos thanks to jkern
0.5 -  02/23/2003 - incorporated some comments. ran html tidy
0.4 -  02/23/2003 - fixed some typos. added sections 2.3 and 2.4
0.3 -  02/23/2003 - next draft
0.2 -  02/22/2003 - next draft
0.1 -  01/11/2003 - initial incomplete draft.