Document revision: 0.5 - 02/24/2003
http://www.andrew.cmu.edu/~wcw/work/servers/2003-q1/
This document provides a summary of the current servers and services managed by Infrastrcture and Middleware Services (ISAM) pf Computing Services. We will discuss hardware, software, and any comments about the future of the services.
In addition, we provide two additional sections on the physical and environments on the machine room from the ISAM perspective.
This document does not discuss development or other non production hardware.
The Solaris operating system has many redeeming features, however the low end Sparc equipment lacks the CPU performance or the features (onboard gigabit ethernet) that are commonly found on the Intel systens.
Running Solaris on Intel is not an overly attractive option as we have concerns about hardware compatibility.
FreeBSD is another option that may be considered.
Consideration of support for Solaris/Intel and FreeBSD would likely become driven by whether or not SPARC remains a viable platform. It could also be that we would look at having these new platforms as a "server only" platform.
Because of budgetary reasons, we have been limiting and delaying the purchase of new sparcs with the hope that Sun will come out with something better.
We would like to limit the age of servers to 4 years. The failure rate increases significantly, especially with hard disk drives, beyond this time. Also, based on current trends, four year old hardware can be more than four times slower than new systems.
Unfortunately, we do not have the budget to do this. As such, when newer hardware is obtained and it replaces hardware less than 5 years old, we will reallocate the older hardware to replace even older hardware.
This "trickle down" strategy is the most cost efficient in terms of raw dollars. The older hardware usually does not have any significant resale value. However, the cost in this strategy is that it tends to expend more staff time.
Here are some general principles we believe in:
Also, many of the SAN characteristics -- being able to easily add additional space to a service, be able to move data from one server to another -- are implemented in the software that we use (AFS, Cyrus Murder).
However, iSCSI promises to, at least, eliminate the cost of the second network. If pricing does drop and the model enables a a distributed storage model, should we start taking advantage of this technology?
IMAP server software is the Cyrus Murder. Details on the system can be found at http://asg.web.cmu.edu/cyrus/ag.html.
The MTA software is Sendmail.
Spam filtering is provided by SpamAssassin (http://www.spamassassin.org) and SIEVE.
Webmail via Squirrellmail (http://www.squirrellmail.org).
IMSP service is provided for IMSP aware clients. IMSP allows users addressbooks and options to be available from any machine. The IMSP software is home grown.
We are considering going to the commercial IMSP server by Cyrusoft. No other changes in core software is expected.
Mailing list software via Majordomo.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
Front-End | Solaris | Ultra80. 2GB memory | |||
Back-End | Solaris | E220R. 2GB memory. qty 2 - 4x36GB RAID5 | |||
IMSP | Solaris | Ultra2170. 1GB memory | |||
Webmail | Linux | Dell GX250 | |||
Mupdate | Linux | Dell PowerEdge 2450 server | |||
MX | Linux | Dell PowerEdge 2450 server | |||
SMTP Submission | Linux | Dell PowerEdge 2450 server | |||
mailing lists | Linux | Dell PowerEdge 2650; 2x73GB mirror; 1GB memory |
One back-end server provides only bboard and netnews.
Three back-ends currently provide 219GB of usable space for user mail storage.
SMTP and MX servers will likely need to be refreshed no later than FY2005/2006 time frame. The addition of virus scanning software may accelerate the refresh as it would put additional CPU demands on these machines.
We plan to increased the number of front-end servers. A cheaper workaround may be to just increase the memory. This should be done in early FY2004.
Disk utilization is around 65%. We plan to increase quotas so an additional back-end server will likely need to be purchased in early FY2004.
Webmail utilization appears to be lower than expected. If there is a spike, additional hardware would be required. However, the hardware being used for webmail is relatively inexpensive and adding additional webmail capacity is fairly straightforward. There are also PHP accelerators that may be purchased that could improve performance.
Mailing list server was just upgraded.
The following items are ideas that still need to be explored:
Oracle/Steltor Corptime. A web interface is also available.
There are some concerns with the Oracle acquisition of Steltor and whether or not the direction that Oracle wishes to take will fit with our direction. However, at this point, there do not appear to be any immediately viable alternatives.
Event calendaring is something we know we want to do and are currently working on requirements. Event calendaring is being bundled with the Portal project and so future discussions on this topic may be under that umbrella.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
Calendar Server | Solaris | Ultra60 2360; 1GB memory; RAID |
The hardware will likely need to be refreshed no later than FY2005. Moving to Linux may be problematic as there is currently no Linux client.
Utilization has not resulted in performance problems.
We need better collection of performance data to track usage and properly estimate when additional upgrades are needed.
At this point, there are no plans to replcate Oracle/Steltor with a different product.
Apache 1.3.x is the current core web server.
Web publishing on www.cmu.edu is done with custom software. Publishing to www.andrew.cmu.edu is done with different custom software. There has been some consideration to moving www.andrew publishing to use the same as www.cmu but there are still details to work out.
Web authentiction is using WebISO.
A web proxy service (to allow access to IP restricted web pages) was written in house.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
www.cmu.edu | Solaris | E220R | |||
www.andrew.cmu.edu | Solaris | E220R | |||
publishing servers | Solaris | E220R | |||
cgi.andrew.cmu.edu | Solaris | Ultra 1 170 | |||
webiso.andrew.cmu.edu | Linux | PowerEdge 2450 | |||
web proxy | Linux | GX260 |
Load on the web servers have generally not exceeded capacity. There are occasional instances where load has spiked was due to poorly written CGIs or people abusing CGI.
The primary web servers will likely need to be refreshed in the FY2005/FY2006 time frame.
The CGI server is most need of immediate replacement.
The CGI service is also tricker and additional hardware may be required as CGIs may run as different entities or have different security characteristics. For example, a password change CGI (not deployed yet) should likely run on its own machine. CGIs that have passwords to access back end services (i.e. sieve) should be separated from other CGIs that do not need this authentication or needs some other authentication.
Note that the web publishing system uses AFS to store and manage the data. The data is also duplicated at least 2x due to hooks for revision control. As such, any significant increases in data will also require AFS capacity to be increased.
If we do not change the publishing system for www.andrew.cmu.edu we should move the data to a RAID unit. While the data is mirrored in AFS space, regenerating an exact copy after a disk failure may not be possible and "bad data" could get out. For example, a user may have updated their AFS web space but was not ready to actually publish. If we force publish the data to recover from a disk failure then we've changed their web state and possibly replaced "good" pages with "bad."
Switching to Linux should be a relatively straightforward option if desired.
The immediate future is tied in with the portal. Some of the hardware refresh may also not be necessary as the portal takes on the role of primary campus web server.
Supporting departmental or user CGIs, PHP, etc. may require additional hardware resources.
With HTTP being stateless, this provides opportunies for having redundant servers. This option should be considered if better availability is a goal.
Oracle 8 is used in a hot standby fashion. If one server fails then another can be brought up quickly.
We provide two oracle instances. The first is for our own use. The other is dedicated to Blackboard.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
Primary Server | Solaris | E220R. 1GB memory | |||
Backup Server | Solaris | E220R. 1GB memory | |||
Blackboard Servers (qty 2) | Solaris | E220R. 1GB memory |
The current systems has 40GB of RAID5 table space and 10GB of log space using 4x18GB disk drives. Utilization is under 40%. Hardware refresh by FY2006 is likely to be needed.
The systems can be expanded by 1GB of physical memory. The RAID unit has 4 empty slots and so another 4 disks of up to 180GB per disk can be added.
No significant changes are currently planned.
There has been some thought to experiment with Oracle and Linux as well as other database systems such as Postgres.
For each supported system type, we provide a set of Unix Servers. The unix servers are used for general unix usage. Currently, it appears that the bulk of the use is for homework assignments requiring Unix and for email (Pine).
The default pool was recently switched to Linux.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
n/a | Linux | Dell GX260 small form factor. 2ghz+. 1GB memory | |||
n/a | Solaris | Ultra 80, Ultra 30 systems |
Hardware is generally on a 3-4 year upgrade cycle where new systems are added to the pool and the oldest systems are phased out.
Linux systems were recently purchased. Next purchase is likely FY2005 or later.
Sparc systems should likely be refreshed in FY2005 as well. However, given that the default pool has been moved to Linux, the usage of the Sparc systems have dropped and we may be able to defer purchase until a later date.
Increased utilization may occur if Clusters decides to remove unix desktops from clusters and requiring those that need unix cycles to connect remotely to a Unix server. This option is unlikely to occur at this time.
The current software in use is OpenLDAP. We have been using CVS nightly updates instead of specific releases.
The OpenLDAP software has not been updated for some time and should be synchronized with the current release this quarter.
Various CGIs are being run on the master server (metadir) to avoid having to authenticate. These CGIs should likely be moved off to better partition the services.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
Master | Solaris | E220R w/RAID | |||
Replicas | Linux | Dell GX260 small form factor. 2ghz+. 1GB memory |
There are currently no performance issues with the existing hardware.
The directory replicas are already using load balanced DNS based on the ldap.andrew.cmu.edu name. However, many clients cache the IP address of a specific machine, thus if that machine fails service becomes unavailable. To avoid this problem, one would need to load balance multiple machines using a single IP address and not just a single name. To do so, we would need purchase a hardware box. The total cost is around $50K.
Increased load on the directory service is expected when we cut over /etc/passwd lookups to use the directory instead of the actual file. This may require additional servers.
We have discussed the possibility of putting an LDAP interface to administrative queries. For example, a LDAP query would result in a quota query being issued to the cyrus servers. If this is to be implemented additional servers would be required.
There are a number of CGI being run on the master directory server. It may be worth separating the services to avoid running too many services on a single machine.
There are plans to master the directory data in Oracle instead of having the data represented only in the LDAP server's database. This will allow for better consistency and flexibility.
The software we are using for AFS service is OpenAFS 1.2.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
Replication | Solaris | Ultra1 2x18GB disks. 320MB memory | |||
User | Linux | Dell PE 2650 4x73GB RAID5. 1GB memory |
User file servers provide the general file service. User home directories, project volumes and other "important" data is stored on this class of servers.
Replication file servers provide software binary images and AFS infrastructure items. This replication is used for availability -- in case a server goes down, another one can be used, and for load balancing -- clients can connect to any server thereby sharing the work load. AFS replication is fairly static. One needs to issue a command for changes to appear. As such, it is most suitable for providing multiple copies of read-only data.
The current hardware goal is to finish standardizing on Linux. There is currently an outstanding float request to do this.
Students do not have sufficient central working space for their projects and so require to carry around Zip disks, floppy disks, or other media. It would seem a productivity boon to provide them with a significant amount of central fileserver space. At this point, it would be good if we could give students 1GB quotas. Assuming a student body of 5000 then that will require 5TB of space if we do not overallocate (usually we overallocate though). That's about 20 Linux servers (using 73GB disks): $120K. Backup costs would be in the neighborhood of another $40K.
Online backup is another possibility. See section 3.11 for details.
Windows file service is not provided as a general service. There are two specific instances of Windows file service: DIST for in-domain (predominately clusters) and NTFS1 for DSP clients.
Software used is the native file sharing mechanism provided by Windows 2000.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
DIST | Win2k | PE2450 4x36GB RAID5; 1GB memory | |||
NTFS1 | Win2k | PE2650 4x735GB RAID5; 1GB memory |
The expansion of Windows file system service is likely to depend on the adoption of AFS clients on windows. If AFS becomes popular, there may only be niche use of Windows file service.
Macintosh file service is similar to Windows File service: there is a download/boot server for clusters and NTFS1 provides AFP service to DSP clients.
An AFS client is available for MacOS X and is in active use in the Clusters.
Cluster download service is provided via the native filesharing from MacOS X.
DSP clients can get AFP service via Windows 2000 Services for Macintosh.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
netboot | MAcOS X | Apple Xserve | |||
NTFS1 | Win2k | PE2450 4x735GB RAID5; 1GB memory |
Similar to Windows?
Windows 2000 is providing the domain infrastructure.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
AD domain | Win2k | PE2450 2x36GB mirror; 1GB memory | |||
ANDREW AD | Win2k | PE2450 2x36GB mirror; 1GB memory |
The future of the windows infrastructure is still being determined.
This section does not cover the Windows/Macintosh backup service.
AFS backups are done by the Stage software: an internally written AFS backup system.
Amanda backs up the local disk of servers. It is scheduling system that wraps the standard unix dump utility. Amanda was developed by the University of Maryland.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
AFS | Linux | PE2650 8x180GB; 1GB memory | |||
Internal Amanda | Solaris | E220R 8x180GB; 1GB memory | |||
Cyrus Amanda | Solaris | E220R 2- 8x180GB; 1GB memory | |||
Cost Recovery Amanda | Solaris | Ultra60 8x180GB; 1GB memory |
Note that all backups are done to RAID.
Archival policies - more flexible policies on what data gets archived and what does not may be needed.
Online backups - Given that we are backing up to RAID, instead of having the backups accessible only by system administrators, one could make the backups directly accessible by the enduser. This way restores could be done 'on demand' by the user without any staff intervention.
The software being used is a modified version of mon with custom integration with the Network Group databases.
Graphing uses RRDTOOL as a base. There is some custom code for managing the graphs known as Hammer. The plan is to move towards Cricket.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
monitor | Linux | Dell GX260; 1GB memory | |||
netsage | Linux | PE1650 2x73GB mirror;; 1GB memory | |||
graphs | Linux | PE2450 4x36GB mirror;; 512GB memory |
Not yet in scope is the whole area of distributed data collection and analysis.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
spoolers | Solaris | Ultra1; 320MB memory; 36GB spool disk |
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
zephyr servers | Solaris | Ultra30; 256MB |
The hardware was recently upgraded via trickle down. The hardware requirements for this system have not been high and so usually is upgraded via trickle down. It is possible that this would change in the future.
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
KDC | Linux | PowerEdge 1650; 1GB; RAID |
Type | OS | Config | Purchase FY | Replacement FY | Replacement Cost |
---|---|---|---|---|---|
EMT/ADM | Solaris | Ultra80; 1GB memory | |||
Linux download | Linux | PE2450 4x36GB mirror; 512GB memory | |||
Linux wash | Linux | PE2450 4x36GB mirror; 512GB memory | |||
Solaris downlaod | Solaris | Ultra80; 1GB memory | |||
Solaris wash | Solaris | Ultra80; 1GB memory | |||
softdist | Linux | PE2550 4x36GB mirror; 512GB memory | |||
Remedy | Solaris | E220R; 1GB memory | |||
license servers | Solaris/Linux | older equipment |
There have been significant improvements the environment of the A100 machine room. With the recent installation of a generator, we should be able to withstand the loss of power. Cooling systems should also have a backup system in place so that the loss of the central chilled water system will not result in overheating.
The challenge is how to best use this space. Having machines physically distant increases the staff time in having to deal with problems. Also, we want to ensure that the machines in the remote location are actually redundant and do not have dependencies on systems in Cyert.
We have started to investigate remote serial console access using Cyclades hardware. We are considering this for both Wean and A100. Intel systems provide additional challenges as a kvm switch is often required and remote KVM have security issues that require a separate private network.
This section provides an overview of the networking infrastructure for ISAM servers and from the ISAM point of view. This focus is also only on the networking in Cyert A100.
Most of the equipment is plugged into a Cisco 6509 which is uplinked to both cores via fibre Gigabit ethernet providing 2Gbps full-duplex to the cores.
The 6509 has multiple 48 port 10/100 blades. The 6509 also has a single 16 port 1000BaseTX blade.
A Dell 5224 24 port 10/100/1000BaseTX switch is uplinked to the 6509 blade to provide additional copper gigabit ethernet port capacity. It is uplinked via four copper gigabit ethernet blades providing 4Gbps full-duplex to the 6509.
Physically the 6509 and 5224 are located in the back of the machine room and all the networking cabling is run, under the floor, back to this rack.
There are three VLANs of note. The first is VLAN10. The bulk of the machines are on this VLAN. The second is VLAN14. The machines running Windows machines are on this VLAN. The third is VLAN13, the "unsecure" VLAN where Unix servers and other general login machines are located.
There is also a switch that is connected to the same network that the cluster machines are in.
These are our expectation of the network futures assuming that the Network Group does not significantly change the status quo. This assumption is likely not correct.
Additional gigabit ports - the most straightforward mechanism for providing additional gigabit ports is to put Cisco gigabit blades into the 6509 chassis. However, the problem with this approach is cost. This is why we have deployed a Dell switch. The immediate plan to add additional gigabit ports would be take another 4 ports from the 6509 blade and attach another Dell switch.
10/100 blade phase out - With the reduction in number of machines and the move of more machines to gigabit, we expect to be able to start phasing out 10/100 blades if needed.
Cabling - Because of the port density and placement of the 6509 in the rack, we have the problem where it is impossible to remove and move cables. There have been a number of solutions proposed; however, they all involve non-trivial downtime and/or cost.
Network Debugging and IDS - We should start planning on how to best set things up so that we can sniff the network traffic on the main switch as a debugging tool. Also, we should look at having some form of IDS that watches over "critical" or "secure" machines.
Firewalls - We should re-evaluate and consider if there is any type of firewalling that makes sense.
A100 fault tolerance - We have a significant single point of failure: the A100 switch. If high availability is desired, we should evaluate options on how to add a redundant switch.
0.6 - 02/23/2003 - fixed typos thanks to jkern 0.5 - 02/23/2003 - incorporated some comments. ran html tidy 0.4 - 02/23/2003 - fixed some typos. added sections 2.3 and 2.4 0.3 - 02/23/2003 - next draft 0.2 - 02/22/2003 - next draft 0.1 - 01/11/2003 - initial incomplete draft.