Andrew Web - Publishing Subsystem Requirements

by Dan McCarriar

1. Introduction

This document describes the requirements for the publishing subsystem of the Andrew Web System. The publishing subsystem is one part of an overall redevelopment of the Andrew Web System, a collection of web services made available to the Carnegie Mellon University community.

The current publishing mechanism varies between the two production web servers. Publishers using the main web site at http://www.cmu.edu/ must transfer their content to a specified AFS directory, and use a telnet-based interface to release their content to a "test" environment, which is accessible at http://www.cmu.edu:8001/. After the content is released to the test environment, the publisher must use the telnet interface again to move the content to the "production" environment. Releases to production are not immediate - the content is copied from AFS to the local disk of the web server via an automated process that runs at 2:12am daily.

On http://www.andrew.cmu.edu/, which hosts content for courses, organizations, and individual users, the publishing model is somewhat different. Publishers still transfer their content to an AFS directory, but then visit a web site to release their content. There is no "test" environment; pages are visible on the production server immediately after they're released. The release mechanism is a simple form, and there are both authenticated and unauthenticated options available.

Outside of the publishing mechanism, the current system does not provide any web-based tools for link validation or managing access to collections. The quotas available for individual collections are relatively small. There is no mechanism for content expiration - for example, an individual user's web pages could remain accessible on the production server long after that user has left the university.

2. Vision Statement

One fundamental need of all web publishers is a way to manage content on a web server. To do this effectively, publishers need not only a way to transfer content to a web server, but also a robust set of tools to support different models of publishing. Individuals and small workgroups may only need space in a file system that is automatically published by a web server. Larger workgroups often require tools to maintain different versions of content and support concurrent authoring, as well as a means to preview content on a web server before deploying it to the public. All publishers need to have an interface to publishing tools such as link validation, and a means to grant individual users access to manage content.

From service provider point of view, it is desirable to separate the "production" web servers where content is viewed by the public from the "development" web servers where content authors perform their development and testing. The web servers where content is published should be easily recoverable, and the content on those servers should be able to automatically "expire" on a specified date or when an individual publisher's account expires.

The publishing subsystem is designed to provide Andrew Web System customers a consistent way to manage content on the Andrew Web System's production web servers, www.cmu.edu and www.andrew.cmu.edu. For server administrators, the publishing subsystem will provide a mechanism for easy administration of web server content, configuration management, and a means for rapid recovery of each production web server in the event of a machine failure.

3. System Overview

The Publishing subsystem consists of the following components:

An important concept of the publishing subsystem is that of a web "collection". A collection is simply space in a file system to which one or more web publishers have access to place HTML pages, images, and other web-related content. Each collection maps to a particular URL on a production web server. For example, an individual user's collection might be accessible at:

http://www.andrew.cmu.edu/user/joe

A collection for the Basket Weaving department, which may have many people that participate in publishing the department's web site, could map to:

http://www.cmu.edu/basketweaving

4. Use-cases and User-scenarios

The publishing subsystem is intended for the following classes of users:

  1. Site administrators use the publishing subsystem to manage all content on the production web servers. Common tasks include deleting and archiving content and managing user access to collections.
  2. Web publishers use the publishing subsystem to place their content on production web servers, move content between the staging server and the production server, and manage access to their collections.

Publishing Content (Simple)

Wally Webmaster has a very simple web site for his department. He creates and tests his pages on a local computer, and uploads them via FTP to a specified location on a file server. After he uploads his pages, they are immediately available at his department's URL: http://www.cmu.edu/wallysdepartment.

Publishing Content (With Staging Server)

Wanda Webmaster is part of a large workgroup that maintains the university's front page. The front page has to be perfect, as it gives visitors from outside the university their first impression of the university. Wanda uploads her files via kFTP to a specified location on a file server. Her pages are then immediately available on the staging server, where she can have her workgroup proofread the pages and test all of the links before the pages are released to the public. After her workgroup has looked at the new pages, Wanda goes to the publishing page on the staging server, where she authenticates by entering her user name and password. She is presented with a list of the web collections she is permitted to manage, and chooses the "homepage" collection. She notices from the release logs that a colleague has also uploaded some new pages to the homepage collection since she did her upload. Wanda checks with her colleague to ensure that the pages have been proofread, and then clicks the "publish" button to move the new pages in the "homepage" collection from the staging server to the production server, where they will be viewable by the public.

Requesting a New Collection

Paul Professor is teaching a class during the spring semester. Before the semester begins, he goes to the web publishing home page to request that a new web collection be created for his course. At the site, he enters various information about the course, including the name and course number. He also enters the user IDs of his teaching assistants to give them access to manage web pages in the new collection. Finally, he enters a date a few weeks after the end of the semester on which the collection will expire. Rather than have the collection be deleted on that date, he chooses the "archive" option, which will cause his course content to be moved to a private URL where it will be accessible only to the maintainers of the collection.

Managing User Access to Collections

Michelle Manager has just hired a new webmaster for her department. On the webmaster's first day of work, Michelle visits the web publishing home page and enters her user name and password. She only has access to maintain one collection, so she is immediately directed to the management page for her department's collection. She clicks the "Add User" button and enters the user ID of her new webmaster. The new webmaster then has access to manage and release content in the department's collection.

Link Checking

Rhonda Researcher has just hired a new graduate student to maintain her web site. Before turning the site over to the student, she visits the web publishing home page to run the link validation tool on her collection. The tool provides her with a list of three broken links, which she fixes. She then configures the tool to run automatically once per week, and email the student if it finds any incorrect links. To cover periods of time such as the summer break where the student might be out of town, she also configures the link validation tool to notify her via email if any links remain broken for longer than two weeks.

Site-Wide Administration

Andy Administrator is responsible for maintaining content on the two production web servers. He regularly goes to an administrative publishing page to manage collections. Today when he logs in to the page, he is notified that the accounts all of the users with access to publish the "ultimatesoccer" collection, which belongs to a student organization, have been disabled because the students have recently graduated. Since nobody can now access this collection, he researches the organization and finds out that there are no remaining members. Andy decides to remove the collection from the production web servers. Andy then adds a user to the access list for another collection because of a request he received via email.

5. Related Links

6.1 Requirements: General Publishing

6.2 Requirements: Simple Publishing Model

6.3 Requirements: Advanced Publishing Model

6.4 Requirements: Content Management

6.5 Requirements: Content Management Tool User Interface

6.6 Requirements: Server Configuration, Reliability, and Monitoring

7. Revision History

Document Revision # Action Taken, Notes When? By Whom?
0.1 Creation 02/23/2001 Dan McCarriar
0.2 Added motivation paragraph, tweaked and rearranged requirements. 02/28/2001 Dan McCarriar
1.0 Modified section numbering format; added, removed, and changed requirements based on 3/23/2001 meeting. 04/04/2001 Dan McCarriar
1.1 Added requirement 100.15. 04/10/2001 Dan McCarriar


dlm@cmu.edu