Carnegie Mellon University 95-702 Organizational Communications and Distributed Object Technologies Lab 2 is due at midnight on October 9 (new due date), 2007. StAX The Streaming API for XML processing ========================================= Most of the projects in this course will involve clients interacting with remote objects in various ways. In this project, however, we will write a client that simply reads data marked up in XML from a remote source. We will use the StAX application programmar interface to read the XML. StAX is a relatively new approach to processing XML. Older approaches include the Document Object Model (DOM) and the Simple API for XML (SAX). Later in this course, we will use the Axis2 web service framework from Apache. This modern framework uses StAX to read and write XML messages and it is another reason for our interest in StAX in this lab. Software Prerequisites ====================== In this project you will need a recent version of Java (see http://java.sun.com/javase/). You will also need Eclipse as well as the two jar files implementing the streaming API for XML (StAX) (see https://sjsxp.dev.java.net/.) These jars are named sjsxp.jar and jsr173_1.0_api.jar and should be included in your Java build path when using Eclipse. Project Description =================== There are four schedules found under the directory www.andrew.cmu.edu/~mm6/95-702/McCarthysSchedule. These schedules are named schedule1.xml, schedule2.xml and so on. There is also an XSDL document called schedule.xsd that contains the grammar for the schedule language. Use a browser to examine one of the four schedules and study the schedule grammar carefully. Write a program in Java called Scheduler that attempts to find a meeting time when n > 1 people are free to meet. Scheduler examines a set of schedules and tries to find a meeting time for each day of the week. If it is able to find a common meeting time then it displays the day and time of the meeting. If it is unable to find such a time it announces that fact for that particular day. It does this for each of the seven days. That is, for each day, a common meeting time is either announced or declared as not possible. The input to the scheduling process will include a minimum meeting time in seconds. If a meeting time of 60 * 60 seconds is required then the scheduler will not generate meeting times for anything less than one hour. Scheduler will read a list of URL's from a local urlList.xml file. It will then fetch an XML document from each of these URL's. It will compute any common meeting times and display a report to the user. Looking for 2 hour meeting times, when applied to all four schedules, my scheduler produces the following output: java Scheduler Loading 4 schedules. **This group can't meet on Monday** **This group can't meet on Tuesday** **This group can't meet on Wednesday** **This group can't meet on Thursday** **This group can't meet on Friday** **This group can't meet on Saturday** Meeting scheduled for a minimum of 7200seconds at 13:0:0:16:0:0 on Sunday. The minimum meeting time will be passed to your program via a command line argument. Suggested Approach ================== It is suggested that you write an object oriented solution to this problem. Clearly, one type of object that we want to represent is a schedule object. Define a Schedule class with a single constructor and a single accessor method. The constructor for a Schedule object will take a URL object as an input parameter. It will then read the entire document using StAX and extract and retain data. These data will be made available to the user via the second method of this class. This second method will be called getAvailable and will have a signature similar to the following: public LinkedList getAvailable(String day) throws Exception There will be a second class called URLList that is used to represent objects holding URLList documents. Its constructor takes a single URL as an input parameter and uses StAX to read the contents of a URLList document. It retains data and makes these data available via two accessor methods. These methods have the following signatures: public int getNumURLs() throws Exception public String getURL(int i) throws Exception Another class that maintains a start and stop time and provides utility methods for time calculations is also called for. Think about what utility methods you will need. The scheduling activity might best be written as a static method of the Scheduler class. As a general rule, don't try to solve the entire problem. Solve smaller problems by hand and focus on building classes and objects that will act as useful tools for the larger problem. Test the classes and objects on smaller instances of the problem. Solve smaller problems first. In your solution you may want to use a simple backtracking search as exemplified in Michael Main's "Bear Game". In the “bear game” we start with an initial number of bears and wish to reach a goal number of bears within a certain number of steps. We are only allowed to perform one of two operations to reach our goal. We may increment the number of bears by a fixed constant (provided at run time) or we may divide the current number of bears in half (if the current number of bears is even.) For example, suppose we start with an initial value of 10 bears and want to reach 5 bears. Suppose too that our increment is 10 and we are allowed to execute 2 steps. Main's algorithm would proceed as follows: 10 --> 20 --> 30 fails need to backtrack 20 --> 10 because 20 is even but this fails too --> 5 we found a solution in one step Here is the code that you might wish to modify to solve the scheduling problem. public static boolean bears (int initial, int goal, int incr, int n) { if (initial == goal) return true; else if (n==0) return false; else if (bears(initial+incr, goal, incr, n-1)) return true; else if (initial % 2 == 0) return bears(initial/2, goal, incr, n-1); else return false; } Simpler Project =============== The most difficult part of the assignment above is the part that requires you to search for a common meeting time using an algorithm like that shown in the "Bears Game". This alternative assignment allows you to skip that part of the problem. Everything else remains the same. For less credit (maximum of 85) you may instead write a solution that simply displays all four schedules. The program would still read the URLList.xml file and the schedule files and would still need to perform StAX parsing. The output would look like the following: java Scheduler Loading 4 schedules. Schedule1.xml Monday 9:00 - 10:00 11:00 - 12:00 Tuesday : Schedule2.xml Monday 9:00 - 10:00 11:00 - 12:00 Tuesday : : Schedule4.xml : Develpoing a simple grammar =========================== It is required that you design an XSDL grammar for the urlList.xml file. See the grammar associated with schedule documents for a guide. Also, you are required to complete the tutorial located at W3C Schools on XML Schema: (see http://www.w3schools.com/schema/default.asp). Knowledge of XSDL will be of help later in the course when we study web services and WSDL. Here is a copy of my urlList.xml file. It is currently configured to provide four schedules to the scheduler. Note the reference to the urlList.xsd document. http://www.andrew.cmu.edu/~mm6/95-702/McCarthysSchedule/schedule1.xml http://www.andrew.cmu.edu/~mm6/95-702/McCarthysSchedule/schedule2.xml http://www.andrew.cmu.edu/~mm6/95-702/McCarthysSchedule/schedule3.xml http://www.andrew.cmu.edu/~mm6/95-702/McCarthysSchedule/schedule4.xml Use the following program to validate urlList.xml files against the grammar that you write. The grammar must require n > 1 schedule URL's. The program below makes use of Xerces. Information on Xerces and the required jars can be found at the following URL: http://xerces.apache.org/xerces-j/. Currently, as far as I am aware, there is no XSDL validation available with StAX parsers. Validate.java is a Java program that validates an XML instance against its schema. The schema document (.xsd) must be in the same directory as the document being validated. The document, however, may be pointed to by a URL. // Validate.java using Xerces import org.xml.sax.InputSource; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; public class Validate extends DefaultHandler { public static boolean valid = true; public void error(SAXParseException exception) { System.out.println("Received notification of a recoverable error." + exception); valid = false; } public void fatalError(SAXParseException exception) { System.out.println("Received notification of a non-recoverable error."+ exception); valid = false; } public void warning(SAXParseException exception) { System.out.println("Received notification of a warning."+ exception); } public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate [filename.xml | URLToFile]"); System.exit (1); } try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature("http://xml.org/sax/features/validation",true); reader.setFeature("http://apache.org/xml/features/validation/schema",true); reader.setErrorHandler(new Validate()); // associate an InputSource object with the file name or URL InputSource inputSource = new InputSource(argv[0]); // go ahead and parse reader.parse(inputSource); } catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); } } Project One Submission requirements =================================== Your submissions will be by digital drop box. Be sure to include documentation in your Java source. Submit one XSD file showing the grammar developed for urlList.xml documents. Submit screenshots showing: A search for one hour meeting times on the first two schedules (schedule1.xml and schedule2.xml). A search for one hour meeting time on the third and fourth schedules (schedule3.xml and schedule4.xml). A search for 30 minute meeting times on all four schedules. Be able to demonstrate and defend your solution if required.