95-733 Internet Technologies Spring 2009 Homework 2 Due: Tuesday, February 10 Lab Topic: XML and the Extensible Style Sheet Language for Transformations XSLT In this lab we will be programming in a transformation language called XSLT. XSLT is used to transform one XML document into another XML document (with a different structure). In order to write programs in XSLT, we need an XML parser (XSLT programs are XML documents) and an XSLT interpreter. The parser is called "Xerces". The interpreter is called "Xalan" (Xalan uses Xerces). The required jar files for XSLT processing using Xalan are : xalan.jar, xercesImpl.jar, xml-apis.jar and xsltc.jar. These may be downloaded from the Apache Foundation. Part 1 Command Line XSLT ======================== For DOS based machines, create a directory called "bats" and place a batch file called "xalan.bat" in that directory. Place the path to your bats directory in the system path variable. The file xalan.bat will hold the following: java org.apache.xalan.xslt.Process ÐIN %1 -XSL %2 -OUT %3 You will need to have the jar files mentioned above on your classpath before running xalan.bat. For Unix based machines, you will use a script file called xalan with execute permissions. My xalan jar files are saved in /Users/mm6/Applications/xalan. My xalan script is shown below. #!/bin/sh export XALAN_HOME=/Users/mm6/Applications/xalan export CP=$XALAN_HOME/xalan.jar:$XALAN_HOME/xercesImpl.jar:$XALAN_HOME/xml- apis.jar:$XALAN_HOME/xsltc.jar java -classpath $CP org.apache.xalan.xslt.Process -IN $1 -XSL $2 -OUT $3 Testing. The following is an xml file called books.xml that contains data on books. It's a copy of the file found on Page 70 of the XSLT Programmer's Reference by Michael Kay. Nigel Rees Sayings of the Century 8.95 Evelyn Waugh Sword of Honour 12.99 Herman Melville Moby Dick 8.99 J. R. R. Tolkien The Lord of the Rings 22.99 We would like to transform this file into an HTML document as shown here (result.html):

A list of books

1 Nigel Rees Sayings of the Century 8.95
2 Evelyn Waugh Sword of Honour 12.99
3 Herman Melville Moby Dick 8.99
4 J. R. R. Tolkien The Lord of the Rings 22.99
In order to carry out this transformation, we will use the XSLT programming language. While it is the case that XSLT is Turing complete, that is, we can solve a wide variety of problems using XSLT, it is especially good at performing XML transformations. Our first XSLT program looks like this (booklist.xsl):

A list of books

Place the two files (books.xml and booklist.xsl) into a directory and make sure that xalan is working properly by running the following command. The output file should look like result.html. xalan books.xml booklist.xsl result.html When debugging XSLT programs, it is often much more helpful to view your output in an editor like Notepad rather than to view your output in a browser like Netscape or IE or Safari. Look at the HTML document in A browser only after you are satisfied with the way it looks in Notepad. The browser view is often quite deceiving and makes a poor debugging tool. Part 2 Handling Namespaces ========================== Many documents make use of XML namespaces to remove ambiguity. The following is our books example with a namespace assigned to the namespace prefix p. Nigel Rees Sayings of the Century 8.95 Evelyn Waugh Sword of Honour 12.99 Herman Melville Moby Dick 8.99 J. R. R. Tolkien The Lord of the Rings 22.99 The same XSLT program that we wrote above needs to be adapted to handle these namespace qualified elements. Be sure to test this new program against the books file with namespaces.

A list of books

Part 3 Running Xalan from within Java ============================================ While command line xalan makes a very nice tool, it is often necessary to make calls for XSLT processing from within other programs. Here is a Java program that performs the same transformation as above. But this time the transformation is performed under application program control. This program would be executed with the command: java ProduceHTML books.xml booklist.xsl result.html // ProduceHTML.java is a simple program that demonstrates how XSLT programs // can be executed from within Java. import java.io.IOException; import java.io.OutputStream; import java.io.FileInputStream; import java.io.FileOutputStream; import javax.xml.transform.Source; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.Result; import javax.xml.transform.TransformerFactory; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerException; public class ProduceHTML { public static void main(String a[] ) { Source xmlDoc, xslDoc; Result result; try { FileInputStream xml = new FileInputStream(a[0]); FileInputStream xsl = new FileInputStream(a[1]); FileOutputStream out = new FileOutputStream(a[2]); xmlDoc = new StreamSource(xml); xslDoc = new StreamSource(xsl); result = new StreamResult(out); TransformerFactory factory = TransformerFactory.newInstance(); Transformer trans = factory.newTransformer(xslDoc); trans.transform(xmlDoc,result); } catch(TransformerException e) { System.out.println("Transformer Probem" + e); } catch(IOException e) { System.out.println("An I/O problem"); } } } Part 4. Running XSLT from within a Java servlet. ================================================ Suppose we want to use a local stylesheet called XSLTransformerCode.xsl to process a remote XML file at some URL. Using Eclipse with the Tomcat plugin, add the xsl stylesheet file to the project so that the file is under the project but not under any subdirectory of the project. A doGet method might have the following code: PrintWriter out = response.getWriter(); // get the xsl stored in this project ServletContext context = getServletContext(); InputStream xsl = (InputStream) (context.getResourceAsStream("/XSLTransformerCode.xsl")); // We need two source objects and one result // get an external xml document using a url in a // string format Source xmlDoc = new StreamSource(urlAsString); Source xslDoc = new StreamSource(xsl); Result result = new StreamResult(out); // Prepare to transform TransformerFactory factory = TransformerFactory.newInstance(); Transformer trans = factory.newTransformer(xslDoc); trans.transform(xmlDoc,result); Part 5. An RDF document from the W3C ==================================== The following document was accessed from the W3C's main web page by clicking on the syndicate link. It is meant to be read by a news reader. We will use it as our input file for the homework problems below. World Wide Web Consortium - Web Standards Leading the Web to Its Full Potential... http://www.w3.org/ 2009-01-20 Future of Social Networking Workshop Begins 2009-01-15: Today began a 2-day Workshop on the Future of Social Networking, organized by W3C to explore the landscape of social networking technologies. Participants submitted 72 position papers on a wide range of topics regarding the growth and future of social networking, including, but not limited to, the mobile context. The meeting is hosted in Barcelona, Spain by Universitat Politècnica de Catalunya and ReadyPeople. Many thanks to the hosts and to Silver Sponsors Ayuntamiento de Zaragoza, Flock, and Peperoni for their support. (Permalink) http://www.w3.org/News/2009#item3 2009-01-15 Use Cases and Requirements for Ontology and API for Media Object 1.0 2009-01-20: The Media Annotations Working Group has published the First Public Working Draft of Use Cases and Requirements for Ontology and API for Media Object 1.0. This document specifies use cases and requirements as an input for the development of the "Ontology for Media Object 1.0" and the "API for Media Object 1.0". The ontology will be a simple ontology to support cross-community data integration of information related to media objects on the Web. The API will provide read access and potentially write access to media objects, relying on the definitions from the ontology. Learn more about the Video in the Web Activity. (Permalink) http://www.w3.org/News/2009#item5 2009-01-20 W3C Invites Implementations of CURIE Syntax 1.0 2009-01-16: The XHTML2 Working Group invites implementation of the Candidate Recommendation of CURIE Syntax 1.0. This document defines a generic, abbreviated syntax for expressing URIs. This syntax is intended to be used as a common element by language designers. The intended audience for this document is Language designers, not the users of those Languages. Track implementations in an ongoing implementation report and learn more about the HTML Activity. (Permalink) http://www.w3.org/News/2009#item4 2009-01-16 W3C Advisory Committee Elects TAG Participants 2009-01-13: The W3C Advisory Committee has elected John Kemp (Nokia), Larry Masinter (Adobe), and T.V. Raman (Google) to the W3C Technical Architecture Group (TAG). Continuing TAG participants are Ashok Malhotra (Oracle), Noah Mendelsohn (IBM, appointed), Jonathan Rees (Science Commons, appointed), and Henry Thompson (U. of Edinburgh). The Director is expected to appoint one individual as well. The mission of the TAG is to build consensus around principles of Web architecture and to interpret and clarify these principles when necessary, to resolve issues involving general Web architecture brought to the TAG, and to help coordinate cross-technology architecture developments inside and outside W3C. (Permalink) http://www.w3.org/News/2009#item2 2009-01-13 W3C Talks in January 2009-01-05: Browse W3C presentations and events also available as an RSS channel. (Permalink) http://www.w3.org/News/2009#item1 2009-01-05 Element Traversal Specification Is a W3C Recommendation 2008-12-22: The Web Applications Working Group has published the W3C Recommendation of Element Traversal Specification. This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides an attribute to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint. Learn more about the Rich Web Client Activity. (Permalink) http://www.w3.org/News/2008#item222 2008-12-22 Web IDL Draft Published 2008-12-22: The Web Applications Working Group has published the Working Draft of Web IDL. This document defines an interface definition language, Web IDL, that can be used to describe interfaces that are intended to be implemented in web browsers. Web IDL is an IDL variant with a number of features that allow the behavior of common script objects in the web platform to be specified more readily. How interfaces described with Web IDL correspond to constructs within ECMAScript and Java execution environments is also detailed. Learn more about the Rich Web Client Activity. (Permalink) http://www.w3.org/News/2008#item221 2008-12-22 SVG Tiny 1.2 Advances State of the Art for Web Graphics 2008-12-22: Creating beautiful and accessible interactive content was made easier today with the release of the Scalable Vector Graphics (SVG) Tiny 1.2 Recommendation. Already implemented and deployed in mobile phones, media centers, and browsers around the world, this open standard allows authors to build documents and interfaces for the Web, with open-source and commercial authoring tools that output open, reusable content. Searchable, internationalized text and user-created metadata bring the Semantic Web to graphics, and improve the experience of users everywhere, while easier programming interfaces put the power in the hands of developers. A test suite helps to ensure interoperable SVG content in modern Web browsers, making it easier than ever to develop and deploy the right look and feel. Read the testimonials and start creating content today. Learn more about the Graphics Activity. (Permalink) http://www.w3.org/News/2008#item223 2008-12-22 PART 6 Introductory XSLT Programming ==================================== (1) 10 Points. Using command line XSLT, write an XSLT program that displays the contents of the title, description and link fields that are direct children of the channel element. Your output will be marked up as HTML and will appear in a browser as follows: W3C RDF Document * World Wide Web Consortium * Leading the Web to Its Full Potential... * http://www.w3.org/ (2) 10 Points. Using command line XSLT, write an XSLT program that displays the number of RDF list items that appear in the document. You must use the XSLT count function in your solution. Your output will be marked up as HTML and will appear in a browser as follows: Counting RDF list items 8 (3) 10 Points. Using command line XSLT, write an XSLT program that displays the content of each title element that is inside an item element. Your output will be marked up as HTML and will appear in a browser as follows (unsigned list elements are shown with an asterisk): Titles * Future of Social Networking Workshop Begins * Use Cases and Requirements for Ontology and API for Media Object 1.0 * W3C Invites Implementations of CURIE Syntax 1.0 * W3C Advisory Committee Elects TAG Participants * W3C Talks in January * Element Traversal Specification Is a W3C Recommendation * Web IDL Draft Published * SVG Tiny 1.2 Advances State of the Art for Web Graphics (4) 10 Points. Using command line XSLT, write an XSLT program that displays the content of each title element that is inside an item element. Your output will be marked up as HTML and will appear in a browser with the titles underlined as hypertext links. If the user clicks on a link the browser will fetch the associated document that is pointed to by the link element. The output on the browser will appear as follows (hypertext links are shown with an underline). Titles (with links) Future of Social Networking Workshop Begins ------------------------------------------- : : SVG Tiny 1.2 Advances State of the Art for Web Graphics ------------------------------------------------------- (5) 50 Points. Write a JSP page that asks the user to enter a topic from a list of topics shown in a drop down list. The three topics will be Business, Technology and World News. Once a selection is made your browser will make a call on a Java servlet passing along the topic. The servlet will fetch the appropriate RSS 2.0 feed from the NY Times web site. It will apply a style sheet that will generate HTML to the browser. The HTML display will show each news title of each item. Each news title will be displayed as a link. The user will be able to click links to visit the associated page. Note that there are no namespaces used in RSS 2.0. New York Times feeds may be found at http://www.nytimes.com/services/xml/rss/ (6) 5 Points. Add a source of feeds drop down box to the application that you built in question 5. The user will be able to select a topic and a source. At a minimum, you will need to provide for three sources. In my solution, I used the BBC, the New York Times and the Sydney Morning Herald. The BBC feeds are available from: http://news.bbc.co.uk/1/hi/help/3223484.stm The Sydney Morning Herald feeds are available from: http://www.smh.com.au/rsschannels/ (7) Between 0 and 5 points. Add some cool feature to the application you built in question 6 and demonstrate it in class.