95-733 Internet Technologies Spring 2008 Homework 2 Due: Tuesday, February 12 Lab Topic: XML and the Extensible Style Sheet Language for Transformations XSLT In this lab we will be programming in a transformation language called XSLT. XSLT is used to transform one XML document into another XML document (with a different structure). In order to write programs in XSLT, we need an XML parser (XSLT programs are XML documents) and an XSLT interpreter. The parser is called "Xerces". The interpreter is called "Xalan" (Xalan uses Xerces). The required jar files for XSLT processing using Xalan are : xalan.jar, xercesImpl.jar, xml-apis.jar and xsltc.jar. These may be downloaded from the Apache Foundation. Part 1 Command Line XSLT ======================== For DOS based machines, create a directory called "bats" and place a batch file called "xalan.bat" in that directory. Place the path to your bats directory in the system path variable. The file xalan.bat will hold the following: java org.apache.xalan.xslt.Process ÐIN %1 -XSL %2 -OUT %3 You will need to have the jar files mentioned above on your classpath before running xalan.bat. For Unix based machines, you would use a script file called xalan with execute permissions. My xalan jar files are saved in /Users/mm6/Applications/xalan #!/bin/sh export XALAN_HOME=/Users/mm6/Applications/xalan export CP=$XALAN_HOME/xalan.jar:$XALAN_HOME/xercesImpl.jar:$XALAN_HOME/xml- apis.jar:$XALAN_HOME/xsltc.jar java -classpath $CP org.apache.xalan.xslt.Process -IN $1 -XSL $2 -OUT $3 Testing. The following is an xml file called books.xml that contains data on books. It's a copy of the file found on Page 70 of the XSLT Programmer's Reference by Michael Kay. Nigel Rees Sayings of the Century 8.95 Evelyn Waugh Sword of Honour 12.99 Herman Melville Moby Dick 8.99 J. R. R. Tolkien The Lord of the Rings 22.99 We would like to transform this file into an HTML document as shown here (result.html):

A list of books

1 Nigel Rees Sayings of the Century 8.95
2 Evelyn Waugh Sword of Honour 12.99
3 Herman Melville Moby Dick 8.99
4 J. R. R. Tolkien The Lord of the Rings 22.99
In order to carry out this transformation, we will use the XSLT programming language. While it is the case that XSLT is Turing complete, that is, we can solve a wide variety of problems using XSLT, it is especially good at performing XML transformations. Our first XSLT program looks like this (booklist.xsl):

A list of books

Place the two files (books.xml and booklist.xsl) into a directory and make sure that xalan is working properly by running the following command. The output file should look like result.html. xalan books.xml booklist.xsl result.html When debugging XSLT programs, it is often much more helpful to view your output in an editor like Notepad rather than to view your output in a browser like Netscape or IE or Safari. Look at the HTML document in A browser only after you are satisfied with the way it looks in Notepad. The browser view is often quite deceiving and makes a poor debugging tool. Part 2 Handling Namespaces ========================== Many documents make use of XML namespaces to remove ambiguity. The following is our books example with a namespace: Nigel Rees Sayings of the Century 8.95 Evelyn Waugh Sword of Honour 12.99 Herman Melville Moby Dick 8.99 J. R. R. Tolkien The Lord of the Rings 22.99 The same XSLT program that we wrote above needs to be adapted to handle these namespace qualified elements.

A list of books

Part 3 Running Xalan from within Java ============================================ While command line xalan makes a very nice tool, it is often necessary to make calls for XSLT processing from within other programs. Here is a Java program that performs the same transformation as above. But this time the transformation is performed under application program control. This program would be executed with the command: java ProduceHTML books.xml booklist.xsl result.html // ProduceHTML.java is a simple program that demonstrates how XSLT programs // can be executed from within Java. import java.io.IOException; import java.io.OutputStream; import java.io.FileInputStream; import java.io.FileOutputStream; import javax.xml.transform.Source; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.Result; import javax.xml.transform.TransformerFactory; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerException; public class ProduceHTML { public static void main(String a[] ) { Source xmlDoc, xslDoc; Result result; try { FileInputStream xml = new FileInputStream(a[0]); FileInputStream xsl = new FileInputStream(a[1]); FileOutputStream out = new FileOutputStream(a[2]); xmlDoc = new StreamSource(xml); xslDoc = new StreamSource(xsl); result = new StreamResult(out); TransformerFactory factory = TransformerFactory.newInstance(); Transformer trans = factory.newTransformer(xslDoc); trans.transform(xmlDoc,result); } catch(TransformerException e) { System.out.println("Transformer Probem" + e); } catch(IOException e) { System.out.println("An I/O problem"); } } } Part 4. Running XSLT from within a Java servlet. ================================================ Suppose we want to use a local stylesheet called XSLTransformerCode.xsl to process a remote XML file at some URL. Using Eclipse with the Tomcat plugin, add the xsl stylesheet file to the project so that the file is under the project but not under any subdirectory of the project. A doGet method might have the following code: PrintWriter out = response.getWriter(); // get the xsl stored in this project ServletContext context = getServletContext(); InputStream xsl = (InputStream) (context.getResourceAsStream("/XSLTransformerCode.xsl")); // We need two source objects and one result // get an external xml document using a url in a // string format Source xmlDoc = new StreamSource(urlAsString); Source xslDoc = new StreamSource(xsl); Result result = new StreamResult(out); // Prepare to transform TransformerFactory factory = TransformerFactory.newInstance(); Transformer trans = factory.newTransformer(xslDoc); trans.transform(xmlDoc,result); Part 5. An RDF document from the W3C ==================================== The following document was accessed from the W3C's main web page by clicking on the syndicate link. It is meant to be read by a news reader. We will use it as our input file for the homework problems below. World Wide Web Consortium Leading the Web to Its Full Potential... http://www.w3.org/ 2008-01-22 W3C Publishes HTML 5 Draft, Future of Web Content 2008-01-22: W3C today published an early draft of HTML 5, a major revision of the markup language for the Web. The HTML Working Group is creating HTML 5 to be the open, royalty-free specification for rich Web content and Web applications. "HTML is of course a very important standard," said Tim Berners-Lee, author of the first version of HTML and W3C Director. "I am glad to see that the community of developers, including browser vendors, is working together to create the best possible path for the Web." New features include APIs for drawing two-dimensional graphics and ways to embed and control audio and video content. HTML 5 helps to improve interoperability and reduce software costs by giving precise rules not only about how to handle all correct HTML documents but also how to recover from errors. Discover other new features, read the press release, and learn more about the future of HTML. (Permalink) http://www.w3.org/News/2008#item8 2008-01-22 Relationship Between Mobile Web and Web Content Accessibility (First Public Working Draft) 2008-01-22: The Mobile Web Best Practices Working Group and the WAI Education and Outreach Working Group have published the First Public Working Draft of Relationship Between Mobile Web Best Practices 1.0 and Web Content Accessibility Guidelines. See the announcement email. http://www.w3.org/News/2008#item11 2008-01-22 Document Object Model Activity Closed 2008-01-22: W3C's Document Object Model (DOM) Activity is now closed. The Document Object Model Working Group closed in the early 2004 after the completion of the DOM Level 3 Recommendations. Since then, several W3C Working Groups have taken the lead in maintaining and continuing to develop standard APIs for the Web; these include the HTML, SVG, CSS, and WebAPI Working Groups. W3C will continue to develop APIs in various Working Groups. Learn more about achievements of those participating as part of the DOM Activity on the DOM Activity Statement. (Permalink) http://www.w3.org/News/2008#item10 2008-01-22 W3C Advisory Committee Elects TAG Participants 2008-01-22: The W3C Advisory Committee has elected Ashok Malhotra (Oracle), T.V. Raman (Google), and Henry Thompson (University of Edinburgh) to the W3C Technical Architecture Group (TAG). Continuing TAG participants are Noah Mendelsohn (IBM), David Orchard (BEA), Jonathan Rees (Science Commons), Norm Walsh (Sun), and Stuart Williams (HP), who co-Chairs the TAG with Tim Berners-Lee. The mission of the TAG is to build consensus around principles of Web architecture and to interpret and clarify these principles when necessary, to resolve issues involving general Web architecture brought to the TAG, and to help coordinate cross-technology architecture developments inside and outside W3C. (Permalink) http://www.w3.org/News/2008#item9 2008-01-22 SPARQL Standard Opens Data on the Web 2008-01-15: Today, the World Wide Web Consortium made it easier to share and reuse data across application, enterprise, and community boundaries with the publication of three new Semantic Web standards for SPARQL (pronounced "sparkle"). SPARQL is the query language for the Semantic Web (see Semantic Web use cases). SPARQL queries hide the details of data management, which lowers costs and increases robustness of data integration on the Web. "Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL," explained Tim Berners-Lee, W3C Director. There are already 14 implementations of the standard, which is comprised of three W3C Recommendations: SPARQL Query Language for RDF, SPARQL Protocol for RDF, and SPARQL Query Results XML Format. Read the press release, testimonials and learn more about the Semantic Web Activity. (Permalink) http://www.w3.org/News/2008#item6 2008-01-15 W3C Invites Implementations of SMIL 3.0 (Candidate Recommendation) 2008-01-15: The SYMM Working Group has published the Candidate Recommendation of Synchronized Multimedia Integration Language (SMIL 3.0), an XML-based language that allows authors to create interactive multimedia presentations. Using SMIL 3.0, an author can describe the temporal behavior of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen. The Working Group is building a test suite help ensure interoperable implementation. Learn more about W3C work on Synchronized Multimedia (Permalink) http://www.w3.org/News/2008#item7 2008-01-15 Service Modeling Language 1.1 Drafts 2008-01-14: The Service Modeling Language (SML) Working Group has published the third Working Drafts of Service Modeling Language, Version 1.1 and Service Modeling Language Interchange Format Version 1.1. The former defines the SML 1.1, intended to model complex services and systems, including their structure, constraints, policies, and best practices. The latter defines the SML 1.1 interchange format, designed to ensure accurate and convenient interchange of the documents that make up an SML model. Learn more about the Extensible Markup Language (XML) Activity. (Permalink) http://www.w3.org/News/2008#item5 2008-01-14 Last Call: SMIL Timesheets 1.0 2008-01-10: The SYMM Working Group has published the Last Call Working Draft of SMIL Timesheets 1.0; this is also the First Public Working Draft. This document defines an XML timing language that makes SMIL 3.0 element and attribute timing control available to a wide range of other XML languages. This language allows SMIL timing to be integrated into a wide variety of a-temporal languages, even when several such languages are combined in a compound document. Because of its similarity with external style and positioning descriptions in the Cascading Style Sheet (CSS) language, this functionality has been termed SMIL Timesheets. Comments are welcome through 15 February. Learn more about W3C work on Synchronized Multimedia. (Permalink) http://www.w3.org/News/2008#item4 2008-01-10 W3C Welcomes Review of Three OWL 1.1 First Public Drafts 2008-01-08: The OWL Working Group has published the First Public Working Draft of three Web Ontology Language (OWL) 1.1 specifications: Structural Specification and Functional-Style Syntax, Model-Theoretic Semantics, and Mapping to RDF Graphs. OWL is used to define Semantic Web vocabularies. Together, these new specifications extend the W3C OWL Web Ontology Language 1.0 with a small but useful set of features that have been requested by users, for which effective reasoning algorithms are now available, and that OWL tool developers are willing to support. The three specifications cover, respectively, the syntax, semantics, and mapping to RDF of OWL 1.1 ontologies. Learn more about the W3C Semantic Web Activity. (Permalink) http://www.w3.org/News/2008#item3 2008-01-08 XHTML Access Module; Comments Welcome 2008-01-07: The XHTML2 Working Group has published the First Public Working Draft of XHTML Access Module. This document is intended to help make XHTML-family markup languages more effective at supporting the needs of the accessibility community. It does so by providing a generic mechanism for defining the relationship between document components and well-known accessibility taxonomies. Learn more about the HTML Activity. (Permalink) http://www.w3.org/News/2008#item2 2008-01-07 W3C Talks in January 2008-01-03: Browse W3C presentations and events also available as an RSS channel. (Permalink) http://www.w3.org/News/2008#item1 2008-01-03 Last Call: Selectors API; New Draft of DOM Level 3 Events 2007-12-21: The Web API Working Group has published the Last Call Working Draft of Selectors API. Selectors, which are widely used in CSS, are patterns that match against elements in a tree structure. The Selectors API specification defines methods for retrieving Element nodes from the Document Object Model (DOM) by matching against a group of selectors. Comments are welcome through 06 January 2008. The Working Group has also published a Working Draft of DOM Level 3 Events, a generic platform- and language-neutral event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. Learn more about the Rich Web Client Activity. (Permalink) http://www.w3.org/News/2007#item270 2007-12-21 W3C Invites Implementations of DCCI 1.0 (Candidate Recommendation); first draft of Delivery Context Ontology available 2007-12-21: The Ubiquitous Web Applications Working Group has published the Candidate Recommendation of Delivery Context: Client Interfaces (DCCI) 1.0. This document defines platform and language neutral programming interfaces that provide Web applications access to a hierarchy of dynamic properties representing device capabilities, configurations, user preferences and environmental conditions. In addition, the Working Group has published the First Public Working Draft of Delivery Context Ontology, which provides a formal model for the delivery context which other specifications can reference normatively. Learn more about the Ubiquitous Web Applications Activity. (Permalink) http://www.w3.org/News/2007#item269 2007-12-21 Last Call: SVG Print 1.2 Language, Primer 2007-12-21: The SVG Working Group has published Last Call Working Drafts of SVG Print 1.2, Part 2: Language and SVG Print 1.2, Part 1: Primer. The former defines features of the Scalable Vector Graphics (SVG) Language that are specifically for printing environments; the latter provides guidelines on how to use the print specification with SVG 1.2 Tiny and SVG 1.2 Full modules. Comments on both specifications are welcome through 08 February. Learn more about the Graphics Activity. (Permalink) http://www.w3.org/News/2007#item268 2007-12-21 Device Description Repository Core Vocabulary 2007-12-21: The Mobile Web Initiative Device Description Working Group has published the First Public Working Draft of Device Description Repository Core Vocabulary. This document describes the Device Description Repository Core Vocabulary for Content Adaptation, that is, the properties that are considered essential for adaptation of content in the mobile Web. Its intended use is to define a baseline vocabulary for implementations of the Device Description Repository (DDR). Learn more about the Mobile Web Initiative Activity. (Permalink) http://www.w3.org/News/2007#item271 2007-12-21 Public Virtual Seminar on Web Issues to be Organized by W3C Spain Office 2007-12-20: On 23 January 2008, the W3C Spain Office will hold a virtual seminar where W3C staff will discuss the latest news in Web topics such as e-Government, Video on the Web, and Mobile Web in developing countries; see the program for the full list of topics and speakers. The public is invited to participate over the Internet in the seminar, which will take place in English from 15:00 to 18:00 (CET); see the participation instructions. The seminar, hosted by UPM, will also be broadcast online. Learn more about the W3C Spain Office. (Permalink) http://www.w3.org/News/2007#item267 2007-12-20 PART 6 Introductory XSLT Programming ==================================== (1) 10 Points. Using command line XSLT, write an XSLT program that displays the contents of the title, description and link fields that are direct children of the channel element. Your output will be marked up as HTML and will appear in a browser as follows: W3C RDF Document * World Wide Web Consortium * Leading the Web to Its Full Potential... * http://www.w3.org/ (2) 10 Points. Using command line XSLT, write an XSLT program that displays the number of RDF list items that appear in the document. You must use the XSLT count function in your solution. Your output will be marked up as HTML and will appear in a browser as follows: Counting RDF list items 16 (3) 10 Points. Using command line XSLT, write an XSLT program that displays the content of each title element that is inside an item element. Your output will be marked up as HTML and will appear in a browser as follows (unsigned list elements are shown with an asterisk): Titles * W3C Publishes HTML 5 Draft, Future of Web Content * Relationship Between Mobile Web and Web Content Accessibility (First Public Working Draft) : : * Public Virtual Seminar on Web Issues to be Organized by W3C Spain Office (4) 10 Points. Using command line XSLT, write an XSLT program that displays the content of each title element that is inside an item element. Your output will be marked up as HTML and will appear in a browser with the titles underlined as hypertext links. If the user clicks on a link the browser will fetch the associated document that is pointed to by the link element. The output on the browser will appear as follows (hypertext links are shown with an underline). Titles W3C Publishes HTML 5 Draft, Future of Web Content ------------------------------------------------- : : Public Virtual Seminar on Web Issues to be Organized by W3C Spain Office ------------------------------------------------------------------------ (5) 50 Points. Write a JSP page that asks the user to enter a topic from a list of topics shown in a drop down list. The topics will be Business, Technology and World News. Once a selection is made your browser will make a call on a Java servlet passing along the topic. The servlet will fetch the appropriate RSS 2.0 feed from the NY Times web site. It will apply a style sheet that will generate HTML to the browser. The HTML display will show each news title of each item. Each news title will be displayed as a link. The user will be able to click links to visit the associated page. Note that there are no namespaces used in RSS 2.0. New York Times feeds may be found at http://www.nytimes.com/services/xml/rss/ (6) 5 Points. Add a source of feeds drop down box to the application that you built in question 5. The user will be able to select a topic and a source. At a minimum, you will need to provide for two feeds. The BBC feeds and the New York Times feeds will be used. The BBC feeds are available from: http://news.bbc.co.uk/1/hi/help/3223484.stm (7) Between 0 and 5 points. Add some cool feature to the application you built in question 6 and demonstrate it in class.