95-733 Internet Technologies Spring 2008
Homework 2 Due: Tuesday, February 12
Lab Topic: XML and the Extensible Style Sheet Language for Transformations XSLT
In this lab we will be programming in a transformation language called
XSLT. XSLT is used to transform one XML document into another XML document
(with a different structure). In order to write programs in XSLT, we
need an XML parser (XSLT programs are XML documents) and an XSLT
interpreter. The parser is called "Xerces". The interpreter is called
"Xalan" (Xalan uses Xerces).
The required jar files for XSLT processing using Xalan are : xalan.jar,
xercesImpl.jar, xml-apis.jar and xsltc.jar. These may be downloaded
from the Apache Foundation.
Part 1 Command Line XSLT
========================
For DOS based machines, create a directory called "bats" and place
a batch file called "xalan.bat" in that directory. Place the path
to your bats directory in the system path variable.
The file xalan.bat will hold the following:
java org.apache.xalan.xslt.Process ÐIN %1 -XSL %2 -OUT %3
You will need to have the jar files mentioned above on your classpath
before running xalan.bat.
For Unix based machines, you would use a script file called xalan with
execute permissions. My xalan jar files are saved in
/Users/mm6/Applications/xalan
#!/bin/sh
export XALAN_HOME=/Users/mm6/Applications/xalan
export CP=$XALAN_HOME/xalan.jar:$XALAN_HOME/xercesImpl.jar:$XALAN_HOME/xml-
apis.jar:$XALAN_HOME/xsltc.jar
java -classpath $CP org.apache.xalan.xslt.Process -IN $1 -XSL $2 -OUT $3
Testing. The following is an xml file called books.xml that contains data
on books. It's a copy of the file found on Page 70 of the XSLT Programmer's
Reference by Michael Kay.
Nigel ReesSayings of the Century8.95Evelyn WaughSword of Honour12.99Herman MelvilleMoby Dick8.99J. R. R. TolkienThe Lord of the Rings22.99
We would like to transform this file into an HTML document as shown here (result.html):
A list of books
1
Nigel Rees
Sayings of the Century
8.95
2
Evelyn Waugh
Sword of Honour
12.99
3
Herman Melville
Moby Dick
8.99
4
J. R. R. Tolkien
The Lord of the Rings
22.99
In order to carry out this transformation, we will use the XSLT
programming language. While it is the case that XSLT is Turing
complete, that is, we can solve a wide variety of problems using
XSLT, it is especially good at performing XML transformations.
Our first XSLT program looks like this
(booklist.xsl):
A list of books
Place the two files (books.xml and booklist.xsl) into a directory
and make sure that xalan is working properly by running the
following command. The output file should look like result.html.
xalan books.xml booklist.xsl result.html
When debugging XSLT programs, it is often much more helpful to
view your output in an editor like Notepad rather than to view
your output in a browser like Netscape or IE or Safari. Look at
the HTML document in A browser only after you are satisfied
with the way it looks in Notepad. The browser view is often
quite deceiving and makes a poor debugging tool.
Part 2 Handling Namespaces
==========================
Many documents make use of XML namespaces to remove ambiguity.
The following is our books example with a namespace:
Nigel ReesSayings of the Century8.95Evelyn WaughSword of Honour12.99Herman MelvilleMoby Dick8.99J. R. R. TolkienThe Lord of the Rings22.99
The same XSLT program that we wrote above needs to be
adapted to handle these namespace qualified elements.
A list of books
Part 3 Running Xalan from within Java
============================================
While command line xalan makes a very nice tool, it is
often necessary to make calls for XSLT processing from
within other programs. Here is a Java program that
performs the same transformation as above. But this
time the transformation is performed under application
program control.
This program would be executed with the command:
java ProduceHTML books.xml booklist.xsl result.html
// ProduceHTML.java is a simple program that demonstrates how XSLT programs
// can be executed from within Java.
import java.io.IOException;
import java.io.OutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.Result;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
public class ProduceHTML {
public static void main(String a[] ) {
Source xmlDoc, xslDoc;
Result result;
try {
FileInputStream xml = new FileInputStream(a[0]);
FileInputStream xsl = new FileInputStream(a[1]);
FileOutputStream out = new FileOutputStream(a[2]);
xmlDoc = new StreamSource(xml);
xslDoc = new StreamSource(xsl);
result = new StreamResult(out);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer trans = factory.newTransformer(xslDoc);
trans.transform(xmlDoc,result);
}
catch(TransformerException e) {
System.out.println("Transformer Probem" + e);
}
catch(IOException e) {
System.out.println("An I/O problem");
}
}
}
Part 4. Running XSLT from within a Java servlet.
================================================
Suppose we want to use a local stylesheet called
XSLTransformerCode.xsl to process a remote XML file
at some URL.
Using Eclipse with the Tomcat plugin, add the xsl
stylesheet file to the project so that the file is
under the project but not under any subdirectory of
the project.
A doGet method might have the following code:
PrintWriter out = response.getWriter();
// get the xsl stored in this project
ServletContext context = getServletContext();
InputStream xsl = (InputStream)
(context.getResourceAsStream("/XSLTransformerCode.xsl"));
// We need two source objects and one result
// get an external xml document using a url in a
// string format
Source xmlDoc = new StreamSource(urlAsString);
Source xslDoc = new StreamSource(xsl);
Result result = new StreamResult(out);
// Prepare to transform
TransformerFactory factory = TransformerFactory.newInstance();
Transformer trans = factory.newTransformer(xslDoc);
trans.transform(xmlDoc,result);
Part 5. An RDF document from the W3C
====================================
The following document was accessed from the W3C's main
web page by clicking on the syndicate link. It is meant to
be read by a news reader. We will use it as our input file
for the homework problems below.
World Wide Web ConsortiumLeading the Web to Its Full Potential...
http://www.w3.org/
2008-01-22W3C Publishes HTML 5 Draft, Future of Web Content2008-01-22: W3C today published an early draft of HTML 5, a major revision of the markup language for the Web. The HTML Working Group is creating HTML 5 to be the open, royalty-free specification for rich Web content and Web applications. "HTML is of course a very important standard," said Tim Berners-Lee, author of the first version of HTML and W3C Director. "I am glad to see that the community of developers, including browser vendors, is working together to create the best possible path for the Web." New features include APIs for drawing two-dimensional graphics and ways to embed and control audio and video content. HTML 5 helps to improve interoperability and reduce software costs by giving precise rules not only about how to handle all correct HTML documents but also how to recover from errors. Discover other new features, read the press release, and learn more about the future of HTML. (Permalink)
http://www.w3.org/News/2008#item8
2008-01-22Relationship Between Mobile Web and Web Content Accessibility (First Public Working Draft)2008-01-22: The Mobile Web Best Practices Working Group and the WAI Education and Outreach Working Group have published the First Public Working Draft of Relationship Between Mobile Web Best Practices 1.0 and Web Content Accessibility Guidelines. See the announcement email.
http://www.w3.org/News/2008#item11
2008-01-22Document Object Model Activity Closed2008-01-22: W3C's Document Object Model (DOM) Activity is now closed. The Document Object Model Working Group closed in the early 2004 after the completion of the DOM Level 3 Recommendations. Since then, several W3C Working Groups have taken the lead in maintaining and continuing to develop standard APIs for the Web; these include the HTML, SVG, CSS, and WebAPI Working Groups. W3C will continue to develop APIs in various Working Groups. Learn more about achievements of those participating as part of the DOM Activity on the DOM Activity Statement. (Permalink)
http://www.w3.org/News/2008#item10
2008-01-22W3C Advisory Committee Elects TAG Participants2008-01-22: The W3C Advisory Committee has elected Ashok Malhotra (Oracle), T.V. Raman (Google), and Henry Thompson (University of Edinburgh) to the W3C Technical Architecture Group (TAG). Continuing TAG participants are Noah Mendelsohn (IBM), David Orchard (BEA), Jonathan Rees (Science Commons), Norm Walsh (Sun), and Stuart Williams (HP), who co-Chairs the TAG with Tim Berners-Lee. The mission of the TAG is to build consensus around principles of Web architecture and to interpret and clarify these principles when necessary, to resolve issues involving general Web architecture brought to the TAG, and to help coordinate cross-technology architecture developments inside and outside W3C. (Permalink)
http://www.w3.org/News/2008#item9
2008-01-22SPARQL Standard Opens Data on the Web2008-01-15: Today, the World Wide Web Consortium made it easier to share and reuse data across application, enterprise, and community boundaries with the publication of three new Semantic Web standards for SPARQL (pronounced "sparkle"). SPARQL is the query language for the Semantic Web (see Semantic Web use cases). SPARQL queries hide the details of data management, which lowers costs and increases robustness of data integration on the Web. "Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL," explained Tim Berners-Lee, W3C Director. There are already 14 implementations of the standard, which is comprised of three W3C Recommendations: SPARQL Query Language for RDF, SPARQL Protocol for RDF, and SPARQL Query Results XML Format. Read the press release, testimonials and learn more about the Semantic Web Activity. (Permalink)
http://www.w3.org/News/2008#item6
2008-01-15W3C Invites Implementations of SMIL 3.0 (Candidate Recommendation)2008-01-15: The SYMM Working Group has published the Candidate Recommendation of Synchronized Multimedia Integration Language (SMIL 3.0), an XML-based language that allows authors to create interactive multimedia presentations. Using SMIL 3.0, an author can describe the temporal behavior of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen. The Working Group is building a test suite help ensure interoperable implementation. Learn more about W3C work on Synchronized Multimedia (Permalink)
http://www.w3.org/News/2008#item7
2008-01-15Service Modeling Language 1.1 Drafts2008-01-14: The Service Modeling Language (SML) Working Group has published the third Working Drafts of Service Modeling Language, Version 1.1 and Service Modeling Language Interchange Format Version 1.1. The former defines the SML 1.1, intended to model complex services and systems, including their structure, constraints, policies, and best practices. The latter defines the SML 1.1 interchange format, designed to ensure accurate and convenient interchange of the documents that make up an SML model. Learn more about the Extensible Markup Language (XML) Activity. (Permalink)
http://www.w3.org/News/2008#item5
2008-01-14Last Call: SMIL Timesheets 1.02008-01-10: The SYMM Working Group has published the Last Call Working Draft of SMIL Timesheets 1.0; this is also the First Public Working Draft. This document defines an XML timing language that makes SMIL 3.0 element and attribute timing control available to a wide range of other XML languages. This language allows SMIL timing to be integrated into a wide variety of a-temporal languages, even when several such languages are combined in a compound document. Because of its similarity with external style and positioning descriptions in the Cascading Style Sheet (CSS) language, this functionality has been termed SMIL Timesheets. Comments are welcome through 15 February. Learn more about W3C work on Synchronized Multimedia. (Permalink)
http://www.w3.org/News/2008#item4
2008-01-10W3C Welcomes Review of Three OWL 1.1 First Public Drafts2008-01-08: The OWL Working Group has published the First Public Working Draft of three Web Ontology Language (OWL) 1.1 specifications: Structural Specification and Functional-Style Syntax, Model-Theoretic Semantics, and Mapping to RDF Graphs. OWL is used to define Semantic Web vocabularies. Together, these new specifications extend the W3C OWL Web Ontology Language 1.0 with a small but useful set of features that have been requested by users, for which effective reasoning algorithms are now available, and that OWL tool developers are willing to support. The three specifications cover, respectively, the syntax, semantics, and mapping to RDF of OWL 1.1 ontologies. Learn more about the W3C Semantic Web Activity. (Permalink)
http://www.w3.org/News/2008#item3
2008-01-08XHTML Access Module; Comments Welcome2008-01-07: The XHTML2 Working Group has published the First Public Working Draft of XHTML Access Module. This document is intended to help make XHTML-family markup languages more effective at supporting the needs of the accessibility community. It does so by providing a generic mechanism for defining the relationship between document components and well-known accessibility taxonomies. Learn more about the HTML Activity. (Permalink)
http://www.w3.org/News/2008#item2
2008-01-07W3C Talks in January2008-01-03: Browse W3C presentations and events also available as an RSS channel. (Permalink)
http://www.w3.org/News/2008#item1
2008-01-03Last Call: Selectors API; New Draft of DOM Level 3 Events2007-12-21: The Web API Working Group has published the Last Call Working Draft of Selectors API. Selectors, which are widely used in CSS, are patterns that match against elements in a tree structure. The Selectors API specification defines methods for retrieving Element nodes from the Document Object Model (DOM) by matching against a group of selectors. Comments are welcome through 06 January 2008. The Working Group has also published a Working Draft of DOM Level 3 Events, a generic platform- and language-neutral event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. Learn more about the Rich Web Client Activity. (Permalink)
http://www.w3.org/News/2007#item270
2007-12-21W3C Invites Implementations of DCCI 1.0 (Candidate Recommendation); first draft of Delivery Context Ontology available2007-12-21: The Ubiquitous Web Applications Working Group has published the Candidate Recommendation of Delivery Context: Client Interfaces (DCCI) 1.0. This document defines platform and language neutral programming interfaces that provide Web applications access to a hierarchy of dynamic properties representing device capabilities, configurations, user preferences and environmental conditions. In addition, the Working Group has published the First Public Working Draft of Delivery Context Ontology, which provides a formal model for the delivery context which other specifications can reference normatively. Learn more about the Ubiquitous Web Applications Activity. (Permalink)
http://www.w3.org/News/2007#item269
2007-12-21Last Call: SVG Print 1.2 Language, Primer2007-12-21: The SVG Working Group has published Last Call Working Drafts of SVG Print 1.2, Part 2: Language and SVG Print 1.2, Part 1: Primer. The former defines features of the Scalable Vector Graphics (SVG) Language that are specifically for printing environments; the latter provides guidelines on how to use the print specification with SVG 1.2 Tiny and SVG 1.2 Full modules. Comments on both specifications are welcome through 08 February. Learn more about the Graphics Activity. (Permalink)
http://www.w3.org/News/2007#item268
2007-12-21Device Description Repository Core Vocabulary2007-12-21: The Mobile Web Initiative Device Description Working Group has published the First Public Working Draft of Device Description Repository Core Vocabulary. This document describes the Device Description Repository Core Vocabulary for Content Adaptation, that is, the properties that are considered essential for adaptation of content in the mobile Web. Its intended use is to define a baseline vocabulary for implementations of the Device Description Repository (DDR). Learn more about the Mobile Web Initiative Activity. (Permalink)
http://www.w3.org/News/2007#item271
2007-12-21Public Virtual Seminar on Web Issues to be Organized by W3C Spain Office2007-12-20: On 23 January 2008, the W3C Spain Office will hold a virtual seminar where W3C staff will discuss the latest news in Web topics such as e-Government, Video on the Web, and Mobile Web in developing countries; see the program for the full list of topics and speakers. The public is invited to participate over the Internet in the seminar, which will take place in English from 15:00 to 18:00 (CET); see the participation instructions. The seminar, hosted by UPM, will also be broadcast online. Learn more about the W3C Spain Office. (Permalink)
http://www.w3.org/News/2007#item267
2007-12-20
PART 6 Introductory XSLT Programming
====================================
(1) 10 Points. Using command line XSLT, write an XSLT program that displays
the contents of the title, description and link fields that
are direct children of the channel element. Your output will
be marked up as HTML and will appear in a browser as follows:
W3C RDF Document
* World Wide Web Consortium
* Leading the Web to Its Full Potential...
* http://www.w3.org/
(2) 10 Points. Using command line XSLT, write an XSLT program that displays
the number of RDF list items that appear in the document. You must use
the XSLT count function in your solution. Your output will
be marked up as HTML and will appear in a browser as follows:
Counting RDF list items
16
(3) 10 Points. Using command line XSLT, write an XSLT program that
displays the content of each title element that is inside an item
element. Your output will be marked up as HTML and will appear in a
browser as follows (unsigned list elements are shown with an asterisk):
Titles
* W3C Publishes HTML 5 Draft, Future of Web Content
* Relationship Between Mobile Web and Web Content
Accessibility (First Public Working Draft)
:
:
* Public Virtual Seminar on Web Issues to be Organized
by W3C Spain Office
(4) 10 Points. Using command line XSLT, write an XSLT program that
displays the content of each title element that is inside an item
element. Your output will be marked up as HTML and will appear in a
browser with the titles underlined as hypertext links. If the user
clicks on a link the browser will fetch the associated document that
is pointed to by the link element. The output on the browser will
appear as follows (hypertext links are shown with an underline).
Titles
W3C Publishes HTML 5 Draft, Future of Web Content
-------------------------------------------------
:
:
Public Virtual Seminar on Web Issues to be Organized by W3C Spain Office
------------------------------------------------------------------------
(5) 50 Points. Write a JSP page that asks the user to enter a topic from
a list of topics shown in a drop down list. The topics will be Business,
Technology and World News. Once a selection is made your browser
will make a call on a Java servlet passing along the topic.
The servlet will fetch the appropriate RSS 2.0 feed from the NY Times
web site. It will apply a style sheet that will generate HTML to
the browser. The HTML display will show each news title of each item.
Each news title will be displayed as a link. The user will be able to
click links to visit the associated page. Note that there are no
namespaces used in RSS 2.0.
New York Times feeds may be found at http://www.nytimes.com/services/xml/rss/
(6) 5 Points. Add a source of feeds drop down box to the application that
you built in question 5. The user will be able to select a topic and
a source. At a minimum, you will need to provide for two feeds. The
BBC feeds and the New York Times feeds will be used.
The BBC feeds are available from:
http://news.bbc.co.uk/1/hi/help/3223484.stm
(7) Between 0 and 5 points. Add some cool feature to the application you
built in question 6 and demonstrate it in class.