95-733 Internet Technologies Spring 2009
Homework 2 Due: Tuesday, February 10
Lab Topic: XML and the Extensible Style Sheet Language for Transformations XSLT
In this lab we will be programming in a transformation language called
XSLT. XSLT is used to transform one XML document into another XML document
(with a different structure). In order to write programs in XSLT, we
need an XML parser (XSLT programs are XML documents) and an XSLT
interpreter. The parser is called "Xerces". The interpreter is called
"Xalan" (Xalan uses Xerces).
The required jar files for XSLT processing using Xalan are : xalan.jar,
xercesImpl.jar, xml-apis.jar and xsltc.jar. These may be downloaded
from the Apache Foundation.
Part 1 Command Line XSLT
========================
For DOS based machines, create a directory called "bats" and place
a batch file called "xalan.bat" in that directory. Place the path
to your bats directory in the system path variable.
The file xalan.bat will hold the following:
java org.apache.xalan.xslt.Process ÐIN %1 -XSL %2 -OUT %3
You will need to have the jar files mentioned above on your classpath
before running xalan.bat.
For Unix based machines, you will use a script file called xalan with
execute permissions. My xalan jar files are saved in
/Users/mm6/Applications/xalan.
My xalan script is shown below.
#!/bin/sh
export XALAN_HOME=/Users/mm6/Applications/xalan
export CP=$XALAN_HOME/xalan.jar:$XALAN_HOME/xercesImpl.jar:$XALAN_HOME/xml-
apis.jar:$XALAN_HOME/xsltc.jar
java -classpath $CP org.apache.xalan.xslt.Process -IN $1 -XSL $2 -OUT $3
Testing. The following is an xml file called books.xml that contains data
on books. It's a copy of the file found on Page 70 of the XSLT Programmer's
Reference by Michael Kay.
Nigel ReesSayings of the Century8.95Evelyn WaughSword of Honour12.99Herman MelvilleMoby Dick8.99J. R. R. TolkienThe Lord of the Rings22.99
We would like to transform this file into an HTML document as shown here (result.html):
A list of books
1
Nigel Rees
Sayings of the Century
8.95
2
Evelyn Waugh
Sword of Honour
12.99
3
Herman Melville
Moby Dick
8.99
4
J. R. R. Tolkien
The Lord of the Rings
22.99
In order to carry out this transformation, we will use the XSLT
programming language. While it is the case that XSLT is Turing
complete, that is, we can solve a wide variety of problems using
XSLT, it is especially good at performing XML transformations.
Our first XSLT program looks like this
(booklist.xsl):
A list of books
Place the two files (books.xml and booklist.xsl) into a directory
and make sure that xalan is working properly by running the
following command. The output file should look like result.html.
xalan books.xml booklist.xsl result.html
When debugging XSLT programs, it is often much more helpful to
view your output in an editor like Notepad rather than to view
your output in a browser like Netscape or IE or Safari. Look at
the HTML document in A browser only after you are satisfied
with the way it looks in Notepad. The browser view is often
quite deceiving and makes a poor debugging tool.
Part 2 Handling Namespaces
==========================
Many documents make use of XML namespaces to remove ambiguity.
The following is our books example with a namespace assigned to
the namespace prefix p.
Nigel ReesSayings of the Century8.95Evelyn WaughSword of Honour12.99Herman MelvilleMoby Dick8.99J. R. R. TolkienThe Lord of the Rings22.99
The same XSLT program that we wrote above needs to be adapted
to handle these namespace qualified elements. Be sure to test
this new program against the books file with namespaces.
A list of books
Part 3 Running Xalan from within Java
============================================
While command line xalan makes a very nice tool, it is
often necessary to make calls for XSLT processing from
within other programs. Here is a Java program that
performs the same transformation as above. But this
time the transformation is performed under application
program control.
This program would be executed with the command:
java ProduceHTML books.xml booklist.xsl result.html
// ProduceHTML.java is a simple program that demonstrates how XSLT programs
// can be executed from within Java.
import java.io.IOException;
import java.io.OutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.Result;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
public class ProduceHTML {
public static void main(String a[] ) {
Source xmlDoc, xslDoc;
Result result;
try {
FileInputStream xml = new FileInputStream(a[0]);
FileInputStream xsl = new FileInputStream(a[1]);
FileOutputStream out = new FileOutputStream(a[2]);
xmlDoc = new StreamSource(xml);
xslDoc = new StreamSource(xsl);
result = new StreamResult(out);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer trans = factory.newTransformer(xslDoc);
trans.transform(xmlDoc,result);
}
catch(TransformerException e) {
System.out.println("Transformer Probem" + e);
}
catch(IOException e) {
System.out.println("An I/O problem");
}
}
}
Part 4. Running XSLT from within a Java servlet.
================================================
Suppose we want to use a local stylesheet called
XSLTransformerCode.xsl to process a remote XML file
at some URL.
Using Eclipse with the Tomcat plugin, add the xsl
stylesheet file to the project so that the file is
under the project but not under any subdirectory of
the project.
A doGet method might have the following code:
PrintWriter out = response.getWriter();
// get the xsl stored in this project
ServletContext context = getServletContext();
InputStream xsl = (InputStream)
(context.getResourceAsStream("/XSLTransformerCode.xsl"));
// We need two source objects and one result
// get an external xml document using a url in a
// string format
Source xmlDoc = new StreamSource(urlAsString);
Source xslDoc = new StreamSource(xsl);
Result result = new StreamResult(out);
// Prepare to transform
TransformerFactory factory = TransformerFactory.newInstance();
Transformer trans = factory.newTransformer(xslDoc);
trans.transform(xmlDoc,result);
Part 5. An RDF document from the W3C
====================================
The following document was accessed from the W3C's main
web page by clicking on the syndicate link. It is meant to
be read by a news reader. We will use it as our input file
for the homework problems below.
World Wide Web Consortium - Web StandardsLeading the Web to Its Full Potential...
http://www.w3.org/
2009-01-20Future of Social Networking Workshop Begins2009-01-15: Today began a 2-day Workshop on the Future of Social Networking, organized by W3C to explore the landscape of social networking technologies. Participants submitted 72 position papers on a wide range of topics regarding the growth and future of social networking, including, but not limited to, the mobile context. The meeting is hosted in Barcelona, Spain by Universitat Politècnica de Catalunya and ReadyPeople. Many thanks to the hosts and to Silver Sponsors Ayuntamiento de Zaragoza, Flock, and Peperoni for their support. (Permalink)
http://www.w3.org/News/2009#item3
2009-01-15Use Cases and Requirements for Ontology and API for Media Object 1.02009-01-20: The Media Annotations Working Group has published the First Public Working Draft of Use Cases and Requirements for Ontology and API for Media Object 1.0. This document specifies use cases and requirements as an input for the development of the "Ontology for Media Object 1.0" and the "API for Media Object 1.0". The ontology will be a simple ontology to support cross-community data integration of information related to media objects on the Web. The API will provide read access and potentially write access to media objects, relying on the definitions from the ontology. Learn more about the Video in the Web Activity. (Permalink)
http://www.w3.org/News/2009#item5
2009-01-20W3C Invites Implementations of CURIE Syntax 1.02009-01-16: The XHTML2 Working Group invites implementation of the Candidate Recommendation of CURIE Syntax 1.0. This document defines a generic, abbreviated syntax for expressing URIs. This syntax is intended to be used as a common element by language designers. The intended audience for this document is Language designers, not the users of those Languages. Track implementations in an ongoing implementation report and learn more about the HTML Activity. (Permalink)
http://www.w3.org/News/2009#item4
2009-01-16W3C Advisory Committee Elects TAG Participants2009-01-13: The W3C Advisory Committee has elected John Kemp (Nokia), Larry Masinter (Adobe), and T.V. Raman (Google) to the W3C Technical Architecture Group (TAG). Continuing TAG participants are Ashok Malhotra (Oracle), Noah Mendelsohn (IBM, appointed), Jonathan Rees (Science Commons, appointed), and Henry Thompson (U. of Edinburgh). The Director is expected to appoint one individual as well. The mission of the TAG is to build consensus around principles of Web architecture and to interpret and clarify these principles when necessary, to resolve issues involving general Web architecture brought to the TAG, and to help coordinate cross-technology architecture developments inside and outside W3C. (Permalink)
http://www.w3.org/News/2009#item2
2009-01-13W3C Talks in January2009-01-05: Browse W3C presentations and events also available as an RSS channel. (Permalink)
http://www.w3.org/News/2009#item1
2009-01-05Element Traversal Specification Is a W3C Recommendation2008-12-22: The Web Applications Working Group has published the W3C Recommendation of Element Traversal Specification. This specification defines the ElementTraversal interface, which allows script navigation of the elements of a DOM tree, excluding all other nodes in the DOM, such as text nodes. It also provides an attribute to expose the number of child elements of an element. It is intended to provide a more convenient alternative to existing DOM navigation interfaces, with a low implementation footprint. Learn more about the Rich Web Client Activity. (Permalink)
http://www.w3.org/News/2008#item222
2008-12-22Web IDL Draft Published2008-12-22: The Web Applications Working Group has published the Working Draft of Web IDL. This document defines an interface definition language, Web IDL, that can be used to describe interfaces that are intended to be implemented in web browsers. Web IDL is an IDL variant with a number of features that allow the behavior of common script objects in the web platform to be specified more readily. How interfaces described with Web IDL correspond to constructs within ECMAScript and Java execution environments is also detailed. Learn more about the Rich Web Client Activity. (Permalink)
http://www.w3.org/News/2008#item221
2008-12-22SVG Tiny 1.2 Advances State of the Art for Web Graphics2008-12-22: Creating beautiful and accessible interactive content was made easier today with the release of the Scalable Vector Graphics (SVG) Tiny 1.2 Recommendation. Already implemented and deployed in mobile phones, media centers, and browsers around the world, this open standard allows authors to build documents and interfaces for the Web, with open-source and commercial authoring tools that output open, reusable content. Searchable, internationalized text and user-created metadata bring the Semantic Web to graphics, and improve the experience of users everywhere, while easier programming interfaces put the power in the hands of developers. A test suite helps to ensure interoperable SVG content in modern Web browsers, making it easier than ever to develop and deploy the right look and feel. Read the testimonials and start creating content today. Learn more about the Graphics Activity. (Permalink)
http://www.w3.org/News/2008#item223
2008-12-22
PART 6 Introductory XSLT Programming
====================================
(1) 10 Points. Using command line XSLT, write an XSLT program that displays
the contents of the title, description and link fields that
are direct children of the channel element. Your output will
be marked up as HTML and will appear in a browser as follows:
W3C RDF Document
* World Wide Web Consortium
* Leading the Web to Its Full Potential...
* http://www.w3.org/
(2) 10 Points. Using command line XSLT, write an XSLT program that displays
the number of RDF list items that appear in the document. You must use
the XSLT count function in your solution. Your output will
be marked up as HTML and will appear in a browser as follows:
Counting RDF list items
8
(3) 10 Points. Using command line XSLT, write an XSLT program that
displays the content of each title element that is inside an item
element. Your output will be marked up as HTML and will appear in a
browser as follows (unsigned list elements are shown with an asterisk):
Titles
* Future of Social Networking Workshop Begins
* Use Cases and Requirements for Ontology and API for Media Object 1.0
* W3C Invites Implementations of CURIE Syntax 1.0
* W3C Advisory Committee Elects TAG Participants
* W3C Talks in January
* Element Traversal Specification Is a W3C Recommendation
* Web IDL Draft Published
* SVG Tiny 1.2 Advances State of the Art for Web Graphics
(4) 10 Points. Using command line XSLT, write an XSLT program that
displays the content of each title element that is inside an item
element. Your output will be marked up as HTML and will appear in a
browser with the titles underlined as hypertext links. If the user
clicks on a link the browser will fetch the associated document that
is pointed to by the link element. The output on the browser will
appear as follows (hypertext links are shown with an underline).
Titles (with links)
Future of Social Networking Workshop Begins
-------------------------------------------
:
:
SVG Tiny 1.2 Advances State of the Art for Web Graphics
-------------------------------------------------------
(5) 50 Points. Write a JSP page that asks the user to enter a topic from
a list of topics shown in a drop down list. The three topics will be
Business, Technology and World News. Once a selection is made your browser
will make a call on a Java servlet passing along the topic.
The servlet will fetch the appropriate RSS 2.0 feed from the NY Times
web site. It will apply a style sheet that will generate HTML to
the browser. The HTML display will show each news title of each item.
Each news title will be displayed as a link. The user will be able to
click links to visit the associated page. Note that there are no
namespaces used in RSS 2.0.
New York Times feeds may be found at http://www.nytimes.com/services/xml/rss/
(6) 5 Points. Add a source of feeds drop down box to the application that
you built in question 5. The user will be able to select a topic and
a source. At a minimum, you will need to provide for three sources. In
my solution, I used the BBC, the New York Times and the Sydney Morning
Herald.
The BBC feeds are available from:
http://news.bbc.co.uk/1/hi/help/3223484.stm
The Sydney Morning Herald feeds are available
from:
http://www.smh.com.au/rsschannels/
(7) Between 0 and 5 points. Add some cool feature to the application you
built in question 6 and demonstrate it in class.