Return to lecture notes index
October 26, 2010 (Lecture 18)

Programming for XML

Today we discussed tools available for programming with XML. In particular, we spoke a bit about the Document Object Model (DOM), the Simple API for XML (SAX), and XPath. We also briefly mentioned XQuery.

DOM parses and XML file and builds, wihtin memory, the tree that it represents. We can imagine that it is a standard sibling tree, with part-child and sibling points. the DOM API allows a programmer to traverse the tree in much the same way a 15-1xx student might traverse some tree built in memory. DOM is a nice model, but it is limited by the fact that the whole tree needs to fit into memory, hopefully without paging. And, it isn't necessary super-useful if the end game is to process all of the subtrees exactly one, for example when loading records, rather than directly asking questions about the data the tree represents.

SAX is very different. It walks through an XML document, making what are, in effect, callbacks each time it meets the beginning or end of an element. These callbacks, fucntions provided by the application programmer can inspect the element and its attributes and take whatever action might be appropriate, e.g. recognize the end of a record and store it within a database.

XPath is a technology strongly associated with DOM. To understand it, we imagine a DOM tree. And, in so doing, we think of it in the same way we might think of a file system tree. Just as we can identify a file within the filesystem by its path, using a /-separate notation, we identify an element within an XML document the same way -- using a /-separated path through its DOM tree. And, just as, given the path to a file, we can ask questions about its children and attributes, we can do the same thing with an element of an XML document identified by its XPath.

XQuery is, in some ways, much like XPath. The key difference is that it allows one to ask questions, e.g. matching, and manipulate sets of nodes that contain the answers.

Tutorials

In class, we walked through bits of the following tutorials and examples -- the Web is ruch with them. Look at others, too.