October 28, 2010 (Lecture 19)

October 28, 2010 (Lecture 19)

Limits of The Simple, Pure Relational Database Model

We haven't spent a whole lot of time reflecting on it this semester, but the data model for a pure relational database is quite simple: A two dimentional table of simple, primitive data. For example, we can't have a table that includes user-defined structured data, e.g. C-like structs. Today, we are going to spend some time exploring these limitations in theory, the "rich" features of modern object-relational databases that bring us past some of them, and some features that remain available only in truely object-oriented databases.
In discussing features generally available in modern object-relational databases, we'll show examples in SQL:1999 syntax. But, since most major commerical databases began implementing object-relational features well before the drafting of this standard, this syntax is not necessarily well-accepted, especially around the edges.

Limits of Simple Relational Databases: Aggregates

As I mentioned a moment ago, pure relational databases are tables of simple attributes -- not structured or aggregate data. This doesn't limit the types of prolems that can be solved. We are also already familiar with managing aggregate types using relational databases. We do one of two things: (i) Placing the aggregate type into its own relation and referencing rows by key from the original relation, or (b) unnesting and flattening the aggregate type, making its elements part of the original relation.
But, each of these techniques come at a cost: creating a new table requires another lookup, and flattening the aggregate type loses its identity and brings our implementation farther from the high-level design and business logic.
Modern relational databases aren't necessarily as restrictive as their simpler, purer ancestors. Beginning with SQL:1999, structured data, a.k.a. aggregates, were added to the language. For example, we can create an aggregate type to represent an address as follows:
  create tye Address as 
  (number integer,
   street varchar(30),
   city varchar(20),
   zipcode char(5),
   zipplus char(4)) final;
  

Limits of Simple Relational Databases: Lists

Another limit of the simple relational model is that we can't maintain a list as a single attribute. But, again, we can make it work -- and we already know how. Instead of keeping the list within a table, we place a key into the table that refers to another table that maintains the list, e.g. can maintain many items with the same key, which serves as a variable sized list of items.
SQL:1999 provides fairly significant support for lists in the form of arrays and multisets. Multisets are, in some sense, arrays without ordering. Arrays and multisets are easily declared, as follows:
  array ['Dick', 'Jane', 'Mary', 'Mike'];
  multiset ['Dick', 'Jane', 'Mary', 'Mike'];
  
We can access the elements of an array or multiset by unnesting it into a table as follows:
  insert into courses 
  values ('Databases', 'Fall 2010', 'Kesden', multiset ['Mogallapu', 'Ganguly']);

  ...
  
  select course from courses
  where 'Ganguly' in (unnest(ta_set));
  
Additionally, if, instead of using a multiset, we use an array, we can access the elements by index, as shown below:
  insert into courses 
  values ('Databases', 'Fall 2010', 'Kesden', array ['Mogallapu', 'Ganguly']);

  ...
  
  select course from courses
  where ta_array[1] = 'Mogallapu';
  

Limits of Simple Relational Databases: Subtypes and Inheritence

Simple relational databases are also missing some useful tools for describing how different real-world types relate to each other. For example, under the object-oriented model, we are able to describe sub-types and leverage this relationship to ease implementation and maintenance through inheritence. Without inheritence, we are left redescribing common aspects of data types and losing information as we translate from our high-level design to one that is representable within the database. The situation is akin to the situation we might have translating from an OO design into a C implementation, as compared to translating it into C++ or Java.
When extended to tables, the idea of inheritence becomes even more powerful. Imagine that StudentEmployee and StaffEmployee are types of Employee, e.g. they inherit from Employee. If we create a table Employees of Employee, and then create subtables StudentEmployees of StudentEmployee and StaffEmployees of StaffEmployee, any time we add a record to either StudentEmployee or StaffEmployee, a record containing the common, inherited component, also appears in the Employees table.
Again, modern relational databases provide a little bit of help here. For example, SQL:1999 allows subtyping via inheritence. Below is an example. Notice that "final" indicates that subclasses cannot be created from the defined type, whereas "not final" allows the definition to be used as the basis for defining subtypes.
  create type Employee
  (fname varchar(20),
   lname varchar (30),
   department varchar (20),
   eid  integer) not final;

  create type StudentEmployee under Employee
  (year integer;
   major varchar(20)) final;

  create type StaffEmployee under Employee
  (vacationDays integer,
   vestedYears integer) final;
  
And, similarly, we can use inheritence on tables, as follows:
  create table Employees of Employee;

  create table StudentEmployees of StudentEmployee under Employees;
  create table StaffEmployees of StaffEmployee under Employees;
  

Limits of Simple Relational Databases: Behaviors and Methods

If we move past talking about data, and start talking about objects, we also find limitations of the simple relational model. We'd like to be able to create Abstract Data Types (ADTs) that include not only data, but also methods that both limit how it can be accessed and make accessing it in common but sophisticated ways convenient. In other words, we'd like to have methods in addition to attributes.
But, at present, even the "richest" relational databases offer only very limited support for methods. In addition to fairly traditional constructors, simple methods can be defined. For example, consider the following SQL:1999 examples. In doing so, note that, as with many object-oriented langauges, constructors have the same name as the type they initialize. Also note that SQL:1999 uses the term "function" in the way that Java would use the term "static method" or "class method". And, simililarly, "self" and "this" are synonymous.
  create type Address as 
  (number integer,
   street varchar(30),
   city varchar(20),
   zipcode char(5),
   zipplus char(4),

   method getZip() returns char(10)) final;

   create instance method getZip() returns number for Address
   begin
     return zipcode + '-' + zipplus;
   end

   create function Address (number integer, street varchar(30), city varchar(20),
                            zipcode char(5), zipplus char(4)) returns Address
   begin
     set self.number = number;
     set self.street = street;
     set self.city = city;
     set self.zipcode = zipcode;
     set self.zipplus = zipplus;
   end
  

Limits of Simple Relational Databases: References

In simple relational databases, we often tie relations together by including, in the attributes of one relation, an attribute which can serve as a key into another relation. As you know, this type of reference is called a foreign key. This foreign key, in effect, serves as a reference from one object, a row, to another object, a row in another table.
Object-oriented enviornments have references as first-class language features. This makes the langauge more expressive, decreasing the complexity of the application. It also allows for better type checking. In the example below, the reference "employee" is limited to an attribute of the employees table:
  create type CConStation (
    location varchar(20),
    phone varchar(10),
    ccon_name ref(Employee) scope employees
  );

  create table CConStations of CConStation;

  

Fully Object-Oriented Databases

Thus far in today's class, we've explored a bunch of object-oriented features that are not part of traditional relational databases, but are part of modern object-relational databases. But, what is a fully object oriented database? What is the gap between here and there?
Well, I think one litmus test might be object persistence. Imagine programming in Java or C++, storing an instance of an object to a database, searching that database for the object by some attributed at a later point in time, and, upon finding it, bring it back to life by re-instantiating it. This is the promise of a fully object-oriented database. For example, imagine serializing a Java object, storing it into a database that could understand its attributes, querying this database, and then having the results of this query take the form of an array of deserialized objects within the program.
This is obviously a really complicated business for a lot of reasons. Sure, object-oriented languages are rich in features. But, beyond that, they are also very rich in their types. For example, the object-oriented data base might well need to make use of Comparators to execute queries. This is slow and complex. And, worse, every object-oriented language has its own set of rich but differing language features, etc. If any of you happen to be familiar with CORBA, think of the nightmare that we have here.
Even if a database were designed to support exactly Java, queries might need to be limited to primitive attribute types, etc. We can imagine a system that serializes Java objects to capture their attributes in XML, normalizes this to a flat (not tree-structured) format, and then stores that. Reinstantiation after a query could be accomplished by the reverse.
And, even this isn't the end of the complexity. What happens if the same instance is reinstantiated multiple times? Does the program logic still function? Do you see the can of worms? Reality dictates that object-oriented databases limit the scope of the object model that they support, the ways reinstantiated objects can be used, and the types of queries that they can execute.
At the present point in time, any truly and fully object-oriented database would be very complex, would likely only support a specific paradigm, and would be too computationally intensive for practical use.