Homework # 8 - Key

Database Design & Administration

Due: 4/17/2001

70-455

Information Resources Management

4/10/2001

50 points

You are the database administrator for the Carnegie Library System. The database contains the following information.

Double underlining indicates a primary key that is also a foreign key.

Branch(BranchID, Address, City)

Book(Call#, Seq#, BranchID, PatronID, DueDate)

BookTitle(Call#, Title)

TitleAuthor(Call#, AuthorName)

TitleSubject(Call#, Subject)

Patron(PatronID, Name, Address, BranchID, FinesDue)

Seq# is a unique number for each copy of the same book.

BranchID in Patron refers to the branch where a person got his/her library card.

The row totals are as follows:

Table

# Rows

Branch

19

Book

125,000,000

BookTitle

12,500,000

TitleAuthor

17,500,000

TitleSubject

50,000,000

Patron

350,000

Currently, the database is centralized and standard disk storage is being used.

The Director of the Library System has come to you because they are experiencing performance problems. Patrons are calling for the return of the card catalog system saying that it would be faster than the computer system that they use to locate their books.

1. Create a denormalized schema that would improve the performance of the "typical" patron queries: search for a book in the current branch by title, subject, or author with call number returned.

Branch(BranchID, Address, City)

Book(Call#, Seq#, BranchID, PatronID, DueDate)

SearchBook(Call#, Title, AuthorName, Subject)

Patron(PatronID, Name, Address, BranchID, FinesDue)

 

2. Five "warnings" about denormalization were provided in class. Apply these to the denormalized schema you developed for #1. Assume you were drafting a memo to the Director of the Library System. Use the warnings to explain the cost and impact of denormalization to the director.

The "SearchBook" table is not 4th NF.

Increases chance of errors or inconsistencies - a subject change for a book will require multiple updates, which will cause problems if not done correctly.

May result in reprogramming if business rules change - if subjects are changed so that there are main subjects with subtopics for each main subject, much more reprogramming will have to be done than if the original schema was used.

Optimizes based on current transaction mix - the denormalized schema is designed to improve the performance of the patron search. If the mix of transactions changes, this new design could be a hinderance.

Increases duplication and space required - For any title, the number of rows in SearchBook is the number of authors times the number of subjects.

Increases programming complexity - updating subjects or authors is more complex.

 

3. Assume that the database must remain centralized (stored on a server at the main branch in Oakland). Show how you would use vertical and/or horizontal partitioning of the original database schema to improve patron query performance. What are the costs or impacts of your partitioning plan?

Horizontal partitioning makes the most sense. Partition the Book table by branch and create a separate Book table for each branch. I would also partition the BookTitle, TitleAuthor, and TitleSubject tables by branch, creating separate tables for each branch that only contained the titles, authors, and subjects for books in each branch. I would not partition the Branch or Patron tables.

Costs/Impacts:

Increases programming complexity

Increases duplication and space required

Increases chance of errors or inconsistencies

Reprogramming required if business rules changed - adding/removing a branch

4. The Director has read a few books on SQL and written the following query. She wants a count of the number of over-due books on the subjects related to procrastination checked out by people who have more than $5 in fines. Currently, she manually counts the books by person that are returned. She complains to you that it "never seems to finish." You also discover that the last time she ran the query was when there were complaints from almost every branch about the performance of the database.

SELECT *

FROM Branch as Br, Book as B, BookTitle as BT, TitleAuthor as TA,

TitleSubject as TS, Patron as P

WHERE Br.BranchID = B.BranchID AND B.Call# = BT.Call# AND

BT.Call# = TA.Call# AND BT.Call# = TS.Call# AND

B.PatronID = P.PatronID AND (B.Call#, Seq#) IN

(SELECT Call#, Seq#

FROM Book

Where DueDate < Today) AND

B.Call# IN

(SELECT Call#

FROM TitleSubject

WHERE Subject LIKE "%PROCRAST%")

AND

P.PatronID IN

(SELECT PatronID

FROM Patron

WHERE FinesDue > 5.00)

ORDER BY P.NAME

4. Explain why this query suffers from performance problems.

Return less data rather than more - too much returned

Single query better than sub-queries - sub-queries not needed

Reduce number of tables joined - unused tables included

Reduce sorting - order by unnecessary

 

5. Assume that the DBMS being used does not enforce concurrency control. One patron had unpaid fines in the amount of $10 when the following three transactions related to this patron were processed at the same time:

(T1) Payment of $5 received in the mail

(T2) Payment of $3.50 made in person at the library

(T3) A book thought to still be checked out was found on the shelves and the patron was credited for a $6.00 fine.

Each of these transactions read the patron record when the total fines due was $10 (before any were committed). The updated patron record was returned to the database in the order shown above.

5A. What will be the fines due for the patron after all three transactions have been executed?

FinesDue = $4

5B. What should be the fines due for the patron after all three transactions have been executed?

FinesDue = -4.50 (credit to patron)

5C. Show the timing and lock usage (LOCK-S, LOCK-X, UPGRADE, RELEASE) for the three transactions if the "most used locking scheme" is used. (This can be a simple table with a numbers down the left for times and a column for each transaction -- for each transaction show when it will impact a lock; use "wait" if a transaction is waiting for another transaction to release a needed lock.)

If all three transactions start at the same time, this will result in a deadlock.

Time

T1

T2

T3

1

Lock-S

   

2

Read

   

3

 

Lock-S

 

4

 

Read

 

5

   

Lock-S

6

   

Read

7

Fines = Fines - 5

Fines = Fines - 3.5

Fines = Fines - 6

8

Upgrade

   

9

wait

Upgrade

 

10

wait

wait

Upgrade

11

wait

wait

wait

T1 can't upgrade because T2 and T3 have shared locks so it waits. T2 gets control, but can't upgrade because T1 and T3 have shared locks so it waits. Same with T3. In the end, all are waiting for the others to finish first.

In order for these transactions to complete successfully, you must assume that T2 doesn't start until after T1 has upgraded its lock and that T3 doesn't start until after T2 has upgraded its lock. With those assumptions, the execution is as follows.

Time

T1

T2

T3

0

T1 Starts

   

1

Lock-S

   

2

Read

   

3

Fines = Fines - 5

   

4

Upgrade

   

5

 

T2 Starts

 

6

 

Lock-S

 

7

 

wait

 

8

Write

wait

 

9

Commit

wait

 

10

 

Read

 

11

 

Fines = Fines -3.5

 

12

 

Upgrade

 

13

   

T3 Starts

14

   

Lock-S

15

   

wait

16

 

Write

wait

17

 

Commit

wait

18

   

Read

19

   

Fines = Fines - 6

20

   

Upgrade

21

   

Write

22

   

Commit