COS 597A, Fall 2008 - Problem Set 4
Due at 5:00pm, Friday October 24, 2008
You may submit
electronically
or by hardcopy: bring your hardcopy submission to my office or put it
in my mailbox in the CS mailroom on the 2nd floor.
Collaboration Policy
You may discuss problems with other students in the class. However,
each
student must write up his or her own solution to each problem
independently.
That is, while you may formulate the solutions to problems in
collaboration
with classmates, you must be able to articulate the solutions on your
own.
Late Penalties
Note atypical penalties
due to fall break and approaching test.
- 25% of the earned score if submitted by 5pm on Monday 11/3/07.
- No credit if submitted
later than Monday 11/3/07.
Chapter,
exercise and page numbers
refer to the course text Database
System Concepts, Fifth Edition,
by Silberschatz, Korth, and Sudarshan
Problem
1:
Suppose that each functional dependency in a set Φ of functional
dependencies on a relational scheme R satisfies the conditions for
Third Normal Form (3NF). Show that any functional dependency that
is derived from one
application of any of the
three rules making up Armstrong's axioms satisfies the conditions for
3NF. Be precise.
Problem 2:
Consider relational scheme R = (A,B,C,D,E) with functional
dependencies:
A→ BC
CD→E
B→D
E→A
(taken from Practice Exercise 7.1, pg 306).
The set of dependencies in the closure of these dependencies can be
characterized as:
A→ ABCDE
BC→ ABCDE
CD→ ABCDE
E→ ABCDE
All other functional dependencies in the closure are either trivial
(right side is subset of left side) or derived from one of the above by
use of Armstrong's Augmentation
rule or by taking a subset of attributes on the right-hand side
of the dependency (the derived Decomposition
rule, pg 280).
Give a lossless, dependency-preserving decomposition of R into two or
more relations satisfying 3NF (Exercise 7.27, pg 309).
Problem 3:
The eXist XML database software is free, open-source software
that provides an XQuery processing engine for XML documents. We
are going to use a very very small portion of eXist - an application
that allows one to write XQuery queries on a database consisting of 3
of Shakespeare's plays captured as XML documents. Everyone will
run under one guest account using a Web interface. You will
express queries and either print or save to a file the results of your
queries. Since everyone will be using the same account, you
should save intermediate work by cutting and pasting to a file.
Then you can cut and paste back into the Web interface when you resume
your work. The eXist Web
interface will not save state beyond your session.
You can use eXist either by accessing a local copy running on the
Computer Science Department studentdb server http://studentdb.cs.princeton.edu:8080/exist/index.xml
or by using the eXist Project public Web site at http://exist-db.org/.
Either will display a home page that looks essentially the same.
Access to the local copy is restricted to users in the cs.princeton.edu
domain. If you are not a member of the
CS department, I will email you with options, but one option is
certainly the public Web site. The advantage of
using the local copy is that it will not be changed during the rest of
the semester except to fix problems (as best we can) and response may
be faster. The advantage of the eXist organization site is
that you don't need to be in the cs.princeton.edu domain and the bugs
may be fixed before we know about them.
The assignment consists of the following steps. Note that you do not need to submit
anything until Step 5:
- Follow one of the links above to the eXist home page.
- Looking down the left hand side menu, find "Examples" and click
on "XQuery Sandbox" under "Examples". Note that the use of
Sandbox requires that Javascript is enabled. You will reach a Web
page that looks like this (pdf
file)
(except that color highlighting does not show in the pdf file.
Also, the drop-down choice box labeled "Paste Saved Query" may or
may not have a query description showing. We are not using these
"saved queries" in this assignment, but you may want to look at them
for more XQuery examples.) Sandbox allows you to
write XQuery queries on several pre-loaded XML databases. We will
use only the Shakespeare database, which consists of 3 plays: Hamlet, Macbeth and Romeo and Juliet. Each
play is a separate XML document but the Shakespeare database consists
of all 3 plays (each play is a child of the database root). The
default database for queries submitted to Sandbox is the Shakespeare
database.
- Type (or cut and paste) the following XML query into the text box below
the "Paste Saved Query" drop-down box. Note that eXist
XQuery is case sensitive.
for
$play in /PLAY
where $play/TITLE &= "Hamlet"
return $play
Then click the SEND button. You will
see in the result window the full XML text of
Hamlet. The result may take
on the order of a minute to appear
if you are
using the eXist Project public Web site (but "Found 1 in 0... seconds.
" will appear almost immediately in the color highlighted top of the
result window). Here (pdf) is what the
beginning of the result looks like. This gives you an idea of how
the plays are tagged. Note that "&=" denotes containment
and matches full words. (Try replacing "Hamlet" with "Ham"; then
replace it with "Ham*", where "*" matches any string.) As you
can see from the text of the TITLE tag, Hamlet is not the complete name of
the play.
- Type (or cut and paste) the following XML query into the text box and send
it:
for $speech in
/PLAY//SPEECH
where LINE &= 'war'
return $speech
(Note that the local site and the eXist
Project site may return the
results in different order; this is because the 3 plays are in
different orders in the different copies of the database.)
- Modify the query of Step 4 to return only the lines containing
'war'. (Hint: recall that text()
returns the
text
content of a tag.) Turn in the query and
the result of the query as part of the
submission of this assignment. You may print the Sandbox
page showing the query and result. If you wish to submit
electronically, you can either cut and paste the query and result
into a text file or you can print the Sandbox page to a pdf file (if
you have Adobe
Acrobat).
- Type (or cut and paste) the following XML query into the text box and send
it:
for $ghostspeech in
/PLAY [TITLE &= 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines
total_number="{count($ghostlines)}">
{$ghostlines}
</GhostLines>
Describe the query
in English. Why is "let $ghostlines := $ghostspeech/LINE" used? Submit your answers to
these questions; do not
submit the result of the query.
- Type (or cut and paste) the following XML query into the text box and send
it:
for $ghostspeech in
/PLAY [TITLE &= 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines
total_number="{count($ghostlines)}">
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>
</GhostLines>
Describe the query
in English. Why is "$ghostspeech/LINE[1]/text()" used in the
return for this query? Submit your answers to
these questions; do not
submit the result of the query.
Note that TAGNAME[n] is a
general form for referring to the nth occurrence of a
child element with tag TAGNAME in the order the children appear
in the document.
- Type (or cut and paste) the following XML query into the text box and send
it:
let $allgspeech := /PLAY [TITLE &=
'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $allgspeech/LINE
return
<GhostLines
total_number="{count($ghostlines)}">
{for
$ghostspeech in $allgspeech
return
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>}
</GhostLines>
How do the results
of this query differ from those in Step 7? Why is "$allgspeech" used? Why is a
nested query used in the return portion? Submit your answers to
these questions; do not
submit the result of the query.
- Write an XQuery query that returns the titles
of the plays that
contain the word "fairies" in one or more of the lines of the
play. Turn in the query and
the result of the query as part of the
submission of this assignment.
NOTE: Cognetic
Systems Inc. also has an XQuery demo
site that offers XML versions of Shakespeare plays (all 37).
Choose "Full-text Search" from the left hand side menu. If you
decide to play with this demo site, you should be aware of differences
with the eXist site:
- Unlike the eXist "XQuery Sandbox", the Cognetic Systems
"Full-text Search" demo does not use the Shakespeare plays by
default. The document must be specified at the beginning of each
path expression using doc("plays.xml"),
e.g. doc("plays.xml")/PLAYS/PLAY.
Note that PLAY is not the root of "plays.xml", but
rather a child of doc("plays.xml")/PLAYS.
- The Cognetic Systems "Full-text Search" demo insists on full
pathnames, so, for example, in 4) above, where LINE must become where $speech/LINE.
- ftcontains is
used
instead of &= to test word containment, so in 4) where LINE &= 'war' becomes
where $speech/LINE ftcontains
'war'
eXist and Cognetic Systems are the two demo sites that I happened to
find on the Web; there may be more and better. I welcome
your findings.