COS 597A, Fall 2011 - Problem Set 4

Due at 3:00pm, Wednesday  October 26, 2011


Collaboration Policy

You may discuss problems with other students in the class. However, each student must write up his or her own solution to each problem independently. That is, while you may formulate the solutions to problems in collaboration with classmates, you must be able to articulate the solutions on your own.

Late Penalties

Note 40% period ends during fall break!


Problem 1:
Suppose that each functional dependency in a set Φ of functional dependencies on a relational scheme R satisfies the conditions for Third Normal Form (3NF).  Suppose we use Armstrong's transitivity rule to obtain X→Z  from X→Y and Y→ Z in Φ, where X, Y, and Z are sets of attributes of R.   Show that X→Z satisfies the conditions for 3NF in the case that neither X nor Y is a superkey.   Be precise.

Problem 2.
Consider a relational schema R with 5 attributes A, B, C, D, and E. Answer the following questions given the functional dependences

  1. A->ABCDE
  2. AB->D
  3. BCD->E

Part a: For each of the 3 functional dependences above, say whether the dependency satisfies the conditions of BCNF and of 3NF. If it does satisfy the conditions of a normal form, state precisely which condition for the normal form it satisfies (e.g. AB->B would be a "trivial" functional dependence satisfying both BCNF and 3NF).

Part b: To the 3 dependencies above, add functional dependency E->A and repeat Part a.

Part c: To the 3 original dependencies above, now add functional dependency E->B and repeat Part a.

Part d:  For the 3 original dependencies plus E-> B, what is the lossless-join decomposition of R guided by (or, equivalently, based on) functional dependency BCD->E? Is this decomposition dependency preserving? Is the resulting set of relational schemas in BCNF? in 3NF? Justify your answers by considering each functional dependency and showing that it does or does not satisfy the conditions of preserving dependency, of BCNF, and of 3NF.

Part e:  For the 3 original dependencies plus E-> B, what is the lossless-join decomposition of R guided by (or, equivalently, based on) functional dependency E->B? Is this decomposition dependency preserving? Is the resulting set of relational schemas in BCNF? in 3NF? Justify your answers by considering each functional dependency and showing that it does or does not satisfy the conditions of preserving dependency, of BCNF, and of 3NF.



Problem 3:
Cognetic Systems Inc. is a company that sells an XML database product called XQuantum, which uses XQuery.    They have an XQuery demo site that offers XML versions of Shakespeare plays (all 37).   We will use this site to try some queries in XQuery.       

The assignment consists of the following steps.  Note that you do not need to submit anything until Step 4:
  1. Go to Cognetic System's  XQuery demo site.    From the menu at the left, choose "Full-Text Search."  You will reach a Web page with a drop-down choice box labeled "Full-text Queries."  We are not using these sample queries in this assignment, but you may want to look at them for more XQuery examples.   We will be entering our own queries in the "Query:" text box. The Shakespeare Plays database is in document plays.xml, which is the root of the XML tree.   The document must be specified at the beginning of each path expression using doc("plays.xml"),  e.g.  doc("plays.xml")/PLAYS/PLAY. 
  2. Type (or cut and paste) the following XML query into the text box labeled "Query:".   Note that this version of XQuery is case sensitive.
for $play in doc("plays.xml")/PLAYS/PLAY
where $play/TITLE ftcontains 'Hamlet'
return $play

Then click the Submit button. You will see in the result window of the returned page the full XML text of Hamlet.  This gives you an idea of how the plays are tagged.  Note that  "ftcontains"  denotes "full text contains" and matches full words.   As you can see from the text of the TITLE tag, Hamlet is not the complete name of the play.   When you are done examining the results, click on "return to query" at the top (and bottom) of the result page.   This should return you to the query entry page. 
  1. Type (or cut and paste) the following XML query into the text box and submit it:
for $speech in doc("plays.xml")/PLAYS/PLAY//SPEECH
where $speech/LINE ftcontains 'ghost'
return $speech

  1. Modify the query of Step 3 to return only the lines containing 'ghost'.   Turn in the query and the result of the query as part of the submission of this assignment.  You may print the result page showing the query and result.  If you wish to submit electronically,  you can either cut and paste the query and result into a text file or you can print the result page to a pdf file.
  1. Type (or cut and paste) the following XML query into the text box and send it:
for $ghostspeech in doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines total_number="{count($ghostlines)}">
{$ghostlines}
</GhostLines>

Describe the query in English.  Why is "
let $ghostlines := $ghostspeech/LINE" used?  Submit your answers to these questionsdo not submit the result of the query.
  1. Type (or cut and paste) the following XML query into the text box and send it:
for $ghostspeech in doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines total_number="{count($ghostlines)}">
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>
</GhostLines>

Describe the query in English.  Why is
"$ghostspeech/LINE[1]/text()" used in the return  for this query?  Submit your answers to these questionsdo not submit the result of the query.
Note that TAGNAME[n] is a general form for referring to the nth occurrence of  a child element with tag TAGNAME  in the order the children appear in the document.
  1. Type (or cut and paste)  the following XML query into the text box and send it:
let $allgspeech := doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $allgspeech/LINE
return
<GhostLines total_number="{count($ghostlines)}">
{for $ghostspeech in $allgspeech
return
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>}
</GhostLines>

How do the results of this query differ from those in Step 7?  Why is "$allgspeech" used?  Why is a nested query used in the return portion?  Submit your answers to these questionsdo not submit the result of the query.
  1. Write an XQuery query that returns the speakers of the speeches in Romeo and Juliet that contain the word "hate" in one or more of the lines of the speech.  Turn in the query and the result of the query as part of the submission of this assignment.
  1. Write an XQuery query that returns for each speech in Romeo and Juliet that contain the word "hate" both the speaker of the speech and the lines of the speach that contain "hate".   Turn in the query and the result of the query as part of the submission of this assignment.  The results should look like:
<SPEAKSHATE>
<SPEAKER>TYBALT</SPEAKER>
<LINE>What, drawn, and talk of peace! I hate the word,</LINE>
<LINE>As I hate hell, all Montagues, and thee:</LINE>
</SPEAKSHATE>
<SPEAKSHATE>
<SPEAKER>PRINCE</SPEAKER>
<LINE>Canker'd with peace, to part your canker'd hate:</LINE>
</SPEAKSHATE>
...


NOTE:  eXist-db is a free, open-source XML-based database managment system providing an XQuery processing engine for XML documents. The eXist site  http://exist-db.org/  includes an XQuery demo site they call their "XQuery Sandbox".   To reach the "sandbox", look down the left hand side menu to find "Examples",  and click on "XQuery Sandbox" under "Examples".   Note that the use of Sandbox requires that Javascript is enabled.   Unfortunately,  although the Sandbox was producing results in August, in the last few days I have not been able to get results even to the provided sample queries.

NOTE: eXist and Cognetic Systems are the two demo sites that I happened to find on the Web;  there may be more and better.  I welcome your findings.