COS 597D, Fall 2013 - Assignment 10

Due 11:55pm Friday December 13, 2013


Collaboration Policy

You may discuss problems with other students in the class. However, each student must write up his or her own solution to each problem independently. That is, while you may formulate the solutions to problems in collaboration with classmates, you must be able to articulate the solutions on your own.

Late Penalty

40% of the earned score if submitted after the due date.


Problem:

Cognetic Systems Inc. is a company that sells an XML database product called XQuantum, which uses XQuery.    They have an XQuery demo site that offers XML versions of Shakespeare plays (all 37).   We will use this site to try some queries in XQuery.       

The assignment consists of the following steps.  Note that you do not need to submit anything until Step 4:
  1. Go to Cognetic System's  XQuery demo site.    From the menu at the left, choose "Full-Text Search."  You will reach a Web page with a drop-down choice box labeled "Full-text Queries."  We are not using these sample queries in this assignment, but you may want to look at them for more XQuery examples.   We will be entering our own queries in the "Query:" text box. The Shakespeare Plays database is in document plays.xml, which is the root of the XML tree.   The document must be specified at the beginning of each path expression using doc("plays.xml"),  e.g.  doc("plays.xml")/PLAYS/PLAY. 
  2. Type (or cut and paste) the following XML query into the text box labeled "Query:".   Note that this version of XQuery is case sensitive.
for $play in doc("plays.xml")/PLAYS/PLAY
where $play/TITLE ftcontains 'Hamlet'
return $play

Then click the Submit button. You will see in the result window of the returned page the full XML text of Hamlet.  This gives you an idea of how the plays are tagged.  Note that  "ftcontains"  denotes "full text contains" and matches full words.   As you can see from the text of the TITLE tag, Hamlet is not the complete name of the play.   When you are done examining the results, click on "return to query" at the top (and bottom) of the result page.   This should return you to the query entry page. 
  1. Type (or cut and paste) the following XML query into the text box and submit it:
for $speech in doc("plays.xml")/PLAYS/PLAY//SPEECH
where $speech/LINE ftcontains 'ghost'
return $speech

  1. Modify the query of Step 3 to return only the lines containing 'ghost'.   Turn in the query and the result of the query as part of the submission of this assignment.  You may print the result page showing the query and result.  If you wish to submit electronically,  you can either cut and paste the query and result into a text file or you can print the result page to a pdf file.
  1. Type (or cut and paste) the following XML query into the text box and send it:
for $ghostspeech in doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines total_number="{count($ghostlines)}">
{$ghostlines}
</GhostLines>

Describe the query in English.  Why is "
let $ghostlines := $ghostspeech/LINE" used?  Submit your answers to these questionsdo not submit the result of the query.
  1. Type (or cut and paste) the following XML query into the text box and send it:
for $ghostspeech in doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines total_number="{count($ghostlines)}">
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>
</GhostLines>

Describe the query in English.  Why is
"$ghostspeech/LINE[1]/text()" used in the return  for this query?  Submit your answers to these questionsdo not submit the result of the query.
Note that TAGNAME[n] is a general form for referring to the nth occurrence of  a child element with tag TAGNAME  in the order the children appear in the document.
  1. Type (or cut and paste)  the following XML query into the text box and send it:
let $allgspeech := doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $allgspeech/LINE
return
<GhostLines total_number="{count($ghostlines)}">
{for $ghostspeech in $allgspeech
return
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>}
</GhostLines>

How do the results of this query differ from those in Step 7?  Why is "$allgspeech" used?  Why is a nested query used in the return portion?  Submit your answers to these questionsdo not submit the result of the query.
  1. Write an XQuery query that returns the speakers of the speeches in Romeo and Juliet that contain the word "hate" in one or more of the lines of the speech.  Turn in the query and the result of the query as part of the submission of this assignment.
  1. Write an XQuery query that returns for each speech in Romeo and Juliet that contain the word "hate" both the speaker of the speech and the lines of the speach that contain "hate".   Turn in the query and the result of the query as part of the submission of this assignment.  The results should look like:
<SPEAKSHATE>
<SPEAKER>TYBALT</SPEAKER>
<LINE>What, drawn, and talk of peace! I hate the word,</LINE>
<LINE>As I hate hell, all Montagues, and thee:</LINE>
</SPEAKSHATE>
<SPEAKSHATE>
<SPEAKER>PRINCE</SPEAKER>
<LINE>Canker'd with peace, to part your canker'd hate:</LINE>
</SPEAKSHATE>
...