SUBSTRING SEARCH STUDY GUIDE

Terminology and Basics

• Substring search problem: Find an instance (or all instances) of a query substring of length M in a text string of length N.
• Sustring must be precisely defined (no regular expressions).
• You should be able to manually use a KMP DFA, and you should be able to manually carry out Boyers-Moore and Rabin Karp.
KMP
• How do you construct the DFA?
• How much time does it take if you re-simulate every time you have a mismatch? It's ok if you don't fully understand the linear time construction process.
• What is the best-case running time for DFA construction and DFA simulation? The worst-case running time?
Boyer-Moore
• What is the mismatched character heuristic? Why do we use the rightmost character?
• Why is the mismatched character heuristic strictly suboptimal? Why do we use it then -- because the basic idea is very similar to KMP and you'll learn it if you ever really need to.
• What is the best-case running time? The worst-case running time?
• Which inputs result in best and worst case performance?

Recommended Problems

C level

1. Fall 2012, #10 (Boyer-Moore)
2. (a) Given the following KMP DFA, give the string that this DFA searches for
j 0 1 2 3 4 5 6
A 1 1 3 1 5 1 5
B 0 2 0 4 0 6 7
(b) Below is a partially-completed KMP DFA for a string sof length 6 over the alphabet {a, B}. State 6 is the accept state. Fill in the missing spots in the table.
j 0 1 2 3 4 5
pat.charAt(j)
A 1 1
B 3 3
(c) Given each of the following strings as input, what state would the DFA in (a) end in?
BABAA
ABABABA
BABABABA

B level

1. Fall 2011 Final, #6 (KMP)
2. Spring 2012 Final, #7 (KMP)
3. Fall 2012, #9 (KMP)
4. Give an example of when you might want to use KMP? Boyer Moore? Rabin Karp?

A level

1. For each algorithm (the version discussed in lecture and the textbook), give the worst-case order of growth in terms of M and N.
------ brute-force substring search for a query string of size M in a text string of size N
------ Knuth-Morris Pratt substring search for a query string of size M in a text string of size N
------ Boyer-Moore (with only mismatch heuristic) substring search for a query string of size M in a text string of size N
------ simulating a DFA with M vertices and 2M edges on a text string of size N Answers
2. Textbook: 5.3.22
3. Textbook: 5.3.26