### SUBSTRING SEARCH STUDY GUIDE

Terminology and Basics

• Substring search problem: Find an instance (or all instances) of a query substring of length M in a text string of length N.
• Sustring must be precisely defined (no regular expressions).
• You should be able to manually use a KMP DFA, and you should be able to manually carry out Boyers-Moore and Rabin Karp.
KMP
• How do you construct the DFA?
• How much time does it take if you re-simulate every time you have a mismatch? It's ok if you don't fully understand the linear time construction process.
• What is the best-case running time for DFA construction and DFA simulation? The worst-case running time?
Boyer-Moore
• What is the mismatched character heuristic? Why do we use the rightmost character?
• Why is the mismatched character heuristic strictly suboptimal? Why do we use it then -- because the basic idea is very similar to KMP and you'll learn it if you ever really need to.
• What is the best-case running time? The worst-case running time?
• Which inputs result in best and worst case performance?
Rabin Karp
• If we know mod(ABCDEFG, R), how do we compute mod(BCDEFGH, R) in constant time (where A through H are arbitrary digits of a number from some alphabet of radix R)?
• What are the Las Vegas and Monte Carlo versions of Rabin-Karp?
• How would we extend Rabin-Karp to efficiently search for any one of P possible patterns in a text of length N? How would this technique compare to using KMP or Boyer-Moore for the same task?

### Recommended Problems

#### C level

1. Fall 2012, #10 (Boyer-Moore)
2. (a) Given the following KMP DFA, give the string that this DFA searches for
j 0 1 2 3 4 5 6
A 1 1 3 1 5 1 5
B 0 2 0 4 0 6 7
(b) Below is a partially-completed KMP DFA for a string sof length 6 over the alphabet {a, B}. State 6 is the accept state. Fill in the missing spots in the table.
j 0 1 2 3 4 5
pat.charAt(j)
A 1 1
B 3 3
(c) Given each of the following strings as input, what state would the DFA in (b) end in?
BABAA
ABABABA
BABABABA

#### B level

1. (KMP) Below is a partially-completed Knuth-Morris-Pratt DFA for a string s of length 11 over the alphabet { A , B }. Reconstruct the DFA and s in the space below.
```      0 1 2 3 4 5 6 7 8 9 10
A 0 0              10 11
B       5   2          4
s                   A
```
2. Spring 2012 Final, #7 (KMP)
3. Fall 2012, #9 (KMP)
4. Give an example of when you might want to use KMP? Boyer Moore? Rabin Karp?

#### A level

1. For each algorithm (the version discussed in lecture and the textbook), give the worst-case order of growth in terms of M and N.
------ brute-force substring search for a query string of size M in a text string of size N
------ Knuth-Morris Pratt substring search for a query string of size M in a text string of size N
------ Boyer-Moore (with only mismatch heuristic) substring search for a query string of size M in a text string of size N
------ Monte Carlo version of Rabin-Karp substring search (that checks only for a hash match) for a query string of size M in a text string of size N
------ regular-expression pattern matching for a pattern of size M on a text string of size N
------ simulating a DFA with M vertices and 2M edges on a text string of size N
------ simulating an NFA with M vertices and 3M edges on a text string of size N Answers
2. Give an example of when you might prefer to use the Monte Carlo version of Rabin Karp over the Las Vegas version.
3. Textbook: 5.3.22
4. Textbook: 5.3.26