What is the terminology for this topic?
an alphabet = {A, C, G, T} (DNA)
= ASCII, Unicode, etc
a string = ACAGTCCGGGACT
all strings indexed from position 1
Let S,T be two strings
|S| is length of S
S(i) is indexing S S(i .. j) denotes S(i)S(i+1) ..
S(j) ST is concatenation of S and T
What is a substring?
ACAGTCCGGGACTGACG
a string contained within a string a substring of S is S(i .. j) (i
What is a common substring?
TCCACTGACTGCTGC
ACAGTCCGGGACTGACG
GACTG is a common substring U is a common substring of S and T if U is a substring of both S and T
What is a subsequence?
ACAGTCCGGGACTGACG
obtained by deleting 0 or more characters from the string
What is a common subsequence?
ACAGTCCGGGACTGACG
TCCACTGACTGCTGC
U is a common subsequence of S and T if U is a subsequence of both S and T
What is a prefix?
ACAGTCCGGGACTGACG
S(1 . . k) for some k (1 k n), where n = |S|
What is a suffix?
ACAGTCCGGGACTGACG
S(k . . n) for some k (1 k n), where n = |S|
What is star / *?
* = {,A,C,G,T,AA,AC,AG,AT,CA,…}
set of all strings composed of symbols from the alphabet
What is the leaf terminology?
a leaf node: no children
a branch node: 1 or more children
a unary node: exactly 1 child
a binary node: exactly 2 children
What is the naive solution to finding longest repeated substring?

What is naive solution to common substrings?
Using trees for suffixes
What problems use a suffix tree to solve them?
What is a Suffix Trie?

What are the properties of a Suffix Trie?
Example of suffix trie
see slides
lecture 10 slide 11/14
Building a suffix trie