terminology of text mining
a token/term: a word or a group of words
a document; one piece of text
a corpus; a collection of documents
Document term matrix
each document is a row and each term is a column
term document matrix
each term is a row and each document is a column
methods to clean and preprocess text