Structured analytics includes the following distinct operations:
Email threading
Name normalization
Textual near duplicate identification
Language identification
Repeated content identification
For a document to be recognized as an email and threaded - it must have:
The Email From field + 1 of the ff
Sent Date
Email To
Email Subject
Email CC
Email BCC