Comprehend
TL;DR Comprehend?
NLP, output entities in a document
Comprehend
Is Comprehend real-time?
Yes for small jobs. Large jobs need async interface.
Comprehend
What are 4 outputs from a Comprehend job?
Entities, language, PII, sentiment
Comprehend
What 2 ML models does Comprehend use?
Pre-trained by AWS, or your own custom model
Comprehend
How can you trust what Comprehend generated?
Every entity has a confidence number letting you know how sure Comprehend is
Comprehend
What can Comprehend extract from input text?
Key phrases, places, people, brands, events, …
Comprehend
What is sentiment in Comprehend?
How positive or negative the text is
Comprehend
What does Comprehend do with topics?
Comprehend can organize text by topic
Comprehend
How can you ask Comprehend to group files by your own categories?
Create training data, give to Comprehend, it creates a custom Classifier
Comprehend
What is NER in Copmprehend?
Named Entity Recognition
Comprehend
What is NER doing?
Find people, places, etc.
Comprehend
How can Comprehend recognize my own (custom) entities?
Train Comprehend with your own input data as a Custom Entity Recognizer
Comprehend
What’s an example of an Entity?
“July 31st”, “Date”, 99% confidence
Comprehend
What common labels are available for Entities?
Person, Place, Organization, Date, Quantity
Comprehend
What’s a problematic Label that Comprehend uses for Entities?
“Other”
Doesn’t explain what Comprehend thinks it is, just that it’s noteworthy
Comprehend
What’s an example of a Key Phrase?
“your bank account number 234-957-928364”
It’s all these words together
Comprehend
What’s the difference between an Entity and a Key Phrase?
Entities are the building blocks, Key Phrases are higher-level constructs
Comprehend
Can Comprehend detect PII?
Kinda: yes, but it isn’t great, it flags “July 31st” as PII as well as arbitrary numbers.
Comprehend
What is “Syntax” in Comprehend?
Flags each word as verb, noun, pronoun, adjective, etc.
Custom Classifier
What’s a custom classifier?
Way to decide what a whole document is about
Custom Classifier
Example of custom classifiers?
invoice, PR/FAQ, security review
the type of the whole document
Custom Classifier
How do you create a custom classifier?
Provide lots of Labeled training data in S3, tell Comprehend to build a model from your data
Custom Classifier
What’s the architecture for a Comprehend custom classifier?
You’re training a new model that you have to deploy with Endpoints, then call the endpoints