Data analysis process steps
Ask, Prepare, Process, Analyze, Share and Act
Data preparation
Preparing the data correctly; involves understanding different types of data and data structures
Data types
A specific kind of data attribute that tells what kind of value the data is
First-party data
Data collected by an individual or group using their own resources; typically the preferred method because you know exactly where it came from
Second-party data
Data collected by a group directly from its audience and then sold
Third-party data
Data collected from outside sources who did not collect it directly; may not be less reliable and needs to be checked for accuracy, bias, and credibility
Population
Refers to all possible data values in a certain data set
Sample
A part of a population that is representative of the population
Historical data
Data that already exists
Qualitative data
Data that can’t be counted, measured, or easily expressed using numbers; usually listed as a name, category, or description (e.g., movie titles, cast members)
Quantitative data
Data that can be measured or counted and then expressed as a number; data with a certain quantity, amount, or range (e.g., movie budget, box office revenue)
Discrete data
Quantitative data that’s counted and has a limited number of values (e.g., money amounts)
Continuous data
Quantitative data that can be measured using a timer, and its value can be shown as a decimal with several places (e.g., movie run time)
Nominal data
A type of qualitative data that’s categorized without a set order; this data doesn’t have a sequence (e.g., ‘Yes,’ ‘No,’ or ‘Not sure’ responses to watching a movie)
Ordinal data
A type of qualitative data with a set order or scale (e.g., ranking a movie from 1 to 5)
Internal data
Data that lives within a company’s own systems; also called primary data
External data
Data that lives and is generated outside of an organization; also called secondary data
Structured data
Data that’s organized in a certain format, such as rows and columns (e.g., spreadsheets and relational databases); easily searchable and more analysis-ready
Unstructured data
Data that is not organized in any easily identifiable manner (e.g., audio files, video files, emails, photos, and social media)
Data model
A model that is used for organizing data elements and how they relate to one another
Data elements
Pieces of information, such as people’s names, account numbers, and addresses
Number data type
Data type in a spreadsheet that represents a numerical value; can be changed into percents or currency
Text/String data type
A sequence of characters and punctuation that contains textual information; can include numbers (like phone numbers) that wouldn’t be used for calculations
Boolean data type
A data type with only two possible values: true or false (e.g., ‘favorite or not favorite’ status on a playlist)