Modules 1-3 Flashcards

(67 cards)

1
Q

Data analysis process steps

A

Ask, Prepare, Process, Analyze, Share and Act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data preparation

A

Preparing the data correctly; involves understanding different types of data and data structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data types

A

A specific kind of data attribute that tells what kind of value the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

First-party data

A

Data collected by an individual or group using their own resources; typically the preferred method because you know exactly where it came from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Second-party data

A

Data collected by a group directly from its audience and then sold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Third-party data

A

Data collected from outside sources who did not collect it directly; may not be less reliable and needs to be checked for accuracy, bias, and credibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Population

A

Refers to all possible data values in a certain data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample

A

A part of a population that is representative of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Historical data

A

Data that already exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Qualitative data

A

Data that can’t be counted, measured, or easily expressed using numbers; usually listed as a name, category, or description (e.g., movie titles, cast members)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quantitative data

A

Data that can be measured or counted and then expressed as a number; data with a certain quantity, amount, or range (e.g., movie budget, box office revenue)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Discrete data

A

Quantitative data that’s counted and has a limited number of values (e.g., money amounts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Continuous data

A

Quantitative data that can be measured using a timer, and its value can be shown as a decimal with several places (e.g., movie run time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nominal data

A

A type of qualitative data that’s categorized without a set order; this data doesn’t have a sequence (e.g., ‘Yes,’ ‘No,’ or ‘Not sure’ responses to watching a movie)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ordinal data

A

A type of qualitative data with a set order or scale (e.g., ranking a movie from 1 to 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Internal data

A

Data that lives within a company’s own systems; also called primary data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

External data

A

Data that lives and is generated outside of an organization; also called secondary data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Structured data

A

Data that’s organized in a certain format, such as rows and columns (e.g., spreadsheets and relational databases); easily searchable and more analysis-ready

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Unstructured data

A

Data that is not organized in any easily identifiable manner (e.g., audio files, video files, emails, photos, and social media)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data model

A

A model that is used for organizing data elements and how they relate to one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data elements

A

Pieces of information, such as people’s names, account numbers, and addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Number data type

A

Data type in a spreadsheet that represents a numerical value; can be changed into percents or currency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Text/String data type

A

A sequence of characters and punctuation that contains textual information; can include numbers (like phone numbers) that wouldn’t be used for calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Boolean data type

A

A data type with only two possible values: true or false (e.g., ‘favorite or not favorite’ status on a playlist)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Data table/Tabular data
Data that has a very simple structure; arranged in rows and columns
26
Records
Another term for rows in a data table, often reserved for structured databases
27
Fields
Another term for columns in a data table, often reserved for structured databases; each record has the same fields in the same order
28
Value
The specific piece of data contained in every cell
29
Wide data
Every data subject has a single row with multiple columns to hold the values of various attributes of the subject
30
Long data
Data in which each row is one time point per subject, so each subject will have data in multiple rows
31
Bias
A preference in favor of or against a person, group of people, or thing; can be conscious or subconscious
32
Data bias
A type of error that systematically skews results in a certain direction
33
Sampling bias
When a sample isn't representative of the population as a whole
34
Unbiased sampling
Results in a sample that's representative of the population being measured
35
Observer bias
The tendency for different people to observe things differently; sometimes referred to as experimenter bias or research bias
36
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative way, based on different backgrounds and experiences
37
Confirmation bias
The tendency to search for, or interpret information in a way that confirms preexisting beliefs
38
ROCCC
An acronym for identifying good data sources: Reliable, Original, Comprehensive, Current, and Cited
39
Reliable data
Accurate, complete, and unbiased information that's been vetted and proven fit for use
40
Original data
Data that has been validated with the original source, not just second or third party information
41
Comprehensive data
Data that contains all critical information needed to answer the question or find the solution
42
Current data
Data that is relevant to the task at hand, as its usefulness decreases as time passes
43
Cited data
Data where the source is known (who created it, when it was last refreshed), making the information more credible
44
Bad data sources
Sources that don't ROCCC; may be inaccurate, incomplete, biased, flat-out wrong, or filled with human error; can be misleading
45
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do
46
Data ethics
Well-founded standards of right and wrong that dictate how data is collected, shared, and used
47
Ownership (Data Ethics)
The principle that individuals own the raw data they provide and have primary control over its usage, processing, and sharing
48
Transaction transparency (Data Ethics)
The idea that all data processing activities and algorithms should be completely explainable and understood by the individual who provides their data
49
Consent (Data Ethics)
An individual's right to know explicit details about how and why their data will be used before agreeing to provide it
50
Currency (Data Ethics)
Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions; includes the opportunity to opt out
51
Data privacy
Preserving a data subject's information and activity any time a data transaction occurs; also called information privacy or data protection
52
Openness/Open data
Refers to free access, usage, and sharing of data, while still respecting privacy and consent
53
Interoperability
The ability of data systems and services to openly connect and share data
54
Database
A collection of data stored in a computer system
55
Metadata
Data about data; acts like a reference guide providing context, telling where, when, and how data was created, and what it's all about
56
Relational database
A database that contains a series of related tables that can be connected via their relationships, using one or more of the same fields
57
Primary key
An identifier that references a column in which each value is unique; uniquely identifies a record in a relational database table; cannot be null or blank
58
Foreign key
A field within a table that is a primary key in another table; provides a link between the data in two tables; multiple foreign keys are allowed to exist in a table
59
Descriptive metadata
Metadata that describes a piece of data and can be used to identify it at a later point in time (e.g., author and title of a book)
60
Structural metadata
Metadata that indicates how a piece of data is organized and whether it's part of one or more data collections (e.g., how the pages of a book are put together to create different chapters)
61
Administrative metadata
Metadata that indicates the technical source of a digital asset (e.g., file type, date/time taken, device used for a photo)
62
Data governance
A process to ensure the formal management of a company’s data assets, giving better control over security, privacy, integrity, usability, and data flows
63
CSV (Comma-separated values)
A file that saves data in a table format using plain text, delineated by characters, often commas
64
Sorting
Involves arranging data into a meaningful order to make it easier to understand, analyze, and visualize (e.g., ascending/descending, alphabetically/numerically)
65
Filtering
Showing only the data that meets a specific criteria while hiding the rest; simplifies a spreadsheet
66
SQL (Structured Query Language)
A query language used by data analysts to communicate with the database
67
BigQuery Sandbox account
An account available at no charge for BigQuery use; limits include a maximum of 12 projects and lack of support for DML operations (Data Manipulation Language)