Data & Analytics Flashcards

(18 cards)

1
Q

What is the specific order that the 7 SQL commands must be written in? (SFJW GHO)

A
  1. SELECT – Specifies which columns or expressions to return.
  2. FROM – Indicates the table(s) to retrieve data from.
  3. JOIN (if needed) – Combines rows from two or more tables based on a related column.
  4. WHERE – Filters rows before grouping, based on conditions on individual rows.
  5. GROUP BY – Groups rows to perform aggregate functions on each group.
  6. HAVING – Filters groups after aggregation, based on conditions on aggregated data.
  7. ORDER BY – Sorts the final result set by specified columns.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Full Backup?

A

A complete copy of all data at a point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Incremental Backup?

A

Copies only the data changed since the last backup of any type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Differential Backup?

A

Copies all changes made since the last full backup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is database Mirroring?

A

The process of making an exact, real-time copy of a database by copying it onto a different machine, usually at the SAME PHYSICoAL LOCATION. This creates data redundancy, so if the primary database fails, the mirrored copy can take over, helping to keep the system available and running smoothly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is database Replication?

A

The process of copying and transferring data between different databases that are located at DIFFERENT PHYSICAL SITES. This means the data is duplicated on a separate machine in a different geographical location, which helps ensure availability and disaster recovery if one site fails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three types of data collection?

A

Extract, Transform, and Load (ETL) – Collecting existing data by extracting it from sources, transforming it, and loading it for analysis.

Active Data Collection – Directly asking users for new data through surveys, interviews, or forms.

Passive Data Collection – Gathering new data without direct user permission, such as tracking web usage or timestamps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three complexities faced when obtaining external data? (DSC)

A

DATA INTEGRITY: ensuring the data is reliable and hasn’t been tampered with.

SAFETY: making sure the data is free from viruses or malicious code.

COPYRIGHTS: legal restrictions or agreements needed to use the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In SQL, which comparison operators are used to test for NULL values?

A

IS and IS NOT

Operators like =, <>, !=, <, > cannot be used to test for NULL because NULL represents “unknown”, not zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the eight steps in the data life cycle? (DCP SAP AP)

A
  1. DEFINITION – Identify what data is needed and where to get it.
  2. CAPTURE/CREATION – Obtain or create the data internally or externally.
  3. PREPARATION – Clean, validate, and ensure data integrity and usability.
  4. SYNTHESIS – (Optional) Create calculated fields or transform data for easier use.
  5. ANALYTICS AND USAGE – Use data internally for reporting and decision-making.
  6. PUBLICATION – Share data or reports with external users.
  7. ARCHIVAL – Move less-used data to passive storage to free resources.
  8. PURGING – Permanently delete data no longer needed or required to keep.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does this describe?

A visual representation that shows the logical flow of data through a system or process. It focuses on how data moves between processes, data stores, and external entities, emphasizing the logical flow rather than physical movement.

A

Data Flow Diagram (DFD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does this describe?

A diagram that depicts the sequence of steps in a process, showing both the logical and physical flow of activities, decisions, and data. It uses various symbols to represent tasks, decisions, inputs, outputs, and can include physical details like who performs the task or where it happens.

A

Flowchart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does this describe?

A diagram that illustrates how different systems or components interact or interface with each other, focusing on the connections and communication between systems rather than the internal flow of data or process steps.

A

System Interface Diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In database dimensional modeling, what is the difference between the Star schema and Snowflake schema?

A

Star schema has a central fact table with denormalized dimension tables, while snowflake schema normalizes dimension tables into multiple related tables, adding complexity and flexibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between denormalized and normalized?

A

Denormalized means data is stored with some intentional redundancy—so related information is combined into fewer tables rather than split into many normalized tables. This reduces the number of joins needed in queries, improving read performance and simplifying reporting, but it can increase data duplication and the risk of inconsistencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which database Normal Form Rule does this describe?

Each table cell holds only one atomic value, and every row is uniquely identified by a primary key.

A

First Normal Form (1NF)

17
Q

Which database Normal Form Rule does this describe?

All non-key attributes depend on the entire primary key, eliminating partial dependencies (important if the primary key is composite).

A

Second Normal Form (2NF)

18
Q

Which database Normal Form Rule does this describe?

No non-key attribute depends on another non-key attribute (no transitive dependencies); every attribute depends only on the primary key.

A

Third Normal Form (3NF)