mastering the data Flashcards

(19 cards)

1
Q

data nerds

A

50-90% time on project spent to mater the data
ex: scrubbing, cleaning, cleansing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

schema

A

table (rectangle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

attributes

A

nuggets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

unified modeling language (UML)

A

relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

relational databases ensure data…

A
  • is complete/ includes all data
  • isnt redundant, wont take up too much space
  • follow business rules and internal controls
  • aid communication and integration of business processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 types of attributes

A
  • primary keys
  • foreign keys
  • composite keys
  • descriptive attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

primary key

A

unique identifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

foreign key

A

attributes that point to primary key in another table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

composite keys

A

combo of two foreign keys used for line items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

descriptive attributes

A

include everything else

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

data dictionaries

A

legend/ log for full description of each column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

big 4

A

currently love alteryx
- extract
- transform
- load
aka cleaning the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

requesting data process

A
  1. determine purpose and scope of data request
  2. obtain data
  3. validate data for completeness and integrity
  4. clean the data
  5. load the data for data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

step 1 : determine purpose and scope of data request

A
  • think ahead and plan
    –> what info do I need
    –> where can I get it
    –> how am I going to get it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

step 2: obtain the data – questions

A
  • do you grab it, or someone else grabs it for you
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

step 2: obtain the data – methods

A
  • obtain yourself

**think about tables, which tables have relations & what attributed are in which table

  • data request to IT dept

** if someone else, more complex bc need to explain and bigger orgs have a lot of approval processes

17
Q

step 3: validate data for completeness and integridy

A

check that data transferred correctly

18
Q

step 4: clean the data

A

make data consistent
- remove heading or subtotals
- clean leading zeros and nonprint characters
- format neg numbers
- correct inconsistencies
–> dates (6/7/2023 or 7/6/2023 or 2023-07-06)
–> numbers (1 or I, 6 or six)
–> international character encoding (“” or <>)
–> languages and measures (currency signs)
–> human error (23 or 32)

19
Q

step 5: load data for data analysis

A

import data and make sure functions work properly