What is big data?
extremely large datasets that have grown enormous sizes beyond the ability to manage and analyze using traditional data processing tools
4 Data Structures
Vector, Matrix, List, Dataframe
Vector is
one-dimensional data structure that holds elements of the same data type, used in statistical analysis and data modeling
Matrix?
TWO dimensional data structure with rows and columns of data, used for mathematical applications
List
data structure that hold different data types and can be dynamically resized, used in programming for tasks like building lists
Dataframe
two-dimensional data structure stores data in tabular format like sheets, include numeric, character and vector. Used in data analysis and manipulation like Python and R
Types of Data
Structured, unstructured, semi-structured
Structured data is
Unstructured data
Semi-structured data
Levels of Measurement
2 Qualitative:
- Nominal
- Ordinal
2 Quantitative:
- Interval
- Ratio
Nominal
categories data with no order. Example: gender, male and female
Ordinal
Data have ordered categories but no consistent intervals. Example: satissifed - dissatisfied
Interval
have consistent intervals but an arbitrary zero point. Example: Weather, celcius
Ratio
data with consistent intervals and true zero point. Example: weight, height, income
Characteristic of Big Data (6V’s)