What is data?
Raw facts, figures, and symbols that have no meaning or context and are used as input for processing in information systems.
What is information?
Processed and analyzed data that has meaning and context
From where can we collect data?
1- Web data, e-commerce
2- Financial transactions, bank/credit transactions.
3- Online trading and purchasing
4- Social Network
What are the categories of data?
1- Qualitative (Categorical): Non-numerical data that can be grouped into categories. E.g., colors, gender, names.
2- Quantitative (Numerical): Data that is measured and can be counted or expressed numerically. E.g., age, height, temperature.
What are the 2 classes of qualitative data?
1- Nominal: Categories that do not have a specific order. Car brands: Toyota, Ford, and Honda.
2- Ordinal: Has an order (low, medium, high)
What are the types of data?
1- Structured
2- Semi-structured
3- Unstructured
What are the features of structured data?
1- Organized in a specific format, usually in a tabular form with rows and columns, with a clear schema of different fields of information.
2-Easy to search, sort, filter, and analyze.
3- Can be easily interpreted by both humans and computers.
What are the features of semi-structured data?
1- has some structure but does not fit elegantly into a traditional tabular format with strict relationships between fields.
2- May contain a mix of elements, including text, numbers, and multimedia.
3- May use markup languages such as XML or JSON to provide some structure to the data.
4- More challenging to process than structured data.
5- Offers greater flexibility and can be used to store a wider range of information.
What are the features of unstructured data?
1- Doesn’t have a predefined data model or format.
2- Can include things like text documents, audio and video files, images, and social media posts.
3- Cannot be easily queried or analyzed using traditional database management techniques.
How data is generated?
1- Human-generated data
2- Machine-generated data
What is big data?
Huge volumes of data that are stored at a consistent speed and have no upper limit.
What are some applications of big data?
1- healthcare
2- Industrial and environmental systems
3- Internet of things (IoT)
4- cyber-physical systems
What is data analytics?
A subset of data science focused on analyzing data to extract meaningful insights.
What is data science?
Field of science that covers the whole lifecycle of data, from data collection to decision-making