The components of the data analytics life-cycle
GDPR
General Data Protection Regulation.
A legal framework that sets guidelines for the collection and processing of personal information from individuals who live and outside of the European Union (EU).
Its aim is to give consumers control over their own personal data.
RDBMS (Relational Database Management System)
a type of database that stores and provides access to data points that are related to one another.
Based on the relational model, an intuitive, straightforward way of representing data in tables. Each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points
Personally Identifiable Information (PII)
Information that, when used alone or with other relevant data, can identify an individual.
May contain direct identifiers (e.g., passport information) that can identify a person uniquely, or quasi-identifiers (e.g., race) that can be combined with other quasi-identifiers (e.g., date of birth) to successfully recognise an individual
Modelling
The stage where models such as algorithms and predictive analysis are built.
Refine & Compare
reflects on the proposed models and solutions, considers alternatives, and iterates current work into one optimised solution.
Lawfulness, fairness and transparency
One of the principles of GDPR.
Gathering data and processing it with a valid legal basis, for example getting user consent to process their data in a certain way.
Your processing of data is in the best interest of the person the data is about, and the scope of the processing can be reasonably expected by the person.
You clearly communicate what, how and why you process data to those whose data you process. Should be in a way that enables data subjects to easily understand what you are doing with their data.
Purpose Limitation
One of the principles of GDPR.
Data collected for specified, explicit, and legitimate purposes.
Data minimisation
One of the Principles of GDPR
Only collect data that is adequate, relevant, and limited to what is necessary.
Accuracy
One of the Principles of GDPR
Having data records that represent the current truth. Records must be kept up to date and correct, and the data processor must take reasonable measures to ensure that.
Storage Limitation
One of the Principles of GDPR
If personal data is no longer required it must be deleted. Exceptions when data can be kept for longer include data for scientific purposes or in the interest of the public (e.g. criminal records).
Accountability
One of the Principles of GDPR
Taking responsibility for your data processing. The data controller and/or processor must be responsible for proper processing of personal data and compliance with the rules of GDPR
Integrity and confidentiality
One of the Principles of GDPR
Personal data must be processed or stored in a manner that ensures its security. This includes protection against unauthorized or unlawful processing and accidental loss, destruction or damage.
It must not be made available or disclosed to unauthorized individuals, entities or processes.
Administrative Data
information created when people interact with services and are collated by organisations.
It is used to help with the operational services of an organisation.
Examples from Multiverse may include attendance data, otj% hours data, which is collected and used to track apprentice progress, and evidence compliance processes
Structured Data
Data that can be organized and formatted in a way that is easy for computers to read, organize, and understand; and (3) can be inserted into a database in a seamless fashion.
Unstructured data
Data that cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two common types of unstructured content. Many business documents are unstructured, as are email messages, videos, photos, webpages, and audio files.
Inner join
Most common type of join; includes rows in the query only when the joined field matches records in both tables.
Left Join
The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match.
Right Join
The RIGHT JOIN keyword returns all rows from the right table (table2), with the matching rows in the left table (table1). The result is NULL in the left side when there is no match.
Cartesian Join
Links table data so each record in the first table is matched with each individual record in the second table. Also called a Cartesian product or cross join.
Schema
defines how data is organised within a relational database. Inclusive of logical constraints such as table names, fields, data types, and the relationships between these entities.
Descriptive Analytics
the use of data to understand past and current business performance and make informed decisions
Predictive Analytics
This type of analytics involves analyzing historical data and using statistical and machine-learning techniques to make predictions or forecasts about future events or outcomes.
It identifies patterns and relationships in data to generate probabilistic predictions about what is likely to happen.
Predictive analytics answers questions like “What is likely to happen in the future?” or “What will be the impact of a specific action?” It helps organizations anticipate future trends, identify potential risks or opportunities, and make proactive decisions.
Example in the workplace: Forecasting customer demand for a product based on historical sales data, market trends, and external factors like seasonality or economic indicators.
Prescriptive Analytics
This type of analytics goes beyond descriptive analytics by providing recommendations and suggestions on what actions to take based on the analysis of data and various possible scenarios.
It leverages advanced techniques, such as optimization algorithms, machine learning, and simulation models, to generate actionable insights. Prescriptive analytics answers questions like “What should we do?” or “What is the best course of action?” It helps in making informed decisions and optimizing outcomes by considering constraints, objectives, and potential risks.
Example in the workplace: Optimizing supply chain operations by recommending the most efficient routes for deliveries, considering factors like traffic, cost, and delivery time