Data Professional
A term used to describe any individual who works with data and/or has data skills.
Machine Learning
An alternative approach to automation, expressing the way you want a task done by using data instead of explicit instructions.
AKA: The use and development of algorithms and statistical models to teach computer systems to analyze patterns in data.
Data Science vs Data Analytics
data science vs. data analytics
Data science is an entire field dedicated to making data more useful. A data scientist is a professional that uses raw data to develop new ways to model data and understand the unknown. Often, their job responsibilities incorporate various components of computer science, predictive analytics, statistics, and machine learning. The collections of information that data scientists work with can be quite large, requiring expertise to organize and navigate.
Data analytics is a subfield of the larger data science discipline. The aim of data analytics is to create methods to capture, process, and organize data to uncover actionable insights for current problems. Analysts focus on processing the information stored in existing datasets and establishing the best way to present this data. Data analysts rely on statistics and data modeling to solve problems and offer recommendations that can lead to immediate improvements.
R programming
Used by researchers and academics
Can create complex statistical models
Jupyter Notebooks
An open-source web application used to create and share documents that contain live code, equations, visualizations, and narrative text
Allows you to run code in real time and helps identify errors easily.
Data stewardship
The practices of an organization that ensure that data is accessible, usable, and safe
Edge computing
A way of distributing computational tasks over a bunch of nearby processors (i.e., computers) that is good for speed and resiliency and does not depend on a single source of computational power
Machine learning:
The use and development of algorithms and statistical models to teach computer systems to analyze patterns in data
Metrics
Methods and criteria used to evaluate data
Python
A general-purpose programming language
Technical Data Professionals
Machine Learning Engineers & Statisticians:
-Expertise in mathematics, statistics, and computing.
-Build models and make predictions.
Advanced Data Analyst:
-Explore datasets to identify directions worth pursuing.
Strategic Data Professionals
Business Intelligence (BI) Professionals
Technical Project Managers
-Interpret information for an organization’s operations, finance, research, and development
-Work aligns with business strategy
Seek solutions to problems through data analytics
Open Data
Data that is available to the public and free to use, with guidance on how to navigate the datasets and acknowledge the source.
Personally Identifiable Information (PII)
Information that permits the identity of an individual to be inferred by either direct or indirect means.
Examples: Biometric records, usernames, social security, or national identification numbers. (Information that’s often associated with medical, financial, and employment records.)
Aggregate Information
Data from a significant number of users that has eliminated personal information.
Sample
A segment of the population that is representative of the entire population.
Data Anonymization
The process of protecting people’s private or sensitive data by eliminating PII.
Data Aggregation
Process of collecting and combining details from a significant number of users in terms of totals or summary.
Data that is often anonymized:
telephone numbers, names, license plates and license numbers, social security numbers, IP addresses, medical records, email addresses, photographs, & account numbers.
General Data Protection Regulation (GDPR)
European Union Law.
The GDPR is described on their website as the toughest privacy and security law in the world. It imposes obligations onto organizations anywhere, so long as they target or collect data related to people in the European Union.
Lei Geral de Protecao de Dado Pessoais (LGPD)
Brazil’s Law for the protection of personal data
The LGPD is a data protection law that governs how companies collect, use, disclose, and process personal data belonging to people in Brazil. LGPD applies to companies that process data about individuals in Brazil.
California Consumers Privacy Act (CCPA)
Privacy rights for California’s consumers.
The CCPA gives consumers more control over the personal information that businesses collect about them. These regulations provide guidance on how to implement the law.
Additionally, states like Colorado, Utah, Virginia, New York, and Connecticut have enacted similar legislation to protect consumer privacy in their states. New York, and Connecticut have enacted similar legislation to protect consumer privacy in their states.
RACI
Responsible, Accountable, Consulted, Informed
Responsible: Responsible for performing the work necessary or making the decisions that are directly related to completing a task within a project. There can be several roles or groups responsible for a task.
Accountable: These individuals must approve the work performed by those who are “responsible”. As a general rule, there is usually a single person in this role, often a manager or project lead.
Consulted: Those assigned to offer input on a task. There should be a clear and open line of two-way communication between those assigned to “responsible” and “consulted”. There can be several people in this role. In many situations, they are referred to as subject matter experts (SMEs).
Informed: Those in this role need to be kept aware of progress and concerns of those working on a project. Those who are “informed” tend to be in higher levels of senior leadership. They need to understand insights from the projects rather than details of how the specific tasks are performed.
Note: On any given RACI chart, not every letter will be assigned. Not all tasks include every letter (e.g., for access to data, you could mark the BI Engineer, Analytics Team Manager, and Date Engineer “R” and the Data Scientist “C”).
Data Scientist
Professionals who work closely with analytics to provide meaningful insights that help improve current business operations.