Azure Active Directory
Cloud-based identity and access management (IAM) solution.
Provides single sign-on and multi-factor authentication to help protect users.
Helps organize computers and users.
Azure Synapse Analytics
Analytics Service that brings together integration, enterprise data warehousing, and big data analytics. An evolution of Azure SQL Data Warehouse. Allows you to build and manage a modern DW.
Strengths: Quickly run complex queries across petabytes of data.
Possible to use serverless SQl pool in Synapse, which is adaptive to current workloads and can shrink or grow on command. You can therefore use the pattern that the data takes (Hash, Round Robin, Replicated)
Apache Blob Storage
File storage in the cloud and an API that lets you build apps to access data. Unstructured, no restriction in the kinds of data it holds. Have higher latency than memory and local disk and don’t have indexing features.
Frequently used in combination with databases to store non-queryable data.
I.e profile pictures for an app could be stored in blobs.
Every blob lives inside a blob container. If you want to store data without performing analysis on the data, set Hierarchical Namespace to Disabled to set up the storage account. Also good for archive rarely used data or store website assets such as images and media.
Azure Synapse Analytics Serverless SQL Pool
A query service over the data in data lake. Benefits: Basic discovery and exploration, logical date warehousing, data transformation.
Apache Spark
Processing system for big data workloads. Interface for programming entire clusters with implicit data parallelism and fault tolerance.
Azure Cosmos DB
Used within web and mobile applications. Good for modeling social interactions. Cosmos DB is globally distributed and a NoSQL database. Organizational entity for your databases.
Azure Data Lake Storage Gen2
Capabilities dedicated to big data analytics. Built on Azure Blob Storage. File system semantics, file-level security, and scale.
Also: low-cost, tiered storage, high availability. If you’re performing analytics on data, set Hierarchical Namespace to Enabled.
Gen2 is with hierarchical namespace, meaning it has a physical folder structure. Gen1 is Blob-based.
Hierarchical namespace
A physical folder structure
Azure Databricks
Data analytics platform optimized for Azure cloud services. Uses notebooks that run on a Spark engine. Integrated with PowerBI, Tableau, and similar.
Cannot be assigned a system assigned managed identity. Instead, use Secret Scope. Allows Databricks to access a Key-vault. Only users with Contributer-permission, or higher, might activate secret scopes.
Azure Data Factory
Azure’s cloud ETL service for scale-out serverless data integration and data transformation. You can create and schedule data-driven workflows. Can create pipelines without writing code. Can copy and transform data. Can orchestrate batch data movement and transformations. Only store pipeline-run data for 45 days.
Activities: copy data from source into a sink, perform transformation, or similar.
Linked services: connection tools that ADF uses to connect to services like API, storage accounts, and databases.
Pipelines: Container for activities in ADF. Contains a flow of activities that execute depending on completion status. Typically equipped with a trigger.
Integration Runtimes: the infrastructure used to compute activities, data flow, SSIS package, and data movements. 3 types:
Azure – runs data flow, data movement, and activities.
Self-hosted – runs data movement and activities.
Azure SSIS – SSIS-package execution
Azure Virtual Network
VNet. Enables Azure resources to securely communicate.
Azure Event Hub
Real-time data ingestion service. Stream events to build dynamic data pipelines and immediately respond. Can process millions of events per second. Collects events. Accepts only endpoints for ingestion of data. No mechanism for sending data back to publisher. Good for massive scale or for a series of events.
Event Hubs Dedicated is a pricing tier, billed at a fixed monthly price, minimum of 4 hours of usage.
Azure Event Grid
Build applications with event-based architectures. Good for dealing with discrete events and when there’s a need for the application to work in a publisher/subscriber model and handle event but not data.
Azure IoT Hub
IoT connector to the cloud. Enables solutions with reliable and secure communication
Azure SQL Database
Managed coud database
Azure Stream Analytics
Serverless scalable event processing engine. Can run real-time analytics on multiple streams.
Used togeahter with Event Hub. Event Hubs feeds events into Azure and Stream Analytics processes them.
Azure DevOps Git Repository
Set of version control tools to manage code. Can help track changes over time
Azure Monitor
To keep data for longer than 45 days. Helps maximize availability and performance of applications and services.
Azure Log Analytics
To edit and run log queries from data collected by Azure Monitor Logs and interactively analyze their results.
Azure Monitor builds on tp of Log Analytics. Monitor is the marketing name, Log Analytics is the technology that powers it.
Microsoft Power BI
Business analytics service. Provides interactive visualizations and BI capabilities. Reports and dashboards for end users.
Microsoft Visual Studio
Development environment. IDE and Code Editor
Delta Lake
Efficient method of storing data. ACID-compliant and stores data in hacked up parquet. Not readable by humans. Can be used for batch data and streaming data. Delta enforces schemas.
Blobs (3 kinds)
Block blobs: blocks of different sizes.
Append blobs: support only appending new data (not updating or deleting existing data). Good for scenarios like storing logs or writing streamed data.
Page blobs: for scenarios involving random-access reads and writes.
Container
Packages of software that contains everything needed to run in any environment. Virtualize the operating system.