clone Flashcards

Question

Draw a flowchart showing how does Amazon SageMaker works. Explain it briefly.

Answer 1

+--------------------+ | Amazon Lambda | +---------+----------+ | v +------------------+ +------+------+ | Amazon |---->| | | CloudWatch | | Event | +------------------+ | Handler | +------+------+ | ----------------------------- | | v v Model Training Model Deployment | | v v +--------------------------------------+ | Amazon SageMaker | +------------------+-------------------+ | v Finish Model Training Job | v +------------------+ | | Amazon S3 |------------+ | Bucket | +------------------+

Answer 2

# Model deployment - Data preparation : Collecting, cleaning, and transforming data into the appropriate format. - Model building Using pre-built algorithms/frameworks or custom algorithms to build the model. - Model training Training the model using the prepared data, with options for distributed training. * Model optimization Fine-tuning hyperparameters and optimizing the architecture for performance. - Model deployment Deploying the model to endpoints for use in production. * Model monitoring Tracking performance metrics and detecting anomalies in real-time. * Model Management Managing the model over time, including updates and retraining.

Answer 3

*IAM Integration: Uses AWS Identity and Access Management (IAM) policies to maintain security for data stored in S3 (Data Lake). * Encryption: Offers optional encryption for models in transit and at rest using AWS Key Management Service (KMS). * Secure Connections: All API requests are transmitted over Secure Sockets Layer (SSL) connections. * VPC Deployment: Can be deployed within an Amazon Virtual Private Cloud (VPC) for greater control over data flow.

Answer 4

* Amazon EC2 instances. * Lambda functions. * API Gateway.

Answer 5

* PuppyGraph. * AWS Neptune. * Neo4j. * DataStax.

Answer 6

It means Neo4j is designed specifically for storing and processing graph data (native), offering a powerful and flexible data model that allows for efficient querying and analysis of complex, interconnected data.

Answer 7

Data storage component: Responsible for storing graph data (e.g., a graph database). * Analytics engine: Responsible for performing actual analysis on the graph data (e.g., path analysis).

Answer 8

Batch ingestion Way of ingestion Collects data over a set period and ingests it all at once (in fixed groups). Efficient for large data volumes. Used when delay is acceptable (e.g., hourly, daily). Streaming ingestion Data flows in as it happens; sends each event immediately. Data size Handles continuous updates (records/events). Update speed Used when fast updates are needed (low latency, real-time).

Answer 9

How it works: 1. Producers send data to a Kinesis stream. 2. Data is stored in shards for a retention period (24 hours default). 3. Consumers (apps or AWS services) read and process records in real-time. Definitions: * Data streams: A container for data records. * Shards: The unit of capacity in a stream; each shard handles a fixed number of reads/writes. * Data record: The smallest unit in a stream (up to 1 MB), consisting of a data blob and partition key. * Data blob: The actual data payload within the record. * Partition key: Determines which shard a record goes to +----------------+ +----------------+ | Event Producer | | Event Consumer | +----------------+ +----------------+ \ / \ / Pull Messages Push Messages \ / \ / v v +-------------------------------------+ | Kinesis Data Streams | |-------------------------------------| | Stream 1 | Stream 2 | Stream 3 | +-------------------------------------+ ^ ^ | | | | +------------------------------------+ | Anatomy of Data Stream | |------------------------------------| | Shard 1 | █ █ █ █ █ □ □ □ □ □ | | Shard 2 | █ █ □ □ □ □ □ □ □ □ | | Shard 3 | █ □ □ □ □ □ □ □ □ □ | +------------------------------------+

Answer 10

How it works: 1. Producers publish messages to a topic. 2. Kafka stores messages in partitions on brokers. 3. Consumers subscribe to topics and process messages in order. Definitions: * Topic: A named category for messages where data is published. * Partition: A subdivision of a topic that enables parallelism and scalability. * Brokers: The Kafka servers that store topic partitions. * Cluster: Multiple brokers working together for fault tolerance and scalability. * Zookeeper: Manages cluster metadata and leader election. +----------------+ +----------------+ | Event Producer | | Event Consumer | +----------------+ +----------------+ \ / \ / \ / Pull Messages Push Messages \ / v v +--------------------------------------+ | Kafka Cluster | |--------------------------------------| | Topic 1 | Topic 2 | Topic 3 | +--------------------------------------+ ^ ^ | | | | +---------------------------------------+ | Anatomy of Kafka Topic | |---------------------------------------| | Partition 0 | █ █ █ █ □ □ □ □ □ □ | | Partition 1 | █ █ □ □ □ □ □ □ □ □ | | Partition 2 | █ □ □ □ □ □ □ □ □ □ | +---------------------------------------+

Answer 11

Job Extract ETL From sources (DBs, APIs, files). ELT From sources. Load ETL Into the target system (data warehouse) after transformation. ELT Loads raw data directly into the target system. Transform ETL Done in an ETL engine (before loading). ELT Done inside the target system using its processing power

Answer 12

Airbyte, Apache NiFi, Talend (open studio).

Answer 13

Sqoop (database → HDFS), NiFi.

Answer 14

Informatica PowerCenter, Talend (enterprise), Matillion.

Answer 15

AWS Glue, Matillion.

Answer 16

# IoT sensor data ✅ 1) Sales Data Aggregation (اختصار محفوظ بسهولة) Extract: Take daily sales data from store databases. Transform: Clean data, unify currency, calculate totals per store/product. Load: Save the aggregated results into a data warehouse (Redshift/Snowflake). ✅ 2) IoT Sensor Data (اختصار واضح وسهل) Extract: Collect hourly readings from IoT devices. Transform: Remove outliers, convert units, compute averages. Load: Store data in a time-series DB or cloud warehouse. ✅ 3) Healthcare Data Pipeline (مختصر ومفهوم) Extract: Get patient records from hospital systems. Transform: Remove personal info, standardize medical units, group by diagnosis. Load: Insert cleaned data into a research database.

Answer 17

Virtualization is a technology that allows you to use one physical computer as if it were many by running multiple virtual machines on it. In cloud computing, it is used to split one big server into smaller, isolated virtual servers, allowing resources to be used efficiently so businesses only pay for what they need without buying extra hardware.

Answer 18

Containerization is the process of packing an application along with all its dependencies (libraries, configuration files) into a single, isolated unit called a "container". This ensures that the application runs consistently and efficiently in any environment (whether on a laptop or a cloud server) because it is independent of the host operating system.

Answer 19

Role Description Resource allocation The hypervisor controls the virtual machines' use of physical resources Isolation It creates and runs isolated virtual machines (VMs), Management It serves as an intermediary between the physical computer and the virtual machines, Security By isolating VMs, it ensures that if one VM fails, others are unaffected,

Answer 20

Type 1 (Bare-Metal): Installed directly onto the computer hardware without an operating system sitting in between. It is highly efficient as it has direct access to resources.  Type 2 (Hosted): Runs over an installed operating system (like Windows or macOS). It is used when you need to execute more than one OS on one machine.

Answer 21

Application virtualization :Microsoft Azure (lets people use applications without installing them locally).  Network virtualization :Google Cloud (allows companies to create networks using software).  Desktop virtualization: Amazon WorkSpaces or Google Cloud (GCP) Virtual Desktops.  Storage virtualization: Amazon S3 (combines storage into a single system).  Server virtualization: VMware vSphere, Microsoft Hyper-V, or KVM.  Data virtualization: Solutions from companies like Oracle and IBM.

Answer 22

How it works: Containers virtualize the operating system of a server . They package code , systems into standard unit ;  Example: Using Docker, you can quickly deploy applications into any environment (like AWS)

Answer 23

1- Amazon Elastic Container Service (ECS): Highly scalable container management. 2- AWS Fargate: Runs containers without managing servers/infrastructure. 3- Amazon Elastic Container Service for Kubernetes (EKS): Runs Kubernetes on AWS. 4- Amazon Elastic Container Registry (ECR): Stores and manages Docker container images. 5- AWS Batch: Runs batch processing workloads using Docker containers.

Answer 24

1. Architecture Containers: Share the host OS kernel; isolated user spaces make them lightweight. Virtual Machines: A hypervisor runs on the host OS; includes a full guest OS with virtualized hardware. 2. Boot Time Containers: Much less (seconds), as they do not need to boot a full OS. Virtual Machines: Longer, because the full OS needs to be initialized. 3. Isolation Containers: Process-level isolation (less strong than VMs). Virtual Machines: Very good isolation because each VM is a system with its own OS. 4. Resource Usage Containers: Consume fewer resources (only necessary binaries/libraries). Virtual Machines: Very high, as full OS overhead is incurred for each instance.

Answer 25

1. Build Create the containerized application using the Docker environment. 2. Store Use Amazon Elastic Container Registry (ECR) to store and manage the Docker container images securely. 3. Deploy Deploy the application from the local Docker environment to Amazon ECS. 4. Run Use AWS Fargate to run the containers without provisioning or managing servers.

clone Flashcards

(60 cards)