What is cloud computing?
A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management or service provider interaction.
What is cloud computing composed of?
5 characteristics, 4 deployment models and 3 service models
What are the five characteristics of cloud computing?
What are the four deployment models of cloud computing?
What are the three service models of cloud computing?
What is the difference between traditional data and big data?
Big Data refers to the inability of traditional data architectures to efficiently handle new datasets.
What are the characteristics of big data? (4 V’s)
Bonus: (New V)
- Value
(There are different ideas about how many V’s there are. Our teacher sticks to 4(5) V’s)
True or False: Machine Learning Techniques are very capable of processing large raw data
True!
Big data require a ____ architecture for efficient storage, manipulation and analysis
Big data require a SCALABLE architecture for efficient storage, manipulation and analysis
What does Data Science try to extract from data? And how?
Data science tries to extract: Actionable knowledge / Patterns
Through: Discovery or hypothesis formulation and hypothesis testing
What are the six primary components of the Data lifecycle management system? (DLMS)
Comprises of six primary components:
- Metadata management (Maintains static and dynamic characteristics of data. I.e data about the data)
Name a couple of Big Data processing frameworks
How does Hadoop MapReduce work? (I highly doubt this will be very relevant - See that chart in Notion)
A MR job splits data into independent chunks which are processed in-parallel by map tasks. Sorted map outputs are fed to reduce tasks.
YARN (Yet Another Resource Negotiator) offers resource management and job scheduling. It is a cluster management technology that became a core part of Hadoop 2.0
What are the advantages of Apache Spark?
Apache Spark offers:
What are the five components of Apache Spark?
How is Apache Beam different from Apache Spark?
Beam does not contain infrastructure for distributed processing
Dunno, don’t focus too much on this, I think
What are the primary components of Apache Beam?
Beam supports (What/Where/When/How) via Pipelines, PCollections, Transformations and Runners
What are the core features of TensorFlow?
Also has a lot of data processing features: tf.keras, data loading and preprocessing (tf.data, tf.io), image processing (tf.image), signal processing (tf.signal)
What is TensorFlow?
Its core is very similar to Numpy, but with GPU support
TensorFlow’s API revolves around tensors which flow from operation to operation hence name TensorFlow
Tensor is like a numpy ndarray
Note: Type conversions can significantly hurt performance
Describe (in very broad terms) TensorFlow’s architecture
ARCHITECTURE: High level (Python code) -> Keras/Data API -> Low-level python API/C++ -> Local/distributed execution engine -> CPU-/GPU-/TPU Kernels
Execution steps:
True or False: Deep Learning frameworks hide mathematics and focus on design of neural nets
True
Name some Deep Learning Frameworks
How do you train a Neural Network?
1) Build a computational graph from network definition
2) Input Training data and compute loss function
3) Update parameters
Define-and-run: DL Frameworks complete step one in advance (TensorFlow, Caffe)
Define-by-run: Combines steps one and two into a single step (PyTorch)
- Computational graph is not given before training, but obtained while training
Not important, but pretty cool: ONNX (https://onnx.ai)
Acronym for: Open Neural Network eXchange
https://onnx.ai