Sending a request as many times as you want but the effect is as if it only happens once.
Idempotency
Type of index where write first goes to an in-memory balanced binary search tree (memtable), and eventually written to disk when tree becomes too large. Write the contents of it (sorted by key name) to a table file
SSTables and LSM trees index
A type of DB commonly optimized for query aggregating system
OLAP (Online Analytical Processing) DB
Politeness
Availability
A distributed algorithm used to ensure all-or-nothing outcomes (atomicity) in a distributed transaction system.
* Coordinates global transaction across multiple nodes or database ensuring all participants either commit or roll back transactions maintaining consistency across distributed system.
Two-Phase Commit
Advantages
* Reliable and ensures data integrity
* Suitable for applications where data accuracy and order are critical.
Disadvantages:
* Higher overhead due to connection management and error checking.
* Slower than UDP due to the additional overhead.
TCP (Transmission Control Protocol)
Supports Node query caching: cache on each instances of elastic search and caches the top 10k queries via LRU cache
AWS OpenSearch
Hot Shard Mitigations
A method of processing data where a large volume of data is collected, processed, and output at once, rather than in real-time. This approach is suitable for scenarios where immediate processing is not required, allowing for efficient handling of extensive datasets and complex computations.
Batch processing
Way(s) Kafka can help with fault tolerance / data integrity?
Kafka: Retention/Replayability
Embedding
System architecture simplifies the data processing model by eliminating the batch layer entirely. It was introduced by Jay Kreps to address the complexity of the Lambda Architecture.
Kappa Architecture
A distribute event streaming platform used for
* building real-time data pipelines and streaming applications.
* designed to handle high throughput and low latency for data ingestion and processing, enabling the real-time processing of data streams.
Kafka
System architecture designed to handle massive quantities of data by using both batch and real-time processing methods. It was proposed to address the challenges of latency, throughput, and fault-tolerance.
Lambda Architecture
Stream processing
Dividing a single database or dataset into smaller segments
Partition
a replication strategy where multiple nodes (leaders) can accept write operations. Each leader replicates its changes to other leaders, allowing writes to be processed on multiple nodes. This provides higher availability and fault tolerance, but introduces challenges in maintaining data consistency and conflict resolution.
Multi Leader Replication
A concept used in stream processing and system design to group a continuous stream of data into fixed, non-overlapping chunks of time. Each chunk captures events that occur within a specific time period. This is useful for analyzing data streams in discrete intervals, allowing for time-based aggregations and computations.
Tumbling Window
a web communication technique used to achieve near-real-time interaction between a client and a server. It is a method where the client requests information from the server, and the server holds the request open until new information is available or a timeout occurs.
Long polling
Crawler Traps
Fault Tolerance
Exponential Backoff
Redundancy