What is Kafka?
Kafka is an open source software which provides a framework for storing, reading and analysing streaming data.
Something like Redis but with database-level reliability
What is Memcached?
Memcached is an open source, high-performance, distributed memory caching system intended to speed up dynamic web applications by reducing the database load. It is a key-value dictionary of strings, objects, etc., stored in the memory, resulting from database calls, API calls, or page rendering.
( Tools for caching )
What is ElasticSearch?
Elasticsearch is a real-time distributed and open source full-text search and analytics engine.
What is Solr?
Solr is a scalable, ready to deploy, search/storage engine optimized to search large volumes of text-centric data
What is Reliability?
The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error)
What is Maintainability?
Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively
What kind of errors can break Relibity?
What examples of scalability workload params do you know?
What is Hadoop?
Hadoop is an open-source software framework with ability to store and process huge amounts of any kind of data, quickly.
What is MapReduce?
MapReduce is a module in the Apache Hadoop open source ecosystem. We use MapReduce to write scalable applications that can do parallel processing to process a large amount of data on a large cluster of commodity hardware servers.
What is a rolling upgrade?
A rolling upgrade is an upgrade of a software version, performed without a noticeable down-time or other disruption of service. ( we have a load balancer and roll upgrade one by one on each server )
What is Shared-nothing architecture?
Shared Nothing Architecture (SNA) is a distributed computing architecture that consists of multiple separated nodes that don’t share resources. The nodes are independent and self-sufficient as they have their own disk space and memory. In such a system, the data set/workload is split into smaller sets (nodes) distributed into different parts of the system. Each node has its own memory, storage, and independent input/output interfaces.
What is replication?
Replication is the continuous copying of data changes from one database (publisher) to another database (subscriber).
What is a database table partitioning (секционирование/шардинг)?
Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.
What replication strategies do you know?
What are the differences between synchronous, asynchronous and semi-synchronous replication?
How to add one more child node without downtime and losing data?
What replication data sending strategies do you know?
What is Statement-based replication (SBR) ? What are pros/cons?
Binary log stores the SQL statements used to change databases on the master server. The slave reads this data and reexecutes these SQL statements to produce a copy of the master database.
Problems
- Rand and Time.now function inside the statement
- Auto incremented columns
What is Write-Ahead Logging (WAL) replication/ Streaming Replication?
WAL stands for Write-Ahead Logging. It is the standard protocol being used to ensure that all the changes made to the database are being logged properly in their order of occurrence. ( we send low level data to replica to restore data )
What is Logical replication?
Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). We use the term logical in contrast to physical replication, which uses exact block addresses and byte-by-byte replication.
What is trigger replication?
This replication allows you to run trigger and handle data on the application side. It’s useful if you use a different DB and you need your custom logic.
What is replication lag?
A replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node. ( When we have differences between main and child nodes)
What is read-after-write consistency?
Read-after-write consistency is the ability to view changes (read data) right after making those changes (write data). For example, if you have a user profile and you change your bio on the profile, you should see the updated bio if you refresh the page. There should be no delay during which the old bio shows up.