Designing Data Intensive Applications Flashcards

(31 cards)

1
Q

what are the types of applications?

A

data-intensive , compute-intensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the types of data that are more widely used?

A

data-intensive as as CPU power is rarely a limiting factor for these applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a data-intensive application built from?

A

Store data so that they, or another application, can find it again later (databases)
• Remember the result of an expensive operation, to speed up reads (caches)
• Allow users to search data by keyword or filter it in various ways (search indexes)
• Send a message to another process, to be handled asynchronously (stream pro‐
cessing)
• Periodically crunch a large amount of accumulated data (batch processing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

are databases , queues , caches different?

A

they are superficially the same as they all store data for some time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the differences between databases , queues , caches?

A

they have different access patterns which means different performance charectaristics and thus different implementations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why should we lump databases , queues , caches under one umbrella like data-systems

A

1 - as many new tools have emerged they are optimized for different use cases like
1 - redis is used for datastore and message queue
2 - apache kafka is a message queue that has database gaurantees
so the line is getting blurry between the different categories
2 - the requirements of application is so wide and big that not one tool can meet all the specifications for the whole application so we use one tool for each task of the applications and stich them together with application code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what keeps all the tools in sync in the application?

A

the application code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the three main qualities of any system?

A

Reliability
The system should continue to work correctly (performing the correct function at
the desired level of performance) even in the face of adversity (hardware or soft‐
ware faults, and even human error)
Scalability
As the system grows (in data volume, traffic volume, or complexity), there should
be reasonable ways of dealing with that growth
Maintainability
Over time, many different people will work on the system (engineering and oper‐
ations, both maintaining current behavior and adapting the system to new use
cases), and they should all be able to work on it productively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the expectations of a reliable system?

A

1 - The application performs the function that the user expected.
2 - It can tolerate the user making mistakes or using the software in unexpected
ways.
3 - Its performance is good enough for the required use case, under the expected load and data volume.
4 - The system prevents any unauthorized access and abuse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the difference between a fault and a failure?

A

A fault is usually defined as one component of the system deviating from its spec
whereas a failure is when the system as a whole stops providing the required service to the user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are some kinds of faults?

A

HardWare Faults:
it can be prevented by redundancy in the system(backup for RAM or CPU or HARDDRIVE) but with more than one machine running it’s better to use software ot handle the faults

Software Errors:
they are correlated and can cause other parts to fail unlike the hardware fault examples:
1 - a software bug due to input
2 - process that uses up resources
3 - a service that the system depends on that slows down or becomes unresponsive
4 - cascading failure , where one fault causes another

can’t prevents it but what helps is testing and that the system itself provides feedback of what is happening

Human Errors:
1 - create interfaces and api
2 - decouple the system
3 - unit tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is scalability?

A

Scalability is the term we use to describe a system’s ability to cope with increased
load. Note, however, that it is not a one-dimensional label that we can attach to a system: it is meaningless to say “X is scalable” or “Y doesn’t scale.” Rather, discussing scalability means considering questions like “If the system grows in a particular way,
what are our options for coping with the growth?” and “How can we add computing
resources to handle the additional load?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to describe a load?

A

Load can be described
with a few numbers which we call load parameters. The best choice of parameters
depends on the architecture of your system: it may be requests per second to a web
server, the ratio of reads to writes in a database, the number of simultaneously active
users in a chat room, the hit rate on a cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the difference between response time and latency?

A

Latency and response time are often used synonymously, but they
are not the same. The response time is what the client sees: besides
the actual time to process the request (the service time), it includes
network delays and queueing delays. Latency is the duration that a
request is waiting to be handled—during which it is latent, awaiting
service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how should u measure a request response time?

A

not by a single request but as a distribution values and then calculate the median of each request meaning half time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what happens When several backend calls are needed to serve a request,?

A

t takes just a single

slow backend request to slow down the entire end-user request.

17
Q

what is scaling up?

A

vertical scaling, moving to a

more powerful machine

18
Q

what is scaling out?

A

horizontal scaling, distributing the load

across multiple smaller machines

19
Q

how to minimize the pain during maintainability so that u don’t have a legacy system?

A

Operability
Make it easy for operations teams to keep the system running smoothly.

Simplicity
Make it easy for new engineers to understand the system, by removing as much complexity as possible from the system. (Note this is not the same as simplicity of the user interface.)

Evolvability
Make it easy for engineers to make changes to the system in the future, adapting it for unanticipated use cases as requirements change. Also known as extensibility, modifiability, or plasticity.

20
Q

what is the job of an operation team?

A

Monitoring the health of the system and quickly restoring service if it goes into a
bad state
• Tracking down the cause of problems, such as system failures or degraded performance
• Keeping software and platforms up to date, including security patches
• Keeping tabs on how different systems affect each other, so that a problematic
change can be avoided before it causes damage
• Anticipating future problems and solving them before they occur (e.g., capacity
planning)
• Establishing good practices and tools for deployment, configuration management,
and more
• Performing complex maintenance tasks, such as moving an application from one
platform to another
• Maintaining the security of the system as configuration changes are made
• Defining processes that make operations predictable and help keep the production
environment stable
• Preserving the organization’s knowledge about the system, even as individual
people come and go

21
Q

what is the most important part in developing a software?

A

data models because
they have such a profound effect: not only on how the software is written, but also on
how we think about the problem that we are solving.

22
Q

what are the general data models?

A

relational model, the document model, and a few graph-based data models

23
Q

what is the relation model?

A

data is organized into relations (called

tables in SQL), where each relation is an unordered collection of tuples (rows in SQL).

24
Q

what was the drive force for NoSQL?

A

• A need for greater scalability than relational databases can easily achieve, including
very large datasets or very high write throughput
• A widespread preference for free and open source software over commercial
database products
• Specialized query operations that are not well supported by the relational model
• Frustration with the restrictiveness of relational schemas, and a desire for a more
dynamic and expressive data model

25
what is polyglot persistence?
to use SQL with NoSQL at the same time
26
what is the problem with the sql representation?
there is a mismatch between the object and the representation in the database like the one to many when u have alot of joins to make or in the newer version where u can put an array of data inside one row but then u can't query for them
27
why do we store the id in the database no the name?
as a name always refers to something that a human can interact with which means it will change overtime but with an id t never changes
28
what is better in a relational model?
joins and handling many-to-one and many to many
29
how are other entities referenced in relational or document models?
relational : foreign key | document: document id
30
Which data model leads to simpler application code?
If the data in your application has a document-like structure (i.e., a tree of one-tomany relationships, where typically the entire tree is loaded at once), then it’s proba‐bly a good idea to use a document model. For highly interconnected data, the document model is awkward, the relational model is acceptable, and graph models are best
31
what is the difference in schema between relationjal and document?
relational is schema-on-write enforcing data type while writing documents is schema-on-read