what are the types of applications?
data-intensive , compute-intensive
what are the types of data that are more widely used?
data-intensive as as CPU power is rarely a limiting factor for these applications
what is a data-intensive application built from?
Store data so that they, or another application, can find it again later (databases)
• Remember the result of an expensive operation, to speed up reads (caches)
• Allow users to search data by keyword or filter it in various ways (search indexes)
• Send a message to another process, to be handled asynchronously (stream pro‐
cessing)
• Periodically crunch a large amount of accumulated data (batch processing)
are databases , queues , caches different?
they are superficially the same as they all store data for some time
what are the differences between databases , queues , caches?
they have different access patterns which means different performance charectaristics and thus different implementations
why should we lump databases , queues , caches under one umbrella like data-systems
1 - as many new tools have emerged they are optimized for different use cases like
1 - redis is used for datastore and message queue
2 - apache kafka is a message queue that has database gaurantees
so the line is getting blurry between the different categories
2 - the requirements of application is so wide and big that not one tool can meet all the specifications for the whole application so we use one tool for each task of the applications and stich them together with application code
what keeps all the tools in sync in the application?
the application code
what are the three main qualities of any system?
Reliability
The system should continue to work correctly (performing the correct function at
the desired level of performance) even in the face of adversity (hardware or soft‐
ware faults, and even human error)
Scalability
As the system grows (in data volume, traffic volume, or complexity), there should
be reasonable ways of dealing with that growth
Maintainability
Over time, many different people will work on the system (engineering and oper‐
ations, both maintaining current behavior and adapting the system to new use
cases), and they should all be able to work on it productively
what are the expectations of a reliable system?
1 - The application performs the function that the user expected.
2 - It can tolerate the user making mistakes or using the software in unexpected
ways.
3 - Its performance is good enough for the required use case, under the expected load and data volume.
4 - The system prevents any unauthorized access and abuse.
what is the difference between a fault and a failure?
A fault is usually defined as one component of the system deviating from its spec
whereas a failure is when the system as a whole stops providing the required service to the user
what are some kinds of faults?
HardWare Faults:
it can be prevented by redundancy in the system(backup for RAM or CPU or HARDDRIVE) but with more than one machine running it’s better to use software ot handle the faults
Software Errors:
they are correlated and can cause other parts to fail unlike the hardware fault examples:
1 - a software bug due to input
2 - process that uses up resources
3 - a service that the system depends on that slows down or becomes unresponsive
4 - cascading failure , where one fault causes another
can’t prevents it but what helps is testing and that the system itself provides feedback of what is happening
Human Errors:
1 - create interfaces and api
2 - decouple the system
3 - unit tests
what is scalability?
Scalability is the term we use to describe a system’s ability to cope with increased
load. Note, however, that it is not a one-dimensional label that we can attach to a system: it is meaningless to say “X is scalable” or “Y doesn’t scale.” Rather, discussing scalability means considering questions like “If the system grows in a particular way,
what are our options for coping with the growth?” and “How can we add computing
resources to handle the additional load?”
how to describe a load?
Load can be described
with a few numbers which we call load parameters. The best choice of parameters
depends on the architecture of your system: it may be requests per second to a web
server, the ratio of reads to writes in a database, the number of simultaneously active
users in a chat room, the hit rate on a cache
what is the difference between response time and latency?
Latency and response time are often used synonymously, but they
are not the same. The response time is what the client sees: besides
the actual time to process the request (the service time), it includes
network delays and queueing delays. Latency is the duration that a
request is waiting to be handled—during which it is latent, awaiting
service
how should u measure a request response time?
not by a single request but as a distribution values and then calculate the median of each request meaning half time
what happens When several backend calls are needed to serve a request,?
t takes just a single
slow backend request to slow down the entire end-user request.
what is scaling up?
vertical scaling, moving to a
more powerful machine
what is scaling out?
horizontal scaling, distributing the load
across multiple smaller machines
how to minimize the pain during maintainability so that u don’t have a legacy system?
Operability
Make it easy for operations teams to keep the system running smoothly.
Simplicity
Make it easy for new engineers to understand the system, by removing as much complexity as possible from the system. (Note this is not the same as simplicity of the user interface.)
Evolvability
Make it easy for engineers to make changes to the system in the future, adapting it for unanticipated use cases as requirements change. Also known as extensibility, modifiability, or plasticity.
what is the job of an operation team?
Monitoring the health of the system and quickly restoring service if it goes into a
bad state
• Tracking down the cause of problems, such as system failures or degraded performance
• Keeping software and platforms up to date, including security patches
• Keeping tabs on how different systems affect each other, so that a problematic
change can be avoided before it causes damage
• Anticipating future problems and solving them before they occur (e.g., capacity
planning)
• Establishing good practices and tools for deployment, configuration management,
and more
• Performing complex maintenance tasks, such as moving an application from one
platform to another
• Maintaining the security of the system as configuration changes are made
• Defining processes that make operations predictable and help keep the production
environment stable
• Preserving the organization’s knowledge about the system, even as individual
people come and go
what is the most important part in developing a software?
data models because
they have such a profound effect: not only on how the software is written, but also on
how we think about the problem that we are solving.
what are the general data models?
relational model, the document model, and a few graph-based data models
what is the relation model?
data is organized into relations (called
tables in SQL), where each relation is an unordered collection of tuples (rows in SQL).
what was the drive force for NoSQL?
• A need for greater scalability than relational databases can easily achieve, including
very large datasets or very high write throughput
• A widespread preference for free and open source software over commercial
database products
• Specialized query operations that are not well supported by the relational model
• Frustration with the restrictiveness of relational schemas, and a desire for a more
dynamic and expressive data model