Enterprise Computing - Operations Flashcards by Benjamin Richmond

What is a staging environment in cloud computing?

A scaled-down replica of the production environment used to test MVPs before release. The infrastructure that provides this is rented in the cloud.

How well did you know this?

Not at all

Perfectly

What are the five key features of cloud computing success?

Broad network access
On-demand, self-service
Measured service
Rapid Elasticity
Resource Pooling

(Easy access, on-demand, measured, can use more or less, resource pooling)

How well did you know this?

Not at all

Perfectly

What does “Broad Network Access” mean in cloud computing?

That the cloud is available over standard networks, including VPN’s.

How well did you know this?

Not at all

Perfectly

What is the “drug-dealer” pricing model in cloud computing?

A model where customers can easily and immediately acquire virtual machines on demand.

How well did you know this?

Not at all

Perfectly

How and why is usage monitored in cloud computing?

The provider monitors and measures usage for optimization and billing purposes.

How well did you know this?

Not at all

Perfectly

What is rapid elasticity in cloud computing?

The ability to scale resources up or down based on demand.

How well did you know this?

Not at all

Perfectly

Define resource pooling in cloud computing.

Assigning virtual machines to physical ones so that several users can use the same physical machine.

How well did you know this?

Not at all

Perfectly

What are the two phases of cloud computing?

Serverful and Serverless Computing

How well did you know this?

Not at all

Perfectly

What are the models of serverful computing?

Infrastructure-as-a-service: Access to bare servers; Platform-as-a-service: Access to servers with operating systems and tools; Software-as-a-service: access to applications on a subscription basis.

How well did you know this?

Not at all

Perfectly

What are the main technologies used for serverful implementation?

Virtual machines and containers.

How well did you know this?

Not at all

Perfectly

What is the cost model of serverful computing?

Charges are based on resource allocation, just like renting a car.

How well did you know this?

Not at all

Perfectly

What is the role of a hypervisor in VM management?

It maps virtual to physical resources, ensuring fair sharing.

How well did you know this?

Not at all

Perfectly

How do containers differ from VMs in resource allocation?

Containers share the host OS, mapped by the OS itself.

How well did you know this?

Not at all

Perfectly

How can microservices be deployed in serverful environments?

One microservice per VM or one microservice per container.

How well did you know this?

Not at all

Perfectly

What are the models of server less computing?

Backend-as-a-Service: Access to services such as authentication or database storage. Function-as-a-service: Cloud provider runs custom code in response to requests or events.

How well did you know this?

Not at all

Perfectly

What does “serverless” actually mean?

Servers exist, but their management is abstracted away by the provider.

How well did you know this?

Not at all

Perfectly

What is the serverless cost model?

Pay-as-you-go based on execution time, like hailing a taxi.

How well did you know this?

Not at all

Perfectly

How can microservices be implemented in serverless environments?

One microservice to one function instance, or one microservice to multiple function instances.

How well did you know this?

Not at all

Perfectly

What are challenges with mapping one microservice to many function instances?

Maintenance problems (keeping track of instances), and performance problems (keeping ‘warm’ instances).

How well did you know this?

Not at all

Perfectly

According to Wells, how long did it take to get a server ready for code deployment at the FT in a data centre vs AWS?

120 days in an FT data centre; minutes in an AWS data centre.

How well did you know this?

Not at all

Perfectly

According to Wells, should one worry about vendor lock-in?

No; it’s more important to avoid delays from doing everything in-house.

How well did you know this?

Not at all

Perfectly

According to Wells, what was the FT’s deployment frequency before and after moving to the cloud?

12 releases per year before; about 30,000 changes per year after.

How well did you know this?

Not at all

Perfectly

According to Wells, do you have to choose between speed and stability?

No; moving fast actually helps fix issues quicker and break fewer things.

How well did you know this?

Not at all

Perfectly

According to Wells, why should you use a queue?

So that producers and consumers don’t rely on each other

How well did you know this?

Not at all

Perfectly

According to Wells, what two things should you focus on when developing a distributed system?

Resilience and redundancy.

According to Wells, why should you adopt business-focused monitoring?

Because a few key business indicators can confirm that the system is functioning properly.

According to Wells, why should you test infrastructure recovery plans?

Because you can’t be confident the plan works until it’s tested.

According to Wells, why must the team that builds a system also run it?

Because only that team understands the system well enough to fix issues, especially in urgent situations like 3am outages.

What is meant by “You build it, you run it”?

Developers are responsible for the operational side of the software they create, leading to better quality and customer feedback loops.

What does the CALMS acronym in DevOps stand for?

Culture, Automotation, Lean, Measurement, Sharing.

What is the cultural principle of DevOps?

Fostering shared values and a blameless culture focused on learning and collaboration.

What lesson does the NUMMI plant story illustrate about DevOps culture?

High-trust, continuous improvement culture leads to significant performance improvements.

Why is automation emphasized in DevOps?

It reduces the chance of failure from manual tasks, increases speed and transparency, and creates repeatable processes.

What is Automation with a human touch?

Allowing automatic or manual interruption of processes to ensure quality.

What is the core of the “Lean” principle in DevOps?

Eliminate waste to reduce delays without sacrificing product quality.

What are two ways to reduce waste in DevOps?

Limit work in progress to avoid interruptions. Reduce handoffs to minimise communication overhead.

Why is measurement critical in DevOps?

It helps detect and fix problems quickly by tracking system behavior through metrics and logs.

What is the purpose of “Sharing” in DevOps?

To enhance collaboration and learning between development and operations teams.

According to Vargo, what are the differing concerns of developers and operators?

Developers focus on agility; operators focus on stability.

How can development teams build relationships with operations?

By inviting them to both formal and informal team activities to improve communication and feedback loops.

According to Vargo, what is DevOps in its purest form?

Breaking down the wall between developers and operators.

According to Vargo, why should one reduce organizational silos?

Because success comes from cooperation between cross-functional teams.

According to Vargo, why should one accept failure as normal?

Because any human-built system is inherently unreliable.

According to Vargo, why implement gradual change?

Because bugs are harder to find in large, million-line changes.

According to Vargo, why leverage tooling and automation?

To turn work into repeatable patterns that can be automated.

According to Vargo, why measure everything?

To justify DevOps investments and define clear success metrics.

According to Vargo, how does SRE reduce organizational silos?

By sharing ownership with developers, using shared tools, and defining availability targets.

According to Vargo, how does SRE accept failure as normal?

By using Service Level Objectives (SLOs) and conducting blameless postmortems.

According to Vargo, how does SRE leverage tooling and automation?

Automating tasks done manually in the past.

According to Vargo, how does SRE implement gradual change?

Through small, fast, iterative deployments that reduce the cost of failure.

According to Vargo, how does SRE measure everything?

By tracking both system metrics (e.g., reliability) and human metrics (e.g., toil).

How does SRE implement the DevOps culture principle?

Through a dedicated SRE team or embedded consultants in dev teams.

What characterizes the SRE postmortem culture?

It is blameless, it's reports do not single out individuals or teams

What is the goal of organizational learning in SRE?

To make the enterprise more resilient through shared learnings.

How does SRE define "toil"?

Manual, repetitive, automatable, low-value ops work that scales linearly.

What percentage of time should SREs spend on engineering work?

At least 50%, according to Google's standards.

What is the incident limit for SREs per 8–12 hour shift?

No more than two incidents to avoid pager fatigue and ensure quality postmortems.

What is an error budget in SRE?

The difference between observed and agreed reliability.

How does the error budget affect feature releases?

If positive, releases can proceed; if negative, they must pause to improve resilience.

How does polarizing time help reduce handoffs?

It allows SREs to focus exclusively on either dev or ops work during a shift.

How does SRE approach metric selection?

Through intuition, experience, and understanding of user needs.

What is a Service Level Indicator (SLI)?

A quantitative measure of a service aspect.

What is a Service Level Objective (SLO)?

A target value or range for an SLI.

What is a Service Level Agreement (SLA)?

A contract outlining the consequences of meeting or missing an SLO.

How does SRE promote knowledge sharing?

By ensuring communication between dev and ops about new functionality and performance.

How does SRE support tool and technique sharing?

By standardizing environment management and enabling self-service deployments.

According to Bisset and Horowitz, what makes a good alert, and why should SREs care?

A good alert is actionable and requires human intervention; SREs care because bad alerts disrupt their sleep and degrade focus.

According to Bisset and Horowitz, what is “reliability theatre” and why does it matter?

It refers to outdated practices like war rooms that look impressive but hinder real incident response.

According to Bisset and Horowitz, what is a “snowflake” and why is it a concern for SREs?

A manually maintained production server; it’s hard to reproduce and debug.

According to Bisset and Horowitz, what are “pets,” “cattle,” and “poultry,” and why should SREs care?

Pets = unique, high-maintenance servers. Cattle = Replaceable VMs. Poultry = Lighweight Containers. SREs care due to differing administrative costs.

According to Bisset and Horowitz, why is autonomous > automated for SREs?

Autonomous systems reduce human intervention, easing on-call burdens.

According to Bisset and Horowitz, why embed SREs in dev teams?

It builds trust and allows SREs to influence system design early on.

According to Bisset and Horowitz, what is the “right number of nines”?

It depends on how much downtime the business can tolerate.

According to Bisset and Horowitz, why is it dangerous to improve a system without revising its SLA?

Customers will assume the higher reliability is the new guaranteed standard.

Enterprise Computing - Operations Flashcards

(74 cards)