My Personal IT Infrastructure Knowledge Base: Platform Engineering

When building and operating platform services—whether it’s cloud hosting, APIs, or IT support—you’ll often hear the terms SLA and SLO. They might sound similar, but they serve very different purposes.

What is a Service?

A service is a self-contained, reusable capability that the platform exposes to customers. It delivers value and solves a specific problem without requiring the customer to manage the underlying complexity.

Services are technical units of functionality
Customers interact with services via APIs, UIs, or SDKs
Services can be part of a larger product

Example of Platform Services:

Compute / Virtual Machine

Service Name: vServer

Description: Virtual servers with configurable CPU, memory, and storage

SLA:

99.999% uptime;

SLO:

VM deployment < 5 min

Compute / Bare Metal

Service Name: pServer

Description: High-performance computing, compliance workloads

SLA:

99.9% uptime;

SLO:

provisioning < 30 min

Storage / Block Storage (vDisk)

Service Name: vHDD

Description: Capacity block storage device

SLA:

99.999% uptime;
1 IOPS per GB (32kB) @ <10 ms

SLO:

provisioning < 5 min

Service Name: vSSD

Description: Capacity block storage device

SLA:

99.999% uptime;
3 IOPS per GB (32kB) @ <5 ms

SLO:

provisioning < 5 min

etc.

SLA – Service Level Agreement

An SLA is a formal contract between a service provider and a customer. It defines what the customer can expect from the service and usually includes consequences if the service falls short.

Example:
“Our cloud storage guarantees 99.9% uptime per month. If we fail, you receive a service credit.”

SLO – Service Level Objective

An SLO is an internal goal for service performance. It helps teams measure and improve service quality but isn’t usually legally binding.

Example:
“Our API endpoints aim for 99.95% uptime per month.”

The Key Difference

SLO = internal target
SLA = external promise

Think of it like this: you aim for an SLO, but you commit to an SLA.

Why It Matters

Understanding SLAs vs SLOs helps teams:

Set realistic goals
Measure service quality accurately
Manage customer expectations clearly

Getting this distinction right is a cornerstone of good service management.

Example of detailed SLA: Cloud Storage Service

Product: S3 Cloud Storage as a Service

SLA (Service Level Agreement) – What the provider promises to the customer:

Uptime guarantee: 99.9% per month

allowed downtime 43m 50s per month

Data durability: 99.9999999% per year (9 nines)

If you have 1 million objects, expected loss is 0.001 per year → practically zero, very safe.
If you have 1 billion objects, expect ~1 object lost per year → still within SLA.
SLA doesn’t guarantee zero data loss, but the chance of losing any given object in a year is tiny.

Support response time: within 2 hours for critical issues

Our SME is available for you within 2 hours
We do not guarantee fix time. We guarantee response time

Penalty if broken: Service credits equivalent to 10% of the monthly bill
In short: “We guarantee our service will be available 99.9% of the time each month, with almost no data loss, and our subject matter experts are ready for you. If not, you get a credit.

Cost: 100 Credits (Credit can be in USD, EUR, CZK, you name it)

Example of SLO: Internal API Endpoint

Product: Internal REST API Endpoint

SLO (Service Level Objective) – What the provider aims for internally:

Uptime target: 99.95% per month
API request latency: 95% of requests under 200ms
Backup success rate: 100% per day
In short: “Internally, we aim to exceed the SLA and keep our service as reliable and fast as possible.”

Responsibility: Platform Engineering SRE / DevOps Engineers

Accountability: Platform Engineering Lead / Manager with Product Owner / Platform Product Manager

Well, now there is another question.

What is the difference between Responsibility and Accountability?

Quick way to remember:

Responsibility = doing
Accountability = owning

Responsibility

Definition: Being tasked with doing something or completing a specific duty.
Focus: The work itself.
Who it applies to: The person or team who actually performs the work.
Key idea: “I am responsible for completing this task.”
Example:

A developer is responsible for writing and maintaining the code for a REST API functions and data validity.
A SRE / Platform Engineer is responsible for API Endpoint availability.

Accountability

Definition: Being answerable for the outcome of a task or decision, regardless of who did the work.
Focus: The results or impact.
Who it applies to: The person who owns the outcome and must report or justify it.
Key idea: “I am accountable if this task succeeds or fails.”
Example: The project manager is accountable for the feature being delivered on time, even if the developer does the coding.

The Platform Engineering team as a whole delivers the service (doing), but the Platform Engineering Lead or Platform Product Owner is typically accountable for ensuring the platform works (owning), meets SLAs, and enables developer productivity. In next two sections, it is explained in further details.

RACI

RACI table below clearly defines responsibility and accountability of various Platform Engineering roles.

Platform Product Owner
Platform Engineering Lead (Manager)
Platfrom Architect
Platform Engineers
DevOps
SREs
Operations

| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |
| Activity / Service. | Accountable (A) | Responsible (R) | Consulted (C) | Informed (I) |
| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |
| Platform roadmap & | Platform Product Owner | Platform Engineers | Developer teams | CTO |
| feature prioritization | | | Architects | VP Engineering |
| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |

| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |

| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |

| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |

| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |

| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |

Roles

In the table below, the various platform engineering roles are explained.

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |
| Role | Focus | Mindset / Goal | Typical Work |
| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

| -------------------- | ------------------------ | ------------------------- | ----------------------------- |

Clear distinction between roles

Platform Product Owner → decides why the platform exists and what problems it solves
Platform Architect → decides how the platform should be built
Platform Engineering Lead → decides what gets built and why
Platform Engineer → designs and builds the platform
DevOps Engineers → ensure platform is deployable, scalable, and maintainable
Site Reliability Engineer (SRE) → ensure platform reliability and operational excellence
Operations / NOC / Support Engineers → Handle day-to-day operational support

Typical Platform Org Chart

Platform Engineering Org Chart could look like the one on drawing below.

Platform Engineering Org Chart

My Personal IT Infrastructure Knowledge Base

Pages

Sunday, February 1, 2026

Platform Engineering - SLA vs SLO & Platform Org Chart

What is a Service?

Example of Platform Services:

SLA – Service Level Agreement

SLO – Service Level Objective

The Key Difference

Why It Matters

Example of detailed SLA: Cloud Storage Service

Example of SLO: Internal API Endpoint

Well, now there is another question.

What is the difference between Responsibility and Accountability?

Responsibility

Accountability

RACI

Roles

Clear distinction between roles

Typical Platform Org Chart

No comments:

Post a Comment