When building and operating platform services—whether it’s cloud hosting, APIs, or IT support—you’ll often hear the terms SLA and SLO. They might sound similar, but they serve very different purposes.
What is a Service?
A service is a self-contained, reusable capability that the platform exposes to customers. It delivers value and solves a specific problem without requiring the customer to manage the underlying complexity.
- Services are technical units of functionality
- Customers interact with services via APIs, UIs, or SDKs
- Services can be part of a larger product
Example of Platform Services:
- Compute / Virtual Machine
- Service Name: vServer
- Description: Virtual servers with configurable CPU, memory, and storage
- SLA:
- 99.999% uptime;
- SLO:
- VM deployment < 5 min
- Compute / Bare Metal
- Service Name: pServer
- Description: High-performance computing, compliance workloads
- SLA:
- 99.9% uptime;
- SLO:
- provisioning < 30 min
- Storage / Block Storage (vDisk)
- Service Name: vHDD
- Description: Capacity block storage device
- SLA:
- 99.999% uptime;
- 1 IOPS per GB (32kB) @ <10 ms
- SLO:
- provisioning < 5 min
- Service Name: vSSD
- Description: Capacity block storage device
- SLA:
- 99.999% uptime;
- 3 IOPS per GB (32kB) @ <5 ms
- SLO:
- provisioning < 5 min
- etc.
SLA – Service Level Agreement
An SLA is a formal contract between a service provider and a customer. It defines what the customer can expect from the service and usually includes consequences if the service falls short.
Example:
“Our cloud storage guarantees 99.9% uptime per month. If we fail, you receive a service credit.”
SLO – Service Level Objective
An SLO is an internal goal for service performance. It helps teams measure and improve service quality but isn’t usually legally binding.
Example:
“Our API endpoints aim for 99.95% uptime per month.”
The Key Difference
- SLO = internal target
- SLA = external promise
Think of it like this: you aim for an SLO, but you commit to an SLA.
Why It Matters
Understanding SLAs vs SLOs helps teams:
- Set realistic goals
- Measure service quality accurately
- Manage customer expectations clearly
Getting this distinction right is a cornerstone of good service management.
Example of detailed SLA: Cloud Storage Service
Product: S3 Cloud Storage as a Service
SLA (Service Level Agreement) – What the provider promises to the customer:
- Uptime guarantee: 99.9% per month
- allowed downtime 43m 50s per month
- Data durability: 99.9999999% per year (9 nines)
- If you have 1 million objects, expected loss is 0.001 per year → practically zero, very safe.
- If you have 1 billion objects, expect ~1 object lost per year → still within SLA.
- SLA doesn’t guarantee zero data loss, but the chance of losing any given object in a year is tiny.
- Support response time: within 2 hours for critical issues
- Our SME is available for you within 2 hours
- We do not guarantee fix time. We guarantee response time
- Penalty if broken: Service credits equivalent to 10% of the monthly bill
- In short: “We guarantee our service will be available 99.9% of the time each month, with almost no data loss, and our subject matter experts are ready for you. If not, you get a credit.
Cost: 100 Credits (Credit can be in USD, EUR, CZK, you name it)
Example of SLO: Internal API Endpoint
Product: Internal REST API Endpoint
SLO (Service Level Objective) – What the provider aims for internally:
- Uptime target: 99.95% per month
- API request latency: 95% of requests under 200ms
- Backup success rate: 100% per day
- In short: “Internally, we aim to exceed the SLA and keep our service as reliable and fast as possible.”
Responsibility: Platform Engineering SRE / DevOps Engineers
Accountability: Platform Engineering Lead / Manager with Product Owner / Platform Product Manager
Well, now there is another question.
What is the difference between Responsibility and Accountability?
Quick way to remember:
- Responsibility = doing
- Accountability = owning
Responsibility
- Definition: Being tasked with doing something or completing a specific duty.
- Focus: The work itself.
- Who it applies to: The person or team who actually performs the work.
- Key idea: “I am responsible for completing this task.”
- Example:
- A developer is responsible for writing and maintaining the code for a REST API functions and data validity.
- A SRE / Platform Engineer is responsible for API Endpoint availability.
Accountability
- Definition: Being answerable for the outcome of a task or decision, regardless of who did the work.
- Focus: The results or impact.
- Who it applies to: The person who owns the outcome and must report or justify it.
- Key idea: “I am accountable if this task succeeds or fails.”
- Example: The project manager is accountable for the feature being delivered on time, even if the developer does the coding.
The Platform Engineering team as a whole delivers the service (doing), but the Platform Engineering Lead or Platform Product Owner is typically accountable for ensuring the platform works (owning), meets SLAs, and enables developer productivity. In next two sections, it is explained in further details.
RACI
RACI table below clearly defines responsibility and accountability of various Platform Engineering roles.
- Platform Product Owner
- Platform Engineering Lead (Manager)
- Platfrom Architect
- Platform Engineers
- DevOps
- SREs
- Operations
| Activity / Service. | Accountable (A) | Responsible (R) | Consulted (C) | Informed (I) |
| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |
| Platform roadmap & | Platform Product Owner | Platform Engineers | Developer teams | CTO |
| feature prioritization | | | Architects | VP Engineering |
| ---------------------- | ----------------------- | ------------------ | --------------- | ---------------- |
Roles
In the table below, the various platform engineering roles are explained.
| Role | Focus | Mindset / Goal | Typical Work |
| -------------------- | ------------------------ | ------------------------- | ----------------------------- |
| | platform. | platform sustainability. | Prioritize platf. initiatives |
Clear distinction between roles
- Platform Product Owner → decides why the platform exists and what problems it solves
- Platform Architect → decides how the platform should be built
- Platform Engineering Lead → decides what gets built and why
- Platform Engineer → designs and builds the platform
- DevOps Engineers → ensure platform is deployable, scalable, and maintainable
- Site Reliability Engineer (SRE) → ensure platform reliability and operational excellence
- Operations / NOC / Support Engineers → Handle day-to-day operational support
Typical Platform Org Chart
Platform Engineering Org Chart could look like the one on drawing below.
![]() |
| Platform Engineering Org Chart |

No comments:
Post a Comment