ECS vs EC2 is the wrong question

I was studying for an AWS certification when I hit a practice question that pitted ECS against EC2 as if they were two options at the same level. I picked the “right” answer, moved on, and then sat with it for a minute. Something about the framing felt off, and the more I read, the worse it got. ECS is not an alternative to EC2. ECS runs on EC2, or it runs on Fargate, but in both cases there is a virtual machine underneath. The interesting question is not which to pick, it is which layer of abstraction you want to be responsible for.

This piece is my notes from working that out. I am still studying, so I am leaning on the AWS docs more than on production experience. Where I am inferring rather than reporting, I try to say so.

What each thing actually is

EC2 is a virtual machine service. AWS gives you a VM, you put whatever you want on it. You own the OS, the patching, what runs on top, how it gets traffic, and how it dies. The unit of work is the instance.

ECS is a container orchestrator. It schedules containers, restarts them when they die, wires them into load balancers, gives them IAM roles, and rolls out new versions. It does not actually run containers itself, it tells something else to run them. The unit of work is the task, which the docs describe as “one or more containers grouped together” with shared networking.

ECS supports two launch types, and this is where I got confused at first:

ECS on EC2: you run a fleet of EC2 instances with the ECS agent on them, and ECS schedules tasks onto those instances. You are still responsible for the VMs. ECS is the scheduler, EC2 is the compute.
ECS on Fargate: AWS runs the VMs for you, you only see the task. There is still a VM under there, you just do not get to log into it, patch it, or know which instance type it is.

So when a practice question asks “ECS vs EC2”, I think it is really hiding one of two real questions:

Plain EC2 (no orchestrator) vs ECS on Fargate. Do I keep managing VMs, or do I let AWS manage them and only deal with containers?
Plain EC2 vs ECS on EC2. Do I want a scheduler in front of my fleet, or do I run my workload directly on the instance?

The trade-offs change completely between (1) and (2), so collapsing them into one question gives bad answers to both. That, I think, is the thing the exam question was hiding from me.

When plain EC2 still makes sense

The reflex in 2026 is “containerise it, put it on Fargate.” From the docs and a few re:Invent talks I have watched, this is not always the right call. Plain EC2 seems to win when:

The workload is not a container. A long-lived process that pins itself to a host, a JVM tuned for a specific instance, a database you have a real reason to run yourself. Wrapping it in a container adds ceremony without buying you much.
You need the whole machine. GPU jobs, HPC, kernel modules, anything that wants --privileged and root on the host. ECS can do some of this on EC2 launch type, but at that point the scheduler is doing very little for you.
The instance is the unit of deployment. I came across game-server architectures as the canonical example: a single match lives in one process, and a scheduler trying to move that around is in the way.
You only have one or two long-lived boxes. A scheduler is overhead. For a tiny fleet, that overhead is most of the work.

The opposite mistake, which I want to remember not to make, is reaching for ECS because it sounds modern and ending up with an orchestration layer for a workload that has exactly one instance.

When ECS on Fargate is the obvious answer

For most stateless HTTP services, Fargate looks like the right default. The reasons are unglamorous, and that is mostly the point:

You stop being a Linux administrator. No AMIs to bake, no patches to roll, no SSM agents to update. The VM exists, you do not touch it.
Scaling is per-task, not per-instance. You set CPU and memory on the task definition, the service scales tasks against whichever metric you pick, and AWS finds the capacity.
The blast radius is the task. A misbehaving container takes down a task, not a whole instance running several of them.
IAM is per-task. Task roles, not instance roles. Your auth boundary lines up with your service boundary, which is the boundary I would want to reason about.

The trade-offs I have read about, and would want to verify the day I actually ran Fargate in anger:

Cold start. A Fargate task takes seconds to tens of seconds to come up. Image size, ECR pull, ENI attach, and ALB health-check intervals all add up. For sudden 10x bursts from zero, the docs basically say “keep some tasks warm.”
Cost at high sustained load. Fargate is priced per vCPU-second and GB-second. At very high steady-state utilisation, the AWS pricing pages suggest you pay a real premium for the abstraction.
No host-level controls. No daemonsets, no sidecars that need host networking, no privileged containers.

When ECS on EC2 is the right call

ECS on EC2 is the awkward middle. From what I have read, it makes sense when:

You have a large container fleet at high sustained utilisation, where Fargate’s premium becomes real money.
You need GPU containers, large memory, or local NVMe. Fargate supports some of this, EC2 launch type gives you the full instance type catalogue.
You need a sidecar that wants host access, like a logging agent that reads host logs.

The two things I have flagged in my notes as “watch out for these the first time you do this”:

Capacity providers. ECS Capacity Providers can scale the underlying Auto Scaling Group based on pending tasks, but the docs are honest that the feedback loop is slow. A new deploy that briefly doubles task count can sit waiting for new instances to come up. I have not had to debug this myself yet, I just want to remember it exists.
Bin packing. ECS treats the CPU and memory you set on a task as a hard reservation when placing. A task definition that over-reserves wastes a lot of fleet, even if the actual usage is low. A toy fragment from the docs:
```
{
  "containerDefinitions": [
    {
      "name": "api",
      "image": "012345678901.dkr.ecr.eu-west-1.amazonaws.com/api:abc123",
      "cpu": 1024,
      "memoryReservation": 512,
      "memory": 1024
    }
  ],
  "requiresCompatibilities": ["EC2"],
  "cpu": "1024",
  "memory": "1024"
}
```
cpu: 1024 reserves a full vCPU. On a 2-vCPU instance you fit two of these, regardless of what they actually do at runtime. The lesson I am taking from this is that right-sizing those reservations is probably the most important tuning knob on ECS-on-EC2, even though it is not the most exciting one.

The cost model is not what I first thought

The naive comparison is “Fargate vCPU-hours cost more than EC2 vCPU-hours, therefore EC2 is cheaper.” That is true at the sticker price, and I think it misses the point.

On EC2 you pay for the instance whether tasks are using it or not. On Fargate you pay for the task whether it is doing anything or not. So the right comparison is utilised vCPU-hours, not provisioned vCPU-hours.

I worked through this on paper: a service with 30% average CPU and a 4x peak. On ECS-on-EC2 you have to provision for the peak, so you are paying for 100% of the capacity to do 30% of the work. On Fargate you scale tasks with load and pay roughly in proportion to demand. Above some break-even utilisation, EC2 wins. Below it, Fargate wins. I do not have a number I trust for where that break-even sits in 2026, but the shape of the trade-off is what I want to remember.

So what would I actually pick

If the exam question had let me write a paragraph instead of choosing A/B/C/D, this is what I would have wanted to write:

Am I running containers at all? If no, the question is not ECS, it is EC2 vs something else like Lambda or App Runner or plain EC2.
Do I need host-level control like GPU, privileged, or a daemonset? If yes, ECS on EC2.
Am I running at very high sustained utilisation where Fargate’s premium hurts? If yes, ECS on EC2 with capacity providers.
Otherwise, ECS on Fargate. The defaults are reasonable, the operational surface is small, and the time saved not babysitting AMIs is time I get to spend learning the next thing.

“ECS vs EC2” collapses two different questions into one and gets both wrong.

The real question is which layer of the stack I want to own. Once I picked the layer, the service fell out of it. That, more than the practice-question answer, is what I actually learned.