placemy.cloud
Guide

Cloud Right-Sizing: The Definitive Guide

Last updated: May 2026 · 12 min read

Cloud right-sizing is the practice of matching each virtual machine to the smallest instance type that still meets its workload's actual resource requirements. Most organisations over-provision by 40–70% because they size for peak demand that rarely materialises. The result is a cloud bill that is 24–67% higher than it needs to be.

Why VMs are almost always oversized

Engineers size VMs at provisioning time — when they have the least information about real demand. They pick an instance type based on marketing descriptions, vendor recommendations, or a previous deployment that was itself never measured. Once running, the instance quietly consumes budget at 5–15% average CPU utilisation while the bill hits finance every month.

The problem compounds across cloud accounts. A platform team with 20 VMs across 3 accounts will typically find that 14–16 of them are oversized by at least one instance family step. In the five workloads we measured during our research evaluation, the median over-provision was 63%.

Why average CPU is a terrible sizing metric

Many right-sizing tools (including AWS Cost Explorer and Azure Advisor) use average CPU utilisation as their primary signal. A VM running at 5% average and 90% peak could be either a bursty workload that legitimately needs headroom, or a steady-state service that had a single anomalous spike. Average cannot distinguish the two.

P95 utilisation — the value below which 95% of all samples fall — captures the workload's real operating envelope. It excludes the top 5% of spikes (which are often transient and recoverable) while still preserving the sustained demand pattern. A VM at 12% P95 is genuinely oversized. A VM at 72% P95 is not.

The three inputs to a safe right-sizing decision

1. CPU utilisation (P95). Measured from CloudWatch (AWS), Azure Monitor, or Cloud Monitoring (GCP) over a minimum 24-hour observation window. Longer windows (7–14 days) capture weekly patterns but require the monitoring agent to retain that history.

2. Memory utilisation (P95). Only available when a monitoring agent (CloudWatch Agent, Azure Monitor Agent, Ops Agent) is installed. Critical for preventing a downsize that triggers OOM kills. Any recommendation that would reduce available memory below the workload's P95 memory pressure is filtered out.

3. Instance type catalogue. AWS publishes 700+ current-generation instance types. Azure publishes 900+. GCP publishes 300+. Each has different vCPU-to-memory ratios, network throughput limits, and regional availability. A right-sizing engine must compare the workload against all candidates that match its operating system, region, and generation — not just the next size down in the same family.

Cross-cloud right-sizing: the overlooked opportunity

Most right-sizing tools only compare within the same cloud provider. But the same 2-vCPU, 8 GB workload might cost $0.0832/hr on AWS (m6i.large in us-east-1) and $0.0640/hr on Azure (Standard_B2ms in East US). That is a 23% price difference for an equivalent spec. If your organisation already runs on multiple clouds, migrating specific workloads cross-provider is a lever that single-cloud tools ignore entirely.

How placemy approaches right-sizing

placemy is a CLI tool that runs on your machine. It authenticates with your existing cloud SDK credentials, discovers every running VM across your AWS accounts, Azure subscriptions, and GCP projects, pulls P95 CPU and memory metrics, and compares each workload against 18,000+ instance types across all three clouds.

The output is a self-contained HTML report (and optional JSON) with per-VM recommendations ranked by savings. Each recommendation includes the evidence trail: current utilisation percentiles, the candidate instance spec, the price comparison, and the safety constraints that were checked.

Importantly, the scan is completely read-only. No resources are modified, no tags are written, no instances are stopped. Your cloud data never leaves your machine — the scan runs locally and results are written to a bucket you own.

Measured results

We tested the engine against five production workloads during its development as a research project. The savings ranged from 23.9% to 67.4%, with a median of 32.5%. These are actual monthly cost reductions measured before and after applying the highest-confidence recommendations — not projections.

The full data is published on our savings evidence page, including instance types, hourly rates, and provider details.

Getting started

A typical first scan takes under 5 minutes from install to report:

$ curl -sSL https://placemy.cloud/install | sh
$ placemy auth login
$ placemy scan --output report.html

The report opens in any browser — no internet connection required after generation. For detailed setup instructions, see the getting started guide.