1. Introduction

Amazon RDS (Relational Database Service) for PostgreSQL is a managed solution that simplifies PostgreSQL database administration in the cloud. In this post, we’ll explore the key concepts and features you need to know to work with this powerful tool.

1.1. Performance: what really matters

Your RDS performance depends on two pillars:

  1. Instance class - CPU, memory, and network
  2. Storage type - I/O throughput

1.2. Scalability: the numbers

RDS offers independent scalability for compute and storage:

Compute limits
  • Up to 128 vCPUs
  • Up to 4,096 GB of RAM
Storage limits
  • Up to 64 TB of capacity
  • Up to 256,000 IOPS
  • Supports auto-scaling and scale-down

⏰ Important detail

  • Storage modifications DO NOT cause downtime.
  • ⚠️ BUT you need to wait 6 hours between modifications.
  • This means: plan your changes in advance. If you increased storage now, you’ll have to wait 6 hours to make another adjustment.

2. Choosing the right instance class

Choosing the right instance class is crucial to achieve the best balance between performance and cost. RDS offers different instance families, each optimized for specific scenarios.

Available instance families
  • General Purpose - Smaller workloads or with variable usage patterns
  • CPU Optimized - Tasks demanding high computational power
  • Memory Optimized - Memory-intensive queries and large cached datasets
  • Graviton - Best choice in most cases

2.1. How to choose?

The decision depends on your workload characteristics:

  • Analyze the current bottleneck: Is it CPU? Memory? I/O?
  • Consider the usage pattern: Constant or variable?
  • Evaluate cost-benefit: Graviton usually wins
  • Test in staging: Validate performance before going to production

3. Storage types: EBS in action

RDS uses EBS (Elastic Block Store) volumes as storage. Think of EBS as the HDD/SSD of your database in the cloud. There are three main options, each with different speed and cost characteristics.

3.1. Before starting: what is IOPS?

IOPS = Input/Output Operations Per Second

In simple terms: how many times per second your database can read or write data to disk.

In the database
  • Each SELECT, INSERT, UPDATE = I/O operations
  • More IOPS = database responds faster
  • Fewer IOPS = queries wait in queue
How to know how many IOPS I need?
  • Monitor in CloudWatch - Metric: ReadIOPS + WriteIOPS
  • If the sum is consistently above 80% of your limit = time to increase

πŸ’‘ Throughput vs IOPS

  • IOPS: How many operations per second (quantity)
  • Throughput: How many MB/s you transfer (transfer speed)

3.2. GP2 (General Purpose SSD) - the legacy option

How it works: performance grows with volume size

Performance examples:

  • 100 GB = ~300 IOPS
  • 500 GB = ~1,500 IOPS
  • 1 TB = ~3,000 IOPS
  • 3,334 GB = 10,000 IOPS (base maximum)

Characteristics:

  • Variable performance: larger disk, more IOPS
  • Up to 1,000 MBps throughput
  • Decent price, but GP3 is generally better

When to use: Practically never. GP3 is superior.

⚠️ GP2 problem

If you need 10,000 IOPS but only use 200 GB, you’ll have to pay for ~3 TB of storage just to get the necessary IOPS. Wasteful!

3.3. GP3 (General Purpose SSD) - standard choice ⭐

How it works: performance independent of size
  • Any size = 3,000 IOPS baseline (free)
  • Need more? Can buy up to 16,000 extra IOPS

Characteristics:

  • 3,000 IOPS baseline independent of size
  • 125 MBps throughput baseline (can increase up to 1,000 MBps)
  • ~20% cheaper than GP2 with same performance
  • You adjust IOPS and throughput separately from size

When to use: This should be your standard choice for 90% of cases.

3.4. IO1 / IO2 Block Express - maximum performance πŸš€

How it works: you provision exactly how many IOPS you need

Available options:

  • IO1: Up to 64,000 IOPS per volume
  • IO2: Up to 256,000 IOPS per volume
  • IO2 Block Express: Up to 256,000 IOPS + better durability

Characteristics:

  • Provisioned and guaranteed IOPS - no surprises
  • Ultra-low latency - single-digit milliseconds
  • Consistent performance - doesn’t depend on credits or variations
  • More expensive - you pay for provisioned IOPS
When to use IO2 Block Express

Use IO2 Block Express when:

  • Critical app that can’t have variable latency
  • Need >16,000 IOPS consistent
  • Intensive OLTP with thousands of transactions/second
  • Mission-critical database (banking, healthcare, large e-commerce)

3.5. Comparison between storage types

FeatureGP2 (Legacy)GP3 (Standard) ⭐IO2 Block Express (Premium)
Baseline IOPS3 per GB3,000 (fixed)You choose
Maximum IOPS16,00016,000256,000
ThroughputUp to 1,000 MBpsUp to 1,000 MBpsUp to 4,000 MBps
LatencyVariableLowUltra-low (consistent)
Cost$$$$$$$
Flexibility❌ Tied to sizeβœ… Independentβœ… Total control
Best forNothing (use GP3)90% of casesCritical apps

4. Auto storage scaling: let RDS manage

One of the most practical RDS features is auto-scaling storage. Configure once and let the system expand automatically when needed.

4.1. When is auto-scaling triggered?

RDS monitors your database 24/7 and triggers auto-scaling when it detects:

Auto-scaling triggers

Trigger 1: Free space ≀ 10% of total capacity

  • Simple and direct: running out of space = automatic expansion

Trigger 2: Predicted growth

  • RDS analyzes history and predicts: “Based on recent days, you’ll run out of space soon”
  • Why is this important? Prevents running out of space during an unexpected spike

4.2. Cooldown protection (6 hours)

⏰ Cooldown protection

To avoid cascading expansions that could cause instability:

Rule: Auto-scaling only adds capacity if no modification occurred in the last 6 hours.

Why does this exist?

  • Avoids rapid fluctuations that can impact performance
  • Prevents uncontrolled growth from bugs
  • Gives RDS time to stabilize the new size

⚠️ Important implication: If you’re growing VERY fast (>10% every 6h), auto-scaling may not keep up. In these cases, size manually with margin.

4.3. Essential CloudWatch alarms

πŸ’‘ Don’t rely only on the maximum limit!

Configure proactive alerts:

Low space alert (70%)

  • Detect problems before they become critical
  • Gives time to investigate abnormal growth

Rapid growth alert

  • Monitor storage growth rate
  • Identify unexpected spikes that may indicate problems

Near limit alert (90%)

  • Last warning before hitting the configured maximum
  • Time to increase maximum limit if needed

Important tip: Set a reasonable maximum limit to avoid billing surprises. If something goes wrong (like an uncontrolled process generating data), you don’t want to wake up with a 64 TB volume!

5. Cost optimization

Performance is essential, but nobody wants surprises on the AWS bill. The good news? You can have both - excellent performance AND controlled costs.

5.1. Strategy 1: Dynamic scaling (schedule-based scaling)

Implementation options

Option 1: AWS Lambda + Scheduler

  • More flexibility and programmatic control
  • Ideal for complex scaling rules

Option 2: AWS Systems Manager (no code)

  • Visual interface, no programming needed
  • Good for simple use cases

Option 3: Third-party tools

  • AWS Instance Scheduler, CloudHealth, or Terraform with cron jobs
  • More robust solutions for large environments

5.2. Strategy 2: Reserved Instances (RIs)

πŸ’° Free money on the table

If you’re SURE you’ll use RDS for 1-3 years, RIs are guaranteed savings:

  • Up to 72% discount vs On-Demand
  • No operational changes - just financial benefit
  • Payment flexibility - All upfront, partial, or none

5.3. Strategy 3: Graviton

Migrating to Graviton is probably the easiest optimization with highest ROI:

  • Up to 40% cheaper than equivalent x86 instances
  • Equal or superior performance on most workloads
  • Simple migration - usually just change instance type

5.4. Strategy 4: Storage - GP2 β†’ GP3

Simple migration with immediate ~20% savings:

  • Same performance (or better)
  • No downtime
  • Independent adjustment of IOPS and throughput

5.5. Strategy 5: Review read replicas

Optimizing read replicas

Read replicas are great for performance, but each one duplicates your compute costs.

Questions to ask:

  • Are all replicas being actively used?
  • Could you consolidate read workloads into fewer replicas?
  • Would smaller replicas meet your needs?
  • Could connection pooling reduce the need for replicas?

Action: Monitor each replica’s utilization and remove or reduce underutilized ones.

5.6. You can’t optimize what you don’t measure

πŸ“Š Essential CloudWatch metrics

Monitor constantly:

  • CPU Utilization - Identify oversizing
  • Read/Write IOPS - Optimize storage type
  • Database Connections - Size correctly
  • Free Storage Space - Prevent problems
  • Replication Lag - Read replica health

6. RDS configuration: Parameter Groups and connections

If you’ve worked with “traditional” PostgreSQL (installed on a server), you know that configuring the database involves editing text files. In RDS, AWS simplified (and in some cases complicated a bit) this process.

6.1. Traditional PostgreSQL vs RDS

Traditional PostgreSQL (self-managed)

You have direct access to configuration files:

1. postgresql.conf - Server configurations

max_connections = 100
shared_buffers = 256MB
work_mem = 4MB
maintenance_work_mem = 64MB

2. pg_hba.conf - Access control

# Who can connect from where and how
host    all    all    192.168.1.0/24    md5
host    all    all    10.0.0.0/8        trust

How to modify:

# 1. Edit the file
sudo nano /var/lib/postgresql/data/postgresql.conf

# 2. Reload configuration
sudo systemctl reload postgresql

# or if restart is needed
sudo systemctl restart postgresql
Amazon RDS

You DO NOT have direct access to files.

Why? Because RDS is managed - AWS takes care of the infrastructure.

Instead, you use:

  • Parameter Groups (replaces postgresql.conf)
  • Security Groups (replaces pg_hba.conf)

6.2. Parameter Groups

Think of Parameter Group as a configuration profile for your database.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Parameter Group    β”‚ ← Configuration set
β”‚  "prod-default"     β”‚
β”‚                     β”‚
β”‚  max_connections=200β”‚
β”‚  shared_buffers=8GB β”‚
β”‚  work_mem=16MB      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β”‚ applied to ↓
           β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
    β”‚ RDS Instanceβ”‚
    β”‚  my-db-prod β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6.2.1. Types of Parameter Groups

Default Parameter Group vs Custom

Default Parameter Group (created by AWS)

  • Conservative default settings
  • Cannot be modified
  • Serves as initial template

Custom Parameter Group (created by you)

  • Fully configurable
  • You control all parameters
  • Can be shared between instances

⚠️ Important: You should ALWAYS create a custom parameter group. Never use default in production!

6.2.2. Parameter types

Not all parameters work the same way. Some can be changed “on the fly”, others require restart.

Static parameters

Definition: Require instance reboot to apply.

Common examples:

shared_buffers:
  - What it is: PostgreSQL cache memory
  - Typical: 25% of instance RAM
  - Example: On 32 GB RAM instance β†’ shared_buffers = 8 GB
  - Why reboot: Memory is allocated at startup

max_connections:
  - What it is: Maximum number of simultaneous connections
  - Typical: 100-500 (depends on workload)
  - Why reboot: Memory structures are pre-allocated

wal_buffers:
  - What it is: Buffers for Write-Ahead Logging
  - Typical: -1 (auto, based on shared_buffers)
  - Why reboot: Buffer is pre-allocated

How to apply:

  1. Modify the parameter group
  2. Apply to instance
  3. RDS requires reboot
  4. Downtime of ~5-10 minutes
Dynamic parameters

Definition: Can be applied immediately, without reboot.

Common examples:

work_mem:
  - What it is: Memory for sorting/hash operations
  - Typical: 4-16 MB (careful: per operation!)
  - Can be: Changed per session
  - Impact: Immediate

maintenance_work_mem:
  - What it is: Memory for VACUUM, CREATE INDEX, etc
  - Typical: 256 MB - 2 GB
  - Can be: Changed without reboot
  - Impact: Immediate

log_min_duration_statement:
  - What it is: Log queries above X ms
  - Typical: 1000 (log queries >1s)
  - Can be: Changed per session
  - Impact: Immediate

random_page_cost:
  - What it is: Estimated cost of random read (SSD = 1.1)
  - Typical: 1.1 for SSD, 4.0 for HDD
  - Can be: Changed immediately
  - Impact: Affects query planner

How to apply:

  1. Modify the parameter group
  2. Apply to instance with “Apply Immediately”
  3. No downtime!
  4. Change in seconds

6.3. Connections to RDS

Security Groups do the job of “who can connect from where”:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Security Group        β”‚ ← Controls IPs/networks
β”‚   "rds-prod-sg"         β”‚
β”‚                         β”‚
β”‚   Inbound Rules:        β”‚
β”‚   βœ… 10.0.1.0/24:5432   β”‚ (app network)
β”‚   βœ… 203.0.113.5/32:5432β”‚ (admin fixed IP)
β”‚   ❌ 0.0.0.0/0          β”‚ (rest blocked)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚ protects ↓
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  RDS Instance β”‚
  β”‚  my-db        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6.3.1. Authentication methods in RDS

Authentication options

1. Username/Password (Default)

  • Traditional and simplest method
  • Credentials stored in RDS
  • Ideal to get started

2. IAM Authentication (Recommended)

  • No passwords - uses temporary IAM tokens
  • More secure - automatic credential rotation
  • Integrated auditing - track who accesses via CloudTrail
  • Ideal for modern AWS applications

3. Kerberos (Enterprise)

  • Centralized authentication for corporate environments
  • Active Directory integration
  • For companies with existing Kerberos infrastructure

7. TLDR

Instance choice

  • Default: M6g (Graviton) - best cost-benefit
  • CPU-heavy: M6g with larger class
  • Memory-heavy: R6g
  • Dev/test: T4g (burstable)

Storage

  • Use GP3 (not GP2!) for 90% of cases
  • Free baseline: 3,000 IOPS
  • IO2 Block Express: Only if you need >16k consistent IOPS

Auto-Scaling

  • Always enable
  • Set maximum limit (3-5x current)
  • Configure CloudWatch alarms (70%, 90%)

Cost optimization

  • GP2 β†’ GP3 (30 min, 20-50% storage savings)
  • x86 β†’ Graviton (1 week test, 20-40% savings)
  • Reserved Instances (30 min, 30-54% baseline savings)
  • Remove unnecessary replicas (variable savings)
  • Downsize oversized instances (analyze CPU <30%)

Parameter Groups

  • 🚫 Never use default in production
  • Create custom group
  • Essential parameters:
    • shared_buffers = 25% RAM (static)
    • random_page_cost = 1.1 (dynamic)
    • work_mem = 16MB (dynamic)
    • log_min_duration_statement = 1000 (dynamic)

Connections

  • Security Groups control network access
  • IAM Authentication recommended (no hardcoded passwords)
  • Never expose RDS publicly (always in private VPC)