Migrating On-Prem Apps to AWS: A Real Architecture Walkthrough

· 7 min read ·
AWSArchitecture

Most migration guides stop at “lift and shift vs. re-architect.” This one skips the theory and focuses on the decisions to make before, during, and after moving workloads from an on-premises data centre to AWS.


1. Pre-Migration Assessment Checklist

Before touching a single EC2 instance, a clear inventory of what is being moved and why is needed. Skipping this phase is the single most common reason migrations stall mid-flight.

Application Discovery

Infrastructure & Performance Baseline

Data & Compliance

Readiness Score

After the checklist, assign each application a migration readiness score across four axes:

AxisQuestions to answer
ComplexityNumber of dependencies, custom kernel modules, legacy protocols (IPX, NetBIOS)
RiskRevenue impact of downtime, regulatory exposure
EffortCode changes needed, team familiarity with AWS
Business valueCost savings, performance gains, agility unlocked

Low complexity + high business value apps go in Phase 1. High risk + high effort apps go last, or get a parallel-run strategy.


2. Migration Phases

Trying to migrate everything at once is how projects fail. Phasing builds confidence and institutional knowledge before touching critical systems.

Phase 0 — Foundation (Weeks 1–4)

This phase has no application migrations. It is purely infrastructure setup.

Phase 1 — Low-Risk Workloads (Weeks 5–10)

Target: stateless, non-customer-facing, or non-critical applications.

Good candidates: internal tools, batch processing jobs, dev/staging environments, monitoring agents, log shippers, internal wikis.

Pattern: Rehost (lift-and-shift) using AWS MGN (Application Migration Service). MGN installs an agent on the source server, continuously replicates the disk to a staging area in AWS, and then launches a test instance before cutover. Cutover time is typically under 30 minutes.

Phase 2 — Tier-2 Production (Weeks 11–20)

Target: production workloads that are important but not the most revenue-critical.

Good candidates: internal APIs, secondary databases, background workers, reporting services.

At this phase, start introducing managed services where it makes sense:

Important: Do not re-architect and migrate at the same time. If swapping MySQL for Aurora, migrate first (rehost), stabilise, then re-platform in a subsequent sprint. Combining both changes makes rollback nearly impossible.

Phase 3 — Tier-1 / Mission-Critical (Weeks 21–30+)

Target: the systems that would wake up the CTO at 3am if they went down.

These require a parallel-run or blue/green strategy:

  1. Replicate the database to AWS using AWS DMS with ongoing replication enabled.
  2. Stand up the application in AWS, pointed at the replicated database.
  3. Route a small percentage of traffic to the AWS environment (Route 53 weighted routing or ALB canary rules).
  4. Monitor error rates, latency, and database lag for at least one full business cycle.
  5. Shift 100% of traffic, then stop replication, then decommission the on-prem instance.

Rollback is straightforward as long as replication is running — flip the DNS weight back. Once replication stops, rollback becomes a restore-from-backup operation.

Phase 4 — Decommission & Optimise

After all workloads are in AWS, decommission on-prem hardware on a defined schedule. Once decommissioned, revisit right-sizing with real AWS Cost Explorer data, move suitable workloads to Savings Plans or Reserved Instances, and evaluate which services warrant re-architecting.


3. Networking Pitfalls

Networking is where most migrations accumulate unplanned work.

Overlapping CIDR Blocks

If the on-prem network uses 10.0.0.0/8 and a VPC is created with 10.0.0.0/16, VPC peering and Transit Gateway routing will break with no error — traffic will simply route to the wrong destination.

Fix: Audit all RFC 1918 ranges in use before creating a single VPC. Reserve a dedicated non-overlapping CIDR range for AWS (e.g., 172.16.0.0/12 if on-prem owns all of 10.x.x.x).

Security Groups Are Not Firewalls

Security groups are stateful but allow-only — there is no explicit deny rule. Network ACLs add stateless deny capability but are easy to misconfigure.

The common mistake: migrating a firewall ruleset literally into security groups, including rules like “deny all from 0.0.0.0/0.” That rule does nothing in a security group — the default deny is implicit.

Fix: Model security groups around application tiers (web, app, db) and allow only the specific ports each tier needs from the tier that calls it. Reference security group IDs instead of IPs wherever possible.

DNS Resolution Between On-Prem and VPC

Applications that resolve hostnames using on-prem DNS servers will break when migrated to a VPC, because the VPC’s Route 53 Resolver handles .aws.internal hostnames and the on-prem DNS has no knowledge of them.

Fix: Configure Route 53 Resolver inbound and outbound endpoints. Outbound endpoints forward queries for on-prem domains to on-prem DNS. Inbound endpoints let on-prem DNS forward queries for AWS private zones to Route 53.

NAT Gateway Costs Surprising Teams

On-premises, east-west traffic between application tiers is free. In AWS, traffic that routes out through a NAT Gateway (e.g., a private subnet instance calling an S3 bucket via the public endpoint) incurs NAT Gateway data processing charges in addition to the data transfer charge.

Fix: Use VPC endpoints. Gateway endpoints for S3 and DynamoDB are free. Audit the NAT Gateway BytesProcessed CloudWatch metric after Phase 1 to catch unexpected traffic patterns before they become a surprise on the bill.


Closing Thoughts

The technical work of a migration is generally straightforward. The hard parts are organisational: getting accurate dependency data, aligning on a cutover window with multiple stakeholders, and resisting the pressure to migrate everything at once.

Phases exist to create checkpoints. Each completed phase gives real operational experience in AWS, a reduced on-prem footprint, and a smaller blast radius for the next one. Move methodically, instrument everything from day one, and treat the decommission date as a hard deadline.

Further Reading