Cloud Networking 101 for On-Prem Admins

Cloud networking confuses experienced on-prem admins for one specific reason: the concepts are the same as everything we’ve covered in this pathway, but the controls are different. There’s no rack of switches you can patch into; there’s no firewall you SSH into; there are no cables. Everything is API calls and configuration objects. The packets still travel between hosts the same way, but the boxes that shape their path are software constructs you create on a console or via Terraform.

A blue-lit rack of enterprise servers with green status LEDs — The cloud is just someone else’s servers, racked in someone else’s data center, exposed to you through APIs. The hardware is real; only the controls feel virtual. Photo: Cookiecutter, Pexels.

This is lesson 11 of Networking from Scratch. The vocabulary is mostly AWS-flavoured here because AWS is the most common starting point and its naming is now industry shorthand — but Azure (VNet) and Google Cloud (VPC) follow the same model with different brand names. Once the AWS picture is clear, the others fall into place.

The VPC: your private network in the cloud

A VPC (Virtual Private Cloud) is a logically isolated network within the cloud provider’s shared infrastructure. You pick an IPv4 CIDR block when you create it — typically a private RFC 1918 range like 10.0.0.0/16 — and that block is yours. The cloud provider routes traffic within it, presents you with abstractions for subnets, gateways, and security, and gives you APIs to wire it all together.

Three things to know about choosing the VPC’s address space:

Pick big. A /16 gives you 65,536 addresses to slice into subnets. A /20 only gives you 4,094. You can’t expand a VPC’s primary CIDR after creation in any clean way; pick generously.
Avoid overlaps with on-prem. If your office or other clouds use 10.0.0.0/16, picking it for your new VPC will make it impossible to peer or VPN them later. Coordinate before you create.
Avoid the most popular ranges. Resist 192.168.0.0/24, 192.168.1.0/24, 10.0.0.0/24 — every consumer router uses one of those, and every laptop on a coffee-shop Wi-Fi will sit on one. If your VPN clients are on those, routing breaks.

Subnets: now they’re per-AZ

Inside a VPC you carve subnets, just like on-prem. The cloud twist: every subnet is bound to a single availability zone (AZ), an isolated power and network domain inside a region. To run a load-balanced service that survives one AZ failing, you put resources in two or more subnets, each in a different AZ.

A typical AWS layout for a single-region production app:

Tier	AZ-1a	AZ-1b	What lives there
Public	10.0.1.0/24	10.0.2.0/24	Public load balancers, bastion hosts, NAT gateways
Private (app)	10.0.10.0/24	10.0.11.0/24	App servers, container nodes
Private (data)	10.0.20.0/24	10.0.21.0/24	RDS instances, ElastiCache, etc.

Six subnets, two AZs, three tiers. That’s the canonical pattern. Each row is logically “the public tier” or “the data tier”; each column is a fault domain. Anything you deploy goes into a tier-and-AZ pair.

One thing that confuses on-prem people: cloud subnets aren’t broadcast domains. The provider’s underlying network is L3 the whole way down; broadcast and multicast as you’d use them on a real Ethernet segment don’t work. ARP is faked — the host sees responses from the underlying SDN even though no actual broadcast was sent. Protocols that rely on L2 multicast (some clustering, some discovery) break in the cloud unless you build overlay networks.

Public vs private — what actually distinguishes them

The cloud doesn’t have a magic “public” checkbox on a subnet. The distinction is purely about routing:

A public subnet has a route in its route table sending 0.0.0.0/0 to the internet gateway. Resources here can be reached from the internet (if they have a public IP) and can reach the internet directly.
A private subnet has its 0.0.0.0/0 route pointed at a NAT gateway (or no default route at all). Resources here can’t be reached from the internet, and they can reach the internet only via the NAT gateway, which translates their source IP to its own.

The label is the route table. Same VPC, same address-space style, same provider primitives — the only difference is what the default route does. That’s the cloud equivalent of “is this segment behind the firewall or not.”

The four gateways you’ll meet

Gateway	Direction	Cost model	What it’s for
Internet Gateway (IGW)	VPC ↔ internet (both directions)	Free	Public-facing services. One per VPC.
NAT Gateway	Private subnet outbound to internet	Per hour + per GB processed	Letting private servers fetch updates without exposing them inbound.
VPC Peering	VPC ↔ VPC (point-to-point)	Per GB across regions	Two VPCs that need to talk privately. No transit through.
Transit Gateway	VPC ↔ VPC (hub-and-spoke), and VPC ↔ on-prem	Per hour + per GB	The grown-up answer when you have many VPCs and on-prem networks.

Two practical notes:

NAT gateway costs add up. If your private workloads pull large container images or do heavy outbound traffic, NAT gateway bandwidth charges are often the line item that surprises finance. VPC endpoints (private connections to AWS services) avoid the NAT for AWS-bound traffic and pay for themselves quickly.
VPC peering doesn’t do transit. If A peers with B, and B peers with C, A still cannot talk to C through B. That’s by design. Use a Transit Gateway when you need a hub.

Route tables: the cloud’s “routing table”

Every subnet has exactly one route table attached. The route table tells the SDN where to send traffic for a given destination. A typical public-subnet table:

Destination       Target
10.0.0.0/16       local         (the VPC itself, automatic)
0.0.0.0/0         igw-xxxxxxxx  (the internet gateway)

And a typical private-subnet table:

Destination       Target
10.0.0.0/16       local
0.0.0.0/0         nat-xxxxxxxx  (the NAT gateway)
192.168.0.0/16    tgw-xxxxxxxx  (your on-prem network, via Transit Gateway)

The route-table entries follow the longest-prefix-match rule we covered in lesson 5. The local route for the VPC’s own CIDR is always added by the provider and can’t be removed. Everything else is yours to configure.

Three or four route tables typically cover an entire VPC: one for public subnets, one for private subnets in each region, maybe one for an isolated “data” tier with no internet access at all. Don’t over-fragment them; keep tables few and consistent.

Security groups vs NACLs

The cloud’s firewall story has two layers. They sit at different scopes and behave differently.

	Security Group (SG)	Network ACL (NACL)
Attached to	An instance / ENI	A subnet
Stateful?	Yes — return traffic is allowed automatically	No — you must allow both directions explicitly
Default	Deny everything inbound; allow everything outbound	Allow everything in both directions
Rules	Allow only (no explicit deny)	Allow and deny, evaluated in order
Best used for	The vast majority of access control	Coarse-grained subnet-level guardrails (e.g. blanket deny of an IP range)

The right default is to do almost everything with security groups and use NACLs only when you need a stateless layer of defence (typically: blocking specific source IPs at a subnet boundary). Trying to do all your access control with NACLs is technically possible and operationally miserable.

Security groups also have a feature with no on-prem analogue: you can write rules that reference other security groups. “Allow inbound TCP 5432 from anything in security group sg-app” means “allow any instance attached to sg-app to connect to my Postgres.” You don’t enumerate IPs — the rule follows the membership. When the app fleet auto-scales and new instances appear, the rule covers them automatically.

Connecting to on-prem

Three options, in increasing order of cost and reliability:

Option	Speed / latency	Cost	When it’s right
Site-to-site VPN	Limited by the public internet path; usually 100–500 Mbps; latency variable	Cheap; provider charges hourly per tunnel + data	Dev/test, small production, backup path
Dedicated connect (AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect)	1, 10, or 100 Gbps; predictable latency; private path	Significant fixed cost (port + cross-connect + circuit) + cheaper data	Production, any sensitive workload, anything compliance-touching
SD-WAN overlay (Cisco, Palo Alto, Fortinet, others on top of cloud connectivity)	Aggregated across multiple paths	Layer on top of the above	Multi-cloud, multi-site enterprises that need uniform policy

From the cloud-network point of view, all three look like a route table entry: “10.20.0.0/16 via the on-prem gateway,” whether the gateway is a VPN, a Direct Connect, or an SD-WAN appliance.

Pets vs cattle — what changes for IPs

On-prem, IP addresses are usually pets — db01.corp.local is at 10.20.5.7 and has been for three years and will be tomorrow. You can hard-code it. You write firewall rules that reference it.

In the cloud, instances are usually cattle — i-0abc123 exists today, gets terminated and replaced tomorrow as part of a deploy or auto-scaling event, and the new one has a different IP. Hard-coding IPs falls apart. The right patterns:

Use DNS (cloud-internal or your own). Reference services by hostname, not IP. The cloud’s service-discovery mechanisms (Route 53, Cloud DNS, internal LB DNS) update when instances change.
Reference by tag or security group, not IP. “Allow connections from sg-app” survives any instance churn.
Use load balancers as stable entry points. The LB has a stable DNS name; the instances behind it churn freely.
If you need a stable IP, allocate one (Elastic IP / static public IP) and detach-reattach across deploys. Treat the address as a separate resource from the instance.

The mindset shift from on-prem to cloud is mostly “stop binding identity to IP.” Once you do, the rest follows.

Public vs private cloud, briefly

“Public cloud” is what we’ve been describing — AWS / Azure / GCP, where you rent capacity in someone else’s data centres. “Private cloud” means cloud-style automation and self-service running on your own hardware (VMware Cloud Foundation, OpenStack, Nutanix). The networking primitives (VPCs, subnets, security groups) often look similar; the difference is who runs the underlying physical and pays for the floor space.

“Hybrid cloud” is just both, connected by VPN or dedicated interconnect, with workloads spanning the boundary. Your laptop’s view of where a service runs becomes irrelevant when DNS and routing make it transparent.

Common gotchas

Symptom	Likely cause
Instance has a public IP but is unreachable	Subnet’s route table doesn’t point `0.0.0.0/0` at an IGW, or security group blocks the inbound port
Instance can’t reach the internet to install packages	Private subnet has no NAT gateway route, or NAT gateway is in a different AZ that’s unhealthy
Two VPCs “peered” but they can’t talk	Route tables not updated; peering is created but the routes pointing through it don’t exist
Latency spikes between AZs in same region	Cross-AZ traffic takes a longer path. Move data plane to same-AZ where possible.
Connection works the first time, hangs on retry	Stateful firewall or NAT entry timed out; classic stateful-table aging
VPN tunnel up but no traffic flowing	Routes on cloud side or on-prem side missing; or BGP not advertising the right prefixes
NAT gateway data costs surprisingly high	Workloads pulling large container images; consider VPC endpoints to ECR / S3 / etc.
VPC CIDR collision when peering	Both VPCs picked overlapping ranges; rebuild one with a non-overlapping CIDR (annoying)

What you can now answer

What’s a VPC? — Your private, isolated network in the cloud. You pick the CIDR; the provider does the rest.
Why is every cloud subnet bound to one AZ? — AZs are fault domains. Forcing the subnet to be AZ-scoped makes redundancy explicit.
What actually makes a subnet “public”? — Its route table’s default route points at an internet gateway. Nothing else.
Security group vs NACL — which do I use? — Security groups, almost always. NACLs only for coarse subnet-level rules.
How do I connect cloud to on-prem? — Site-to-site VPN for cheap; dedicated interconnect (Direct Connect / ExpressRoute / Cloud Interconnect) for production.
Why don’t I hard-code IPs in the cloud? — Instances are cattle. Bind to DNS, security groups, or load balancers, not IPs.

What’s next

One more lesson rounds out this pathway: an anatomy of common attacks — how the kill chain in the BEC article and the modern intrusion playbook map back to the protocols and middleboxes you now understand. Networking is hardest to secure when you don’t know how it works; you do, so the security layer becomes “here’s where each defence sits” rather than a separate mystery.

Tags: #AWS #Beginner #CCST #Cloud #Networking #VPC