Cloud networking confuses experienced on-prem admins for one specific reason: the concepts are the same as everything we’ve covered in this pathway, but the controls are different. There’s no rack of switches you can patch into; there’s no firewall you SSH into; there are no cables. Everything is API calls and configuration objects. The packets still travel between hosts the same way, but the boxes that shape their path are software constructs you create on a console or via Terraform.

This is lesson 11 of Networking from Scratch. The vocabulary is mostly AWS-flavoured here because AWS is the most common starting point and its naming is now industry shorthand — but Azure (VNet) and Google Cloud (VPC) follow the same model with different brand names. Once the AWS picture is clear, the others fall into place.
The VPC: your private network in the cloud
A VPC (Virtual Private Cloud) is a logically isolated network within the cloud provider’s shared infrastructure. You pick an IPv4 CIDR block when you create it — typically a private RFC 1918 range like 10.0.0.0/16 — and that block is yours. The cloud provider routes traffic within it, presents you with abstractions for subnets, gateways, and security, and gives you APIs to wire it all together.
Three things to know about choosing the VPC’s address space:
- Pick big. A
/16gives you 65,536 addresses to slice into subnets. A/20only gives you 4,094. You can’t expand a VPC’s primary CIDR after creation in any clean way; pick generously. - Avoid overlaps with on-prem. If your office or other clouds use
10.0.0.0/16, picking it for your new VPC will make it impossible to peer or VPN them later. Coordinate before you create. - Avoid the most popular ranges. Resist
192.168.0.0/24,192.168.1.0/24,10.0.0.0/24— every consumer router uses one of those, and every laptop on a coffee-shop Wi-Fi will sit on one. If your VPN clients are on those, routing breaks.
Subnets: now they’re per-AZ
Inside a VPC you carve subnets, just like on-prem. The cloud twist: every subnet is bound to a single availability zone (AZ), an isolated power and network domain inside a region. To run a load-balanced service that survives one AZ failing, you put resources in two or more subnets, each in a different AZ.
A typical AWS layout for a single-region production app:
| Tier | AZ-1a | AZ-1b | What lives there |
|---|---|---|---|
| Public | 10.0.1.0/24 | 10.0.2.0/24 | Public load balancers, bastion hosts, NAT gateways |
| Private (app) | 10.0.10.0/24 | 10.0.11.0/24 | App servers, container nodes |
| Private (data) | 10.0.20.0/24 | 10.0.21.0/24 | RDS instances, ElastiCache, etc. |
Six subnets, two AZs, three tiers. That’s the canonical pattern. Each row is logically “the public tier” or “the data tier”; each column is a fault domain. Anything you deploy goes into a tier-and-AZ pair.
One thing that confuses on-prem people: cloud subnets aren’t broadcast domains. The provider’s underlying network is L3 the whole way down; broadcast and multicast as you’d use them on a real Ethernet segment don’t work. ARP is faked — the host sees responses from the underlying SDN even though no actual broadcast was sent. Protocols that rely on L2 multicast (some clustering, some discovery) break in the cloud unless you build overlay networks.
Public vs private — what actually distinguishes them
The cloud doesn’t have a magic “public” checkbox on a subnet. The distinction is purely about routing:
- A public subnet has a route in its route table sending
0.0.0.0/0to the internet gateway. Resources here can be reached from the internet (if they have a public IP) and can reach the internet directly. - A private subnet has its
0.0.0.0/0route pointed at a NAT gateway (or no default route at all). Resources here can’t be reached from the internet, and they can reach the internet only via the NAT gateway, which translates their source IP to its own.
The label is the route table. Same VPC, same address-space style, same provider primitives — the only difference is what the default route does. That’s the cloud equivalent of “is this segment behind the firewall or not.”
The four gateways you’ll meet
| Gateway | Direction | Cost model | What it’s for |
|---|---|---|---|
| Internet Gateway (IGW) | VPC ↔ internet (both directions) | Free | Public-facing services. One per VPC. |
| NAT Gateway | Private subnet outbound to internet | Per hour + per GB processed | Letting private servers fetch updates without exposing them inbound. |
| VPC Peering | VPC ↔ VPC (point-to-point) | Per GB across regions | Two VPCs that need to talk privately. No transit through. |
| Transit Gateway | VPC ↔ VPC (hub-and-spoke), and VPC ↔ on-prem | Per hour + per GB | The grown-up answer when you have many VPCs and on-prem networks. |
Two practical notes:
- NAT gateway costs add up. If your private workloads pull large container images or do heavy outbound traffic, NAT gateway bandwidth charges are often the line item that surprises finance. VPC endpoints (private connections to AWS services) avoid the NAT for AWS-bound traffic and pay for themselves quickly.
- VPC peering doesn’t do transit. If A peers with B, and B peers with C, A still cannot talk to C through B. That’s by design. Use a Transit Gateway when you need a hub.
Route tables: the cloud’s “routing table”
Every subnet has exactly one route table attached. The route table tells the SDN where to send traffic for a given destination. A typical public-subnet table:
Destination Target
10.0.0.0/16 local (the VPC itself, automatic)
0.0.0.0/0 igw-xxxxxxxx (the internet gateway)
And a typical private-subnet table:
Destination Target
10.0.0.0/16 local
0.0.0.0/0 nat-xxxxxxxx (the NAT gateway)
192.168.0.0/16 tgw-xxxxxxxx (your on-prem network, via Transit Gateway)
The route-table entries follow the longest-prefix-match rule we covered in lesson 5. The local route for the VPC’s own CIDR is always added by the provider and can’t be removed. Everything else is yours to configure.
Three or four route tables typically cover an entire VPC: one for public subnets, one for private subnets in each region, maybe one for an isolated “data” tier with no internet access at all. Don’t over-fragment them; keep tables few and consistent.
Security groups vs NACLs
The cloud’s firewall story has two layers. They sit at different scopes and behave differently.
| Security Group (SG) | Network ACL (NACL) | |
|---|---|---|
| Attached to | An instance / ENI | A subnet |
| Stateful? | Yes — return traffic is allowed automatically | No — you must allow both directions explicitly |
| Default | Deny everything inbound; allow everything outbound | Allow everything in both directions |
| Rules | Allow only (no explicit deny) | Allow and deny, evaluated in order |
| Best used for | The vast majority of access control | Coarse-grained subnet-level guardrails (e.g. blanket deny of an IP range) |
The right default is to do almost everything with security groups and use NACLs only when you need a stateless layer of defence (typically: blocking specific source IPs at a subnet boundary). Trying to do all your access control with NACLs is technically possible and operationally miserable.
Security groups also have a feature with no on-prem analogue: you can write rules that reference other security groups. “Allow inbound TCP 5432 from anything in security group sg-app” means “allow any instance attached to sg-app to connect to my Postgres.” You don’t enumerate IPs — the rule follows the membership. When the app fleet auto-scales and new instances appear, the rule covers them automatically.
Connecting to on-prem
Three options, in increasing order of cost and reliability:
| Option | Speed / latency | Cost | When it’s right |
|---|---|---|---|
| Site-to-site VPN | Limited by the public internet path; usually 100–500 Mbps; latency variable | Cheap; provider charges hourly per tunnel + data | Dev/test, small production, backup path |
| Dedicated connect (AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect) | 1, 10, or 100 Gbps; predictable latency; private path | Significant fixed cost (port + cross-connect + circuit) + cheaper data | Production, any sensitive workload, anything compliance-touching |
| SD-WAN overlay (Cisco, Palo Alto, Fortinet, others on top of cloud connectivity) | Aggregated across multiple paths | Layer on top of the above | Multi-cloud, multi-site enterprises that need uniform policy |
From the cloud-network point of view, all three look like a route table entry: “10.20.0.0/16 via the on-prem gateway,” whether the gateway is a VPN, a Direct Connect, or an SD-WAN appliance.
Pets vs cattle — what changes for IPs
On-prem, IP addresses are usually pets — db01.corp.local is at 10.20.5.7 and has been for three years and will be tomorrow. You can hard-code it. You write firewall rules that reference it.
In the cloud, instances are usually cattle — i-0abc123 exists today, gets terminated and replaced tomorrow as part of a deploy or auto-scaling event, and the new one has a different IP. Hard-coding IPs falls apart. The right patterns:
- Use DNS (cloud-internal or your own). Reference services by hostname, not IP. The cloud’s service-discovery mechanisms (Route 53, Cloud DNS, internal LB DNS) update when instances change.
- Reference by tag or security group, not IP. “Allow connections from
sg-app” survives any instance churn. - Use load balancers as stable entry points. The LB has a stable DNS name; the instances behind it churn freely.
- If you need a stable IP, allocate one (Elastic IP / static public IP) and detach-reattach across deploys. Treat the address as a separate resource from the instance.
The mindset shift from on-prem to cloud is mostly “stop binding identity to IP.” Once you do, the rest follows.
Public vs private cloud, briefly
“Public cloud” is what we’ve been describing — AWS / Azure / GCP, where you rent capacity in someone else’s data centres. “Private cloud” means cloud-style automation and self-service running on your own hardware (VMware Cloud Foundation, OpenStack, Nutanix). The networking primitives (VPCs, subnets, security groups) often look similar; the difference is who runs the underlying physical and pays for the floor space.
“Hybrid cloud” is just both, connected by VPN or dedicated interconnect, with workloads spanning the boundary. Your laptop’s view of where a service runs becomes irrelevant when DNS and routing make it transparent.
Common gotchas
| Symptom | Likely cause |
|---|---|
| Instance has a public IP but is unreachable | Subnet’s route table doesn’t point 0.0.0.0/0 at an IGW, or security group blocks the inbound port |
| Instance can’t reach the internet to install packages | Private subnet has no NAT gateway route, or NAT gateway is in a different AZ that’s unhealthy |
| Two VPCs “peered” but they can’t talk | Route tables not updated; peering is created but the routes pointing through it don’t exist |
| Latency spikes between AZs in same region | Cross-AZ traffic takes a longer path. Move data plane to same-AZ where possible. |
| Connection works the first time, hangs on retry | Stateful firewall or NAT entry timed out; classic stateful-table aging |
| VPN tunnel up but no traffic flowing | Routes on cloud side or on-prem side missing; or BGP not advertising the right prefixes |
| NAT gateway data costs surprisingly high | Workloads pulling large container images; consider VPC endpoints to ECR / S3 / etc. |
| VPC CIDR collision when peering | Both VPCs picked overlapping ranges; rebuild one with a non-overlapping CIDR (annoying) |
What you can now answer
- What’s a VPC? — Your private, isolated network in the cloud. You pick the CIDR; the provider does the rest.
- Why is every cloud subnet bound to one AZ? — AZs are fault domains. Forcing the subnet to be AZ-scoped makes redundancy explicit.
- What actually makes a subnet “public”? — Its route table’s default route points at an internet gateway. Nothing else.
- Security group vs NACL — which do I use? — Security groups, almost always. NACLs only for coarse subnet-level rules.
- How do I connect cloud to on-prem? — Site-to-site VPN for cheap; dedicated interconnect (Direct Connect / ExpressRoute / Cloud Interconnect) for production.
- Why don’t I hard-code IPs in the cloud? — Instances are cattle. Bind to DNS, security groups, or load balancers, not IPs.
What’s next
One more lesson rounds out this pathway: an anatomy of common attacks — how the kill chain in the BEC article and the modern intrusion playbook map back to the protocols and middleboxes you now understand. Networking is hardest to secure when you don’t know how it works; you do, so the security layer becomes “here’s where each defence sits” rather than a separate mystery.