Advanced Networking — The Ones That Aren’t in the Manual
Beyond the basic “ping fails / DNS broken” class of problems lies a category of network errors that show up only at scale or under load: PMTU black-holes in tunnels, conntrack table exhaustion on busy load balancers, VLAN tag mismatches that break only some traffic, bonding interfaces that don’t actually fail over. These are the ten you’ll see in production after the easy ones are ruled out.
#101 PMTU black-hole in tunnels
Description: Small packets work; large packets time out; common with GRE/IPsec/VPN.
Solution: ping -M do -s 1472 host tests path MTU; if it fails at standard sizes, fragmentation is broken upstream. Clamp TCP MSS: iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu.
#102 conntrack table full
Description: New connections fail on busy boxes; dmesg shows “nf_conntrack: table full, dropping packet”.
Solution: cat /proc/sys/net/netfilter/nf_conntrack_count vs nf_conntrack_max; sysctl -w net.netfilter.nf_conntrack_max=1048576; tune timeouts (nf_conntrack_tcp_timeout_*).
#103 VLAN tag mismatch
Description: Some traffic flows fine; tagged-VLAN traffic doesn’t.
Solution: ip -d link show eth0.10 shows VLAN ID; verify switch trunk config matches; native VLAN must align on both ends.
#104 bond not failing over
Solution: cat /proc/net/bonding/bond0 shows mode and active slave; for LACP (mode 4) verify switch is configured for LACP; for active-backup (mode 1) check primary/backup link state.
#105 bridge isolation lost (STP)
Solution: bridge link; verify STP enabled when intentional; check for loops with tcpdump -i br0 -n stp.
#106 NAT not working (masquerade)
Solution: sysctl net.ipv4.ip_forward = 1; iptables -t nat -L -nv POSTROUTING; counter on MASQUERADE rule increments?
#107 IPv6 RA confusion
Description: Host gets unexpected IPv6 from a rogue Router Advertisement.
Solution: sysctl net.ipv6.conf.eth0.accept_ra=0 on hosts that shouldn’t auto-configure; investigate the rogue RA source.
#108 ssh slow on connect
Description: SSH stalls 10+ seconds before banner.
Solution: UseDNS no in sshd_config; reverse-DNS lookup is the usual culprit. Restart sshd.
#109 traceroute shows asymmetric path
Description: Outbound and return packets take different routes; firewalls hate this.
Solution: Source-based routing with ip rule; or rebuild routing on both ends to converge paths.
#110 jumbo frames not taking effect
Solution: ip link show eth0 — check current MTU; ip link set eth0 mtu 9000; switch and ALL devices in path must support jumbo frames or you get fragmentation.
Conclusion
- PMTU issues hide behind “works for small data, fails for big.” Test with
ping -M do -s. - Monitor conntrack on every load balancer / NAT box; full table = silent drops.
UseDNS noon every sshd. The 10-second stall is unforgivable.- Test failover BEFORE you need it.
ip link set eth0 downon a multi-NIC host should stay reachable. tcpdumpbeats theorizing. Capture, look at the actual packets.
Related Linux Admin articles
- Linux Networking Commands — the ip / ss / dig command reference
- Linux Networking Errors — for the simpler connection-refused / DNS class
- Linux Performance & Observability — for slow-but-working network