Linux Admin

Linux Advanced Networking Errors: PMTU, conntrack, VLAN, bonding

Part of pathway: Linux Troubleshooting: 150 Common Errors

Advanced Networking — The Ones That Aren’t in the Manual

Beyond the basic “ping fails / DNS broken” class of problems lies a category of network errors that show up only at scale or under load: PMTU black-holes in tunnels, conntrack table exhaustion on busy load balancers, VLAN tag mismatches that break only some traffic, bonding interfaces that don’t actually fail over. These are the ten you’ll see in production after the easy ones are ruled out.

#101 PMTU black-hole in tunnels

Description: Small packets work; large packets time out; common with GRE/IPsec/VPN.

Solution: ping -M do -s 1472 host tests path MTU; if it fails at standard sizes, fragmentation is broken upstream. Clamp TCP MSS: iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu.

#102 conntrack table full

Description: New connections fail on busy boxes; dmesg shows “nf_conntrack: table full, dropping packet”.

Solution: cat /proc/sys/net/netfilter/nf_conntrack_count vs nf_conntrack_max; sysctl -w net.netfilter.nf_conntrack_max=1048576; tune timeouts (nf_conntrack_tcp_timeout_*).

#103 VLAN tag mismatch

Description: Some traffic flows fine; tagged-VLAN traffic doesn’t.

Solution: ip -d link show eth0.10 shows VLAN ID; verify switch trunk config matches; native VLAN must align on both ends.

#104 bond not failing over

Solution: cat /proc/net/bonding/bond0 shows mode and active slave; for LACP (mode 4) verify switch is configured for LACP; for active-backup (mode 1) check primary/backup link state.

#105 bridge isolation lost (STP)

Solution: bridge link; verify STP enabled when intentional; check for loops with tcpdump -i br0 -n stp.

#106 NAT not working (masquerade)

Solution: sysctl net.ipv4.ip_forward = 1; iptables -t nat -L -nv POSTROUTING; counter on MASQUERADE rule increments?

#107 IPv6 RA confusion

Description: Host gets unexpected IPv6 from a rogue Router Advertisement.

Solution: sysctl net.ipv6.conf.eth0.accept_ra=0 on hosts that shouldn’t auto-configure; investigate the rogue RA source.

#108 ssh slow on connect

Description: SSH stalls 10+ seconds before banner.

Solution: UseDNS no in sshd_config; reverse-DNS lookup is the usual culprit. Restart sshd.

#109 traceroute shows asymmetric path

Description: Outbound and return packets take different routes; firewalls hate this.

Solution: Source-based routing with ip rule; or rebuild routing on both ends to converge paths.

#110 jumbo frames not taking effect

Solution: ip link show eth0 — check current MTU; ip link set eth0 mtu 9000; switch and ALL devices in path must support jumbo frames or you get fragmentation.

Conclusion

  1. PMTU issues hide behind “works for small data, fails for big.” Test with ping -M do -s.
  2. Monitor conntrack on every load balancer / NAT box; full table = silent drops.
  3. UseDNS no on every sshd. The 10-second stall is unforgivable.
  4. Test failover BEFORE you need it. ip link set eth0 down on a multi-NIC host should stay reachable.
  5. tcpdump beats theorizing. Capture, look at the actual packets.

Related Linux Admin articles

Leave a Reply