← All Learning Pathways

Linux Troubleshooting: 150 Common Errors

Every common Linux production error, by section: filesystem and disk, process and memory, networking, authentication, systemd, package management, Docker, kernel and boot, storage and RAID, security, advanced networking, programming and build, database, CI/CD, and performance.

15 articles • follow them in order

  1. 1
    Linux Admin

    Linux Filesystem & Disk Errors: 10 Common Problems and Fixes

    Working reference for the 10 most common Linux filesystem and disk errors. No space left on device, Read-only filesystem (errors=remount-ro after I/O failure), inode exhaustion (df -i), Permission denied (and the SELinux/AppArmor twin), Bad magic number in super-block, Disk quota exceeded, Stale file handle (NFS ESTALE), generic I/O errors with SMART correlation, /tmp full, fsck failure during boot. Each error covers description, root cause, common scenarios, and step-by-step fix. Includes cross-links to the Linux Disk & Filesystem command reference and to related troubleshooting topics (Storage/RAID, Performance).

  2. 2
    Linux Admin

    Linux Process & Memory Errors: OOM, ulimit, fork failures

    Working reference for the 10 most common Linux process and memory errors. OOM killer victim selection (oom_score / oom_score_adj), ENOMEM and vm.overcommit_memory, EMFILE too-many-open-files (ulimit -n vs systemd LimitNOFILE vs fs.file-max), EAGAIN on fork (nproc limit), SIGSEGV with core-dump analysis, swap-thrash kills, D-state uninterruptible sleep diagnosis (/proc/PID/wchan + stack), system-wide ENFILE, pthread_create failures, silent cron failures. Cross-links to Process Management and System Monitoring command references.

  3. 3
    Linux Admin

    Linux Networking Errors: 10 Common Connection Problems and Fixes

    Working reference for the 10 most common Linux networking errors. Connection refused vs timed out (RST vs silent drop), No route to host, NXDOMAIN DNS failures (dig +short / resolv.conf / nsswitch), SSL certificate verify failed (expired / CN mismatch / clock skew), Network unreachable, Address already in use (EADDRINUSE with ss -tlnp 'sport = :PORT'), TIME_WAIT exhaustion, ARP resolution, SSH permission denied (publickey). Each error covers description, root cause, and step-by-step fix using the OSI-ladder diagnostic order. Cross-linked to the Linux networking command reference and the advanced-networking-errors deep-dive.

  4. 4
    Linux Admin

    Linux Authentication & User Errors: 10 Common Login Problems

    Working reference for the 10 most common Linux authentication errors. PAM authentication failure (auth.log diagnosis), user not in sudoers (visudo + /etc/sudoers.d/ drop-ins), UNPROTECTED PRIVATE KEY (SSH 700/600 perms), pam_faillock account lockout (faillock --reset), sudo no-tty for cron (NOPASSWD scoped), KEX algorithm mismatch with old servers, /sbin/nologin shell trap, stale group memberships requiring re-login or newgrp, sudo 3-strikes, SSH wrong-key-offered diagnosis. Cross-links to User & Service Management, File Permissions, and Security Errors.

  5. 5
    Linux Admin

    Linux Systemd & Service Errors: Failed to start, Unit not found, dependency cycles

    Working reference for the 10 most common Linux systemd service errors. Failed to start (with journalctl -xeu diagnosis), Unit not found (daemon-reload after custom-unit drop), control process exited non-zero, Start request repeated too quickly (StartLimitBurst), Dependency cycle, Unit is masked, bad ExecStart values, false 'running' status without health checks, permission denied on low ports (CAP_NET_BIND_SERVICE), daemon-reload required after edits. Cross-links to the User & Service Management command reference.

  6. 6
    Linux Admin

    Linux Package Management Errors: apt, dpkg, yum, dnf

    Working reference for the 10 most common Linux package management errors. dpkg lock contention, unmet dependencies (--fix-broken), NO_PUBKEY GPG (apt-key vs trusted.gpg.d/), Hash sum mismatch from mirror sync, held broken packages, dpkg processing errors with postinst scripts, unreachable repository (DNS/proxy/firewall), insufficient disk during install, expired signing keys, conflicting packages. apt vs dpkg vs yum vs dnf nuances throughout.

  7. 7
    Linux Admin

    Linux Docker & Container Errors: daemon, image pull, exit 137, port allocation

    Working reference for the 10 most common Linux container errors. Cannot connect to Docker daemon (group / socket diagnosis), pull access denied (registry login + token expiry), overlay2 No space left on device (docker system prune), exit code 137 = SIGKILL by OOM, port allocation conflicts, manifest unknown, restart loops, iptables chain conflicts with firewalld, SELinux :Z mount label, cgroups memory limits. Cross-links to the systemctl / journalctl reference and to CI/CD troubleshooting.

  8. 8
    Linux Admin

    Linux Kernel & Boot Errors: GRUB, kernel panic, initramfs, emergency mode

    Working reference for the 10 most common Linux boot failures. Kernel panic - not syncing (UUID/initramfs/hardware), GRUB file not found, VFS unable to mount root (post-disk-replace UUID change), systemd emergency mode from failed mount, no init found, missing initramfs modules, fsck during boot, OOM in initramfs on small VMs, kernel command line parse errors, silent boot loops. Cross-linked to disk/filesystem and storage/RAID.

  9. 9
    Linux Admin

    Linux Storage & RAID Errors: mdadm degraded, LVM, SMART, multipath

    Working reference for the 10 most common Linux storage and RAID failures. mdadm degraded array (remove + add), the worst-case dual-disk failure during rebuild, LVM volume group not found (vgscan + vgchange), thin pool 100% full, SMART pre-fail attributes (replace before catastrophic), multipath all-paths-failed, snapshot CoW pool overflow, force-assemble of broken arrays as last resort, iSCSI session loss, fstrim discard support. Cross-linked to disk/filesystem command reference and performance troubleshooting.

  10. 10
    Linux Admin

    Linux Security Errors: SELinux denies, expired certs, firewall blocks, fail2ban

    Working reference for the 10 most common Linux security errors. SELinux AVC denial (ausearch + audit2allow + restorecon), AppArmor DENIED (aa-complain + aa-genprof), TLS cert expired (certbot timer), firewall blocking expected traffic with iptables -L -nv counter diagnosis, fail2ban banned-IP recovery, SSH brute force defenses (key-only / fail2ban / port move), sudo timeout in long scripts, NET::ERR_CERT_REVOKED, audit log full, GPG signature verification. Cross-linked to auth errors, file permissions, and networking errors.

  11. 11
    Linux Admin

    Linux Advanced Networking Errors: PMTU, conntrack, VLAN, bonding

    Working reference for 10 advanced Linux networking issues that show up at scale. PMTU black-hole in tunnels (clamp MSS), conntrack table full on busy NAT/LB boxes (sysctl + timeout tuning), VLAN tag mismatches, bond failover not working (LACP vs active-backup), bridge / STP, NAT masquerade not forwarding, rogue IPv6 RA, ssh slow connect (UseDNS no), asymmetric routing, jumbo frame end-to-end requirements. Cross-linked to networking command reference.

  12. 12
    Linux Admin

    Linux Programming & Build Errors: gcc, ld, missing headers, ABI mismatch

    Working reference for the 10 most common Linux build/compile/link errors. fatal error: header not found (install -dev / -devel), undefined reference (link order matters), library not found at runtime (ldd / LD_LIBRARY_PATH / ld.so.conf.d), GLIBC ABI mismatch (build in target container), segfault in tests, missing make/cmake (build-essential), Python ModuleNotFoundError, npm ENOSPC, exec on noexec mount, OOM during compile (lower -j, swap).

  13. 13
    Linux Admin

    Linux Database Errors: connection refused, deadlocks, replication lag

    Working reference for the 10 most common Linux database errors at the OS layer. PostgreSQL/MySQL/Redis: connection refused (systemctl status), too many connections (PgBouncer / pool), password auth (pg_hba.conf), deadlocks (pg_stat_activity + pg_blocking_pids), replication lag, archive_command failures, MySQL Lost connection (max_allowed_packet, wait_timeout), Redis maxmemory eviction policy, disk full on data dir, long-running stuck transactions. Cross-linked to process/memory and networking troubleshooting.

  14. 14
    Linux Admin

    Linux CI/CD & Automation Errors: runners, secrets, flaky tests, deploy failures

    Working reference for the 10 most common Linux CI/CD pipeline errors. Runner offline (gitlab-runner / github-actions-runner systemctl), secret env var not injected, docker login failures (use ephemeral CI tokens), flaky tests (env vs hardware vs race), build cache miss, disk full on runner, job timeouts, deployment connection refused, git pull auth, Ansible host key verification. Cross-linked to docker, build, and systemd troubleshooting.

  15. 15
    Linux Admin

    Linux Performance & Observability: load, latency tails, perf top, eBPF

    Working reference for the 10 most common Linux performance problems. High load with low CPU (D-state iowait), p99 latency tails (averages lie), high context switch rate, disk I/O saturation (%util / await), network at line rate, memory pressure without OOM (vmstat si/so), slow boot (systemd-analyze blame), hung app (strace / wchan / stack), kernel CPU vulnerability mitigations, cloud noisy neighbor (%steal). Tools: vmstat / iostat / sar / perf top / bcc-eBPF / biolatency / execsnoop. Cross-linked to monitoring command reference.