Process and Memory Errors — The OOM Killer and Friends
When a Linux system is under memory pressure, the kernel’s OOM killer activates, scoring every process and terminating the highest scorer. The error messages that result — “Cannot allocate memory”, “Resource temporarily unavailable”, “Too many open files” — are the second-most-common Linux production issues after disk problems. This article walks through the ten you’ll see most often.
#011 Out of memory: Killed process X
Description: The kernel’s OOM killer terminated a process to reclaim memory.
Root cause: Total RAM + swap was insufficient for the working set; kernel picked the process with highest oom_score and sent SIGKILL.
Solution: dmesg | grep -i oom shows what was killed; cat /proc/<PID>/oom_score shows scoring; lower critical-process score with echo -1000 > /proc/<PID>/oom_score_adj; add RAM or swap; investigate the leak with pmap -x <PID>.
#012 Cannot allocate memory (ENOMEM)
Description: A malloc() or fork() returns ENOMEM despite free memory appearing available.
Root cause: Either RAM is genuinely exhausted, OR vm.overcommit_memory=2 is restricting allocations, OR per-process limits are hit.
Solution: free -h; check cat /proc/sys/vm/overcommit_memory; ulimit -a for the process; sysctl -w vm.overcommit_memory=1 for permissive overcommit.
#013 Too many open files (EMFILE)
Description: A process cannot open additional file descriptors.
Root cause: The process hit its ulimit -n soft limit on file descriptors.
Solution: cat /proc/<PID>/limits; raise with ulimit -Sn 65536; permanent: edit /etc/security/limits.conf or systemd unit LimitNOFILE=65536; check global ceiling: sysctl fs.file-max.
#014 Resource temporarily unavailable (EAGAIN on fork)
Description: fork() fails because the user has too many running processes.
Solution: ps -u username | wc -l; raise nproc in limits.conf; check for fork()-bomb behavior in misbehaving shell loops.
#015 Segmentation fault (SIGSEGV)
Description: A process accessed memory it didn’t own and was killed.
Solution: Enable core dumps: ulimit -c unlimited; analyze with gdb /path/to/binary core; for repeating production crashes, run under strace or valgrind.
#016 Killed (out of swap)
Description: Process killed without explicit OOM message; system was thrashing.
Root cause: Swap was exhausted; the kernel killed processes to recover.
Solution: vmstat 1 — high si/so means swapping. Add swap (fallocate -l 4G /swapfile && mkswap && swapon); add RAM; tune vm.swappiness.
#017 Process hung in D state (uninterruptible sleep)
Description: A process shows D in ps output and won’t respond to signals (not even SIGKILL).
Root cause: Stuck in a kernel I/O syscall, usually waiting on disk or NFS.
Solution: cat /proc/<PID>/wchan — what kernel function it’s waiting on; cat /proc/<PID>/stack for the stack trace; usually fixing the underlying I/O (NFS server, dead disk) is the only option.
#018 Too many open files in system (ENFILE)
Description: System-wide file descriptor table exhausted.
Solution: cat /proc/sys/fs/file-nr; raise fs.file-max; find the leaking process: lsof | awk '{print $2}' | sort | uniq -c | sort -rn | head.
#019 Stack overflow / pthread_create failed
Description: A multi-threaded process fails to spawn additional threads.
Solution: Per-process thread limit (ulimit -u); virtual memory limit hit; or kernel kernel.threads-max ceiling. Check /proc/sys/kernel/threads-max.
#020 Process never starts (silent fail in cron)
Description: A scheduled job appears not to run.
Solution: Check journalctl -u cron; verify $PATH in cron environment; redirect both stdout and stderr to a log: * * * * * cmd >>/var/log/myjob.log 2>&1.
Conclusion
Five habits:
- Always
dmesg | grep -i oomafter an unexplained process death. - Set explicit
LimitNOFILEin systemd unit files for any service that might open many sockets. - Monitor
vmstat 1for swap-in/out spikes — that’s the leading indicator before OOM. - Set
oom_score_adjto negative values for critical daemons. - Enable core dumps in production for post-mortem analysis of segfaults.
Related Linux Admin articles
- Linux Process Management: ps, top, kill, jobs, cron, nice — the command reference
- Linux System Monitoring: vmstat, top, iostat, sar — for diagnosing pressure
- Linux Performance & Observability — for slow-not-killed processes