Filesystem and Disk Errors Are the Most Frequent Linux Outages
Of the 150 most-common Linux production errors, the first ten are filesystem and disk problems for one reason: when storage breaks, everything breaks. A 100% full disk takes down logging, monitoring, package management, and the application all at once. This article walks through the ten errors you will see most often, with description, root cause, and remediation for each.
#001 No space left on device
Description: A write fails because the filesystem is full.
Root cause: The filesystem reached 100% byte utilization. Even small writes (logs, lockfiles, /tmp scratch) fail.
Common cause: Uncontrolled log growth, a runaway process generating output, large temp files left behind by a crashed job.
Solution:
df -hto identify the full filesystem.du -sh /var/log/*orncdu /varto find the offender.journalctl --vacuum-size=1Gif systemd-journal is large.apt clean/yum clean allto clear package cache.- Configure
logrotateif logs grew unchecked.
#002 Read-only file system
Description: Writes fail with EROFS even though the disk has space.
Root cause: The kernel detected I/O errors or filesystem corruption and remounted read-only (the errors=remount-ro default in /etc/fstab) to prevent further damage.
Common cause: Failing disk, sudden power loss mid-write, or filesystem corruption from a kernel bug.
Solution:
dmesg -T | tailto confirm the I/O error trigger.smartctl -a /dev/sdato check disk health.- Unmount and run
fsck -y /dev/sdaN(root filesystem requires boot from rescue). - Replace failing hardware if SMART shows reallocated sectors.
mount -o remount,rw /only as a last-resort emergency — the kernel did the remount for a reason.
#003 No space left on device (inode exhaustion)
Description: Writes fail with ENOSPC even though df -h shows free space.
Root cause: Each file consumes one inode. Millions of tiny files (mail spool, session cache, npm packages) can exhaust the inode table while bytes remain.
Common cause: Mail queue runaway, session/cache directories not pruned, build artifacts piling up in CI runners.
Solution:
df -ito confirm inode exhaustion.find /var -xdev -type d -exec sh -c "echo \$(ls {} | wc -l) {}" \; | sort -rn | headto locate the directory with the most files.- Bulk-delete:
find /var/spool/postfix/ -type f -delete(or whichever offender). - Long-term: reformat with
mkfs.ext4 -i 4096 /dev/Xto allocate more inodes.
#004 Permission denied
Description: A process cannot read or write a file despite the file existing.
Root cause: The UNIX permission model rejected the operation because the process UID/GID doesn’t match the file’s owner/group/other bits.
Common cause: Wrong file ownership, restrictive umask, or SELinux/AppArmor MAC denial that LOOKS like a regular permission error.
Solution:
ls -l fileto see actual permissions.id userto confirm the running user’s UID/groups.chown user:group fileorchmod 644 fileto correct.- If permissions look fine:
ausearch -m AVC -ts recent— SELinux denial gives the same error.
#005 Bad magic number in super-block
Description: mount or fsck rejects a partition with “wrong fs type, bad option, bad superblock”.
Root cause: Filesystem superblock corruption or wrong filesystem type passed to mount.
Common cause: Bit rot, sudden power loss during a write to the superblock, or pointing mount at the wrong device.
Solution:
blkid /dev/sdaNto confirm what filesystem actually exists.- For ext:
mke2fs -n /dev/sdaNlists backup superblocks; restore withe2fsck -b BACKUP_NUM /dev/sdaN. - For XFS:
xfs_repair /dev/sdaN.
#006 Disk quota exceeded
Description: A user’s write fails despite the filesystem having free space.
Root cause: Quotas (per-user or per-group) cap how much a user can write.
Common cause: Quotas configured in /etc/fstab with usrquota,grpquota mount options.
Solution:
quota -u aliceto see usage.edquota -u aliceto raise limits.repquota -afor an org-wide view.
#007 Stale file handle (NFS)
Description: Reads or writes to an NFS-mounted file fail with ESTALE.
Root cause: The file’s inode on the server changed (deleted or replaced) while the client still held a handle to it.
Common cause: Another client deleted the file; or the NFS server was restored from backup, changing inode numbers.
Solution:
umount -f /mnt/nfs && mount /mnt/nfsto refresh handles.- For long-running daemons: restart them; they cached the stale handle.
#008 I/O error during read/write
Description: EIO returned when reading from or writing to a file.
Root cause: The block device returned an error to the kernel — bad sector, drive failure, or transient cable issue.
Solution: dmesg for the specific block; smartctl -a for disk health; replace the disk if SMART pre-fail attributes are tripping.
#009 Cannot create temp file (ENOSPC on /tmp)
Description: Applications fail with cryptic errors because /tmp is full.
Solution: find /tmp -type f -mtime +1 -delete; or set tmpfiles.d rules; or move /tmp to tmpfs (RAM).
#010 fsck failed during boot
Description: Boot stops at “fsck failed; entering emergency shell”.
Root cause: Filesystem check found errors it couldn’t auto-repair.
Solution: Run fsck -y /dev/sdaN at the rescue prompt (answer yes to all). If repeatedly broken, the disk is failing — replace it.
Conclusion
Five habits that prevent most filesystem incidents:
- Monitor
df -hANDdf -i— both can fail independently. - Configure
logrotateon every server with logs. - Run
smartctl -a /dev/sdXas a periodic health check; replace pre-fail disks. - Keep a separate
/varmount so logs filling up doesn’t take down/. - Test backups by restoring; don’t learn during an outage.
Related Linux Admin articles
- Linux Disk & Filesystem: lsblk, mount, fstab, fsck — the command reference for storage management
- Linux Storage & RAID Errors — for mdadm and LVM-specific issues
- Linux Performance & Observability — when storage is slow rather than broken