Linux Admin

Linux Filesystem & Disk Errors: 10 Common Problems and Fixes

Part of pathway: Linux Troubleshooting: 150 Common Errors

Filesystem and Disk Errors Are the Most Frequent Linux Outages

Of the 150 most-common Linux production errors, the first ten are filesystem and disk problems for one reason: when storage breaks, everything breaks. A 100% full disk takes down logging, monitoring, package management, and the application all at once. This article walks through the ten errors you will see most often, with description, root cause, and remediation for each.

#001 No space left on device

Description: A write fails because the filesystem is full.

Root cause: The filesystem reached 100% byte utilization. Even small writes (logs, lockfiles, /tmp scratch) fail.

Common cause: Uncontrolled log growth, a runaway process generating output, large temp files left behind by a crashed job.

Solution:

  • df -h to identify the full filesystem.
  • du -sh /var/log/* or ncdu /var to find the offender.
  • journalctl --vacuum-size=1G if systemd-journal is large.
  • apt clean / yum clean all to clear package cache.
  • Configure logrotate if logs grew unchecked.

#002 Read-only file system

Description: Writes fail with EROFS even though the disk has space.

Root cause: The kernel detected I/O errors or filesystem corruption and remounted read-only (the errors=remount-ro default in /etc/fstab) to prevent further damage.

Common cause: Failing disk, sudden power loss mid-write, or filesystem corruption from a kernel bug.

Solution:

  • dmesg -T | tail to confirm the I/O error trigger.
  • smartctl -a /dev/sda to check disk health.
  • Unmount and run fsck -y /dev/sdaN (root filesystem requires boot from rescue).
  • Replace failing hardware if SMART shows reallocated sectors.
  • mount -o remount,rw / only as a last-resort emergency — the kernel did the remount for a reason.

#003 No space left on device (inode exhaustion)

Description: Writes fail with ENOSPC even though df -h shows free space.

Root cause: Each file consumes one inode. Millions of tiny files (mail spool, session cache, npm packages) can exhaust the inode table while bytes remain.

Common cause: Mail queue runaway, session/cache directories not pruned, build artifacts piling up in CI runners.

Solution:

  • df -i to confirm inode exhaustion.
  • find /var -xdev -type d -exec sh -c "echo \$(ls {} | wc -l) {}" \; | sort -rn | head to locate the directory with the most files.
  • Bulk-delete: find /var/spool/postfix/ -type f -delete (or whichever offender).
  • Long-term: reformat with mkfs.ext4 -i 4096 /dev/X to allocate more inodes.

#004 Permission denied

Description: A process cannot read or write a file despite the file existing.

Root cause: The UNIX permission model rejected the operation because the process UID/GID doesn’t match the file’s owner/group/other bits.

Common cause: Wrong file ownership, restrictive umask, or SELinux/AppArmor MAC denial that LOOKS like a regular permission error.

Solution:

  • ls -l file to see actual permissions.
  • id user to confirm the running user’s UID/groups.
  • chown user:group file or chmod 644 file to correct.
  • If permissions look fine: ausearch -m AVC -ts recent — SELinux denial gives the same error.

#005 Bad magic number in super-block

Description: mount or fsck rejects a partition with “wrong fs type, bad option, bad superblock”.

Root cause: Filesystem superblock corruption or wrong filesystem type passed to mount.

Common cause: Bit rot, sudden power loss during a write to the superblock, or pointing mount at the wrong device.

Solution:

  • blkid /dev/sdaN to confirm what filesystem actually exists.
  • For ext: mke2fs -n /dev/sdaN lists backup superblocks; restore with e2fsck -b BACKUP_NUM /dev/sdaN.
  • For XFS: xfs_repair /dev/sdaN.

#006 Disk quota exceeded

Description: A user’s write fails despite the filesystem having free space.

Root cause: Quotas (per-user or per-group) cap how much a user can write.

Common cause: Quotas configured in /etc/fstab with usrquota,grpquota mount options.

Solution:

  • quota -u alice to see usage.
  • edquota -u alice to raise limits.
  • repquota -a for an org-wide view.

#007 Stale file handle (NFS)

Description: Reads or writes to an NFS-mounted file fail with ESTALE.

Root cause: The file’s inode on the server changed (deleted or replaced) while the client still held a handle to it.

Common cause: Another client deleted the file; or the NFS server was restored from backup, changing inode numbers.

Solution:

  • umount -f /mnt/nfs && mount /mnt/nfs to refresh handles.
  • For long-running daemons: restart them; they cached the stale handle.

#008 I/O error during read/write

Description: EIO returned when reading from or writing to a file.

Root cause: The block device returned an error to the kernel — bad sector, drive failure, or transient cable issue.

Solution: dmesg for the specific block; smartctl -a for disk health; replace the disk if SMART pre-fail attributes are tripping.

#009 Cannot create temp file (ENOSPC on /tmp)

Description: Applications fail with cryptic errors because /tmp is full.

Solution: find /tmp -type f -mtime +1 -delete; or set tmpfiles.d rules; or move /tmp to tmpfs (RAM).

#010 fsck failed during boot

Description: Boot stops at “fsck failed; entering emergency shell”.

Root cause: Filesystem check found errors it couldn’t auto-repair.

Solution: Run fsck -y /dev/sdaN at the rescue prompt (answer yes to all). If repeatedly broken, the disk is failing — replace it.

Conclusion

Five habits that prevent most filesystem incidents:

  1. Monitor df -h AND df -i — both can fail independently.
  2. Configure logrotate on every server with logs.
  3. Run smartctl -a /dev/sdX as a periodic health check; replace pre-fail disks.
  4. Keep a separate /var mount so logs filling up doesn’t take down /.
  5. Test backups by restoring; don’t learn during an outage.

Related Linux Admin articles

Leave a Reply