Storage and RAID Errors
RAID and LVM errors are the highest-stakes Linux issues — data integrity is on the line. Most of them follow a pattern: a disk returns errors, the array degrades, the rebuild begins, and either succeeds (good) or fails halfway with another disk dropping (bad). The ten errors below are what the on-call sees when storage hardware misbehaves.
#081 RAID array degraded
Solution: cat /proc/mdstat shows status; mdadm --detail /dev/md0 for full diagnosis; failed disk: mdadm /dev/md0 --remove /dev/sdc && mdadm /dev/md0 --add /dev/sde.
#082 RAID rebuild failed (second disk dropped during recovery)
Description: Worst-case scenario; array is now in inconsistent state.
Solution: Stop writes immediately. Image the failing disk with ddrescue if possible. Restore from backup. Don’t guess — this is when you call the storage vendor.
#083 LVM: Volume group not found
Solution: vgscan --cache; vgchange -ay vg_name to activate; check pvs — missing physical volume?
#084 LVM thin pool 100% full
Description: Filesystem and OS appear fine but writes fail.
Solution: lvs shows pool usage; immediate fix: extend the pool with lvextend -L+50G vg/pool; long-term: enable autoextend in /etc/lvm/lvm.conf.
#085 SMART: Pre-fail attribute
Description: smartctl -a /dev/sdX shows reallocated_sector_ct or pending_sector_count incrementing.
Solution: Replace the disk preemptively. SMART pre-fail is the warning shot before catastrophic failure.
#086 multipath device: failed all paths
Solution: multipath -ll; check fabric (FC switches, iSCSI portals); verify LUN is still mapped on the array; iscsiadm -m session -R to rescan.
#087 LVM snapshot exceeded its CoW pool
Solution: Snapshot is unusable when full. Either extend it (lvextend) before changes accumulate, or accept the loss and lvremove.
#088 mdadm: insufficient devices to start array
Description: Two disks lost from a RAID 5; not enough left for parity.
Solution: Force assemble at risk: mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdd1. Last resort — data integrity not guaranteed.
#089 iSCSI session lost
Solution: journalctl -u iscsid; iscsiadm -m session to list; iscsiadm -m node -T target -p portal --login to reconnect.
#090 fstrim / discard not supported
Description: SSDs need TRIM to maintain performance; fstrim fails on certain filesystems/drivers.
Solution: Verify with lsblk -d -o NAME,DISC-GRAN,DISC-MAX; for thin-provisioned LVM: enable in /etc/lvm/lvm.conf (issue_discards = 1).
Conclusion
- Monitor
mdadm --detailoutput via Prometheus/check_mk; a degraded array is silent without monitoring. - Replace SMART pre-fail disks BEFORE they fail completely.
- Test backup restores quarterly. The first time you find your backup is broken should not be during an outage.
- Use
ddrescuenotddwhen imaging dying disks. - RAID is not backup. Snapshots aren’t backups either. Real backups go off-host.
Related Linux Admin articles
- Linux Disk & Filesystem — the lsblk/mount/fstab reference
- Linux Filesystem & Disk Errors — for higher-level FS issues
- Linux Performance & Observability — for slow-but-not-broken storage