Two Jobs: Bundle and Squeeze
Linux separates archiving (bundling many files into one) from compression (making a single file smaller). tar bundles. gzip, bzip2, xz, and zstd compress. The classic .tar.gz file is one of each, layered. zip is the only common tool that does both at once, which is part of why it’s the cross-platform choice.
This article is a working reference for Linux archives and compression: tar in all its forms, picking the right compressor for the trade-off you care about, and the recipes that come up daily.
The Compressors at a Glance
| Tool | Speed | Ratio | Best for |
|---|---|---|---|
gzip / .gz |
fast | OK | everywhere — ubiquitous default |
bzip2 / .bz2 |
slow | good | tighter than gzip; legacy |
xz / .xz |
slowest | best | distribution archives, long-term storage |
zstd / .zst |
fast | good | modern default — near-gzip speed, near-xz ratio |
lz4 / .lz4 |
fastest | worst | real-time / streaming where latency matters |
For new work, prefer zstd. It’s the modern Pareto-optimal choice. For compatibility (sending to someone older), gzip.
tar — the Tape ARchive
Create
tar -cf out.tar dir/ # bundle, no compression
tar -czf out.tar.gz dir/ # gzip
tar -cjf out.tar.bz2 dir/ # bzip2
tar -cJf out.tar.xz dir/ # xz (capital J)
tar -caf out.tar.zst dir/ # auto-detect from extension (modern tar)
tar -czvf out.tar.gz dir/ # add v for verbose
tar -czf out.tar.gz --exclude='*.log' dir/
tar -czf out.tar.gz --exclude-from=excludes.txt dir/
Extract
tar -xf in.tar # auto-detects compression (modern tar)
tar -xzf in.tar.gz # explicit gzip
tar -xJf in.tar.xz # explicit xz
tar -xzf in.tar.gz -C /tmp/ # extract TO /tmp/
tar -xzvf in.tar.gz # verbose
tar -xzf in.tar.gz path/to/file # extract a single file
Inspect
tar -tzf in.tar.gz # list contents (no extract)
tar -tzvf in.tar.gz # list with sizes/perms
tar -tzf in.tar.gz | head # peek at top entries
Memorizing the Flags
The flags are infamous. Mnemonic:
c= createx= extractt= listf= file (always followed by filename)v= verbosez= gzipj= bzjp2 (mnemonic stretch)J= xJ (capital)
Modern tar (1.30+) auto-detects compression from the file extension on extract, so you can usually skip the z/j/J on extract. tar -xf works for any compression.
Standalone Compressors
gzip file.txt # creates file.txt.gz, removes original
gzip -k file.txt # keep original
gunzip file.txt.gz # decompress
zcat file.txt.gz # cat without decompressing to disk
zless file.txt.gz # less, but compressed
zgrep "ERROR" app.log.gz # grep without decompressing first
bzip2 / bunzip2 / bzcat / bzless / bzgrep # same family for bz2
xz / unxz / xzcat / xzless / xzgrep # same for xz
zstd / unzstd / zstdcat / zstdless / zstdgrep # same for zstd
The zcat family is genuinely useful. Need to grep through last week’s rotated logs?
zgrep -h "ERROR" /var/log/app.log.*.gz | sort | uniq -c | sort -rn
zip — When You Need Cross-Platform
zip -r archive.zip dir/ # create
zip -r archive.zip dir/ -x '*.git/*' # exclude
unzip archive.zip # extract
unzip -l archive.zip # list contents
unzip -d /tmp/ archive.zip # extract to dir
zip -e secret.zip files/ # password-protected (weak)
Use zip when sharing with Windows users. The encryption is weak; for real protection use gpg or 7-Zip’s AES.
7z — Strong Encryption + Best Compression
7z a archive.7z dir/ # add
7z a -p archive.7z dir/ # password-protected (AES-256)
7z a -p -mhe=on archive.7z dir/ # encrypt headers too (filenames hidden)
7z x archive.7z # extract
7z l archive.7z # list
Useful Recipes
Backup with timestamp
tar -czf backup-$(date +%Y%m%d).tar.gz /var/www/
Verify an archive
tar -tzf backup.tar.gz > /dev/null && echo "archive is OK"
Stream a backup over SSH (no temp file)
tar -czf - /data | ssh remote "cat > /backup/data-$(date +%F).tar.gz"
ssh remote "tar -czf - /data" | tar -xzf - -C /local/
Compress with parallel CPUs
tar -cf - dir/ | pigz > out.tar.gz # parallel gzip
tar -cf - dir/ | pbzip2 > out.tar.bz2 # parallel bzip2
tar --use-compress-program='zstd -T0' -cf out.tar.zst dir/
pigz (parallel gzip) is a near-drop-in replacement that uses all your CPU cores. Worth installing on big-archive boxes.
Common Pitfalls
tarflag confusion.tar c(create) vstar x(extract). Wrong one + matching filename = data loss.tarcreating from/.tar -czf /backup/full.tar.gz /recursively includes the backup itself. Use--exclude=/backup.- Absolute path warnings. By default,
tarstrips leading/for safety. To keep absolute paths:tar --absolute-names(rarely the right choice). - Compression of already-compressed. Re-gzipping a JPEG or PDF saves nothing and burns CPU.
tar -cfwithout compression for media-heavy archives. xzon a memory-constrained box.xz’s default level uses several GB of RAM. Usexz -T1or lower compression level on small VMs.- Forgetting
-kwith gzip.gzip fileremoves the original.gzip -k filekeeps it.
Conclusion
Five habits:
- For new work:
zstd. For compatibility:gzip. - Always test extract:
tar -tzf foo.tar.gz | head. - Use
--excludepatterns for log/cache directories. - Stream backups over SSH instead of staging through temp files.
pigzon multi-core machines for free speedup.
Related Linux Admin troubleshooting
For common errors and fixes related to this topic, see: