Linux Admin

Linux Archives & Compression: tar, gzip, xz, zstd, zip

Part of pathway: Linux Mastery: 300 Commands

Two Jobs: Bundle and Squeeze

Linux separates archiving (bundling many files into one) from compression (making a single file smaller). tar bundles. gzip, bzip2, xz, and zstd compress. The classic .tar.gz file is one of each, layered. zip is the only common tool that does both at once, which is part of why it’s the cross-platform choice.

This article is a working reference for Linux archives and compression: tar in all its forms, picking the right compressor for the trade-off you care about, and the recipes that come up daily.

The Compressors at a Glance

Tool Speed Ratio Best for
gzip / .gz fast OK everywhere — ubiquitous default
bzip2 / .bz2 slow good tighter than gzip; legacy
xz / .xz slowest best distribution archives, long-term storage
zstd / .zst fast good modern default — near-gzip speed, near-xz ratio
lz4 / .lz4 fastest worst real-time / streaming where latency matters

For new work, prefer zstd. It’s the modern Pareto-optimal choice. For compatibility (sending to someone older), gzip.

tar — the Tape ARchive

Create

tar -cf out.tar dir/                     # bundle, no compression
tar -czf out.tar.gz dir/                 # gzip
tar -cjf out.tar.bz2 dir/                # bzip2
tar -cJf out.tar.xz dir/                 # xz (capital J)
tar -caf out.tar.zst dir/                # auto-detect from extension (modern tar)
tar -czvf out.tar.gz dir/                # add v for verbose
tar -czf out.tar.gz --exclude='*.log' dir/
tar -czf out.tar.gz --exclude-from=excludes.txt dir/

Extract

tar -xf in.tar                           # auto-detects compression (modern tar)
tar -xzf in.tar.gz                       # explicit gzip
tar -xJf in.tar.xz                       # explicit xz
tar -xzf in.tar.gz -C /tmp/              # extract TO /tmp/
tar -xzvf in.tar.gz                      # verbose
tar -xzf in.tar.gz path/to/file          # extract a single file

Inspect

tar -tzf in.tar.gz                       # list contents (no extract)
tar -tzvf in.tar.gz                      # list with sizes/perms
tar -tzf in.tar.gz | head                # peek at top entries

Memorizing the Flags

The flags are infamous. Mnemonic:

  • c = create
  • x = extract
  • t = list
  • f = file (always followed by filename)
  • v = verbose
  • z = gzip
  • j = bzjp2 (mnemonic stretch)
  • J = xJ (capital)

Modern tar (1.30+) auto-detects compression from the file extension on extract, so you can usually skip the z/j/J on extract. tar -xf works for any compression.

Standalone Compressors

gzip file.txt                            # creates file.txt.gz, removes original
gzip -k file.txt                         # keep original
gunzip file.txt.gz                       # decompress
zcat file.txt.gz                         # cat without decompressing to disk
zless file.txt.gz                        # less, but compressed
zgrep "ERROR" app.log.gz                  # grep without decompressing first

bzip2 / bunzip2 / bzcat / bzless / bzgrep    # same family for bz2
xz / unxz / xzcat / xzless / xzgrep          # same for xz
zstd / unzstd / zstdcat / zstdless / zstdgrep  # same for zstd

The zcat family is genuinely useful. Need to grep through last week’s rotated logs?

zgrep -h "ERROR" /var/log/app.log.*.gz | sort | uniq -c | sort -rn

zip — When You Need Cross-Platform

zip -r archive.zip dir/                  # create
zip -r archive.zip dir/ -x '*.git/*'     # exclude
unzip archive.zip                        # extract
unzip -l archive.zip                     # list contents
unzip -d /tmp/ archive.zip               # extract to dir
zip -e secret.zip files/                 # password-protected (weak)

Use zip when sharing with Windows users. The encryption is weak; for real protection use gpg or 7-Zip’s AES.

7z — Strong Encryption + Best Compression

7z a archive.7z dir/                     # add
7z a -p archive.7z dir/                  # password-protected (AES-256)
7z a -p -mhe=on archive.7z dir/          # encrypt headers too (filenames hidden)
7z x archive.7z                          # extract
7z l archive.7z                          # list

Useful Recipes

Backup with timestamp

tar -czf backup-$(date +%Y%m%d).tar.gz /var/www/

Verify an archive

tar -tzf backup.tar.gz > /dev/null && echo "archive is OK"

Stream a backup over SSH (no temp file)

tar -czf - /data | ssh remote "cat > /backup/data-$(date +%F).tar.gz"
ssh remote "tar -czf - /data" | tar -xzf - -C /local/

Compress with parallel CPUs

tar -cf - dir/ | pigz > out.tar.gz       # parallel gzip
tar -cf - dir/ | pbzip2 > out.tar.bz2    # parallel bzip2
tar --use-compress-program='zstd -T0' -cf out.tar.zst dir/

pigz (parallel gzip) is a near-drop-in replacement that uses all your CPU cores. Worth installing on big-archive boxes.

Common Pitfalls

  • tar flag confusion. tar c (create) vs tar x (extract). Wrong one + matching filename = data loss.
  • tar creating from /. tar -czf /backup/full.tar.gz / recursively includes the backup itself. Use --exclude=/backup.
  • Absolute path warnings. By default, tar strips leading / for safety. To keep absolute paths: tar --absolute-names (rarely the right choice).
  • Compression of already-compressed. Re-gzipping a JPEG or PDF saves nothing and burns CPU. tar -cf without compression for media-heavy archives.
  • xz on a memory-constrained box. xz’s default level uses several GB of RAM. Use xz -T1 or lower compression level on small VMs.
  • Forgetting -k with gzip. gzip file removes the original. gzip -k file keeps it.

Conclusion

Five habits:

  1. For new work: zstd. For compatibility: gzip.
  2. Always test extract: tar -tzf foo.tar.gz | head.
  3. Use --exclude patterns for log/cache directories.
  4. Stream backups over SSH instead of staging through temp files.
  5. pigz on multi-core machines for free speedup.

Related Linux Admin troubleshooting

For common errors and fixes related to this topic, see:

Leave a Reply