The Unix Philosophy in One Pipeline
The Linux command line is built on a 50-year-old idea: small tools that do one thing well, connected by pipes that pass text streams between them. Master four utilities — grep, awk, sed, and sort — and you can transform almost any text data without writing a script. Add uniq, wc, cut, and tr for the supporting cast.
This article is a working reference for Linux text processing on the command line, with the option flags that matter and the pipeline patterns that come up in production work.
grep — Find Lines That Match
grep "ERROR" app.log # lines containing ERROR
grep -i "error" app.log # case-insensitive
grep -v "DEBUG" app.log # invert: lines NOT matching
grep -r "TODO" src/ # recursive search in directory
grep -n "def main" *.py # show line numbers
grep -E "^(WARN|ERROR)" app.log # extended regex (alternation)
grep -A 3 -B 1 "ERROR" app.log # show 3 lines after, 1 before
grep -c "404" access.log # just the count
grep -l "TODO" *.py # filenames only
grep -L "TODO" *.py # filenames that DON'T match
For most modern work, prefer ripgrep (rg) if installed — it’s faster, respects .gitignore, and uses sane defaults. But every Linux system ships with grep, so know it cold.
awk — Field-Oriented Filtering
Treat each input line as a record split into fields. $1 through $NF are the fields; default delimiter is whitespace.
awk '{print $1}' access.log # first column
awk '{print $1, $7}' access.log # IP and URL
awk -F: '{print $1}' /etc/passwd # custom field separator
awk '$3 > 1000 {print $1}' data.txt # filter by numeric condition
awk 'NR==1 || NR==10' file.txt # rows 1 and 10
awk 'END {print NR}' file.txt # total line count
awk '/error/ {count++} END {print count}' log # count matching lines
awk -F'[,;]' '{print $2}' data.csv # multi-char delimiter
awk is a full programming language. For one-liners, the pattern condition { action } is the core. The default action is print $0; the default condition is true (every line).
sed — Stream Editor
sed 's/old/new/' file.txt # replace first occurrence per line
sed 's/old/new/g' file.txt # replace all occurrences
sed -i 's/old/new/g' file.txt # in-place edit (modifies file!)
sed -i.bak 's/old/new/g' file.txt # in-place with backup
sed '/^#/d' config # delete comment lines
sed '5,10d' file.txt # delete lines 5-10
sed -n '5,10p' file.txt # print only lines 5-10
sed 's|/old/path|/new/path|g' file # alternate delimiter (avoid escaping /)
Use sed -i with caution. It silently overwrites the file. Always test the substitution without -i first, OR use -i.bak so you have a backup if it goes wrong.
sort and uniq
sort file.txt # alphabetical
sort -n file.txt # numeric (so 100 comes after 9)
sort -r file.txt # reverse
sort -k2 file.txt # by second column
sort -k2 -n -r file.txt # second column, numeric, reverse
sort -t: -k3 -n /etc/passwd # delimiter colon, third field, numeric
sort -u file.txt # unique (also see uniq)
uniq file.txt # remove adjacent duplicates
uniq -c file.txt # prefix count of duplicates
uniq -d file.txt # show only duplicates
sort file.txt | uniq -c | sort -rn # frequency-sort (the classic)
uniq only removes adjacent duplicates — you almost always want to sort first. The classic sort | uniq -c | sort -rn pipeline produces a frequency-ranked list, ubiquitous in log analysis.
Counting and Slicing
wc -l file.txt # line count
wc -w file.txt # word count
wc -c file.txt # byte count
head -20 file.txt # first 20 lines (default 10)
tail -20 file.txt # last 20 lines
tail -f app.log # follow new appends (live)
tail -f app.log | grep ERROR # live filtered tail
cut -d: -f1 /etc/passwd # field 1, colon-delimited
cut -c1-10 file.txt # first 10 characters per line
tr — Character-Level Transforms
echo "hello" | tr 'a-z' 'A-Z' # uppercase
echo "a,b,c" | tr ',' '\n' # comma to newline
tr -d '\r' < dos.txt > unix.txt # strip Windows CR characters
tr -s ' ' # squeeze runs of spaces to one
tr -cd '[:print:]' < binary > ascii # keep only printable chars
Putting It Together — Real Pipelines
Top 10 IPs hitting your nginx with 404s
grep " 404 " /var/log/nginx/access.log \
| awk '{print $1}' \
| sort \
| uniq -c \
| sort -rn \
| head -10
Active SSH users sorted by login count
last \
| awk '$1 != "" && $1 != "wtmp" {print $1}' \
| sort \
| uniq -c \
| sort -rn
Find every file containing “TODO” modified in the last week
find . -mtime -7 -type f -exec grep -l "TODO" {} \;
Replace a config value across many files
find /etc/nginx -name "*.conf" \
-exec sed -i.bak 's/listen 80;/listen 443 ssl;/g' {} \;
xargs — Feed Output as Arguments
find . -name "*.tmp" | xargs rm # delete found files
find . -name "*.tmp" -print0 | xargs -0 rm # null-separated for filenames with spaces
echo "file1 file2" | xargs touch # create both files
ls *.log | xargs -I {} mv {} archive/ # placeholder substitution
Always use -print0 with xargs -0 when dealing with files. Filenames with spaces, newlines, or special characters break the default whitespace-separated mode.
Common Pitfalls
grepregex vs literal.grep -F "1.2.3.4"for fixed-string match (no regex interpretation of dots). Otherwise1.2.3.4matches almost anything.sed -idestroys files silently. Test without-ifirst.uniqneeds sorted input. Pipe throughsortfirst or it only removes adjacent duplicates.awkfield reset on every line. Don’t expect$1from one line to persist into the next.tail -fdoesn’t follow rotation. Usetail -F(capital) to handle log-rotated files.xargswithout-0on filenames. One file with a space in the name and the whole pipeline misbehaves.
Conclusion
Five compounding habits:
- Pipe everything.
command | head -20tests cheaply, then drop the head when you’re confident. - Always
grep -iby default for log searches. Cases vary unpredictably. - Reach for
awkwhen the data is column-oriented,sedfor line-oriented edits,cutfor the simplest field extraction. sort | uniq -c | sort -rnis your friend. Memorize it.- Always test
sed -ichanges on a copy first, or usesed -i.bak.
Related Linux Admin troubleshooting
For common errors and fixes related to this topic, see: