grep is a command-line tool for searching line matching a regular expression pattern. In Unix-like systems, like Linux or macOS, everything is a file, and more precisely everything is a stream of bytes. A file is a collection of bytes that you can read and/or write. A reference to such file is called a file descriptor (fd). This approach allow us to use the same set of system calls to access a given resource, and subsequently using the same set of tools, like grep.
Write programs to handle text streams, because that is a universal interface.
— Doug Mcillroy, Unix pioneer
If you know how to handle text streams, you can make your Linux life a lot easier.
Of these great text-processing tools, grep
is one of the oldest.
Everything it does is based on finding regular patterns in lines of
text, and printing them. Yet, despite its age and its simplicity, grep
provides you a large amount of power and flexibility.
As you learn to use grep
, you’ll find more and more problems that have
easy grep
solutions.
Here are some typical uses of grep
:
- Finding where an expression occurs in a file or directory
- Counting how many times an expression occurs
- Filtering output from a different program.
grep
combines very well with other programs. The last part of this
how-to section gives you an example of how to use grep
to make a
simple reporting tool.
Tutorials: searching for text patterns in Romeo and Juliet
Finding text patterns is probably the most common use of grep
.
First, we need a source file to work with. Any text that has multiple lines is a good place to start. How about a play by Shakespeare? (Server logs are a more traditional example).
If you want to follow along, you can use
Project Guttenberg’s public-domain version of Romeo and Juliet. You can
download the file from your shell using curl
, like this:
curl -o romeo-and-juliet.txt https://www.gutenberg.org/files/1513/1513-h/1513-h.htm
Now, let’s define a goal. In the play, “Nurse” is an important person. Our task is to find the answer to these questions:
- How many times does the word “nurse” get spoken in the play?
- How many times does the person, “Nurse,” speak?
Tutorial 1. Find all lines where the word “nurse” is spoken
Inspect the file for patterns.
To make precise queries, it helps to know something about the structure of the file. A quick look at
romeo-and-juliet.txt
tells us that:- A person’s name is introduced in all capitals
- A person’s speech begins with their name in all-caps followed by
a period, e.g.
ROMEO.
- All dialogue is separated by a blank line
- Stage direction is written in brackets with underscores, e.g.
[\_Exit Nurse._]
Find a simple expression.
Let’s look for the expression
nurse
:grep "nurse" romeo-and-juliet.txt
Surprisingly, this yields only two results, a very small number for a major character!
But, the results make sense. In this play, “Nurse” is a proper name. So, the character’s name will start with a capital
N
.Search again with proper capitalization:
grep "Nurse" romeo-and-juliet.txt
Now we’ve got many more results.
Search for multiple case patterns
If you want to find all times that the word “Nurse” appears in the play’s dialogue, the first obvious thing to do would be a case-insensitive search.
grep
has an-i
flag, which lets you search for all cases.grep -i "Nurse" romeo-and-juliet.txt
But wait─now we have too many results! Many of the output lines just say
NURSE.
These lines indicate who’s speaking the dialogue.For now, we want to find only lines where the word “nurse” is spoken inside the text. That is, we want to print lines with the expressions “Nurse” and “nurse”
Use bracket expressions to find multiple combinations
To search for
Nurse
andnurse
together, the simplest way is with a bracket expression. Bracket expressions match any character in the brackets.grep "[nN]urse" romeo-and-juliet.txt
Almost there! But, there are still the stage directions, like this
[\_Exit Nurse_]
Filter output using regex, pipes, and the
-v
option.We already have most of the output we need. We just need to filter out where
Nurse
appears between the expressions[_
and_]
. To do this, we can use a regular expression:[_.*Nurse.*_]
..*
means zero or more of any character.A simple way narrow furter is to pipe our output into another
grep
command. The-v
flag inverts matches: it prints only lines that do not contain the expression.grep "[nN]urse" romeo-and-juliet.txt | grep -v "\[_.*Nurse.*_]"
Great! We have successfully…oh, wait.
It seems that Gutenberg uses a special format for stage directions when a person enters. So, we also need to filter out lines that look like this:
Re-enter Nurse. Enter Nurse.
Use the
-e
flag for multiple expressionsgrep "[Nn]urse" romeo-and-juliet.txt \ | grep -v -e " Re-enter\| Enter" -e "\[_.*Nurse.*_]"
👉 We must escape the
|
character. Otherwise,grep
will interpret it literally.
Congrats! Now we’ve really found every line where the word “nurse” is spoken in Romeo and Juliet. To do this, we used a combination of search, simple regex, and filters. This script is not very efficient or robust, but, for the purpose of our task, it’s acceptable.
The text is not going to change, and it’s not very long. Once our search prints what we need, we probably don’t need to run many more times.
If the text were long and dynamically changing, and the script needed to be run often, we’d probably need a more robust solution. See When grep is not so great.
Tutorial 2: Find all times Nurse speaks
This task is simpler. We’ve already discovered that the text introduces dialogue by printing the speaker’s name in all caps, followed by a period.
Use
grep
to print all, broad matchesgrep "NURSE" romeo-and-juliet.txt
Inspect output for false positives
Everything looks good, except the first printed line.
NURSE to Juliet.
We need to include the period after the character’s name.
Use
grep
to print more specific matchesIn this case, escape the
.
character. If you don’t,grep
will match any character that follows the expressionNURSE
.grep "NURSE\." romeo-and-juliet.txt
Use the
-c
option to count lines.grep -c "Nurse\." romeo-and-juliet.txt
Now you know: the Nurse speaks 90 times in the play.
How to grep
To become a grep
master, there are two things you need to memorize:
- The command options
- The regex patterns
Learning these is mostly a matter of memorization and practice.
After you’ve memorized these, you’ll develop your own method for using
grep
effectively. The steps in the
second tutorial
demonstrate a common process for problem-solving with grep
(and all
regex).
- First, make a general search. Don’t get too complicated!
- Inspect your output for false positives and for missing lines.
- Make a more precise query, inspect again.
- Repeat until
grep
prints exactly what you need
Now that you’ve used grep
to solve some specific problems, it’s time
to look at how to use grep
in general cases.
How to grep
over multiple files
There are a few ways to grep
over multiple files:
- With multiple arguments
- With file globbing
- With recursive search.
To grep
with multiple files, just pass the files as arguments.
grep "bash" file1.sh file2.sh
You can also grep
using file globs. This command checks for the
expression bash
across all shell files in the directiory.
grep "bash" *.sh
The -r
option searches recursively through directories. It is one of
the handiest options.
This command searches for all files with bin/bash
in your scripts
directory:
grep -r "bin/.*sh" ~/scripts
This should match all shells invoked, including /bin/bash
,
/bin/dash
, bin/env bash
, etc.
To add a little context to this script, use the -A
option to print two
lines after the pattern.
grep -rA 2 "bin/.*sh" ~/scripts
How to exclude results
There are multiple ways to exclude searches.
You might want to use the -v
option to exclude lines. This was
demonstrated in the
first tutorial.
In a recursive search, you might want to search through only lines with
a certain extension. In these cases, you can use the --exclude
option.
For example, to exclude yaml files, do something like this:
grep -rA 2 --exclude="*.yaml" "bin/.*sh" ~/scripts
Or, to exclude all .git
directories, use --exclude-dir
:
grep -rA 2 --exclude-dir=".git" "bin/.*sh" ~/scripts
How to combine grep
with other programs
grep
can take input from standard out. It’s often handy to use grep
to filter output from another program. For example, to print all Firefox
processes, you could run this command.
ps -ax|grep "firefox"
You can also pipe grep
output to another program. For example, if a
command’s output is too large to fit on your screen, you might want to
pipe grep
to less
.
grep -rA 2 "bin/.*sh" ~/scripts | less
How to use grep
with regex
Knowing how to use regex patterns can be really useful for grepping.
Here’s the last Shakespeare example: how would you search for every
derivative form of the word love
in Romeo and Juliet?
A match should print lines with forms like “lovers” or “loving,” but not lines with only the base word, “love.”
First, you can make your search case-insensitive.
grep -i "love" romeo-and-juliet.txt
However, this prints false positives, like “glove”.
Add the word boundary character,
\b
.grep -i "\blove\b" romeo-and-juliet.txt
Now you’ve got all matches of
love
, including the word itself. But the task requires only derivatives of the word.Extend the base pattern with the
.
character.The base expression is
lov
. This expression is in all derivatives, like “loving” and for “lover”. To match any character, use the.
grep -i "\blov.\+\b" romeo-and-juliet.txt
Unfortunately, this commands prints exactly what we don’t want: only lines with the word “love”.
Use the
\w
word character to search for other word characters.\w
searches of all word-like charcters, i.e. alphanumerics.grep -i "\blov.\w\b" romeo-and-juliet.txt
Much better! But this matches only 5-letter derivations, like “lover”.
Expand the pattern with
\+
The
\+
character matches one or more instances of the preceding item. In this instance, it searches for one or more instance of a\w
character (i.e., any word character).grep -i "\blov.\+\b" romeo-and-juliet.txt
Use the
-E
flag to avoid escaping all special charaters, like+
.
Bingo! Let’s look at the first five results:
lovers
loving
lovers
lovers
lov’d
Besides predictable forms, like “loving”, there’s also surprising
antiquated forms, like “lov’d”. A good grep
can be pleasantly
surprising.
How to use grep
with process substition
Maybe you want to search for a regular expression that changes
dynamically. In these cases, you can use process substition and
variables to pass expressions to grep
.
For example, consider a log file where each line starts with a date, in
the format YYYY-MM-DD
. Something like.
2021-12-02 <more-text>
2021-12-02 <more-text>
2021-12-01 <more-text>
2021-11-29 <more-text>
...
1970-01-01 <more-text>
Perhaps you want to know about how many times an event was logged in the
current month. With process substition, you could use grep
to create a
dynamic report:
#!/bin/bash
#Print out how many times an event occurred this month
month=$(date +%B) # gets name of month
search=$(date +%Y-%m) # makes a search term from date, in YYYY-MM format
file=long-text.log
count=$(grep -c "^$search" "$file") "uses expression to count events in $file"
echo "This month, $month, has logged $count events.
When is grep not so great?
The beauty of grep
is its simplicity. Don’t get too complicated.
In the examples of the tutorial and how-to, the data is well-structured, and the queries are relatively simple.
For example, the Romeo and Juliet text has a very precise way of defining when dialogue happens, and how stage direction happens. The log file might be very long file, but it also has very regular patterns. Every line begins with a date in one format.
For simple searches, or for matches across a large set of files, grep
is very powerful.
However if you want to manipulate text, or work with specific fields of
a file, you’ll probably want to use a more specific tool, like sed
or
awk
.
For advanced text searching, like with natural langauge processing, it’s probably time to use a language with dedicated libraries to help you achieve your task.
Supplementary links
Want more grep
? Here’s some grep
-related links:
- Video: Brian Kernighan talks about the origins of
grep
. One night in 1971. - The GNU
grep
manual. Everything that’s possible with GNU’s implementation ofgrep
. - Why is GNU
grep
so fast? A technical discussion of an implementation.
The whole point with “everything is a file” is not that you have some random filename (indeed, sockets and pipes show that “file” and “filename” have nothing to do with each other), but the fact that you can use common tools to operate on different things.
[…]
The UNIX philosophy is often quoted as “everything is a file”, but that really means “everything is a stream of bytes”— Linus Torvalds, Newsgroups: fa.linux.kernel