Command Line Trick: Filter text files based on a list of strings¶
Let's say you want to remove some lines from a large text file programmatically. For example, you want to remove every line that contains certain IDs, but you want to keep the rest of the lines intact.
You can use the command line utility grep
with option -v
to find all the lines in the file that do NOT contain your search term(s). You can make a file with a list of several search terms and use that file with grep
using the -f
option as follows:
Explanation¶
- The target file is your text file from which you wish to remove lines. The text file can be of type
csv
,tsv
,obo
etc. For example, you wish to filter a file with these lines:
keep this 1 this line is undesired 2, so you do not wish to keep it keep this 3 keep this 4 keep this 5 keep this 6 something undesired 2 this line is undesired 1 keep this 7
- The file
your_list.txt
is a text file with your list of search terms. Format: one search term per line. For example:
undesired 1 undesired 2
-
The utility
tee
will redirect the standard output to both the terminal and write it out to a file. -
You expect the
out_file.tsv
to contain lines:
keep this 1 keep this 3 keep this 4 keep this 5 keep this 6 keep this 7
Do the filtering and updating of your target file in one step¶
You can also do a one-step filter-update when you are confident that your filtering works as expected, or if you have a backup copy of your target_file.tsv
.
Use cat
and pipe the contents of your text file as the input for grep
. Redirect the results to both your terminal and overwrite your original file so it will contain only the filtered lines.