Command Line Trick: Filter text files based on a list of strings¶
Let's say you want to remove some lines from a large text file programmatically. For example, you want to remove every line that contains certain IDs, but you want to keep the rest of the lines intact.
You can use the command line utility
grep with option
-v to find all the lines in the file that do NOT contain your search term(s). You can make a file with a list of several search terms and use that file with
grep using the
-f option as follows:
grep -v -f your_list.txt target_file.tsv | tee out_file.tsv
The target file is your text file from which you wish to remove lines. The text file can be of type
oboetc. For example, you wish to filter a file with these lines:
keep this 1 this line is undesired 2, so you do not wish to keep it keep this 3 keep this 4 keep this 5 keep this 6 something undesired 2 this line is undesired 1 keep this 7
your_list.txtis a text file with your list of search terms. Format: one search term per line. For example:
undesired 1 undesired 2
teewill redirect the standard output to both the terminal and write it out to a file.
You expect the
out_file.tsvto contain lines:
keep this 1 keep this 3 keep this 4 keep this 5 keep this 6 keep this 7
Do the filtering and updating of your target file in one step¶
You can also do a one-step filter-update when you are confident that your filtering works as expected, or if you have a backup copy of your
cat and pipe the contents of your text file as the input for
grep. Redirect the results to both your terminal and overwrite your original file so it will contain only the filtered lines.
cat target_file.tsv | grep -v -f your_list.txt | tee target_file.tsv