Saturday, May 4, 2024
 Popular · Latest · Hot · Upcoming
8
rated 0 times [  8] [ 0]  / answers: 1 / hits: 38218  / 2 Years ago, wed, july 20, 2022, 8:07:52

I would like to know if there is a way to combine a series of grep statements where the effect is to "and" the expressions rather than "or" the matching expressions.



Demo below:



./script  
From one grep statement, I want output like this
a b c

not like this
a
c
a b
a b c
a b c d


Hear is a look at the script.



 #!/bin/bash
string="a
b
c
d
a b
a b c
a b c d"

echo -e " From one grep statement I want output like this"
echo "$string" |
grep a |grep c |grep -v d #Correct output but pipes three grep statements

echo -e "
Not like this"
echo "$string" |
grep -e'a' -e'c' -e-v'd' #One grep statement but matching expressions are "or" versus "and"

More From » grep

 Answers
0

You cannot transform the filter grep a | grep c | grep -v d to a single simple grep. There are only complicated and ineffective ways. The result has slow performance and the meaning of the expression is obscured.



Single command combination of the three greps



If you just want to run a single command you can use awk which works with regular expressions too and can combine them with logical operators. Here is the equivalent of your filter:



awk '/a/ && /c/ && $0 !~ /d/'


I think in most cases there is no reason for simplifying a pipe to a single command except when the combination results in a realatively simple grep expression which could be faster (see results below).



Unix-like systems are designed to use pipes and to connect various utilities together. Though the pipe communication is not the most effective possible but in most cases it is sufficient. Because nowadays most of new computers have multiple CPU cores you can "naturally" utilize CPU parallelization just by using a pipe!



Your original filter works very well and I think that in many cases the awk solution would be a little bit slower even on a single core.



Performance comparison



Using a simple program I have generated a random testing file with 200 000 000 lines, each with 4 characters as a random combination from characters a, b, c and d. The file has 1 GB. During the tests it was completely loaded in the cache so no disk operations affected the performance measurement. The tests were run on Intel dual core.



Single grep



$ time ( grep -E '^[^d]*a[^d]*c[^d]*$|^[^d]*c[^d]*a[^d]*$' testfile >/dev/null )
real 3m2.752s
user 3m2.411s
sys 0m0.252s


Single awk



$ time ( awk '/a/ && /c/ && $0 !~ /d/' testfile >/dev/null )
real 0m54.088s
user 0m53.755s
sys 0m0.304s


The original three greps piped



$ time ( grep a testfile | grep c | grep -v d >/dev/null )
real 0m28.794s
user 0m52.715s
sys 0m1.072s


Hybrid - positive greps combined, negative piped



$ time ( grep -E 'a.*c|c.*a' testfile | grep -v d >/dev/null )
real 0m15.838s
user 0m24.998s
sys 0m0.676s


Here you see that the single grep is very slow because of the complex expression. The original pipe of three greps is pretty fast because of a good parallelization. Without parallelization - on a single core - the original pipe runs just slightly faster than awk which as a single process is not parallelized. Awk and grep probably use the same regular expressions code and the logic of the two solutions is similar.



The clear winner is the hybring combining two positive greps and leaving the negative one in the pipe. It seems that the regular expression with | has no performance penalty.


[#29913] Thursday, July 21, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
musining

Total Points: 171
Total Questions: 124
Total Answers: 121

Location: Zambia
Member since Thu, Jun 25, 2020
4 Years ago
;