Saturday, May 18, 2024
16
rated 0 times [  16] [ 0]  / answers: 1 / hits: 10069  / 2 Years ago, sun, june 26, 2022, 1:26:02

I often use grep to find files having a certain entry like this:



grep -R 'MyClassName'


The good thing is that it returns the files, their contents and marks the found string in red. The bad thing is that I also have huge files where the entire text is written in one big single line. Now grep outputs too much when finding text within those big files. Is there a way to limit the output to for instance 5 words to the left and to the right? Or maybe limit the output to 30 letters to the left and to the right?


More From » command-line

 Answers
6

grep itself only has options for context based on lines. An alternative is suggested by this SU post:




A workaround is to enable the option 'only-matching' and then to use
RegExp's power to grep a bit more than your text:



grep -o ".{0,50}WHAT_I_M_SEARCHING.{0,50}" ./filepath


Of course, if you use color highlighting, you can always grep again to
only color the real match:



grep -o ".{0,50}WHAT_I_M_SEARCHING.{0,50}"  ./filepath | grep "WHAT_I_M_SEARCHING"



As another alternative, I'd suggest folding the text and then grepping it, for example:



fold -sw 80 input.txt | grep ...


The -s option will make fold push words to the next line instead of breaking in between.



Or use some other way to split the input in lines based on the structure of your input. (The SU post, for example, dealt with JSON, so using jq etc. to pretty-print and grep ... or just using jq to do the filtering by itself ... would be better than either of the two alternatives given above.)






This GNU awk method might be faster:



gawk -v n=50 -v RS='MyClassName' '
FNR > 1 { printf "%s: %s
",FILENAME, p prt substr($0, 0, n)}
{p = substr($0, length - n); prt = RT}
' input.txt



  • Tell awk to split records on the pattern we're interested in (-v RS=...), and the number of characters in context (-v n=...)

  • Each record after the first record (FNR > 1) is one where awk found a match for the pattern.

  • So we print n trailing characters from the previous line (p) and n leading characters from the current line (substr($0, 0, n)), along with the matched text for the previous line (which is prt)


    • we set p and prt after printing, so the value we set is used by the next line

    • RT is a GNUism, that's why this is GNU awk-specific.




For recursive search, maybe:



find . -type f -exec gawk -v n=50 -v RS='MyClassName' 'FNR>1{printf "%s: %s
",FILENAME, p prt substr($0, 0, n)} {p = substr($0, length-n); prt = RT}' {} +

[#8624] Tuesday, June 28, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
ndaavi

Total Points: 169
Total Questions: 116
Total Answers: 113

Location: Falkland Islands
Member since Wed, Dec 23, 2020
3 Years ago
;