It's admission time in India and I am trying my best to get the best college I can for engg.
I have got a pdf file which contains a table which looks like
It contains about 2500+ entries and I have 3 days time.
So to do some smart work sorting out the right colleges for me, I need to match the contents to multiple regexp, like
- Should contain either of the words "computer" or "information"
- should contain both GE and FALSE
- Should match the regexp
[0-9]{5,}
I first tried opening it in libreoffice calc
but it opens in libreoffice Draw
. I tried pdftohtml
and pdftotext
but both mess it badly.
Finally I came at pdfgrep
, but it does not work in combination with grep as,
pdfgrep regexp1 ./locn to file|grep regexp2|grep regexp3
gives error
Binary file (standard input) matches
So whatever I have to do is with a single regexp to be put in pdfgrep, which will match all regexp's that I need.
EDIT: You can download the pdf here.