How can I turn photos of paper documents into a scanned document? is related, but not the same, as I'm talking about pdf files. The processing of images seems complicated in the answers under the linked question, especially because it involves processing each image separately: given my pdf has hundreds of pages, the solution I expect is not that of processing/editing images, but simply of scanning digital photos and documents the way real ones are. I mean something like a "virtual scanner" for which the input would be a photo-based pdf or collection of photos and the output a "normal" scanned document. (Also the Scantailor tool recommended - also here - seems to lack a Linux version now.)
This is not about OCR and not about converting image to text.
To clarify what I mean I will post a few examples.
There are pdf files based on text, not image, and they are text files (let's say docx or odt) exported to pdf. They look ready to be printed:
The above is not what I discuss here.
What I'm interested in are the pdfs in the images below, namely the difference between scanned text pages that look too much like images and scanned text pages that look like digitized text.
The first are formed of images that look like pictures taken of book pages:
or
Such copies can hardly be re-printed on paper, as the background will be printed too.
The second ones are what one would expect from scanned text, and can be printed:
or
The picture-like pdf may already be OCR-processed and its text searchable, and still look like a collection of (page) photos: OCR is not the problem here.
What I want is the clear black-on-white look of the "scanned" pdf and the removal of all the "real" details (especially shadows) that are normal in a photo but should be absent in a printed page.
As @vanadium noticed in a comment, I am looking for a software solution that automatically cleans up pictures of a document, much alike Google Scan on a smartphone.
As @user535733 said in a comment, the problem here seems to be, at least to some extent, that of converting the greyscale (scanned/image) text to black-and-white.