Some text files I come across, have little squares with numbers in them (in place of certain characters). I am unable to copy and paste these in Ubuntu, but may search and replace in gedit
each character individually (replacing for what I think is it's best match), obviously this is only feasible if there are only a few types of square.
I'm lead to believe that these squares are displayed because I am missing certain fonts... My aim is to convert this into an ePub or PDF file.
My question is:
- What type of coding is this? And why does this happen?
- If it is missing fonts, can I install them and will this solve the problem (allow me to convert symbols to PDF e.g. using
Calibre
)? - Is there an application to convert my text file to a text file without these squares, instead replacing them with a similar character? For example, the symbol
is pretty much a
y
, so I would like this function to replace each instance ofwith a
y
.
An example txt file is here and it originally looked like this (note inaccuracies followed OCR
).
Note: I couldn't get either uni2ascii
or iconv
to work (though I may not have been using the correct [options]), so please check with the given file before posting a solution!