Saturday, April 20, 2024
 Popular · Latest · Hot · Upcoming
3
rated 0 times [  3] [ 0]  / answers: 1 / hits: 4593  / 2 Years ago, sun, january 9, 2022, 10:30:20

Some text files I come across, have little squares with numbers in them (in place of certain characters). I am unable to copy and paste these in Ubuntu, but may search and replace in gedit each character individually (replacing for what I think is it's best match), obviously this is only feasible if there are only a few types of square.



An example of several of the squares



I'm lead to believe that these squares are displayed because I am missing certain fonts... My aim is to convert this into an ePub or PDF file.



My question is:




  • What type of coding is this? And why does this happen?

  • If it is missing fonts, can I install them and will this solve the problem (allow me to convert symbols to PDF e.g. using Calibre)?

  • Is there an application to convert my text file to a text file without these squares, instead replacing them with a similar character? For example, the symbol enter image description here is pretty much a y, so I would like this function to replace each instance of enter image description here with a y.



An example txt file is here and it originally looked like this (note inaccuracies followed OCR).



Note: I couldn't get either uni2ascii or iconv to work (though I may not have been using the correct [options]), so please check with the given file before posting a solution!


More From » pdf

 Answers
1

The boxes mean "glyph not found"; the characters in the box are hexidecimal representations of the codepoint, in unicode.



There are two possibilities: the character encoding is garbled, or the font you are using doesn't have a glyph for that character. This is a great overview character encoding if you really want to understand it: http://trochee.net/2011/05/character-encoding-tutorial/



Curiously, U+001F and U+001D are really just glorified line breaks. It seems odd that OCR would return those.


[#44776] Sunday, January 9, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
velelf

Total Points: 395
Total Questions: 115
Total Answers: 107

Location: Sudan
Member since Mon, Jun 1, 2020
4 Years ago
;