Intelligence is simply talking to many people

Convert image to text (tesseract-ocr)

sudo apt install tesseract-ocr

or (although I don't think this is necessary)

sudo apt install tesseract-ocr libtesseract-dev tesseract-ocr-eng

Do an example. Name your file existingimage.png and open a Terminal in that folder and do

tesseract -l eng existingimage.png output_from_ocr cat documenttocreate.txt

(where -l specifies a language. To see all the languages, do man tesseract)


OCR means Optical Character Recognition

Convert image to pdf (not to txt)

tesseract -l eng input_for_ocr.png output_from_ocr pdf


Errors because 'Tesseract couldn't load any languages!': https://github.com/tesseract-ocr/tesseract/issues/1309

Spanish: download from here https://github.com/tesseract-ocr/tessdata/blob/c2b2e0df86272ce11be323f23f96cf656565ed41/spa.traineddata

put it here /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata (You will have to open that folder as root)

Or maybe you can just use: sudo apt-get install tesseract-ocr-spa (although it might not save it to the location you want)


NOTE: After you install a language (or even if you don't) you might over-save the same file, and see an error message, but it's working anyway.

Comments: 0

Interested to discuss? Leave a comment.

Image




Your email will not be published nor shared with anyone. In your text you can use markdown for marking up *italic*, links <http://example.org> and other elements. These comments are moderated and published manually as soon as possible.