(If you want to screenshot first, and want to rotate the screen: xrandr -o left (and use xrandr -o normal to return to regular view).)
Open Terminal in the folder with all the images and
convert *.png -shave 90x280 shaved*.png
(If you want to screenshot first, and want to rotate the screen: xrandr -o left (and use xrandr -o normal to return to regular view).)
Open Terminal in the folder with all the images and
convert *.png -shave 90x280 shaved*.png
Uses qpdf, imagemagick, ocrmypdf, and pdftotext.
0 Rename pdf to orig.pdf
1 Extract the paragraph from the pdf
2 Create vr.pdf (VeryReadable)
3 Create text layer on vr.pdf as a new pdf
4 Convert that new pdf to .txt
5 Delete everthing except the new txt file and the new layered pdf * rm output* && rm final* && rm vr.pdf && rm just-the-chapters.pdf
All at once:
qpdf orig.pdf --pages . 318-419 -- just-the-chapters.pdf && convert -density 288 just-the-chapters.pdf output-%02d.jpg && convert output*.jpg -level 25% final-%02d.jpg && convert final*.jpg vr.pdf && ocrmypdf -l spa vr.pdf bookonech9-10.pdf && pdftotext -layout bookonech9-10.pdf bookonech9-10.txt && rm output* && rm final* && rm vr.pdf && rm just-the-chapters.pdf
It relies on tesseract for its OCR (https://ocrmypdf.readthedocs.io/en/latest/languages.html), so you need tesseract's language packs (http://tttthis.com/blog/convert-image-to-text-tesseract-ocr) to do other languages.
"OCRmyPDF that will add a text layer to a scanned PDF making it searchable"
FOSS
sudo apt-get install ocrmypdf
ocrmypdf input.pdf output.pdf
SPANISH (characters, otherwise it won't be able to copy-paste ¿)
ocrmypdf -l spa input.pdf output-spa.pdf