Blog: Example

Convert pdf to text (several-step process)

Uses qpdf, imagemagick, ocrmypdf, and pdftotext.


0 Rename pdf to orig.pdf

1 Extract the paragraph from the pdf

  • qpdf orig.pdf --pages . 100-110 -- just-the-chapter.pdf

2 Create vr.pdf (VeryReadable)

  • convert -density 288 just-the-chapter.pdf output-%02d.jpg
  • convert output*.jpg -level 25% final-%02d.jpg
  • convert final*.jpg vr.pdf

3 Create text layer on vr.pdf as a new pdf

  • ocrmypdf -l spa vr.pdf vr-layer-spa.pdf

4 Convert that new pdf to .txt

  • pdftotext -layout vr-layer-spa.pdf chapter-spa.txt

5 Delete everthing except the new txt file and the new layered pdf * rm output* && rm final* && rm vr.pdf && rm just-the-chapters.pdf

All at once:

qpdf orig.pdf --pages . 318-419 -- just-the-chapters.pdf && convert -density 288 just-the-chapters.pdf output-%02d.jpg && convert output*.jpg -level 25% final-%02d.jpg && convert final*.jpg vr.pdf && ocrmypdf -l spa vr.pdf bookonech9-10.pdf && pdftotext -layout bookonech9-10.pdf bookonech9-10.txt && rm output* && rm final* && rm vr.pdf && rm just-the-chapters.pdf

TTTThis

Convert pdf to text (OCRmyPDF)

It relies on tesseract for its OCR (https://ocrmypdf.readthedocs.io/en/latest/languages.html), so you need tesseract's language packs (http://tttthis.com/blog/convert-image-to-text-tesseract-ocr) to do other languages.

"OCRmyPDF that will add a text layer to a scanned PDF making it searchable"

FOSS

sudo apt-get install ocrmypdf

ocrmypdf input.pdf output.pdf

SPANISH (characters, otherwise it won't be able to copy-paste ¿)

ocrmypdf -l spa input.pdf output-spa.pdf


TTTThis

Colorizing black and white photos in (Gnu Image)

  • Make a gradient of dark blue and light blue
  • Colors > Map > Gradient
  • (NOTE you can also do this by Windows > Dockable dialogs > Palette, and drag over some colors dark to light, then do Map > Palette, which gives you a more rangy color effect)
  • (NOTE or you can do Colors > Colorize) THEN
  • r-click blue layer > Add layer mask > Black full transparency
  • Now you can use any tool (pen, airbrush, square and fill) and create that color

TTTThis

Speech to text (and mp3?) (nerd-dictation)

https://news.ycombinator.com/item?id=29972579


TTTThis

Create a photo montage (imageMagick)

montage -geometry +0+0 -tile 10x *.jpg result.jpg

creates a montage of all the jpgs in the folder, 10 images wide (you can't specify by how many height.

https://stackoverflow.com/questions/37709879/how-to-generate-a-collage-image-like-shown


TTTThis