Lighten, darken, increase contrast on (text) images, for readability (ImageMagick)

Lighten those pages

  • convert output.pdf -function polynomial 1,0,0,0 darkened.pdf
  • mogrify output.pdf -contrast-stretch 2%x20% music1C.pdf
  • convert -density 600 output.pdf output-%02d.jpg

(using rm to add security permissions then remove them after

sudo mv /etc/ImageMagick-6/policy.xml /etc/ImageMagick-6/

-----When done, you can restore the original with

sudo mv /etc/ImageMagick-6/ /etc/ImageMagick-6/policy.xml

3 step process:

  • convert your_pdf_filename.pdf output-%02d.jpg
  • convert output*.jpg -level 25% final-%02d.jpg
  • convert final*.jpg very_readable.pdf

(change the level value)

With the arg -threshold you get a "black and white" (only) image. But I want to keep the gray scale, which is possible with the arg -level: you keep the gray, letting the image with a darker or lighter gray scale. (referring to something like <<< convert output*.jpg -normalize -threshold 80% final-%02d.jpg >>>


Extract some pages from a pdf (qpdf)

install qpdf (FOSS)

qpdf originalDoc.pdf --pages . 1-10 -- outputDoc.pdf

For multiple sets of pages

qpdf --pages . 1-8 . 53-70 -- input.pdf output.pdf


Convert images to pdf (img2pdf)

(There is a way described for Imagemagick but it didn't work for me )

sudo apt-get install img2pdf

Open a Terminal in the folder with the images and do

sudo img2pdf *.png -o output_imgs.pdf

(assuming the images are pngs.)


Convert image to text (tesseract-ocr)

sudo apt install tesseract-ocr

or (although I don't think this is necessary)

sudo apt install tesseract-ocr libtesseract-dev tesseract-ocr-eng

Do an example. Name your file existingimage.png and open a Terminal in that folder and do

tesseract -l eng existingimage.png output_from_ocr cat documenttocreate.txt

(where -l specifies a language. To see all the languages, do man tesseract)

OCR means Optical Character Recognition

Convert image to pdf (not to txt)

tesseract -l eng input_for_ocr.png output_from_ocr pdf

Errors because 'Tesseract couldn't load any languages!':

Spanish: download from here

put it here /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata (You will have to open that folder as root)

Or maybe you can just use: sudo apt-get install tesseract-ocr-spa (although it might not save it to the location you want)

NOTE: After you install a language (or even if you don't) you might over-save the same file, and see an error message, but it's working anyway.


Convert PDF to TXT with pdftotext (poppler-utils)

  1. sudo apt install poppler-utils
  2. open Terminal in folder
  3. pdftotext -layout pdfname.pdf documenttocreate.txt (where -layout tries to preserve the formatting of the pdf, it is an optional command)
  4. pdftotext -layout -f 1 -l 20 pdfname.pdf documenttocreate.txt (where f and l designate the first and last pages, we create a txt file out of pages 1-20 of the pdf)

Pdf To Text doesn't support batch conversion. You have to do it using “Bash for loop” to convert a whole folder full of pdfs.

for file in *.pdf; do pdftotext -layout "$file"; done

To convert all pdfs in that folder to files (I haven't tested this)