install qpdf (FOSS)
qpdf originalDoc.pdf --pages . 1-10 -- outputDoc.pdf
For multiple sets of pages
qpdf --pages . 1-8 . 53-70 -- input.pdf output.pdf
install qpdf (FOSS)
qpdf originalDoc.pdf --pages . 1-10 -- outputDoc.pdf
For multiple sets of pages
qpdf --pages . 1-8 . 53-70 -- input.pdf output.pdf
(There is a way described for Imagemagick but it didn't work for me https://linuxhint.com/convert-image-to-pdf-command-line/ )
sudo apt-get install img2pdf
Open a Terminal in the folder with the images and do
sudo img2pdf *.png -o output_imgs.pdf
(assuming the images are pngs.)
sudo apt install tesseract-ocr
or (although I don't think this is necessary)
sudo apt install tesseract-ocr libtesseract-dev tesseract-ocr-eng
Do an example. Name your file existingimage.png and open a Terminal in that folder and do
tesseract -l eng existingimage.png output_from_ocr cat documenttocreate.txt
(where -l specifies a language. To see all the languages, do man tesseract)
OCR means Optical Character Recognition
Convert image to pdf (not to txt)
tesseract -l eng input_for_ocr.png output_from_ocr pdf
Errors because 'Tesseract couldn't load any languages!': https://github.com/tesseract-ocr/tesseract/issues/1309
Spanish: download from here https://github.com/tesseract-ocr/tessdata/blob/c2b2e0df86272ce11be323f23f96cf656565ed41/spa.traineddata
put it here /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata (You will have to open that folder as root)
Or maybe you can just use: sudo apt-get install tesseract-ocr-spa (although it might not save it to the location you want)
NOTE: After you install a language (or even if you don't) you might over-save the same file, and see an error message, but it's working anyway.
Pdf To Text doesn't support batch conversion. You have to do it using “Bash for loop” to convert a whole folder full of pdfs.
for file in *.pdf; do pdftotext -layout "$file"; done
To convert all pdfs in that folder to files (I haven't tested this)
https://askubuntu.com/questions/150100/extracting-embedded-images-from-a-pdf/1187844#1187844
Check you have it (often pre-installed) with
pdftoppm -v
Example you can try without overwriting anything. Open a Terminal in the folder and do (you can, if you want, change your pdf name to imgtesttt.pdf and just copy-paste these 2 commands).
mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 imgtesttt.pdf images/pg
(This is the highest quality jpeg available, although you can set from 0 to 100. Jpegs will be around .2 to 2mb with 8.5x11" pages.
or for png
mkdir -p images && pdftoppm -png -r 300 imgtesttt.pdf images/newimagename
(Note that this does two things. First it creates a folder called 'images', so to create a folder called 'book' you need to change that as well as the latter part of the command)
To do several pdfs that are in the same folder
mkdir -p XXXXX && pdftoppm -png -r 300 XXXXX.pdf XXXXX/XXXXX
mkdir -p YYYYY && pdftoppm -png -r 300 YYYYY.pdf YYYYY/YYYYY
A more simple command
mkdir -p images && pdftoppm imgtesttt.pdf images/pg
(where it creates a folder in that folder called ‘images’ and makes a .ppm image file of every page. Where 300 is 300dpi (default is 150dpi if you don't specify)
Note: you can make:
A tiff example:
mkdir -p images && pdftoppm -tiff -r 300 mypdf.pdf images/pg (300dpi, where each image takes 15-45 seconds. Single core process, so not any faster on faster machines)
A simpler jpeg example:
mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg
I just did png this way:
mkdir -p images && pdftoppm -png -r 300 mypdf.pdf images/pg
image shows jpeg 300dpi and png 300dpi at 100% and 200%