Add OCR capability so pdf's including image based text can be processed usefully
At present it's text and pictures only. Provided aligment of these is simple, it's very accurate. However, a large group of pdfs cannot presently be usefully processed by it.