PDF Importer Problems

Some PDF documents (e.g. some OCR-ed documents, some LaTeX-generated documents, as well as some improperly encoded documents) may appear perfectly formatted onscreen, but text can't be extracted correctly. In some cases, completely garbled data is extracted, and in other cases, the extracted text contains either extraneous spaces inside words, or missing spaces between words. In all cases, there are two consequences to this:

See avoid large non-linguistic textual data to identify these files and work around the problem.

By default FoxTrot uses Spotlight's metadata importer to extract text from PDF documents, but an alternate method is available: Xpdf. For some documents, Xpdf may give better results than Spotlight's importer, and for some other documents, this is the opposite. To change which method is used to index PDF documents:

You will need to rebuild your index for this change to take effect. You may create a small test index and compare both methods on a few documents, before rebuilding your main index.