Indexing Large Amounts of Data

On a contemporary Mac, FoxTrot can easily index dozens of gigabytes of textual data (and much larger volumes of mixed data types). There are no hard limits as to how much data Foxtrot can handle. There is no magic number either, as things depend on a variety of factors such as:

In addition, carefully organizing your indexed data and your indices can greatly improve FoxTrot's performances.


Store your index files on an SSD

Indices are stored, by default, in the Library folder of your home folder, which is typically on your Mac's internal storage. FoxTrot Pro lets you change the location of an index.

Large index files (e.g. > 1 GB) will be much faster when stored on an SSD drive, rather than a hard disk. Also, avoid storing your index files on a network drive, especially when using Wi-Fi.


Use adequate hardware

FoxTrot will probably not benefit from huge amounts of RAM (e.g. 32 GB or more) nor from massive multi-core CPUs (e.g. 12 cores or more), however, if you want to index gigabytes of textual data, a reasonably powerful Mac will help (e.g. 4 to 8 CPU cores, 8 to 16 GB of RAM, large enough SSD drive, fast and reliable network if the data you index is stored on a NAS).


Create multiple indices

FoxTrot Pro lets you create multiple indices, instead of indexing all your data in a single index. This has multiple benefits:


Only index data you really need to search

Temptation may be great to index your whole drive, or your whole NAS; however it typically contains lots of data you really don't need to index, such as applications and their internal data, and system files. If you really want to index all, create a secondary index for this, and keep your main index focused on your most important, relevant or personal data.


Avoid large non-linguistic textual data

A FoxTrot index contains the list of "words" contained in your data, a "word" being a string of adjacent alphabetical or numerical characters. If you index large amounts of non-linguistic textual data (e.g. numerical data from spreadsheets or .csv files, hexadecimal, base64, encoded, encrypted, XML or JSON data, log files, source code, database dumps…), the index will contain millions of unique "words", and this will severely degrade FoxTrot's performance.

This non-linguistic textual data may also come from incorrectly parsed file formats (including some PDF documents; see below). To check how a specific file has been parsed and indexed, search for it by filename, then option-click it in the search results list. This will show the plain text data that has been indexed.

Be careful and avoid indexing large amounts (e.g. > 100 MB) of such data; if you really need to, use a dedicated index for this, or split the data to multiple indices.

Use the resource hogs button (in the indexed data pane of the manage indices window) to check if you have such files amongst your indexed data. You can remove these files from your index in multiple ways:


Make sure your data files are correctly parsed

If some files have an incorrect filename extension (e.g. .txt), or if some third-party Spotlight metadata importers are behaving incorrectly, some of your files may be parsed as plain text, although they contain some binary or encoded data. See avoid large non-linguistic textual data above to identify these files and fix the problem.

If these files are handled by a third-party Spotlight metadata importer, you may also disable the use of this specific importer in FoxTrot. To do so:

You will need to rebuild your index for this change to take effect.


Beware of certain types of PDF documents

Some PDF documents are not correctly imported, leading to non-linguistic textual data being indexed. See avoid large non-linguistic textual data above, and also PDF Importer Problems.