Indexing Large Amounts of Data
On a contemporary Mac, FoxTrot can easily index dozens of gigabytes of textual data (and much larger volumes of mixed data types). There are no hard limits as to how much data Foxtrot can handle. There is no magic number either, as things depend on a variety of factors such as:
- version of FoxTrot used (version 7 is considerably faster)
- hardware resources
- CPU speed
- number of CPU cores
- amount of RAM available
- SSD, Fusion Drive, or hard drive
- network speed, when the indexed data is on a network drive
- nature of the indexed data
- number of files
- size of files
- proportion of indexable text (over graphics and other data) in indexed documents
- type of files
- nature of text (e.g. literary text versus collections of numerical data)
In addition, carefully organizing your indexed data and your indices can greatly improve FoxTrot's performances.
Store your index files on an SSD
Indices are stored, by default, in the Library folder of your home folder, which is typically on your Mac's internal storage. FoxTrot Pro lets you change the location of an index.
Large index files (e.g. > 1 GB) will be much faster when stored on an SSD drive, rather than a hard disk. Also, avoid storing your index files on a network drive, especially when using Wi-Fi.
Use adequate hardware
FoxTrot will probably not benefit from huge amounts of RAM (e.g. 32 GB or more) nor from massive multi-core CPUs (e.g. 12 cores or more), however, if you want to index gigabytes of textual data, a reasonably powerful Mac will help (e.g. 4 to 8 CPU cores, 8 to 16 GB of RAM, large enough SSD drive, fast and reliable network if the data you index is stored on a NAS).
Create multiple indices
FoxTrot Pro lets you create multiple indices, instead of indexing all your data in a single index. This has multiple benefits:
- you can easily focus your searches on one (or multiple) index
- if parts of your indexed data is frequently modified (e.g. current projects), and other parts only occasionally (e.g. archives or reference data), you may use a daily scheduled update for one small index, and only update your larger indices manually.
- updating multiple indices simultaneously may make better usage of multi-core CPU (although version 7 already benefits from multi-core CPU when updating a single index)
Only index data you really need to search
Temptation may be great to index your whole drive, or your whole NAS; however it typically contains lots of data you really don't need to index, such as applications and their internal data, and system files. If you really want to index all, create a secondary index for this, and keep your main index focused on your most important, relevant or personal data.
Avoid large non-linguistic textual data
A FoxTrot index contains the list of “words” contained in your data, a “word” (or a “textual entity”, to use a term with no linguistic meaning) being a string of adjacent alphabetical or numerical characters. If you index large amounts of non-linguistic textual data (e.g. numerical data from spreadsheets or .csv files, hexadecimal, base64, encoded, encrypted, XML or JSON data, log files, source code, database dumps…), the index will contain millions of unique “words”, and this will severely degrade FoxTrot's performance.
This non-linguistic textual data may also come from incorrectly parsed file formats (including some PDF documents; see below). To check how a specific file has been parsed and indexed, search for it by filename, then option-click it in the search results list. This will show the plain text data that has been indexed.
Be careful and avoid indexing large amounts (e.g. > 100 MB) of such data; if you really need to, use a dedicated index for this, or split the data to multiple indices.
Use the resource hogs button (in the indexed data pane of the manage indices window) to check if you have such files amongst your indexed data. You can remove these files from your index in multiple ways:
- blacklist some files individually from the resource hogs window
- uncheck the corresponding file type, in the index contents of files pane
- add the folder containing them to the skipping these subfolders pane
- move these files outside of your indexed folders
- change the filename extension of these files
Make sure your data files are correctly parsed
If some files have an incorrect filename extension (e.g. .txt), or if some third-party Spotlight metadata importers are behaving incorrectly, some of your files may be parsed as plain text, although they contain some binary or encoded data. See avoid large non-linguistic textual data above to identify these files and fix the problem.
If these files are handled by a third-party Spotlight metadata importer, you may also disable the use of this specific importer in FoxTrot. To do so:
- quit FoxTrot
- relaunch it while pressing the command and option keys
- enable the manage third-party metadata importers checkbox
- disable the misbehaving importer
You will need to rebuild your index for this change to take effect.
Beware of certain types of PDF documents
Some PDF documents are not correctly imported, leading to non-linguistic textual data being indexed. See avoid large non-linguistic textual data above, and also PDF Importer Problems.