Microsoft has made significant strides toward enabling easy searching and collecting of relevant data for eDiscovery purposes. Being able to search mailboxes and SharePoint/OneDrive sites from one straightforward search form – to set up litigation holds, conduct internal investigations, or prepare to make a document production in litigation – is certainly convenient. However, there are limitations that need to be anticipated when creating an eDiscovery collection plan involving Office 365 data.

When using keywords to search for relevant data within a custodian’s email or SharePoint/OneDrive accounts, there are several factors that require consideration. First, in both email and SharePoint/OneDrive searching, there is a limit of 20 terms per query. If the search requires more than 20 terms, they will need to be split into groups of 20 or less and run independently. This creates substantial work if the list of search terms is considerably larger than 20 terms. While there are no limits to the number of mailboxes or sites that can be searched within the same search, only results from the top 1,000 (the 1,000 mailboxes with the most hits) will be returned. If results from more than 1,000 mailboxes are expected, then the list will need to be grouped and run as separate searches, as with the keywords.

Next, Office 365 does not fully index all file types and items for searching purposes. For example, non-Microsoft file types, such as .bmp (Bitmap images) and .mp3, and some emails, such as those with very large attachments, are not fully indexed. Likewise, non-searchable PDFs and encrypted files are not indexed. These files can be exported along with the indexed results, however, only if the option to include them is selected at the exporting stage.

Another consideration when using keywords to search the data is that some fields may not be indexed. By default, most fields that are commonly searched for Discovery purposes are indexed in Office 365. However, before beginning a project, confirm with the administrator that the default fields (called “Crawled Properties”) are in fact included. Additional fields may be included and indexed by using “Managed Properties” [1].

Notably and surprisingly, the file paths for documents are not included in a keyword search. So, for example, when searching for documents containing the term “Acme,” the results will contain all folders that include that term, but not the contents of the folders. Therefore, if a folder named “Acme” is in a user’s SharePoint site, results from within that folder will only be included if they also contain the keyword independently.

There is a way to work around this, but it requires a few extra steps. Microsoft has provided a PowerShell script that can be run to return a listing of all folders in a site [2]. This script is run per user, so if a list of all folders for all users is needed, they will need to be run individually, or modify the script accordingly. This script only returns folder information in a user’s active mailbox, not any archived folder information. The script provides a list containing “Folderid” information for mailboxes and “DocumentLink” information for sites.

Keywords or search terms can then be used to search folder contents using “Folderid” or “DocumentLink.” Note that the script returns “DocumentLink” information, rather than the “Path” property information. Although both properties can be used to view the contents of a folder within a PowerShell script, the “Path” property in Office 365 cannot be used to locate or export media files (e.g., audio and video files such as .wav, or .mpeg).

For many reasons, Office 365 can be a very useful tool in eDiscovery collection. However, each matter requires a consideration of its unique requirements and features in order to ensure a defensible collection and to search for potentially relevant data. Understanding the nuances and limitations of Office 365 will help to safeguard the results and provide a defensible implementation.