Deduplication & Hashing
You can configure hashing and deduplication while setting up a new project in Forensic Email Intelligence (FEI). The configuration user interface looks as follows:

Hashing & Deduplication Options
Section titled “Hashing & Deduplication Options”FEI presents the following options regarding hashing and deduplication:
1. Calculate Forensic Hash: When this option is enabled, FEI calculates a hash based on the binary contents of the entire email. The resultant hashes are reported on the Evidence Grid in a column called Forensic Hash.
2. Calculate eDiscovery Hash: When this option is enabled, FEI calculates a hash based on a user-configurable subset of metadata fields for each email. The resultant hashes are reported on the Evidence Grid in a column called eDiscovery Hash.
3. eDiscovery Hashing Fields: The following metadata fields can be used to calculate the eDiscovery hash:
- Message-ID (required): The Message-ID identification field as defined in RFC 5322. (e.g.,
<[email protected]>)
-
Subject: Email subject.
-
Date Sent: Origination date (i.e., Sent Date) of the email.
-
Date Received: The date extracted from the latest received header.
IMPORTANT If you would like to differentiate between the sender’s and the receiver’s copy of an email, you may want to include this field in your eDiscovery hash list. The sender’s copy of a message would not have its Date Received field populated while the receiver’s copy should.
-
Participants: The sender and recipients of the email.
-
Body: Message body (text + HTML body, if available) normalized using the relaxed canonicalization algorithm as defined in RFC 6376.
-
Attachment Names: Names of the attachments of the email.
-
Attachment Hashes: Hashes calculated based on the binary contents of the email’s attachments.
4. Family Level Deduplication: This option controls whether duplicate identification should be performed at the individual email level, or at the attachment family level. FEI defaults to family-level deduplication.
Let’s consider the following example:
Directory
Email AEmail XEmail YEmail Z
Directory
Email BEmail Z
Email YDirectory
Email AEmail X
Directory
Email BEmail Z
Directory
Email AEmail XEmail YEmail Z
Directory
Email BEmail Z
Email YDirectory
Email AEmail X
Directory
Email BDuplicate of earlier Email B familyEmail ZDuplicate of earlier Email B family
Directory
Email AEmail XEmail YEmail Z
Directory
Email BEmail ZDuplicate of Email A → Email Z
Email YDuplicate of Email A → Email YDirectory
Email ADuplicate of Email AEmail XDuplicate of Email A → Email X
Directory
Email BDuplicate of Email BEmail ZDuplicate of Email A → Email Z
5. Hash Algorithm: You can choose between the MD5, SHA-1, SHA-256, and SHA-512 algorithms. FEI defaults to SHA-256.