Thought Leadership  •  November 15, 2022

Start the Conversation

Honeypot Field to Catch Bots
Honeypot Field to Catch Bots

What is Dark Data? How to Manage and Protect It

The term dark data doesn't refer to data that's dark or harmful — in fact, it’s quite the opposite. This term reflects data that organizations collect but do not use, such as data they're required to store for compliance purposes. While dark data is benign and not harmful, there are reasons to be concerned about it.

Read on to learn about the risks and how to protect dark data, plus how to find and identify this data within your organization.

What Is Dark Data?

As mentioned, dark data is data that is collected but not used or analyzed. Put simply, it's all the other data your business collects alongside the data you actually use for decision-making purposes.

Businesses have far more data than they could ever use or analyze. It costs money to store this data. While sometimes this data is required to be stored for compliance reasons, other times dark data does not need to be maintained. Holding onto needed dark data actually costs the business money in data storage. Identifying dark data that can be disposed of allows the business to conserve resources.

By now you might be wondering how much of a business's data is considered dark data. Studies suggest that the typical rate is around 50%. However, some reports have suggested that as much as 90% of a business's data is dark data!

Types of Dark Data

From a discoverability standpoint, dark data may be structured, unstructured or semi-structured.

As a reminder, structured data refers to data that was formatted before being stored. Think of a financial statement or customer profile where fields like address, credit card or bank account information are stored in clearly defined fields.

Unstructured data refers to data that has not been processed. This data is stored in its native format, such as email, chat log or PowerPoint presentation.

Semi-structured data is unstructured data that contains some metadata — some defined data fields. It isn't as discoverable as structured data, but it can be searched or catalogued.

Dark Data Examples

While companies make use of different kinds of data, let's look at some typical examples of data that falls in the dark category:

  • Download attachments: Downloads often clutter employee computers and can be safely cleaned on a regular basis.
  • Emails: Similar to downloads, emails often stay on company servers long after they're needed. Pruning email archives is wise.
  • Call records, service tickets, etc: Customer-facing businesses often have call records, tickets and other service records, which do not need to be kept in perpetuity.
  • HR records on previous employees: The EEOC or Equal Employment Opportunity Commission requires that employers keep records on previous employees for one year after they leave. However, many companies have extensive data on previous employees that serves no purpose and no longer needs to be kept.
  • Presentations, spreadsheets and reports: While many reports need to be kept for compliance purposes, there are others that no longer need to be kept. Knowing the difference is key to digital records management.
  • Financial data: Some financial data must be kept for compliance purposes, but companies often have old or outdated financial statements stored on servers. These pose a risk if discovered by hackers.

The Risks of Dark Data


Dark data is often out of sight and out of mind for companies. Unfortunately, the same can't be said for hackers. Dark data reservoirs are often attractive to hackers, who believe they can get away with a data breach since the source data is neglected.

If dark data were exposed during a leak, it could have negative consequences for the business.

Consider a business that is preparing for an IPO, or initial public offering. Leading up to the IPO, any news about the business could have outsized implications for the final share price. If financial data were exposed to the public as the result of a dark data leak, the deal could be jeopardized.

In-house IT teams are stretched managing unprecedented amounts of data. Bad actors are out there searching for vulnerabilities. It's not a matter of if there's a security breach, but when.


Not knowing the data is there does not exempt a business from compliance with financial reporting regulations around data management, privacy protection and storage. Nor does it absolve the business from addressing Data Subject Access Requests, or DSARs, in a timely manner.

Dark data AI initiatives help a business map our data resources, understand what compliance measures need to be applied and quickly take the right action to protect stored data.

Regulators are issuing fines to companies that are not in compliance, adding a measure of urgency to the quest to protect dark data.


Dark data costs your business money. It also negatively impacts ESG reporting on sustainability initiatives. Not only is the business paying for data storage and resources that are not needed, it has a higher carbon footprint than needed. By addressing dark data now, businesses can reduce both risk and resources.

Put another way, deleting unneeded dark data can be an easy ESG win for your business.

Protecting Dark Data

Protect your financial data with AI and dark data machine learning.

AI-powered software enables companies to scan millions of pages quickly. Pattern detection capabilities allow the software to not only classify types of dark data but to pinpoint sensitive information, such as phone numbers, names, or Social Security numbers. These tools can even locate the unstructured dark data hiding in plain sight.

After detection, software applies data redaction using pattern recognition capabilities. Redacting information preserves privacy and ensures regulatory compliance. When full-scale redaction isn't desired, software can temporarily redact information during transfer then un-redact it after the fact.

Finding & Utilizing Dark Data

Before you can implement these dark data solutions, you have to discover the dark data.

The first step in the discovery process is identification. Machine learning can perform the heavy lifting that would have taken employees hundreds of hours.

Once the software digests data from all sources and extracts it into an asset catalog, the dark data can be worked with. The next step after discovery is classification, so you understand what you have with dark data analytics.

Could this data be harmful if leaked? Is it necessary (for compliance, for example) to keep this? Could it be useful to your business to dig into the data? These answers will guide decision-making to reduce risk, increase compliance and shed what is no longer needed.

Machine learning and AI can assist with categorizing dark data by performing analysis on the data that may contain valuable insight. Machine learning can also automate the redaction of sensitive information so that stored data complies with privacy regulations.

Get Serious About Dark Data Protection

While the amount of dark data the typical business has is significant, it is easier than imagined to protect dark data and reduce risk when you use the right tools.

At DFIN, we provide tools to help businesses locate dark data, protect information and bring control to this vulnerable area. On average, clients who adopt our solution realize time savings of 93% and $190,000 cost savings per year. Explore how our data protection solutions have helped companies locate, secure and control their data.