What is Dark Data? The Upside Down of the Data Revolution

Posted on

Even the most superficial of astronomers and astrophysicists has heard the term “dark matter” when referring to outer space. A form of mass, it comprises 80% of the universe, but paradoxically, humans cannot directly observe it. It is the “apps running in the background” of astrophysics: it explains and causes a lot to happen, but we don’t see it on screen.

That dark matter term resonated in the industrial and business worlds, which have their own universe comprised of data instead of mass. All the structures, services and goods that are manufactured, consumed and traded depend on information assets. Not all of those get factored into analysis and decision making, however, because up until now, it has been indecipherable, especially when produced on a large scale.

What is Dark Data?

Imagine the data universe as an iceberg. Companies moving along can see huge chunks of information floating out there. But, underneath the surface of the business waters, those chunks become behemoths of data that have never been seen or used before. Gartner coined the term “dark data” to refer to those information assets under the water missed in traditional data methodology.

Also referred to as “unstructured data,” it may be collected as a byproduct of other needs, saved for liability coverage and neglected or simply missed, therefore going unprocessed or harnessed. A classic example of dark data is e-mails: if employees don’t have deletion protocols or automatic inbox cleanings, their e-mail servers store everything with no guarantee of security. Within those e-mails could be useful information — like addresses and demographics — but potentially holding sensitive information improperly secured and open to theft.

Another example of dark data would be video footage of a business, like a grocery store. This footage would be considered “dark data” because it’s already being collected – for security and monitoring purposes – but it could be processed using AIs or other analytic methods in new ways. An AI program could identify products on film that get picked up but returned immediately, revealing insight into customer responses to marketing or packaging.

Gartner stated that dark data, like dark matter, comprises 80% of all business interactions and scenarios happening in the world. If that percentage and IDC estimates are correct, 5.26 zettabytes of dark data will form. Dark data awareness has become more relevant with the explosion of the Internet of Things (IoT) and Industrial Internet of Things (IIoT) markets. In 2016, Gartner Group forecasted that 26 billion devices would be connected to the Internet of Things in 2020, contributing even more to this enormous base layer of information.

Is dark data bad?

Dark data is not bad, necessarily, but its potential can go one of two ways: immensely helpful or be a serious pain in a company’s bottom line. Without proper data management techniques, dark data could overwhelm storage capabilities over time. For companies that rely on instantaneous and organizable storage structures, that could be an inhibitive issue.

Another major vulnerability associated with ignored dark data is cybertheft. If cybersecurity teams only focus on what is seen, then hackers will have open access to exploit the hidden information. That scenario would be especially egregious in security and governmental operations.

How do you use dark data?

Just because data is dark now does not mean it always will be. The first step to illuminating below the surface is recognizing what constitutes data beyond traditional sources: text messages, documents, audio files and still images are all potential treasure troves. Physical data, while the most private and least accessible, could also provide helpful directions.

There are tools available to make sure dark data becomes a cash cow instead of a dead weight. Distributed data architecture, machine learning, visualization, natural language processing and other AI tools and cognitive analytics are all modern methods of harnessing dark data. Hiring IT talent with the appropriate analytical skills is another great way to keep a thumb on a data pulse.

New technology, diverse analytic tools and a capable workforce could transform these mounds of information. A great place to start would be improving marketing and updating products. Imagine identifying facial expressions of employees in a manufacturing plant from surveillance footage. A machine learning AI could determine company morale or worker health based on an image. Leaders would have the opportunity to improve workplace life.

It’s not enough to accommodate or protect dark data. Falling behind on harnessing dark data could eventually be the most expensive mistake. While some companies will rapidly change perspective and reclassify usable data, others will not. Analyzing that information sooner will give companies a competitive market edge. Slow adopters will be propelled into the iceberg or, worse, made redundant.