Why Data Ownership Matters in the Age of AI

More data means more accurate models, but some companies don’t want to share.

Sept. 14, 2018

7 min read

Fig. 1 — In early platforms the owner of the data what informally considered to be owned by what produced it. As platforms start integrating AI and machine learning, this data could improve an industry as a whole, and protections and true ownership needs to be established.

Global society is entering a new era of data ownership, and general data protection regulation (GDPR), Europe’s stringent new privacy law, is just the beginning. As the Internet of Things (IoT) continues to proliferate, industrial enterprises in particular must consider how data ownership impacts not only privacy, but their own machine-learning initiatives.

While current models of data ownership inhibit the potential of machine learning, there is hope on the horizon—if the industry can create new conditions for safe, secure, and even proprietary data sharing.

Current Industrial Models Limit and Usurp Data

Industrials benefit from big data and machine learning in three primary ways. The first is predictive maintenance (PdM): applying machine-learning models to predict when and how equipment might malfunction so that it can be maintained before it breaks. The second, emerging approach is Equipment-as-a-Service (EaaS): the ability to pay for actual asset uptime, rather than spending capital to own, manage, and maintain equipment like pumps or compressors.

The third benefit comes from using data to innovate new machine-learning-driven business models that will aid business in unforeseen ways. For example, a shipping business could apply sensor data to a machine-learning model that predicts which shipping lanes would be most efficient during certain weather patterns.

From a data-ownership standpoint, the trouble starts at the exact same spot as the opportunity. EaaS and PdM are experiencing a strong wave of interest and adoption from industry. Yet in its current form, these also create data-ownership asymmetries.

The companies renting the machines see operational data, which basically says that the equipment is working. End users, in the best case, will only see data for a small set of equipment; in the worst case, they may see only minimal raw sensor data rather than any performance insights from that data collected.

End users thus receive partial or incomplete data, which inhibits their own machine learning and AI models from ingesting the wide and varied body of data needed for accurate predictive (PdM) models. Additionally, new regulations, such as GDPR, may enforce more complex data governance, making it difficult to remain in compliance when only a portion of data is available.

For these reasons, as well as competitive concerns and long-standing habits, many companies choose to keep all of their industrial data proprietary. They retain control of data and can govern how that data is used. It’s more secure and typically makes compliance easier. It also means, however, that companies work with a smaller body of data. AI and machine learning capabilities will be limited—and those limitations will hinder the innovation of new predictive models, potentially resulting in a competitive disadvantage.

Www Machinedesign Com Sites Machinedesign com Files Do 2 0

AI platforms might span industries and provide help in ways we don’t even know. For example, it might be possible to get more accurate preventative maintenance on a motor even if it is used in different equipment.

How Can This Conundrum be Addressed?

First, let’s dive into why data is (and is not) protected, and how it came to be that way. The early days of the internet led to everything being sharable, and often shared, especially in consumer-facing businesses; this was a sharp contrast to the days when data was all kept in-house. Then came SaaS models, where providers kept data on public clouds and mixed data ownership on private clouds. This led to the arrival of big data, leading to two new economic models emerging: mixed data ownership that shares everything, as observed with ad tech platforms, and a proprietary model where businesses and providers hold data on lockdown.

Both models have their own set of opportunities and costs to weigh. As mentioned above, keeping data proprietary protects it from compliance and security risks. Storing and processing data requires in-house machines and manpower, however, and the data lake stays small, generating relatively less predictive results and hindering further AI innovation.

In contrast, data-ownership models that entail sharing everything result in better, more accurate insights. It is easier to identify false positives, cross-correlate, and find redundancies within the large body of public data. This model also puts less strain on company resources to process data, as access to all tools and expertise is available in the cloud.

Also, there is a huge and ever-growing body of data for AI to ingest, which makes AI smarter and faster. The more data a business can “see,” the better its data models will be. For example, if one Tesla Model S experiences a drive shaft problem, machine learning will derive a problem based on the intelligence gleaned from all other Model S vehicles. On the flipside, data sharing also means not having full control over data governance, as well as raising compliance and security risks.

Www Machinedesign Com Sites Machinedesign com Files Do 3 0

More data means more accurate simulations.

Conceptualizing Data for the Future

In order to progress, industrials must seriously consider a third way: sharing data for the sole purpose of feeding AI models while retaining ownership of the data itself. Businesses would give providers rights, under something like an NDA for machine learning and AI, to access data in order to feed machine-learning models. The data itself would remain proprietary, but the models that use it would be openly available.

In theory, sharing models would provide an open-source knowledge base for every industry—with no need to share the data, just the model. Such models could be applied to asset-heavy industries such as oil and shipping. The whole industry could benefit from the learnings associated with each compressor issue in every global plant, for instance.

With ledgers and decentralized tokens becoming more widely available through technologies such as blockchain and IOTA, data model exchanges could document and explain all machine-to-machine and machine-to-human communications. In other words, there would be clear sharing, tracking, and tracing of data and models. (Who provided it? What was the quality? How was it used? How was it transformed?)

Proprietary data would also remain under NDA, but AI models derived from data would be available to providers. Everyone would benefit from everyone else’s experience, but the data stream could still be controlled, and compliance risk eliminated. The models themselves could be shared on an open exchange—sort of like an app store for data models.

While this is one path forward, it’s certainly not the only one. The bottom line is that data ownership is an issue, and the issue can only resolve with more innovation. We live in a golden era of technology, and a solution will come. The bigger question is, will it be the best one for everyone involved?

Tor Jakob Ramsøy is founder & CEO of Arundo Analytics.