Who Owns Our Data?

October 25, 2021

Published almost a century ago, Upton Sinclair’s novel Oil! offered readers a vivid panorama of speculators’ scramble to acquire western lands and then dig for petroleum at all costs. Sinclair’s portrayal spared nothing: the trickery and deceit used to acquire land, the bribes doled out to coax favorable policies from President Harding’s Washington circle, or the insidious, symbiotic relationship between the industry and American empire. The rapacious quest for capitalist profits from oil, Sinclair decried, was “crippling the bodies of men and women, and luring the nations to destruction by visions of unearned wealth, and the opportunity to enslave and exploit labor.”

At its core, the personal data economy of the twenty-first century presents a question of power.

Today there is a new rush to extract and exploit a previously untapped asset. The financial rewards from this asset are so great that pundits have rushed to dub it a “new oil.” But rather than tapping the remains of ancient algae and zooplankton, we produce this asset through our daily jaunts across social networks such as Facebook and TikTok. We exude it when we browse the Internet, triggering electronic tracking “cookies” that advertisers have cunningly strewn across the web. It is pumped forth from our muscles as soon as we strap on a Fitbit or turn on a directional service while walking or driving to work or school. And increasingly, it will bubble up from “smart” devices laced through our homes, neighborhoods, and cities, many with the capacity to geolocate and track our actions and habits.

The slightly misleading name for this resource is “personal data.” Whether handed over intentionally or unwittingly, it captured by social media, cookies, and the internet of things captures, second-by-second now, granular details of behavior, temperament, and even thinking. It is an enormously valuable asset because it can be used to draw inferences not just about the expected future behavior of the producing subject.

Perhaps as importantly, personal data derived from many sources, and then aggregated, can be used to train machine-learning instruments. These offer more general predictions about whole demographic slices. (It is in this sense that “personal” data is a misnomer, since my data can easily enable inferences about you or someone else.) Massed in its totality, personal data becomes a commercially valuable asset. It circulates among data brokers, targeted advertisers, political campaigns, and even foreign states as the valuable fuel powering predictive interventions. As of 2019, the U.S. data brokerage industry alone was worth more than $200 billion. At the same time, that data enables new kinds of harm, such as the manipulative electioneering engaged in by Cambridge Analytica, the divisive and radicalizing effects of Facebook’s algorithm design, and the serious financial and privacy losses that accrue to individuals when a company’s data banks are breached.

At a historical moment when Congress seems unlikely to act, collective ownership by cities is a way to seize back a measure of control over data’s destiny.

As many have argued, the high stakes of this “new oil” call for radically new forms of governance. A central concern in this effort—as much as a matter of power as of equity—is the question of ownership: Who owns personal data? Our problem today lies in finding a model of data ownership that recognizes the collective interest we have in how personal data is used, that avoids the profound costs of free-wheeling private exploitation by individual firms, and that does not slip into authoritarian forms of state control.

To see why data ownership is such an urgent question, consider what Sinclair would have made of the brave new data economy of the twentieth century. No doubt he would have keenly perceived the price tag of commodifying personal data. And looking beyond it, he might have traced several uncanny parallels with the hydrocarbon economy of a century ago.

Begin with exploitation. In the personal data economy, as in Sinclair’s oil rush, the race to acquire assets leads firms to take shortcuts—sharing data despite having committed not do so, or secretly acquiring data in ways that few customers would have tolerated. One especially insidious example turns on how some smart speakers have used “lidar” to track physical movements without disclosing as much. Meanwhile, Facebook’s cookies, distributed across the web, secretly track our online movements even if we have never signed on to the social network. Consent, informed or not, takes a back seat to profit maximization.

The personal data economy is characterized by another form of exploitation that Sinclair would have recognized: American workers are increasingly subject to what perceptive scholars call “an all seeing Argus Panoptes built from technology.” This captures and quantifies their movement, attention, and efforts—all in the name of extracting as much profit from labor as possible. In the gig economy, algorithmic pricing further squeezes labor’s share of profits to a vanishing rind. Data might also displace labor entirely. There is ongoing debate as to whether the data economy will lead to significant job losses: The most sophisticated observers think this has not happened yet—but is a colorable possibility in coming years.

U.S. law today provides no clear answer to the question of who owns personal data. There is no individual right to it.

Like the oil economy of Sinclair’s California, the personal data economy further inflicts costs beyond its immediate exploitation of producers and workers. The growing concentration of economic activity in the data economy increases economic inequality at the level of the nation. Google and Facebook are unlike the industrial titans of the early twentieth century insofar because they employ far fewer people—and hence spread their wealth around far less than, say, a Ford or a General Motors. Inequality in turn places pressure on the possibility of inclusive and equitable democratic rule. And when social media firms use personal data to drive engagement with polarizing, false, and hateful content, they more specifically privilege profit over the democratic process. Even the physical environment itself can be compromised by the enormous energy demands created by the careless use of instruments such as the natural-language processing models used by Google for translation, speech recognition, and sentiment analysis.

At its core, the personal data economy of the twenty-first century presents a question of power. Exploitation, inequality, and environmental spillovers arise because of its one-to-many structure, its vast economies of scale, and the large knowledge asymmetries between those who produce and those who exploit data. Given these conditions, is it any surprise that this economy is prone to relationships of dependency and subordination? When Shoshana Zuboff condemns “surveillance capitalism” or McKenzie Wark critiques the “vectoralist” class, they are eking out a new language to capture a new kind of hierarchy. This novel order is based not on hieratic knowledge of the divine, brute force of arms or gold ducats, but on information. New ways of exercising power over us, whether by firms or states, are the yield on our personal data trails.

One important divergence between oil and personal data is the very basic matter of ownership. A landowner in Sinclair’s day had “a right to an unlimited production” from her wells under legal rules regarding real property that went back to Roman times. As a result, the California landowners Sinclair described had free reign to decide how and when to extract and then use petroleum. The property system insulated resource extraction and use from direct state control.

The information produced by social media platforms captures the flow of interactions, rather than something distinct about a single user.

U.S. law today, however, provides no clear answer to the question of who owns personal data. A few years before Oil! was published, the U.S. Supreme Court resolved a fight between two competing wire services as to whether the “facts” making up the news could be treated as property. The Court firmly rejected the idea that a private individual “who might happen to be the first to report a historic event” has “the exclusive right for any period to spread the knowledge of it.” That is, there is no such thing as a private property interest in facts under American law. Facebook’s annual revenues of $26.17 billion, therefore, derive from the processing of data that does not and cannot belong to Facebook, but that Facebook extracts and controls using a mix of contract law, trade secrets, and other legal workarounds.

Even if the law recognized individual property interests in data, it is not clear that a system of individual ownership would produce just and equitable outcomes. For one thing, it has been tried and found wanting. In 1996 information systems expert Kenneth C. Laudon proposed a system of individual entitlements to data. Several firms tried to bring his idea to market but were able to drum up insufficient interest. Asking people to manage their own data, given in the pre-Internet age, turned out to be too demanding of their time and attention. With the increased variety of personal data sources and the growing complexity and unpredictably of these uses, this problem of management has only grown exponentially ever since.

Further, as Laudon and others who have advanced the idea of individualized ownership rights since him have recognized, a useable system of such entitlements would not be directly managed by individuals. Instead, individuals would have to contract with a third party who would collect and then commodify all the resulting data. This solution merely recreates the initial difficulty, which economists call an “agency” problem: We rely now on commercial firms to act as our agents in abiding by our contract-based expectations about collection and use limits. But, as Facebook in particular constantly reminds us, we can’t trust them to act as our faithful agents. In practice, the model of individual data rights suffers from the same agency problem, just for a different set of actors. It’s not clear why we should expect these new firms to behave any differently from our old agents.

There are other reasons we should doubt that individual assignments provide a tractable model for personal data. To begin with, it is often hard to assign specific pieces of data to single individuals. The information produced by social media platforms, in particular, is often relational: it captures the flow of interactions, rather than something distinct about a single user. An individualized form of property right would require not just joint permissions for data use, which are enough easy to imagine but potentially tricky in practice if the people involved are disinclined to cooperate. Individual ownership would also require some way to slice up the value implicit in data between the parties depending on their relative contributions, which in both theory and practice will be fiendishly hard. Data from the Internet of Things, for example, will frequently have a collective rather than an individual quality: consider the information streaming from your smart thermostat about when and how a family unit engages in activity, or the data piping from a smart fridge about the aggregate consumption habits of a whole family.

Personal data captures granular details of behavior, temperament, and even thinking. Massed in its totality, it becomes a commercially valuable asset.

Another problem with individual ownership is the disconnect between the notion of slicing data into individual allotments and the manner in which personal data is used. Machine learning draws inferences about you by crunching other people’s data. Controlling only your own data does not prevent such aggregate data from being used against you.

The absence of an individual right to data opens the door to another possibility. Where an asset stands to generate tremendous wealth, but at the same time might precipitate severe social and individual harms, one effective response has been to turn to some form of collective ownership. Perhaps the most well-known example is in Norway—where oil exploitation has not had all of the same ruinous effects that Sinclair described. Collectively owned petroleum has funded a trillion-dollar capital fund owned by the Norwegian state, the Government Pension Fund Global, that helps to sponsor a robust welfare system. Oil production in Norway is subject to comparatively rigorous environmental regime. Meanwhile, the domestic economy has been tilted toward renewable energy sources.

Carefully done, in short, collective ownership of the nation’s oil reserves has been a way to manage oil resources democratically while taking fair account of their hazardous spillovers.

Collective ownership also neatly fits the particular circumstances of personal data’s creation. On this point, it is worth reconsidering the English theorist John Locke’s famous account of property as arising of a “mixing” of one’s own labor with something else (say, the product of someone else’s labor). Locke assumes a single actor, who is treating and transforming a specific object or thing. For Locke, this mixture entitles one to individual ownership. To be sure, the price of that thing then depends on how much others evaluate it, how easy it is to recreate by others, and many other considerations. But in its origin, property for Locke yields from discrete and individual labor—and hence takes a discrete and isolate form.

Unfettered state access to personal data presents grave risks to political freedoms. It is surely far better that personal data be held in private hands than at the mercy of state officials.

Personal data comes about in a different way: it is essentially a function of our collaborations and interactions. As Marion Fourcade and Daniel Kluttz have forcefully argued, social media platforms, in particular, subtly exploit a “natural compulsion to reciprocate” and the “existing solidaristic bond” between people. The conditions of production on platforms, they show, is intrinsically social in nature. It is communal, and hence distinct from the solitary and atomistic model of labor that Locke assumes. Even a classical liberal in the Lockean vein, then, ought to recognize a moral case for collective ownership on the part of those whose interactions, not discrete actions, create personal data in the first place.

Translating the Norwegian model over to the personal data context, of course, presents a number of practical challenges. Obviously, it is one thing for state officials to have direct and unmediated access to, and control over, an oil reserve. It is quite another matter for them to have freewheeling access to our personal data—as is now the case in China under a data security law passed earlier this year. Unfettered state access to personal data presents grave risks to political freedoms. However malign Facebook and Google sometimes seem, it is surely far better that personal data be held in private hands rather than at the mercy of state officials. Sometimes, you have to pick your poison.

But collective ownership need not translate into the direct state control of data. As anyone who has every rented an apartment or checked their coat at a restaurant knows, physical control and legal ownership are two distinct things. They can easily peel apart. I can own something but not have physical control of it, or, vice versa, I can have physical control of something but not own it. In particular, a model of collective data ownership can be imagined that does not involve placing data in storage facilities controlled by the state or giving the state any more access to data than it already has. Existing physical arrangements of private routers and servers do not need to be disturbed in a move to collective ownership of data.

What then might a collective ownership regime for personal data look like? Inklings of such a regime can already be glimpsed in certain cities in the United States and Europe. These are now requiring firms in the personal data economy to share their data for public uses. For example, the Spanish city of Barcelona uses a platform called “Decidem” as a vehicle for the governance of personal data. A company wishing to operate a service using personal locational data—say, a bike-sharing firm—must agree to give their data to Decidem, where its uses will be subject to public debate and decision. This mechanism allows the city to employ that data for public ends, although it doesn’t prevent the company from applying the same data to undesirable purposes. Or consider New York City. Since January 2019, it has mandated that ride-sharing companies such as Uber and Lyft disclose operational data on the date, time, and location of pickups and drop-offs, and more as a condition of operating. Again, the price of playing is making the data you gather public.

A model of collective data ownership can be imagined that does not place data in storage facilities controlled by the state or give the state more access to data.

Neither the Barcelona nor the New York regimes, however, take the further step of declaring data a collective resource in order to constrain the ways commercial actors can exploit it. But it is not hard to tease out the steps that would be required to achieve that end. For one thing, it is nothing new for a city to place a collective resource into public ownership. In U.S. law, there is a long tradition of cities and states declaring that environmental resources such as lakes, rivers, water-sources, and parks are owned by the public at large. Historically, this has been accomplished through an ancient legal device with roots in Roman and medieval law called the “public trust.”

To understand how a public trust works, consider the land abutting Chicago’s Millennium Park, just under the waterline of Lake Michigan. This land was found to be held in the “public trust” in 1892. What this meant was that the land belongs to the public at large, while the state acts as a trustee for the public, deciding how the land can be used or sold. So when Illinois’s legislature tried to broker a sweetheart deal selling the lakeshore to a railroad company, the U.S. Supreme Court stepped in and reversed that deal because it was inconsistent with the state’s trustee obligations toward the general public.

This same model can be applied to personal data by a municipal or state ordinance. A city such as New York or Barcelona, for example, might declare that any data gathered by spatial tracking or smart devices within its borders was subject to its public trust. A more ambitious approach would be to sweep in all data generated by or associated with social-media accounts of residents, as well as any data arising from internet accessed from within the jurisdiction. That data, to be sure, would remain on the servers of private firms such as Google or Facebook: no state official need have immediate access or control over it. Nevertheless, the data would, by law, be the property of the public at large, and the city would be its trustee. Firms acquiring, exploiting, or selling that data would be able to do so solely with the trustee’s permission.

Notice how radically the business model of a monopolist such as Facebook would have to change. First, it would be under a legal duty to disclose how it uses data—something that currently doesn’t exist. To be sure, firms can simply lie or mislead. But many firms are today already subject to extensive disclosure mandates as a result of environmental law, securities, and anti-fraud law and many other kinds of regulation. Those mandates are enforced via reporting rules imposed on corporate officers, by government inspections, and by outside auditors. If Goldman Sachs can turn a profit while complying with this sort of disclosure regime, there is no basis for Facebook or Google to object to something similar.

Cities might declare that any data gathered by spatial tracking or smart devices within their borders were subject to its public trust.

The data-deploying firm would also be subject to legal constraints on upstream data uses. The city would have the power to use the courts to stop firms from objectionable uses. And because the city would do so as a trustee for the public at large, its decisions could also be challenged in court: a private citizen could lodge a legal challenge to a specific arrangement or data-related affordance on the ground that it was not in the interests of the people at large. Private litigation could also target state efforts to access personal data improperly, or otherwise abuse the public trust in data.

What, more positively, might a city demand? It could not demand unfettered physical access to data: that would trigger privacy concerns. A company such as Facebook, which does not disclose much information about how it uses data, would at least be required to offer details of the ways that it planned to analyze and apply data (a good idea, even if collective ownership is not adopted). A platform firm such as Uber or Lyft, that generates profits by arbitraging between suppliers and the public, might be required to ensure that drivers obtain a certain return on their efforts. It might also be barred from using its data to create surge pricing—for example in periods of inclement weather. And its data might be used to help improve traffic flow.

Commercial deployments of personal data could be also conditioned on agreeing to share a fraction of profits with the trust. In effect, use taxes would ensure that those creating data benefitted from its transformation into a commercial asset. Rather than paying individuals for data up front, a public trust is a way for the public to recoup some return from the emotional, intellectual, and even physical labor that allowed its creation on the back end. The trust would be legally obliged to apply its funds to the general benefit of a city or state’s residents. Local labor hence would yield a fiscal foundation for local public services and amenities.

Access to data for commercial use could also be conditioned on agreement to forego certain harmful transformations. For example, a social media platform subject to the public trust regime might be required to demonstrate that its network architecture did not facilitate the dissemination of false political information or deliberately polarizing propaganda. It might also be compelled to show that it did not, even inadvertently, present different interfaces to men and women, or to different racial and ethnic groups. A company that builds smart devices capturing visual data—think of smart doorbells such as the Ring—might be prohibited from allowing their yield to be used as training data for controversial facial recognition instruments.

To be clear, this “public trust” approach to personal data would not address all concerns raised by the personal data economy. One of the most startling features of data science today is the way that predictive instruments can be trained to draw insights about out-of-sample individuals. A prediction tool trained on data from Pennsylvania or Poland, in this way, can be used to make inferences about people in Barcelona or Baltimore.

Rather than paying individuals for data, a public trust is a way for the public to recoup some return for the labor that allowed its creation on the back end.

Yet the personal data economy of today, like the oil economy Sinclair described, is still “crippling the bodies of men and women,” while “luring the nations to destruction,” at least if they call themselves democratic. At a historical moment at which the federal Congress seems singularly unlikely to act, collective ownership by cities is a way to seize back a measure of control over data’s destiny. It is a means of addressing at least some of the harms of the personal data economy. And it is simply a recognition that unlike oil, personal data was always ours to make of what we, democratically, will.

Who Owns Our Data?

Donate to support work like this:

Poor Historians

The Making of the Deportation Machine

What Are We Living Through?

Get our newsletter