Why Big Data Is A Big Bluff (A 3-Step Deconstruction)

With the emergence of big data, the magical thinking made a powerful comeback. Key to a utopian world for some people, it is for other ones the promise for an Orwellian future. However, both opposed sides agree on the same dogma: big data is going to take possession of reality. Yet it seems that reality is resisting. What if the mighty-power fantasies related to big data were mostly an illusion made up of dollars and myths? Here is a three-step deconstruction.

Paul Vacca
10 min readSep 17, 2017
The Precog of “Minority Report” directed by Steven Spielberg

1. The come-back of the magical thinking

In the beginning was the data. For centuries, it developed progressively and continuously on the Earth’s surface in an analogue mode. Then came the digital and the Internet. As the power of computers doubled every two years for fifty years — according to the famous Moore Law, the quantity of data too, skyrocketed exponentially. With the smartphone, the Internet of things and the quantified self, the connected objects and beings — hyperconnected — started to broadcast their own data. And the last move, act or mood became a data. The multiplication then turned into a big bang. An expansion in the total amount of data in which 90% of the data produced since the beginning of the history of mankind was produced during the last two years.

This is how data became big data.

Magical thinking, the come back

There is no doubt that the production of more data can offer us a better knowledge of our world; that it is a good thing for the progress of science and medicine; that it supplies the development of artificial intelligence or smart cities. Knowledge and science have always progressed thanks to the outbreak of new data and new crossovers.

But with big data, as if intoxicated by the torrent of data, we rapidly move from science to science-fiction. Magical thinking, the come-back. Guided by a digital animism, some have been quick to attribute infinite powers to this infinite production of data. In their eyes, the mass of data become some an influential all-might power. Some — the owners of clouds, the start-ups, the Internet giants, the NSA, etc. — to venerate it they would proliferate symposiums, conferences or on the social networks, boasting the radiant and thriving future that big data promises to worship it; others — hackers, libertarians, Luddites, etc. — would raise it as an absolute threat that the humanity should fear. That is because big data achieves the feat to gather both sides with totally opposed doxa around the same dogma: the belief in its mighty-power.

The mighty-power fantasies

Its alleged mighty-power is first omniscience. As we are now capable of having access to any data at any time, we can thus know everything. This capacity to see and know everything thanks to the collection of data is called data-panopticon. This “omniscience” on big data was even theorized in 2008 by the magazine Wired in an article entitled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”. In this article-manifesto, Chris Anderson claimed that the torrent of data would hit the scientific method with obsolescence. Indeed, why seek to understand reality with necessarily hazardous hypotheses now that reality can be delivered in its entirety thanks to data? According to him, the torrent of data allows us to access to a complete knowledge of reality without bothering with science. Big data is omniscience without science.

It is also supposed to open us the doors to prescience. Between knowing everything and foreseeing, there is just a step, merrily taken by the proponents of predictive big data, this discipline that pretends that knowing everything on a person would allow them to foresee his/her future acts on the basis of statistical models or signatures.

Finally, the last vaunt of this alleged all-mighty power of big data is power. As true as “Governing means anticipating”, foreseeing everything offers full authority. And the strong arm of this power is the algorithm, the key that allows one to solve any problem.

Utopia and dystopia are in one boat

Therefore, big data would be this all-mighty trinity of omniscience, prescience and power, for better and for worse, outlining on the one hand, a utopian future: a world where we will be able to anticipate all diseases, banish class struggle and inequalities, eradicate poverty and famine, a world where even immortality becomes conceivable (as we know, one works hard on this in the Silicon Valley); or drawing up, on the other hand, for some others, a future with dystopian threats: a world under generalized cyber surveillance, data totalitarianism, determinism… Two facets of the same fantasy.

Because, for now, in the face of the supposed all-mighty power of big data, reality seems to still resist…

2. Reality vs. Big data: 1- 0

For now, amidst its dreams of omniscience, big data finds an obstacle: reality. The example of NSA is a real textbook case. It is well known now that the American intelligence agencies collect massively data via its wiretaps and surveillance systems that it sounds out (with the datamining technique) with data analysis tools (algorithms) with the hope of detecting suspicious statistics (signatures) that will allow us to report terrorist behaviors. For what result? Keith Alexander, the NSA’s director, asserted in 2013 that his agency’s surveillance programs — after more than 10 years of massively collecting data — had helped to foil dozens of conspiracies. A few months later, he mentioned 13 events, before admitting that actually only two threats were avoided…

The panoptic illusion

According to Grégoire Chamayou, researcher at the CNRS, this catastrophic can be perfectly explained. In an article released in June 2015 in La Revue du Crieur[1], he dismounted point by point the panoptic illusion in which the NSA is immersed. He recalls the sentence of an American researcher who highlighted that “the only foreseeable thing regarding terrorist datamining is its permanent failure”. This inevitable failure lies in two major illusions. The first one is the blind faith in massive data collection (“Collect it all”) that, rather than “looking for a needle in a haystack, consists in collecting the whole haystack”, while multiplying the difficulties in analyzing all the data. The second one is the belief that there exists a “terrorist signature” — i.e. a succession of actions that would lead to an attack — that could be detected. This belief is false. There is a double risk: on the one hand, that of letting some “true” terrorist acts happen, because the principle of terrorist acts precisely consist in foiling preset schemes by developing unprecedented modus operandi. On the other hand, there is the risk of seeing them everywhere. If the scheme “a person owning a truck, driving to a sensible place and having bought ammonium nitrate” can allow the identification of a potential terrorist act, it also applies to almost all farmers in Nebraska who own a truck and buy ammonium nitrate (which is used in the production of fertilizers). In short, either the NSA doesn’t detect the terrorist attack, or it detects too many.

“The prediction is hard, especially as it concerns the future”

Google experienced the same kind of failure with its “Google Flu Trends”. This “revolutionary” application launched in 2008 allowed following the flu epidemic in real time simply thanks to online requests including “paracetamol”, “flu”, “headache” in the search bars… initially, everyone — including the prestigious American scientific magazine Nature — rightfully believed in a miracle: the results are reliable, close to those given by the CDC, the American official body for disease prevention, but faster and without needing a researchers armada. Except that the application rapidly seized up. In 2013, the media announced an epidemic risk and the online requests went haywire, which distorted the results, overestimating the risks of an epidemic. The application then became the reflection of the Internet users’ hypochondria more than of reality. A victim of an epidemic of requests, it was totally disrupted. Google closed the service last August.

Yes, even for big data “prediction is hard, especially as it concerns the future” as noticed by Groucho Marx. For now, the predictive marketing from our spread of data on the Internet particularly can be seen in retrology, this art of guessing the past by proposing for instance to discover the hotel that we booked two weeks ago or by submitting a book that we already bought or even read.

Smart data or big data: Auguste Dupin vs. Scotland Yard

As big data becomes increasingly bigger — via hyperconnection, the Internet of things, open data and clouds — rather than helping to reveal the reality with billions of data and dreams of exhaustiveness, it seems to bury it, like a haystack that would cover the needle we are looking for. The idea that exhaustiveness would help take control of reality and would duplicate it in some way, embodies an accounting like idea of reality, a denial of reality, a misinterpretation. Like a map with on a 1/1 scale that would mix up territories, that would possess all the guarantees of accounting accuracy but would reveal to be an inadequate guide for us.

Actually — and this is the decisive input by philosophers, from Descartes to phenomenologists including Kant — the reality is not a mere comiplation of data , as exhaustive as they may be, it is a hypothesis. It is not given, delivered “as such”, it is a construction of our intellect. Here is why some prescribe to put back intelligence and human factor at the heart of clouds and data torrent, and to opt for smart data preferring pertinence and discernment in the data collection and intelligence in their analysis.

The perfect illustration of the difference between the smart data approach and the big data one — and between their respective effectiveness — is given by Edgar Poe’s short story, “The Purloined Letter”[2]. While the Scotland “big data” Yard’s team struggle desperately to go through the slightest millimeter of the apartment looking for the compromising letter, the detective Auguste “smart data” Dupin, based on some pertinent data, discovers where it is before even entering the apartment… But we will not reveal the solution in order not to spoil for those who have not yet read this smart jewel on intelligence.

3. Big Data : dollars and myths

A mystery remains. How is it that in spite of disappointing results — and even fiascos — the frenzy surrounding big data goes on increasing? We can doubt the concrete effects achieved by big data, but it is now obvious that its craze on the market is very real. Internet giants, startups, cloud players, brands, advertising agencies, all of them step into this business with overwhelming rates of return. AWS (Amazon Web Service), Amazon’s cloud computing department is the only one of the group to make profits — what allows Jeff Bezos to cover some of his debts and to satisfy his hunger for external growth. Not to mention the omnipresence of conferences, symposiums and other keynotes imbued with messianic evangelicalism and big business that are always sold-out. It is undeniable, big data is a big business that sells and pushes sales.

A fantasy economy

In fact, the paradox is only apparent. In fact, the link between hypothetical and monetized results have become a basic equation of the times. We live in one virtual society, not only in the “digital” sense, but also became everything that has not yet become — which is thus the state of virtuality — possesses more value than what already exists. Any company is worth more with its shadow projecting on the future than its present state. Only is important the desirability which in turn can provoke at the same time investors, consumers, media, social networks and street cafés…

And the desire is more an affair of fantasies than tangible proofs. And on this topic big data develops a strong fantasmagoric aura that it draws from the origins of the three big contemporary myths.

Three structuring myths

Big data is firstly, an Eldorado. The perfect programmatic vector to sharpen greed and start a gold rush. There is evidently the “big” that authorizes all superlatives and opens all horizons of immensities like a hold-up booty. But there is also the “data” that constitutes a form of digitized gold. Like the yellow metal, it possesses a dual character. A physical dimension, quantifiable — we have the number of terabytes which is are billions of billion octets — compatible, stocked on clouds. And a symbolic aura that an occult branch of power, of control and of force because data is also information.

Then, big data, inevitably evokes Big Brother, the myth built by George Orwell in 1984. A constant reference to illustrate the data-panopticon, this capacity that big data players seemingly have — first of all the GAFA (Google, Apple, Facebook and Amazon) — in order to know and control everything via geolocation, the multiplication of digital traces left by each on the Web, via applications and social networks…

And lastly, big data is experienced like an echo from the Minority Report the new anticipation of Philip K. Dick — widely popularized by the homonymous film by Steven Spielberg with Tom Cruise. A fiction that is set in the future, a police, thanks to mutants with divine powers — the Precogs, abbreviation of precognition — would be in the measure to stop suspects before they even commit the act. A perfect illustration of the fantasies underlying the hypothesis of predictive big data: being ready thanks to user profiling to anticipate their future state of desire. In the 70s, the myth of the subliminal message that frightened the enlightened minds. The belief that the existence of a hidden injunction inside advertising message, like an undetectable pattern in a carpet that is even more efficient since it’s meant to directly target the unconsciousness without any filter. Today it’s another occult force that we suspect to be at work. Since we know everything through our digital traces — profiling — we may be capable of predicting what we desire.

Ascending spiral

We may think that the last two myths — 1984 and the Minority Report — generally waved off by its detractors, are bothering the players of big data. Although, it is quite the contrary that is taking place: these “totalitarian myths” do nothing except entertain and feed the fantasies on the allmightiness of big data. More we treat the actors of Big Brother or of the Spinx, more we end up accrediting them the effectiveness of their power. Which equally accrues their attractiveness from their clients’ perspectives and consequentially, their valuation. And further the latter grows, further it becomes in turn the proof of their power. And so forth… an ascending spiral logic in which all contributes to reinforce the power of big data. A virtuous circular dynamic, as to say. Which is also sometimes — is it necessary to recall? — the origin of speculative bubbles.

[1] Do not hesitate to read the excellent long-form article “In the head of the NSA — a philosophic history of the American intelligence” by Grégoire Chamayou in Le Crieur n°1 — June 2015 (La Découverte/Médiapart)

[2] « The Purloined Letter » in Extraordinary Stories, Edgar Allan Poe (Livre de Poche)

--

--

Paul Vacca

Auteur. Chroniqueur pour Les Échos Week-end. Intervenant à l'Institut Français de la Mode (IFM Paris), à l’ISG Luxury Geneva (Suisse).