25 February 2014

Data as Reality Made Legible

A fragment written for an upcoming paper on the construction of student data in higher education.

The ubiquity of data in contemporary society hides its peculiarity. Data is a very specific form of information, one in which the subject is broken down atomistically, measured precisely (in the sense of being measured to quite specific standards that may or may not involve a high level of quantitative precision), and represented consistently so that it can be compared to and aggregated with other cases. That this form of knowledge is more common in highly structured institutions and rose to ubiquity with the modern, bureaucratic state and the capitalist enterprise should surprise no one. Creating data should be regarded as a social process in which reality is made legible to the authorities of an institutional structure.

Scott (1998) argues that the driving force behind the creation of data is the need to make the subjects governed by an institution legible.

Certain forms of knowledge and control require a narrowing of vision. The great advantage of such tunnel vision is that it brings into sharp focus certain limited aspects of an otherwise far more complex and unwieldy reality. This very simplification, in turn, makes the phenomenon at the center of the field of vision more legible and hence more susceptible to careful measurement and calculation. Combined with similar observations, an overall, aggregate, synoptic view of a selective reality is achieved, making possible a high degree of schematic knowledge, control, and manipulation. (Scott 1998, 11)

Legible knowledge transforms reality into standardized, aggregated, static facts that are capable of consistent documentation and limited to the matters in which there is official interest. Such facts emerge from a process in which common representations are created into which cases are classified and which can then be aggregated to create new facts on which the state will rely in making decisions. (1998, 80–81)

The importance of legibility for governance can be seen most clearly when Scott contrasts legible knowledge with local knowledge. The latter, with all of the specific practices, details, and dynamics of reality, is impossible to use for the kind of broad governance characteristic of the modern state; it lacks commonality with other localities and is not objective to outsiders. This obstructs governance in two ways, first by preventing synoptic understanding by authorities and then by denying the governing algorithms of the bureaucracy the standardized inputs they need to produce a standardized output. “A legible, bureaucratic formula which a new official can quickly grasp and administer from the documents in his office” (1998, 45) is a necessity for modern governance in both the state and the enterprise.

The need for legibility defines not only the form but also the substantive nature of data. It is common to regard data from a scientific realist perspective in which data is a technical artifact, a representation of information about some subject that is stored such that it can be related to other such representations. This is, for example, the approach used in the United Kingdom’s Data Protection Act 1998. The act defines data as a qualified synonym for information: “Data is information which …” followed by a list of technical conditions relating to storage and processing; personal data is defined by the data’s relation to an individual identifiable either in the data itself or in relation to other data, and sensitive personal data includes information about a specific list of personal characteristics.

This is a quite problematic view of data, however, as it suggests that the process of representing reality  is an automatic, even algorithmic process. Such a view is na├»ve, however; like virtually all technologies, data is a socio-technical construct in which human agency is essential. (Nissenbaum 2010, 4–6) Rather than being an automatic process with a one-to-one relationship between reality and data, data states are underdetermined with a one-to-many relationship between reality and data: one state of the world can give rise to many possible data states, some of which are incommensurable with others. In order to make the world legible to human authorities and algorithmic bureaucracies, one data state must be chosen to represent a state of the world from among many possibilities. Reality constrains those possibilities but it does not, by itself, fully reduce the state of the world to a single data state. The contingency of the final forms of data requires some external source of stability in order for data to bring legibility to the world. (Mitev 2005) What is needed is a process of translation from reality to data that constructs a single representation by serving as the external source of stability for representation.

Such a process is inherently endogenous to the creation of the data as long as multiple possible data states remain even after those that are inconsistent with the exogenous reality to be represented are already eliminated. In a realist view of data, all but one of these states must be regarded as errors or biases in the data, which can be corrected by validating the data against itself or the reality it purports to represent until a single data state that is fully consistent with reality remains. But the self-correcting process of scientific realism cannot do this. All possible final data states will appear consistent with reality because they follow the rules of a specific process that leads to them, which has itself legitimized its data as the only acceptable representation of reality, all else—local knowledge in particular—being dismissed as anecdotal evidence. Data, in essence, only exists once socially created rules for choosing among the possible data states are in place.

A paradigmatic case is that of classifying individuals within a system of gender relations. Simply within the binary gender system common in western cultures, people might be represented within a data system either by sex or by gender. These categories are not reducible to each other; the existence of transgendered and intersexed people is sufficient to make sex and gender incommensurable within such binary systems. Moreover, there is no inherent reason that a data system needs to be limited to the gender binary: Facebook recently introduction of more than 50 custom gender descriptions from which its members can choose. In spite of this, most data systems rely on binary systems, with biological sex being the most common (despite “gender” being the most common data field name). The representation of individuals’ place in the system of gender relations is thus determined by neither reality nor by the technical requirements of the data system. It is a choice on the part of developers to reduce an exceptionally complex reality to a specific legible form.


  • Mitev, Nathalie N. 2005. “Are Social Constructivist Approaches Critical? The Case of IS Failure.” In Handbook of Critical Information Systems Research: Theory and Application, 70–103. Elgar Original Reference. Northampton, Mass: E. Elgar Pub.
  • Nissenbaum, Helen. 2010. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Staford, California: Stanford Law Books.
  • Scott, James C. 1998. Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. New Haven: Yale University Press.

No comments:

Post a Comment