16 September 2013

Rendering Students Legible: Translation Processes in Private, Institutional, and Government Higher Educational Data Systems

This is an extended version of the proposal I submitted to the 2014 WPSA conference today. It builds on, and contributes to, the general ideas of my information justice paper from the 2013 MPSA conference.
To say that data is constructed seems trivially true; data architecture is a basic competency of data scientists and database design a core concern of information technology staffs. But embedded in this conventional view is an understanding of data consistent with scientific realism: the data that is available to us may be limited, but it nonetheless objectively represents the reality of a datized moment, interaction, or condition. In this view, the choices made in data architecture have at most minor substantive influence on the data itself.

This paper subjects educational data to a constructivist analysis that goes much further than the mechanics of database design. Students’ interactions with the institution, the state, and private actors present a problem of legibility for those actors, which is solved by datizing those interactions through a series of translations in the data process. The information that student databases contain is structured primarily by transformations in the data process and not the datized moment itself. I show how data is substantively constructed through its collection and management by conducting structural analyses of the data systems commonly used at Utah Valley University: Canvas, Banner, the Utah System of Higher Education reporting process, and IPEDS. I describe a series of characteristic transformations that take place in the collection, storage, and retrieval of data in these systems:
  1. from relevance to existence,
  2. from contingent to essential,
  3. from narrative to nominal,
  4. from complex to categorical (or plural to monistic), and
from diversity to normalcy. These transformations reduce the many ways of understanding reality to a single interpretation embodied in the data. Though reality does both provide inputs into the transformational process and constrains the ways that the process can transform those inputs into outputs, the choices made in developing data processes make impossible a database in which there is a one-to-one correspondence between reality and data. The information output is underdetermined by the datized moment itself, resulting in a Rashomon-like one-to-many correspondence that is reduced to a single output only with the conclusion of the data process.

This is more than merely suggesting that there are errors or biases in the data. In a realist view of data, errors and biases can be corrected by validating the data against itself or the reality it purports to represent. But the self-correcting process of scientific realism cannot do so within the kinds of transformations described in this paper. The transformations are consistent with reality because they follow the rules chosen for a specific data process. Actions are consistent with the conclusions drawn from that data because the data process has legitimized that data as the only acceptable representation of reality; all else can be dismissed as anecdote. Constructive data can only be challenged in confrontation with other constructions that may have been possible given alternative data processes.

Policymakers can gain much from understanding the transformations that govern the data that they use. Understanding them helps, first and foremost, determine the extent to which a data point--or combination of points--is an appropriate operationalization of a concept of concern, for example, when analyzing along a combined race-gender classification is more useful than analyzing by race and the gender sequentially. Such understanding also makes us aware of blind spots in our data where deviation or perceived irrelevance has been excluded. Finally, understanding the constructive nature of data is an antidote to a number of scientistic fallacies that can undermine data analysis and action.

But the deeper concern is for the injustices that can enter higher education through these translations. These translations exercise and distribute power between students and institutions, between institutions and the state, and among institutions by creating privileged representations and embedding both new and existing such representations in the data that is used in decisionmaking.