So there's another aspect to "the digital" that gets dragged out which I am going to pretend I was saving for a later post, rather than outright forgot about. That aspect is that the digital photograph is machine readable. Which isn't in and of itself a huge paradigm shift. You could probably have built something or another in Victorian era that would "read" a picture in some whacky sense or another, although I dare say nobody did.
Be that as it may, whether by a series of tiny steps or in one giant and shocking step, we do find ourselves increasingly in a world in which we, and our things, are photographed automatically and the pictures fed to a machine. Your automobile license plate might be photographed at toll points on roads or bridges, or by police officers, the numbers on the plate to be fed directly to a computer for billing or various law-enforcement searches. We either are at or on the cusp of our faces being photographed and fed into facial recognition systems constantly and for various purposes.
Something I probably won't talk about now is that much of what we shoot ourselves is also fed directly into the machine, but let's stick to the pictures machines take for the nonce.
At this point we've arrived at the usual point, in which boffins are inventing new ways to ingest photographs and process them for no reason except that boffins gotta boffin and "I have a cool idea!" The shady buggers hanging around the boffins are yoinking the ideas and deploying them for shady purposes. Extending state control, expanding the neoliberal agenda, or the fascist one, or simply trying to turn it in to (more) money.
This can be viewed, and is in fact very neatly viewed, through the ideas of index and representation.
In general, the index is in good shape. These photos are automatically shot, and index whatever they're pointed at. Probably the indexing is a bit shoddy, the quality of these pictures is likely to be lousy, so, while the facial recognition program by god gets a fully indexical thingy of your face, it's done in bad light, it's low resolution, and so on.
Hold on to this point, it's going to become important in a moment.
At this point we have something like representation, but it's of a different sort.
Rather than worrying about how the index (which lies by omission) hits the human/social mind, we worry about how it hits the opaque algorithm. The algorithm works nothing like a human mind. Despite the cries of "AI!" and "neural network!" the algorithm in fact resembles a mind in almost no meaningful way. But, like the mind, the photograph interacting with the algorithm -- call this action machine representation -- can produce real world results.
Let us not lose track of the fact that the algorithm is in general but one piece of a human/machine system. There are always people involved, someplace. The point, though, is that the picture hits the algorithm first.
A friend of mine was driving his vehicle in Pennsylvania, when some cops pulled him over. They were pretty sure the car was stolen, their license plate scanner had pulled up a "stolen car" notification. Things got tense for a bit. Then the cops fiddled with their computer, grumbled, waved guns around, fiddled some more, and then told my friend he could go. He said, because he is smart, "Wrong state, right?" and the cops grumbled some more and nodded sourly.
So what happened here is that the index was perfectly adequate. The numbers were sufficiently rendered, the missing material inherent in the photograph was not relevant. The representation sucked, though. Interestingly, in computer science we also use the word representation to describe how a real thing, like a license plate, might be summarized in a chunk of computer data. So in this case, the representations were misaligned. Either the scanner or the database failed to note the state, and my friend suffered from having a Pennsylvania plate with the same numerical component as a Virginia plate on a car that had been stolen.
The index is the photograph. It is not "Washington State AUC4915."
The photograph, which is just a picture of the ass-end of a car, hits the algorithm. The algorithm them attempts the first steps of representation, and might come up with "AUC4915" or "Washington State AUL4915" or any number of things that are not in fact my license plate number. The action of representation proceeds through the system, representations being processed, altered, and matched against a database that might contain "Virginia AUC4915" or just "AUC4915" and continues with "STOLEN" and then the human part of the system wakes up and starts waving guns around. An arrest may ensue, or not.
In general we're going to see more of this. Face recognition will yield mistaken identity, and the wrong people will be detained, sometimes arrested, and occasionally shot.
You can draw a somewhat shaky line from this notion of machine representation to the traditional one, because in these degenerate times people are doing all this stuff with neural networks, which are trained with sets of existing pre-analyzed photographs. Those photos, the so-called training sets, are analyzed by humans with all the problems of representation that go in to that. Famously, google was identifying black people as gorillas for a while, a case where the generally accepted theory is that representation informed machine representation leading to results. More generally, machine representations are designed by people.
First of all, note how neatly the old and tired theory from the late 20th century seems to be working here, once we disassociate representation from its contemporary day job of supporting identity politics.
So, really, we have three things in play here that seem relevant:
- Representation, in the human-mind sense.
- Machine representation, the analog of of representation with the mind replaced with the algorithm.
- Data representation, the actual structure of database records referred to by the algorithm.
These are all in play, given that the first one tends to inform the third one and, to an extent, the second one.
So what? Who cares and why should we care?
Well, as usual, I think there's value in having things parsed out to teeny little details, because I am me. Secondly, though, this gives us a pretty firm framework for understanding the social/digital mechanisms here. A dash-mounted license plate scanner in a police cruiser isn't a singular object. There are the policemen in the car, there is the database off someplace with the list of all the stolen cars, there are the contractors who wrote all the software, and so on. There is a complex of people and machines involved which, from time to time, produces a traffic stop, an arrest.
The photograph is the input that brings the machine to life, leading to the arrest. In the process, representations collide and interact. If they interact properly, the person arrested in fact committed the crime. If they interact badly, the wrong person is arrested. Or shot.
In computing, we call this systems architecture, but in these cases we need to be roping the human beings and the raw elements of how photographs function into our architecture. We need to understand how these things work. The ideas of the photograph, the index, and the three representations above all need to be grasped at some level. The interactions between them need to be explicitly mapped out and understood if we are to understand where errors are going to creep in, and how to fix them.
And that's just to get the system to work as the designers intend.
We also have questions of whether such systems are good ideas at all. Again, a thorough understanding of what's actually going on, how it works, how it fails and how it succeeds, are useful tools in the arguments against (or for) such systems.
In terms, for instance, of license plate scanning systems we can probably argue that the ship has sailed. The license plate is explicitly designed and intended as a key into a database (a filing system), so arguments that it should not be used as such are probably going to go nowhere. Arguing, though, for correct design, as well as enabling correct design, is still in play.
Perhaps the system my friend was caught up in was designed, in part, by people in a country where there is no state designation on license plates. Perhaps in India, or The Philippines, license plates are issued at a federal level and the numerical content is indeed a unique nationwide key. Perhaps the designed a database record format that did not include the state at all, and instead simply had the plate number indicated as a UNIQUE KEY. When deployed, the American users simply shrugged it off and worked around it, but not well enough.
This would absolutely be a failure of representation, possibly in all three senses. An understanding that representation as discussed in this essay is as much social as technical could have uncover that (hypothetical) problem earlier. An understanding of index might lead to the realization that the photographs weren't good enough to reliably capture the state information, which in the USA can be a bit of a bugger. And so on.
In a way, google's problem with identifying black people as gorillas is falsely comforting. It suggests that if only we have a more diverse community of engineers training our neural networks, then the system will work perfectly.
This is untrue. Machine representation proceeds through a dizzying set of transformations and computations. Google's problem was easy to spot. The sheer complexity, and the sheer mechanicalness, of machine representation is likely to produce far more subtle problems in representation, problems that are deeply inhuman, problems we will not recognize.
Just sitting here mulling this framework around, I can imagine pretty much endless scenarios in which face recognition systems could go wrong, and I can point to exact causes of the failures.