Thursday, October 3, 2019

Photography and Machine Learning

Nifty article in The Guardian here.

The short form is that there's a large cache of burned scrolls that were cooked when Vesuvius erupted. The ink used on them by The Ancients was carbon based, that is, it consists of smears of a material on to the substrate that is more or less identical to the current state of the substrate. Everything is very brittle, nothing can be unrolled, and the signs left by the writing process are very very subtle anyways. The ink markings remain as, maybe, changes in texture in the charred material.

Ok, so they're gonna image these things with something kind of like a CAT scanner, in their rolled up state, and try to discern these subtle textural changes, and try to work out the writing on the scrolls, without unrolling anything. Bold move. I favor it.

They're going to apply machine learning, at which point things get dicey, and become an illustration of the kinds of problems we're going to see more and more of.

First lets imagine a worst case scenario. They train their computer program on a bunch of burned papyrus of whatever with Greek written it, and accidentally train their system to turn goddamn near anything into convincing looking Greek text. This isn't going to happen, because in the first place we may assume that these scientists are not idiots, and in the second place the output would just wind up being a kind of word salad version of their training text.

Four score and by the grace of God, King of England, conceived in Liberty, and dedicated to the Archbishop of Canturbury... etcetera

This sort of blockheaded error is going to jump out at you, and anyways, it does not seem likely to me that they're going to try to teach it to recognize text anyways. They might well try to teach it to recognize strokes of ink, instead. In this case, they're going to get out a sort of cloud of lines, ink that might be on this surface, or on a surface nearby. I would bet you a dollar that at some point they get out pictures that look a bit like several layers of text laid atop one another, with some ink strokes simply missing, some incorrect, and some real, with no real way of knowing which is which.

This mess gets analyzed by hand, and probably also with machine learning tools, because by god once you've got a hammer, you're going to bash some bloody screws in with it.

While we are not generating Greek Salad here, we are at real risk of generating a salad of ink strokes, and imagining that we see text in them. Or, probably and, training a computer to imagine text in the cloud of ink strokes.

I dare say that they're going to produce at least two separate systems, so they can cross-check results, because — and this is terribly important — there seems to be no way to check the damned thing's work. You guess that there's ink here, here, and here, and that it might say "one dozen eggs" but there's no way to tell if it really says that or if the computer is just generating suspiciously meaningful noise. If you could physically check it, then you could just go ahead and read the thing in the first place.

If you get two systems, and you're really really careful not to accidentally cross-pollinate, then you have a couple scenarios. The first case is that both systems are pretty sure that it says "one dozen eggs" in Greek. The notional ink strokes are pretty similar, and pretty clear (pro-tip: you probably cross-pollinated the two systems, and they're both getting it wrong the same way). Another scenario is that you're getting two completely different sets of ink strokes out, and one reads "one dozen eggs" in Greek, the other says "Titus, god what a shit" in Latin.

The most likely scenario, in my cynical opinion, is that you get two sets of ink strokes out that look vaguely similar, and if you stick to the strokes they agree on there's no text that you can make out, while each system taken alone produces some runs of sensible text, but not the same text.

But it's worse than that.

Suppose we make out what appears to be a letter of Cicero based on a couple of words here or there. If we had an actual written, albeit damaged, text in front of us, we could reason thus: Look, it's text, and it's signed by Cicero. We can reasonably extrapolate a collection of random words with indecipherable chunks between as being correct Latin sentences, in the style of Cicero. We cannot reasonably make the same extrapolations in this case, because we're not looking at a piece of paper with writing on it. We are looking at the output of a computer program which did, well, something we can't quite be sure what the hell it did.

For all we know it pulled out the Cicero because we trained it on some of his writing, or because it accidentally worked out how Latin names go together, or some damn thing.

This is not going to stop people performing the above extrapolation.

The best case scenario here is that we are about to acquire a whole bunch of classical texts which live in a brand new category of reliability. The worst case is that we're going to expand the canon of classical writing with a bunch of completely made-up bullshit. How to tell which world we live in seems to be, at best, difficult. Not only does the technology make it very hard to tell what's real and what's Memorex, there is the ugly reality that nobody every got a PhD for 7 years of work that results in we were unable to establish any significant results.

Computational photography has, across the board, the same problems.

The pictures you get out will, in general, look pretty convincing. Non-convincing pictures simply get tossed as "glitches" so the only "genuine" output will be, by definition, convincing. The pictures are, in general, based on reality but are not an index of reality in any meaningful way.

As long as there is an objective reality that can be checked, things are not so bad, but in those cases in which we cannot check against reality, we run into a problem of trust. We get something out, to be sure, which might be the right thing. How can we know, though? How much trust can we place in this output? Given that the output looks convincing, people with agendas are going to strive pretty hard to get us to accept this or that result. People will stare into the depth of the pixels, and discern what they want to see, and they will lean hard on the rest of us to see the same way.

It is only a matter of time, perhaps weeks, perhaps a decade, before someone mounts a substantive challenge to some photographic evidence on the grounds that computational photography can and does produce convincing-looking artifacts.

The evidentiary and scientific value of photographs is about to take a nose dive. I remember back when you couldn't use JPEG for medical images (maybe you still can't) because JPEG artifacts could be misread. This makes that issue look like nothing whatever.

An interesting side remark here is that contemporary medical imaging often is computational photography in a fairly strictly defined sense ("CAT" means Computerized Axial Tomography) in that a bunch of not-very image-like data is combined by data processing into one or more pictures. What it does not use is Machine Learning. There is no AI bullshit, it's straight up deterministic computation, with a well-defined (and small) envelope of potential artifact generation.

It remains unclear to me how, or if, this will effect vernacular or art photography. My current guess: not very much. Which leads to an interesting bifurcation in how photographs are treated.


  1. I'm always suspicious of arguments that are based on, 'Oh, the engineers will never be able to solve this.'
    But I like the points you've been making about AI and digital imaging, that perhaps a line is being crossed.

    1. It is certainly possible that the engineers will just make it work. The differences between "inked on charred surface" and "not inked on charred surface" might just pop right out, and maybe machine learning is just the right method to scale up the computations.

      There are certainly cases where it's computationally infeasible to Just Calculate It but a machine learning model can produce essentially the same results much more cheaply.

      Verifiability becomes a problem in this case, but if they can really make two clean-room-separate models that produce substantially the same ink-traces, then they've got that covered pretty well too.

      It might Just Work. But it might not. There's a *huge* grey area in the target region, and if the project lands in it, there's gonna be problems.

      And if this project works out, well, there will likely be other projects that land in their corresponding grey areas.

      It is the existence and the essentially new character of the Grey Areas that makes machine learning applied to photography interesting!

  2. The problem with machine learning (and especially deep learning) is that people can provides you with the algorithm, but the results they produce are so convoluted that nobody can interpret how good or bad are the results, except indeed if there is a reality check (e.g. you consistently win chess games against a world champion). So I really dislike the idea of AI as a way to gain understanding things; it is just trained and fit for purpose, nothing more. Yes, it can beat chess champions, but it will not explain you how it did it. Very limited benefit for understanding beside the pure performance. And a sad step for humanity too with respect to knowledge; real progress are made by increasing our understanding of the world, not by outperforming existing algorithms/persons (IMHO).