|A sting in the digitial tale“I was working in an archive of a 250-year-old business, reading correspondence from about the time of the American Revolution. Incoming letters were stored in wooden boxes about the size of a standard Styrofoam picnic cooler, each containing a fair portion of dust as old as the letters. As opening a letter triggered a brief asthmatic attack, I wore a scarf tied over my nose and mouth. Despite my bandit’s attire, my nose ran, my eyes wept, and I coughed, wheezed and snorted. I longed for a digital system that would hold the information from the letters and leave paper and dust behind.
“One afternoon, another historian came to work on a similar box. He read barely a word. Instead, he picked out bundles of letters and, in a move that sent my sinuses into shock, ran each letter beneath his nose and took a deep breath, at times almost inhaling the letter itself but always getting a good dose of dust. Sometimes, after a particularly profound sniff, he would open the letter, glance at it briefly, make a note and move on.
“Choking behind my mask, I asked him what he was doing. He was, he told me, a medical historian. (A profession to avoid if you have asthma.) He was documenting outbreaks of cholera. When that disease occurred in a town in the eighteenth century, all letters from that town were disinfected with vinegar to prevent the disease from spreading. By sniffing for the faint traces of vinegar that survived 250 years and noting the date and source of the letters, he was able to chart the progress of the cholera outbreaks.”
Paul Duguid, Trip report from Portugal, quoted in Seeley Brown and Duguid’s The Social Life of Information
In the first post of this series, I argued that the first problem of identity is to establish what are the set of properties that are necessary and/or sufficient to make a valid claim of identity. I also asserted that there is no single set of properties that will always be necessary and sufficient for every situation: in other words, identity is always context-dependent.
In the real world, we can always make a reality check: if we can’t confirm the identity of something or someone, we can always go back for more information, look for further properties useful in establishing identity: “IS that really my car? The number plate’s wrong, it’s someone else’s but I could have sworn it was mine…”.
In the digital world things are a little more complicated, however.
When we “digitize” something from the real world, what we create in the digital world is always a poor copy at best and often lacks many of the properties of the original – we inevitably lose something. When you take a digital photo (and even traditional film is digital, once you get down to the individual photosensitive grains), the shot captures a determined visual subset of reality – where the camera is pointing, the image resolution, the focus, the film speed – all these factors and others necessarily limit what is captured.
On its own, this is not a particular problem: we tend to take a photograph to capture something we’ve seen and wish to record. How often, however, once the photo is developed and examined, do we wish we could have gone back and captured some detail that we hadn’t originally seen? Used our zoom to capture more of a building’s ornamemnt? Shot a little more to the left to capture the whole of that tree with the fascinating shape?
These are all examples of what I call the digital one-way street: when you go along the street from the analogue, real world, to the digital one, there is no turning back. Once you have encapsulated – digitized – a pre-determined subset of the real world, there is no going back. The best you can hope for is a second-pass, by attempting to re-create exactly the same situation again only this time with your focus of attention shifted. And how many times will want to re-visit because somethng else comes into our field of examination? Anyway, as all amateur philosophers know, you never can step in the same river twice. The lengthy quotation alongside from Paul Duguid serves as a warning to us all: you never know if there is some other property of a real world object “out there” that is not captured in the digital, and that you might one day need: . This has always been my main concern about metadata and where I have some sympathy with the approach taken by, for example, Google – don’t just try to extract or attribute metadata to each web page you find, but throw the whole thing and the kitchen sink in for good measure – for example, you never know when some crackpot wants to do research on the average number of comma’s used per paragraph on Web pages (and why not? a PhD thesis topic awaits…). The point is is this: You can never prefigure future requirements, even if you can take a good guess.
Why is all this relevant to the question of digital identity?
Because if any reference authority – say, a central government department – wishes to establish a formal identity scheme, they need to think very carefully about what is the set of properties considered to be necessary and sufficient to uniquely identify someone. Once you have chosen your set and, for example, rolled out a complete electronic identity card system, it is impossible – short of recalling all the cards issued and starting over again – to modify that initial property set: so you had better get it right or accept that it is possibly inadequate on its own.
In over-zealous attempts to “future-proof” some systems, there might be a temptation to thrown in as many properties as can be technically managed, “just in case”. Aside from digital identity, many information and document management projects have driven full-speed into the unforgiving wall of reference metadata, realising too late or on impact that the ideal metadata set just keeps on changing…
And that’s just for starters…Throw in the additional complications posed, for example, when you want two or more systems to interoperate, using the same “identifiers” and yet more problems arise: “Service A” identifies by a combination of family name, date of birth and in the case of multiple births, also by first given name, whereas “Service B” identifies by a birth registration number alone. You might be able to find a solution within a particular jurisdiction by prescribing the data elements required, but what about in situations when the data is to be used across administrative and jurisdictional boundaries? Or when one service offers name and DoB and another states that, because of its data protection legislation, they are not allowed to exchange names? Believe me, it happens.
The decision therefore regarding the set of properties to be chosen is, to use a common euphemism, non-trivial. How can one possibly go about choosing a set that is reliable, stable and doesn’t face the problem of the “digital one-way street”? And is the exercise in any case worthwhile? Is there a better approach?