One thing I didn’t mention in my blog yesterday but which I think with hindsight I should have, is that we don’t actually know that the AOL searches by a particular anonymised user are all by the same person. They may all be performed on the same AOL account, but of course many people share an account among their family, lodgers and so on, and may even let their PC-less neighbour come round from time to time to do some surfing.

Just over a week ago AOL published the search histories of around 650,000 user accounts over a three month period, ostensibly in the name of search research. It didn’t give you the user’s name, but instead assigned a random number to each AOL account, the aim being to make it harder (or in their opinion impossible) to identify the user from their search histories.

What that means is that multiple users of a single AOL account are given just one number between them.

So in practice AOL is probably right – it probably is nigh on impossible to prove someone’s identity from the search histories. Because unless there is only one person using that AOL account – and you couldn’t know that from the data AOL published – it is impossible to conclude that two searches, for ‘A’ and then ‘B’, mean that a single person is interested in or even linked to both A and B. One member of the family may be interested in ‘A’, a neighbour or lodger in ‘B’.

There’s another problem that this highlights, and it’s one of context. Just because someone searches for “Leonardo DiCaprio pictures” and then for “teacher’s curriculum coursework materials”, does not suggest a teacher with a possibly unhealthy interest in Leo, because we have no way of knowing that the first search was not by his teenage daughter looking for pictures of her favourite star.

Does all this mean that it was OK for AOL to publish all that information? Actually, no. People did not give their express permission for this data to be published, even with their names omitted. There are search histories that could cause embarrassment, especially given the context problem discussed above: a search may appear to identify someone with embarrassing or even illegal acts, whether or not the context just makes it look that way, or whether it was even them doing the searching.

Plus although it is difficult to prove all searches in a history are by one user, it is still possible to build up a mosaic of information that could breach someone’s privacy, or their family’s privacy. Also, there are social security numbers, names and addresses in the data that could be useful to spammers and other ne’erdowells.

So it was right and proper that AOL yesterday made this apology in a statement: “This was a screw-up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant… Although there was no personally identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.”

Here here.