News that AOL has released massive amounts of search history data has already got users up in arms if various blog comment boards are anything to go by. People don’t take kindly to the notion that a firm like AOL should suddenly decide to make all of their web searches public. I’m not surprised.
Start to look at the search histories of individual users and you realise that search histories like this are simply a window into people’s private worlds. All of life is here: the lonely, the sick. The happy and the sad, and yes also the depraved and the criminal.
Someone is looking for help with anorexia; someone else wants to know about herbal remedies that could help with depression. Someone wants to know which drug rehab clinics accept Medicaid payment, while someone wants to know how to “counsel someone who is going threw a crisis”.
For another, “this just doesn’t look like being my day” seems more like a statement than a search term. Someone else’s 20-month old baby has separation anxiety when its parents go out for dinner. Someone’s cat won’t use its litter tray.
We know all this because AOL keeps a log of all of the web searches that its users conduct. For some reason it decided that for the purposes of search research, it would be useful to publish a copy of three months’ worth of those logs, featuring around 20 million web queries collected from around 650,000 users.
While AOL published the lists with the actual user names replaced by a supposedly meaningless number to protect people’s identities, observers have noted that some search histories would make it possible for the user to be identified.
At time of writing the original page with the download (all 439Mb of it) was unavailable, but it’s already now widely available from various so-called ‘mirror’ sites.
One problem noted by Michael Arrington at TechCrunch for example is that users might have searched for their own name, to see if they are mentioned on any other web sites. A glance then at their search history could make their identity obvious.
I’ve so far only had a quick glance at the raw data. As AOL noted in a ReadMe file attached to the data, “Please be aware that these queries are not filtered to remove any content. Pornography is prevalent on the Web and unfiltered search engine logs contain queries by users who are looking for pornographic material.” Yes, there are quite a few users searching for porn. That’s unlikely to surprise anyone.
More surprising is that the data also shows the web sites that the search engine came back with for various search terms. One user who had searched for various pornographic terms and then did a search for “girl that pay for sex” may have been surprised that his search returned the web site for the Guardian newspaper, for instance.
But it is also possible to start to build quite a detailed picture of someone’s identity from their search history. We know from the user who was looking for a girl who “pay for sex” that he is looking for such a girl in “fayetteville nc”. He’s also looking for a good engine oil for his “acura legend” motorcar – perhaps not surprising as “acura legend smoke when frist start up”. He’s also got a problem with his “bbq”, apparently.
So if police wanted to trace this man all they would have to do is find the person in Feyetteville whose Acura Legend smokes less than his barbecue.
From some entries, you can find out far more than this and possibly even enough to identify the person. The implications for privacy and the possibility of subsequent criminal investigations are very real.
At the very least, some of the histories could cause much embarrassment if not criminal proceedings if the user was ever identified. TechCrunch noted that one user had done lots of searches for terms like “how to kill your wife”.
There are others that could be evidentiary. I found one searcher had looked for “revenge tactics”, “the woman’s book of revenge”, “how to torment someone”, “how to humiliate someone”, “how to get revenge on an old lover” and “things to send your old lover via email”. Perhaps most disturbing of all, one of her last search terms was for “jame blunt goodbye my lover lyrics”.
The worry is that there could be a knock-on effect on people’s attitudes to their privacy online. Although there are indeed clearly a lot of people using search engines for nefarious ends, there are just as many using it to ask legitimate, but equally private questions.
If they stay within the law, users should be able to search for information of a private or embarrassing nature without the fear that some day their search history could become common knowledge. If someone wants to do a web search for “nude resorts clothing optional”, or “butt implants”, or even “tent pole replacements” behind closed doors, then so be it. But people who search for James Blunt lyrics – they are the ones we need to worry about.
UPDATE: AOL has apologised – I blogged it here.