Google search results knowledge about you
They know all about you
Every time you use an internet search engine, your inquiry is stored in a huge database. Would you like such personal information to become public knowledge? Yet for thousands of AOL customers, that nightmare has just become a reality. Andrew Brown reports on an incident that has exposed how much we divulge to Google & co
Guardian August 28, 2006
In March this year, a man with a passion for Portuguese football, living in a city in Florida, was drinking heavily because his wife was having an affair. He typed his troubles into the search window of his computer. "My wife doesnt love animore," he told the machine. He searched for "Stop your divorce" and "I want revenge to my wife" before turning to self-examination with "alchool withdrawl", "alchool withdrawl sintoms" (at 10 in the morning) and "disfunctional erection". On April 1 he was looking for a local medium who could "predict my futur".
But what could a psychic guess about him compared with what the world now knows? This story is one of hundreds, perhaps tens of thousands, revealed this month when AOL published the details of 23m searches made by 650,000 of its customers during a three-month period earlier in the year. The searches were actually carried out by Google - from which AOL buys in its search functions.
The gigantic database detailing these customers' search inquiries was available on an AOL research site for just a few hours before the company realised that substituting numbers for users' names did not really protect their identities enough. The company apologised for its mistake - and removed the database from the internet. The researcher who published the material has been sacked, as has his manager, and last week AOL's chief technology officer, Maureen Govern, resigned. But those few hours online were enough for the raw data files to be copied all over the internet, and there are now four or five sites where anyone can search through them using specialised software.
What was published by AOL represents only a tiny fraction of the accumulated knowledge warehoused within Google's records - but it has given all of us, as users, a dramatic and unsettling glimpse of how much, and in what intimate detail, the big search engines know about us.
The number of searches Google carries out is a secret, but comScore, an independent firm, reckons that the search engine performed 2.7bn searches by American users alone in July this year. Yahoo, its main rival, conducted around 1.8bn American searches in the same month; Microsoft's MSN around 800m and AOL 366m.
All of this information is stored. Google identifies every computer that connects to it with an implant (known as a cookie) which will not expire until 2038. If you also use Gmail, Google knows your email address - and, of course, keeps all your email searchable. If you sign up to have Google ads on a website, then the company knows your bank account details and home address, as well as all your searches. If you have a blog on the free blogger service, Google owns that. The company also knows, of course, the routes you have looked up on Google maps. Yahoo operates a similar range of services.
All this knowledge has been handed over quite freely by us as users. It is the foundation of Google's fortune because it allows the company to target very precisely the advertising it sends in our direction. Other companies have equally ambitious plans: an application lodged on August 10 with the US Patent & Trademark Office showed that Amazon is hoping to patent ways of interrogating a database that would record not just what its 59 million customers have bought - which it already knows - or what they would like to buy (which, with their wish lists, they tell the world) but their income, sexual orientation, religion and ethnicity. The company, of course, already knows who we are and where we live.
Even though the search logs that AOL released were made anonymous, by assigning a number to each user, it is not difficult in many cases to discover somebody's name from their search queries. And it is easy to follow exactly what users were thinking as they sat at their computers, in the apparent privacy of their own homes, since the time and date of every search is given.
On April 4, for instance, user 14162375, the melancholy Portuguese-American in Florida, seems to have passed out on the keyboard at 6.20pm, when he asked, suddenly, "llllfkkgjnnvjjfokrb" then "vvvvbmkmjk" and "vvglhkitopppfoppr". An hour later he had recovered enough to search for variations on his wife's name - he thought she might have moved to New England. On the evening of April 16, matters came to a head. "My cheating wife," he typed; and then, five times, "I want to kill myself," and then "I want to make my wife suffer," followed quickly by "Kill my wifes mistress," "My wifes ass," "A cheating wife". Two days after that he was back looking for audio surveillance and bugging equipment and four weeks later he seemed to have cheered up and was looking for motorcycle insurance.
The story stops abruptly there, at the end of May, because that is when the three months' worth of released AOL search records came to an end.
One of the first researchers to demonstrate that we will tell anything, however intimate, to a computer, was Joseph Weizenbaum of MIT, who in 1966 wrote a programme called "Eliza" that parodied non-directional psychotherapy. If the user typed anything in, Eliza would appear to ask a question based on that cue. In no time at all, unhappy students were telling the computer all their troubles as if there were a real and sympathetic person behind the screen. Stories and jokes about this circulated for decades, but the men most successful at turning this concept into a fortune were the founders of Google, Larry Page and Sergei Brin. As users, we think that the Google search engine is a way of supplying us with information about what's on the web. But the flow of information is two way. We ask Google things that we would hesitate to ask anyone living. The price for the answers is that Google remembers it all.
Take user 11110859 of New York City, who fell in love and then was sorry. She was up early on March 7 to buy hip-hop clothes from G-Unit; by March 26, however, there was more excitement in her life. Searches on "losing your virginity" were followed by three weeks of frantic worry about whether she was pregnant: stuff she might have hesitated to tell her best friend or her mother is all quite clear from the Google searches. But by the end of April the pregnancy scare was over and had been replaced by a broken heart. Even before she had stopped asking "Can you still be pregnant even though your period came?" she was asking "Why do people hurt others" and this was the theme of almost all her questions throughout May, culminating on the afternoon of the 19th, when she asked "How to love someone who mistreated you?"; "What does Jesus say about loving your enemies?" "What does God mean when he says bless those who spitefully use you?" Then she spent a couple of days trying to buy Betty Boop postage stamps, and the next thing we know, she was asking first for directions to the New York prison on Rikers Island, then "What items are we allowed to bring at Rikers Island" and finally for "uncoated playing cards".
User 11110859 was not the only person interested in the prison but she seems to have been the youngest and, in some senses, the most innocent. User 3745417 laid out her thoughts in detail just as graphic: on March 6 she made eight searches on child molestation and similar phrases. A week later she was trying to find a prisoner in Rikers Island - nine searches in one evening - a subject she returned to at 9.30am on March 25, when she made another eight searches. Between March 27 and March 29 she made 34 successive searches for M&M chocolates in the early evening, followed on the 30th, at 10pm, by four searches for "Kid Party Games". By 10.15pm she was searching for "Whitney Houston"; then, in the course of the next hour, 29 searches on "black porn for women" and similar subjects.
By the end of April, she was looking for a legal aid lawyer in New York City, a swimsuit, a credit card and a holiday in the Bahamas.
These stories, with all the revealing information they contain, cannot always easily be tied to a specific individual, but sometimes they can. The social security number, with which all Americans are issued, conforms to a recognisable pattern which is easy to search for in the data that AOL released. So, too, are telephone numbers. On the internet, you can buy anything from anywhere, but there are some things, such as pet care, which people mainly buy locally, so it is easy to spot where they live. People often search for their own names, which can then be cross- referenced with the telephone book.
At least one person in the AOL group, a blameless grandmother in Alabama, was identified by the New York Times within days of the AOL data release. And though it may be hard to identify complete strangers, it is very much easier to recognise in the AOL data details of someone you may already know. A church lady in the midwest, whose quest for Christian quilted wall hangings was interspersed with inquiries about vibrators and arousing frigid wives, is probably easy for anyone in her congregation to identify.
This is knowledge beyond the dreams of any secret police in history. Earlier this year Google fought a lawsuit to keep a week's worth of random search data out of the hands of the US government, but other search companies have handed over their data without complaint and nobody has yet discovered what deals have been struck between search engines and the Chinese government. China is generally thought of as attempting to censor the internet, which it does; search engines that do business in China must censor their own results if they are to succeed. But the real power for a totalitarian government is no longer just censorship. It is to allow its citizens to search for anything they want - and then remember it.
No western government, so far as we know, has gone that far. But if one ever does, it will know where the information is kept that will tell it almost everything about almost everyone. This morning, as I logged in to Googletalk, to chat with my sister, the programme silently upgraded itself. "Would you like to show friends what music you're playing now?" it asked.
Every time you use an internet search engine, your inquiry is stored in a huge database. Would you like such personal information to become public knowledge? Yet for thousands of AOL customers, that nightmare has just become a reality. Andrew Brown reports on an incident that has exposed how much we divulge to Google & co
Guardian August 28, 2006
In March this year, a man with a passion for Portuguese football, living in a city in Florida, was drinking heavily because his wife was having an affair. He typed his troubles into the search window of his computer. "My wife doesnt love animore," he told the machine. He searched for "Stop your divorce" and "I want revenge to my wife" before turning to self-examination with "alchool withdrawl", "alchool withdrawl sintoms" (at 10 in the morning) and "disfunctional erection". On April 1 he was looking for a local medium who could "predict my futur".
But what could a psychic guess about him compared with what the world now knows? This story is one of hundreds, perhaps tens of thousands, revealed this month when AOL published the details of 23m searches made by 650,000 of its customers during a three-month period earlier in the year. The searches were actually carried out by Google - from which AOL buys in its search functions.
The gigantic database detailing these customers' search inquiries was available on an AOL research site for just a few hours before the company realised that substituting numbers for users' names did not really protect their identities enough. The company apologised for its mistake - and removed the database from the internet. The researcher who published the material has been sacked, as has his manager, and last week AOL's chief technology officer, Maureen Govern, resigned. But those few hours online were enough for the raw data files to be copied all over the internet, and there are now four or five sites where anyone can search through them using specialised software.
What was published by AOL represents only a tiny fraction of the accumulated knowledge warehoused within Google's records - but it has given all of us, as users, a dramatic and unsettling glimpse of how much, and in what intimate detail, the big search engines know about us.
The number of searches Google carries out is a secret, but comScore, an independent firm, reckons that the search engine performed 2.7bn searches by American users alone in July this year. Yahoo, its main rival, conducted around 1.8bn American searches in the same month; Microsoft's MSN around 800m and AOL 366m.
All of this information is stored. Google identifies every computer that connects to it with an implant (known as a cookie) which will not expire until 2038. If you also use Gmail, Google knows your email address - and, of course, keeps all your email searchable. If you sign up to have Google ads on a website, then the company knows your bank account details and home address, as well as all your searches. If you have a blog on the free blogger service, Google owns that. The company also knows, of course, the routes you have looked up on Google maps. Yahoo operates a similar range of services.
All this knowledge has been handed over quite freely by us as users. It is the foundation of Google's fortune because it allows the company to target very precisely the advertising it sends in our direction. Other companies have equally ambitious plans: an application lodged on August 10 with the US Patent & Trademark Office showed that Amazon is hoping to patent ways of interrogating a database that would record not just what its 59 million customers have bought - which it already knows - or what they would like to buy (which, with their wish lists, they tell the world) but their income, sexual orientation, religion and ethnicity. The company, of course, already knows who we are and where we live.
Even though the search logs that AOL released were made anonymous, by assigning a number to each user, it is not difficult in many cases to discover somebody's name from their search queries. And it is easy to follow exactly what users were thinking as they sat at their computers, in the apparent privacy of their own homes, since the time and date of every search is given.
On April 4, for instance, user 14162375, the melancholy Portuguese-American in Florida, seems to have passed out on the keyboard at 6.20pm, when he asked, suddenly, "llllfkkgjnnvjjfokrb" then "vvvvbmkmjk" and "vvglhkitopppfoppr". An hour later he had recovered enough to search for variations on his wife's name - he thought she might have moved to New England. On the evening of April 16, matters came to a head. "My cheating wife," he typed; and then, five times, "I want to kill myself," and then "I want to make my wife suffer," followed quickly by "Kill my wifes mistress," "My wifes ass," "A cheating wife". Two days after that he was back looking for audio surveillance and bugging equipment and four weeks later he seemed to have cheered up and was looking for motorcycle insurance.
The story stops abruptly there, at the end of May, because that is when the three months' worth of released AOL search records came to an end.
One of the first researchers to demonstrate that we will tell anything, however intimate, to a computer, was Joseph Weizenbaum of MIT, who in 1966 wrote a programme called "Eliza" that parodied non-directional psychotherapy. If the user typed anything in, Eliza would appear to ask a question based on that cue. In no time at all, unhappy students were telling the computer all their troubles as if there were a real and sympathetic person behind the screen. Stories and jokes about this circulated for decades, but the men most successful at turning this concept into a fortune were the founders of Google, Larry Page and Sergei Brin. As users, we think that the Google search engine is a way of supplying us with information about what's on the web. But the flow of information is two way. We ask Google things that we would hesitate to ask anyone living. The price for the answers is that Google remembers it all.
Take user 11110859 of New York City, who fell in love and then was sorry. She was up early on March 7 to buy hip-hop clothes from G-Unit; by March 26, however, there was more excitement in her life. Searches on "losing your virginity" were followed by three weeks of frantic worry about whether she was pregnant: stuff she might have hesitated to tell her best friend or her mother is all quite clear from the Google searches. But by the end of April the pregnancy scare was over and had been replaced by a broken heart. Even before she had stopped asking "Can you still be pregnant even though your period came?" she was asking "Why do people hurt others" and this was the theme of almost all her questions throughout May, culminating on the afternoon of the 19th, when she asked "How to love someone who mistreated you?"; "What does Jesus say about loving your enemies?" "What does God mean when he says bless those who spitefully use you?" Then she spent a couple of days trying to buy Betty Boop postage stamps, and the next thing we know, she was asking first for directions to the New York prison on Rikers Island, then "What items are we allowed to bring at Rikers Island" and finally for "uncoated playing cards".
User 11110859 was not the only person interested in the prison but she seems to have been the youngest and, in some senses, the most innocent. User 3745417 laid out her thoughts in detail just as graphic: on March 6 she made eight searches on child molestation and similar phrases. A week later she was trying to find a prisoner in Rikers Island - nine searches in one evening - a subject she returned to at 9.30am on March 25, when she made another eight searches. Between March 27 and March 29 she made 34 successive searches for M&M chocolates in the early evening, followed on the 30th, at 10pm, by four searches for "Kid Party Games". By 10.15pm she was searching for "Whitney Houston"; then, in the course of the next hour, 29 searches on "black porn for women" and similar subjects.
By the end of April, she was looking for a legal aid lawyer in New York City, a swimsuit, a credit card and a holiday in the Bahamas.
These stories, with all the revealing information they contain, cannot always easily be tied to a specific individual, but sometimes they can. The social security number, with which all Americans are issued, conforms to a recognisable pattern which is easy to search for in the data that AOL released. So, too, are telephone numbers. On the internet, you can buy anything from anywhere, but there are some things, such as pet care, which people mainly buy locally, so it is easy to spot where they live. People often search for their own names, which can then be cross- referenced with the telephone book.
At least one person in the AOL group, a blameless grandmother in Alabama, was identified by the New York Times within days of the AOL data release. And though it may be hard to identify complete strangers, it is very much easier to recognise in the AOL data details of someone you may already know. A church lady in the midwest, whose quest for Christian quilted wall hangings was interspersed with inquiries about vibrators and arousing frigid wives, is probably easy for anyone in her congregation to identify.
This is knowledge beyond the dreams of any secret police in history. Earlier this year Google fought a lawsuit to keep a week's worth of random search data out of the hands of the US government, but other search companies have handed over their data without complaint and nobody has yet discovered what deals have been struck between search engines and the Chinese government. China is generally thought of as attempting to censor the internet, which it does; search engines that do business in China must censor their own results if they are to succeed. But the real power for a totalitarian government is no longer just censorship. It is to allow its citizens to search for anything they want - and then remember it.
No western government, so far as we know, has gone that far. But if one ever does, it will know where the information is kept that will tell it almost everything about almost everyone. This morning, as I logged in to Googletalk, to chat with my sister, the programme silently upgraded itself. "Would you like to show friends what music you're playing now?" it asked.