Sameer Siruguri

My Blog

The Unsolved Mysteries Of The World

I was inspired by this blog post by Keir Clarke 1)who incidentally has more map visualizations of Google Auto Suggest behavior on how Google Auto Suggest was completing the query Why Is X So… for various state names (in the United States), to wonder what equivalent wisdom of the crowds Google had gathered for other placenames associated with where people live.

I thought it’d be a simple matter of feeding a list of city or country names to a Google API, and realized that there isn’t one – turns out that unlike Bing, Google isn’t so happy for the common consumer to share in all its indexy goodness. Well, no matter, I figured I wasn’t trying to build a commercial-grade application, so I could do just as well by scraping the contents of a browser, via Selenium and Ruby.

The Conclusions

I scraped the top 50 city names in the US, ordered by population 2)This is where I got the city name list: http://www.infoplease.com/ipa/A0763098.html. Here’s what people thought of the top 20 cities (well, 21 – I had to throw Seattle in, where loud overtook rainy, possibly because this was the year of the Seahawks’ miracle Superbowl run):

[simple_table]

why is New York so expensive
why is Los Angeles so popular
why is Chicago so windy
why is Houston so big
why is Philadelphia so ghetto
why is Phoenix so big
why is San Antonio so hot
why is San Diego so expensive
why is Dallas so boring
why is San Jose so boring
why is Austin so liberal
why is Jacksonville so bad
why is Indianapolis so ghetto
why is San Francisco so expensive
why is Columbus so gay
why is Charlotte so boring
why is Detroit so dangerous
why is El Paso so safe
why is Memphis so bad
why is Boston so great
why is Seattle so loud

[/simple_table]

Nothing too surprising – the West Coast cities are expensive, Detroit is dangerous while El Paso is safe 3)Here’s more about El Paso’s own little miracle. and Chicago is windy. These are the stereotypes the crowd knows and wants to clarify (with the exception of the Seattle 12th man thing.) That’s what Clarke states’ data tells us too, and it follows from the construction of the question – “I heard X is very Y, why is it so Y?”

I took this a step further and gathered all four auto-suggestions that Google will generate in the browser (Firefox), and aggregated them by frequency. The number of “why bad” and “why great” caught my attention, and I wanted to see what’s more on the crowd’s collective mind:

[simple_table]

19 | boring
14 | ghetto
12 | bad
10 | hot
7 | windy
7 | great
7 | expensive
7 | dangerous
7 | cheap
7 | big
6 | cold

[/simple_table]

Boring, and ghetto: Take that, America! That’s what you look like, to inquiring minds on Google. More cities are bad than are great. About as many are as expensive as are cheap. Global warming has 10 cities in its grip. And perhaps not surprisingly, none are conservative.

The Data

Want all the initial scrapes? Go right ahead and play around with them – let me know if you find anything more interesting. I haven’t cleaned it up – for example, when I searched for Mesa, AZ, Google felt I really should be looking for information about MRSA.

The Code

Want to generate your own scrapes? Here’s the very, very simple Ruby script that does this. Fork and pull with abandon!

References   [ + ]

1. who incidentally has more map visualizations of Google Auto Suggest behavior
2. This is where I got the city name list: http://www.infoplease.com/ipa/A0763098.html
3. Here’s more about El Paso’s own little miracle.

Single Post Navigation

Leave a Reply

Your email address will not be published. Required fields are marked *