It seems like whenever the Dodgers play the Giants at AT&T Park – just a block from StumbleUpon’s headquarters in San Francisco – our employees from both ends of the state get in heated discussions over which city is hipper: Los Angeles or San Francisco. As a data-centric company with access to terabytes of data in our Hadoop cluster, I thought it would be interesting to do some mining and let the numbers speak for itself. Leveraging the rating behavior from our users in SF and LA across all of our over 500 interests, we are able to actually highlight the differences in the cultures of the two cities.

You can draw your own conclusions here. I’ll stay out of the debate, especially since I’m partial to my hometown of Oakland, California. :P

If you’re a data nerd and want to hear the details on how this data was mined, crunched and analyzed, read on (and if you’re an engineer or Data Scientist, you should consider applying for our team):

The topic of each page in StumbleUpon’s index is crowdsourced by our users. We have millions of pages within each Interest, added by our users, that we recommend to other Stumblers.

First, I took all of our users who live either in Los Angeles or San Francisco. Then, for each one of our topics, I found all the pages thumbed up by these users. Next, I mined through the title text of these pages to construct n-gram frequencies of each appearing word – this was my proxy method for determining what thumbed-up pages for LA and SFers in given topic are actually about.

To determine which words were most popular in a given city, relative to their popularity in the other city, I borrowed a technique used in Information Theory and calculated the relative entropy of each word I had an n-gram for and ranked them. The top ranked words are the ones you see in the infographic.

I love how at StumbleUpon we are able to toss speculation to the wind and source the collective intelligence of our users to see what they really, truly like. Data is power!

Thanks for reading this far. Enjoy the infographic and have fun discussing it!

