Blogs » Developer Blog » Developer Blog » Wrangling Data with Open Source

This is the first in a new series of posts where we’ll be regularly updating you on the cool stuff our engineers are busy working on. Feel free to ask questions in the comments section! And if you’re interested in developer positions we have available, check out our jobs page.

2011 – The Year of Data Growth (2.5 Billion Data Points a Week!)

Our engineering team has been working around the clock to manage and monitor increasing amounts of data as our userbase continues to grow past 15 million. We’re lucky to have a stellar team of HBase specialists – including Michael Stack, one of the creators of HBase (an Apache project), and Jean-Daniel “J-D” Cryans, database engineer and HBase committer – that helps StumbleUpon manage 2.5 billion data points every week. (Since there are only eight HBase committers in existence, Michael and J-D take their jobs very seriously!) We couldn’t manage our data without open source projects like HBase and Hadoop – they’re critical to our ability to build a fast and reliable user experience.

OpenTSDB: Debugging Downtimes

OpenTSDB is a program that utilizes HBase and was developed entirely in-house by our own Benoît Sigoure, Site Reliability Engineer. It helps us monitor second-by-second changes in our traffic and connectivity and detect spam and malicious activity as quickly as possible, all of which have been crucial as we expand to support more stumbling activity. Since Benoît first released OpenTSDB last September, several major companies have begun using it for their own data monitoring needs, finding it easy to throw any kind of data computation problem at it without issue.

OpenTSDB helps StumbleUpon’s operations team pinpoint the exact second when an outage occurs, helping them begin to understand why it happened. Represented here in red are the number of stumbles occurring at the moment when a recent outage began, and in green are the number of data requests that got backlogged in those seconds.

 

The real-time, extremely granular data that OpenTSDB provides helps reduce the Mean Time To Resolution (MTTR) of outages. The program has brought our diagnosis time from hours to minutes, simply by giving us more visibility into second-by-second glitches. It’s even revealed problems we hadn’t noticed before or that were lying dormant until traffic spiked.

For example, recently we had a configuration problem on some of our databases that was causing a small fraction of our stumbles to be served extremely slowly (over 3 seconds, which is unacceptably slow for anyone waiting to see the best of the web). OpenTSDB helped us spot and resolve the issue because it doesn’t overlook even the smallest fraction of activity.

ElasticSearch: Making Data Easy to Find

StumbleUpon is using ElasticSearch, an open-source, scalable data search solution, to power some upcoming new features. Here’s one example of why we love it: say you wanted to retrieve web pages that you’d stored in HBase. To do this with HBase, you’d have to assign unique IDs to each web page and remember these IDs in order to retrieve them later. ElasticSearch can take a web page you’ve submitted and index every word on this page. All you need to do if you need the web page again is remember a keyword present on the page, and the technology will find what you need.

ElasticSearch also enables our engineering teams to work cross-functionally. Even engineers not trained on searching data sets like HBase can run functions using ElasticSearch technology. For example, our research team uses ElasticSearch when they conduct natural language processing tests to improve recommendation methods. Plus, it’s fast and easy to implement: I was able to integrate ElasticSearch in our analytics dashboard after just an hour.

Thanks for reading! If you want to read the more technical details of what’s on our engineers’ plates these days, head to our Developer’s Blog.

/Josh Eichorn profile picture