I am pleased to announce the launch of Gravity Labs, an initial peek into our underlying interest graph infrastructure as well as a showcase of some of our Open Source projects.
For the last 2+ years we have been working on productionizing a web-scale system that leverages a wide range of unique disciplines, from natural language processing to large scale semantics and ontology development, to real time behavioral algorithms, all the way to a variety of different machine learning techniques.
We didn’t set out to build a complex system that encompasses so many different disciplines, we set out to personalize the web. Unfortunately, it didn’t take long for us to realize that the generally accepted collaborative filtering and behavioral targeting algorithms available today didn’t meet many of our core requirements. There are quite a few – but the primary requirements for a cloud based, web scale personalization engine are:
Real time capable:
- New user events occur in the tens, hundreds and even millions of times per second. A user’s personalized experience needs to update in real time as each event, or group of events occurs
- New content is created across the web at similar rates. Content needs to be available for recommendation to a user immediately upon its creation.
- Signal generated by a user on one site needs to be applicable to user’s recommendations on all other sites across the web. Just like a user is now able to interact with their social graph as they use the broader web, they need to be able to take their interest graph (or “personalization profile”) with them to every website they visit, and be able to both apply it and augment it anywhere.
- We are all unique. You can’t put everyone in a bucket. While neighborhood/bucketing based algorithms do work (and are one, albeit small, component of our infrastructure), generalizations about people’s actions are made in order to enable scalability at the cost of accuracy. A true personalization engine should absolutely minimize grouping users together as much as possible, and treat each individual as a unique entity with a unique set of interests
- The fears of the filter bubble are real, and existing personalization and contextual recommendation engines often drive users down a more and more narrow content discovery path. A successful personalization engine needs to have the capability to inject serendipity into a users experience at an individual level. Both the general, real time consensus of content that is important across the global web regardless of a user’s interest, and the semantic relationships between are very different, but highly connected interests needs to be taken into account.
It has been (and will continue to be) quite a challenge. It has required the minds of very different people with many different core skillsets.
And it took a long time. Candidly, one reason we have been so quiet about our development efforts is because we wanted to make sure we could get far enough ahead of everyone else :). We’ve popped in and out of the news with test/data acquisition products here and there, but the goal has always been a system that can accurately process all of the interest based signal data across the entire web, and leverage it to personalize every user’s internet experience.
We are proud to announce that the above system, or the “Gravity Interest Service” as we call it internally, officially went live at production scale 6 months ago.
Since then we have:
* Created over 400 million user interest graphs
* Served over 13 Million pieces of personalized content per day
* Personalized the daily internet experience of tens of millions of users per month
* Processed over 25 million inbound interest signals per day
And with our current growth rate we will be handling 10X all of these numbers in under 6 months.
It’s an exciting time for us here, so we have decided to give a (small) peek under the hood, as well as open source some of our non-core components. We leverage a significant amount of open source software for a good portion of our data storage and processing, and want to contribute what we can back to the community.
Thanks for your interest in Gravity. There is a lot more coming in the very near future, but our new Beta Labs Section should give you enough to play with until then.