Gravity processes more than 100 million tweets and status updates, tens of thousands of articles from the Wordpress firehose, thousands of RSS feeds, and hundreds of millions of user actions each day. To keep pace with these large, high-volume data feeds, Gravity has designed a fault-tolerant ingestion pipeline for handling, normalizing, sanitizing, and routing incoming data for real-time behavioral and content processing.
Gravity processes many streams of user actions in real-time, including user clicks, reads, shares, tweets, updates, comments, and likes as well as custom actions though partner sites and apps. Different user actions dictate different computational models for determining user interests. Each user action is associated with a particular, piece or set of content, which is simultaneously processed by our semantic and virality engines.
Gravity processes and analyzes every piece of content that we ingest. Incoming links are fully resolved to unique URL, which are indexed by our web crawlers in real-time. Gravity's intelligent web crawlers are designed to identify and extract the content a user intends to consume upon arriving to a webpage and disregard or devalue peripheral information. All new content, whether ingested directly or the result of a crawl, is sanitized and structured prior to semantic analysis. For existing content, time series data is recorded and, on occasion, content is re-indexed.
Gravity semantically analyzes content to accurately determine the important topics of interest of all content we ingest in real-time. Our semantic engine is meticulously tuned to handle both highly conversational, 140-character tweets and long-form, formal documents through the same pipeline. Unlike other solutions, which focus on entity extraction, our semantic engine aims to derive the underlying intent of the consumer of a piece of content. Gravity's semantic engine leverages our proprietary interest ontology as well as proprietary natural language processing algorithms. Our interest ontology layers multiple open-source ontologies as well as a massive proprietary dataset of discovered properties and attributes and an ever-growing dataset of colloquial terms and phrases. Our ontology consists of billions of edges and is structured as a directed graph superimposed on an undirected graph, which enables novel interest-based traversal and convergence algorithms.
An Interest Graph is a graph representation of the connection between a person, piece of content, or website (each, an object), and all of its interests. Gravity creates, maintains, and utilizes Interest Graphs for users, pieces of content, and even for websites. An Interest Graph evolves in real-time as new events occur that impact object-interest connections. Each object-interest connection has a variety of attributes, including the current and historic strength of connection between object and interest. In practice, one or many Interest Graphs can be used to identify a set of related Interest Graphs, in the same way that one or more search terms can be used to identify search results containing the search terms. Each of Gravity's applications relies on a different set of graph queries to power its functionality.
Gravity Personalization delivers a rank-ordered set of content to each user based on his interests. We measure the semantic distance between a user's interest graph and all available content to provide a uniquely personalized experience. This approach avoids the cold start and scalability issues inherent in collaborative filtering and provides results more granular than taxonomy-based behavioral targeting technology. We also inject important content based on our virality analysis to avoid the information winnowing effect inherent in most personalization solutions.
Gravity Analytics provides website operators a window into the Interest Graphs of their websites, their users and the web as a whole. Through aggregation of users' Interest Graphs, Gravity Analytics surfaces current and historical interests of a site's audience. By comparing the aggregation of user Interest Graphs on a site with the aggregation of content Interest Graphs on a site, Gravity Analytics surfaces discrepancies between site-publishing behavior and user-consumption behavior. In addition, by aggregating content Interest Graphs across the social web, Gravity Analytics surfaces trending areas of interest and trending URLs across the entire web or constrained to specific high-level subject areas.
Gravity calculates Interest Graphs for each user, so that they can enjoy dynamic experiences tailored to their interests on sites using Gravity Personalization.
How it works: We calculate what's interesting on the web in real-time. As new content is published across the web or shared on the social web, Gravity crawls, indexes and semantically analyzes each web page, tweet, status update or post. Gravity organizes structured and unstructured content by interest, so that we can calculate trending topics or surface the most viral stories about any topic on demand. Finally, we deliver personalized recommendations to each user based on the topics they engage with most and what's exciting right now.