What is HPaste?
HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals:
What isn't HPaste?
You'll notice that HPaste has a lot of convenience classes for MapReduce jobs. This is to make it painless to use your domain objects and tables in the context of MR jobs. HPaste has no aspirations to replace tuple-based frameworks like Pig or Cascading (both of which we use for complex log parsing). HPaste is intended to hug the Hadoop MapReduce API very closely, building convenience functions where necessary, without abstracting too far away from the base concept.
The goal of HPaste's MapReduce support is to allow you to build rich functionality into your Table and Row objects, and make it be painless to have those tables and rows participate in MapReduce jobs. Oftentimes in HBase you have a combination of OLTP style operations (client gets data, client serves data), and OLAP style operations (pivot one table around a particular piece of data, and output to another table). That is where HPaste comes in handy, because there is often an impedance in Pig and/or Cascading between HBase-friendly binary data serialized objects and the tuple framework that makes those libraries so awesome to use for ad-hoc log-style data.
(It is a mini-goal of HPaste to integrate into Cascading's tuple framework.)
Where is HPaste?
HPaste is free and can be found on GitHubCheck out the Quickstart
HPaste is free and can be found on GitHub
This project uses Maven. To use HPaste in your own maven project, simply add it as a dependency:
HPaste was open sourced by Gravity.com in 2011 and is available from GitHub
Many contributions, both philosophical and code, by...
HPaste is available for free and released by Gravity.com under the Apache 2.0 license. See the LICENSE file for all the details.
This project is currently actively developed and maintained. It is used in a large production codebase in high-throughput, memory-intensive scenarios, and has many months of bug fixes under its belt. Because it already has a great deal of code utilizing it, there will not be many breaking changes to the API. Instead what we usually do is provide an upgraded API that sits next to the old API, then deprecate the old one.