Gravity Labs

HPaste

HPaste unlocks the rich functionality of HBase for a Scala audience.

Overview

What is HPaste?

HPaste unlocks the rich functionality of HBase for a Scala audience. In so doing, it attempts to achieve the following goals:

  • Provide a strong, clear syntax for querying and filtration
  • Perform as fast as possible while maintaining idiomatic Scala client code -- the abstractions should not show up in a profiler!
  • Re-articulate HBase's data structures rather than force it into an ORM-style atmosphere.
  • A rich set of base classes for writing MapReduce jobs in hadoop against HBase tables.
  • Provide a maximum amount of code re-use between general Hbase client usage, and operation from within a MapReduce job.
  • Use Scala's type system to its advantage--the compiler should verify the integrity of the schema.
  • Be a verbose DSL--minimize boilerplate code, but be human readable!

What isn't HPaste?

You'll notice that HPaste has a lot of convenience classes for MapReduce jobs. This is to make it painless to use your domain objects and tables in the context of MR jobs. HPaste has no aspirations to replace tuple-based frameworks like Pig or Cascading (both of which we use for complex log parsing). HPaste is intended to hug the Hadoop MapReduce API very closely, building convenience functions where necessary, without abstracting too far away from the base concept.

The goal of HPaste's MapReduce support is to allow you to build rich functionality into your Table and Row objects, and make it be painless to have those tables and rows participate in MapReduce jobs. Oftentimes in HBase you have a combination of OLTP style operations (client gets data, client serves data), and OLAP style operations (pivot one table around a particular piece of data, and output to another table). That is where HPaste comes in handy, because there is often an impedance in Pig and/or Cascading between HBase-friendly binary data serialized objects and the tuple framework that makes those libraries so awesome to use for ad-hoc log-style data.

(It is a mini-goal of HPaste to integrate into Cascading's tuple framework.)

Where is HPaste?

HPaste is free and can be found on GitHub

Check out the Quickstart

Get HPaste

HPaste is free and can be found on GitHub

Installation

This project uses Maven. To use HPaste in your own maven project, simply add it as a dependency:

Description

HPaste was open sourced by Gravity.com in 2011 and is available from GitHub

Developer:

Contributors:

Many contributions, both philosophical and code, by...

Licensing

HPaste is available for free and released by Gravity.com under the Apache 2.0 license. See the LICENSE file for all the details.

Project Status

This project is currently actively developed and maintained. It is used in a large production codebase in high-throughput, memory-intensive scenarios, and has many months of bug fixes under its belt. Because it already has a great deal of code utilizing it, there will not be many breaking changes to the API. Instead what we usually do is provide an upgraded API that sits next to the old API, then deprecate the old one.