Start building your big data application today!

Kiji is a framework for building big data applications that provides a Java API, command-line tools and DSLs (domain-specific languages) for developers. Built on top of HBase, Kiji scales linearly to support the largest workloads, but also provides simple interfaces and easy to manage environment to build the best big data applications. With Kiji, you can:

Start developing with HBase in under 15 min

Using our Bento Box distribution, getting your development environment running is as simple as:

bento start

This command starts up a local HBase server which can host a Kiji application. All of this is self-contained in a single directory, so you don’t need to go through a complicated system installation process. Bento also includes all of the Kiji client tools. You can download the latest Bento Box here.

Already have a hadoop cluster with HBase? You can download the Kiji client binaries and add Kiji dependencies to your Maven project. Source code is also available through github.

Easily import your existing data for online use

Kiji’s Bulk Importer command line tool allows you to easily import CSV & JSON data into Kiji with simple configuration files.

This snippet, from our Music Tutorial, maps data from a json file (containing user_id and song_id fields) into a Kiji table:

{
  name : "users",
  families : [ {
    name : "info",
    columns : [ {
      name : "track_plays",
      source : "song_id"
    } ]
  } ],
  entityIdSource : "user_id",
  overrideTimestampSource : "play_time",
  version : "import-1.0"
}

This data is then available for use in your Java application or for processing with MapReduce using Kiji’s Java API.

Learn more about Bulk Importing in our Music Tutorial and our user guide. You can also author your own importers using our bulk importer API in KijiMR.

Power your Java application with scalable, real-time storage

Java developers will love Kiji for the rich API it provides to access data from HBase.

Applications can request fine-grained subsets of data through high-level data request objects:

// Specify the row and column data to read.
// The column names are specified as constants in the Fields.java class.
final EntityId entityId = table.getEntityId(mFirst + "," + mLast);
final KijiDataRequestBuilder reqBuilder = KijiDataRequest.builder();
reqBuilder.newColumnsDef()
     .add(Fields.INFO_FAMILY, Fields.FIRST_NAME)
     .add(Fields.INFO_FAMILY, Fields.LAST_NAME)
     .add(Fields.INFO_FAMILY, Fields.EMAIL)
     .add(Fields.INFO_FAMILY, Fields.TELEPHONE)
     .add(Fields.INFO_FAMILY, Fields.ADDRESS);
final KijiDataRequest dataRequest = reqBuilder.build();
final KijiRowData rowData = reader.get(entityId, dataRequest);

Individual cells can then be accessed directly:

rowData.getMostRecentValue(Fields.INFO_FAMILY, Fields.FIRST_NAME)

You can learn more about building an application using the Java API in the Phonebook Tutorial.

Construct complex, evolvable data schemas

Kiji Schema Shell provides a familiar DDL (Data Definition Language) for defining column names, field types, and storage parameters for storing data in HBase using Kiji. Our simple, declarative language makes it easy to quickly create a schema that ensures you can find data when you need it.

Here’s an example:

CREATE TABLE users 
    WITH DESCRIPTION 'A table for user names and email addresses'
    ROW KEY FORMAT HASHED
    WITH LOCALITY GROUP default WITH DESCRIPTION 'main storage' (
      MAXVERSIONS = INFINITY,
      TTL = FOREVER,
      INMEMORY = false,
      COMPRESSED WITH GZIP,
      FAMILY info WITH DESCRIPTION 'basic information' (
        name "string" WITH DESCRIPTION 'the user\'s name',
        email "string",
        { "type" : "record", "name" : "foo", "fields" : [
          { "name" : "x",  "type" : "int" },
          { "name" : "y",  "type" : "string" }
      ]}));

Don’t worry if you’re not sure about what your final schema will look like: Kiji supports updating your schema without downtime and even using versioned schemas from different applications. Kiji encourages using an non-relational, entity-centric data model (similar to object databases), which you can learn more about here.

Learn more about using the Kiji Schema DDL for table definition here.

Process web-scale data with MapReduce

KijiMR provides a Java API for performing MapReduce operations over Kiji tables. Kiji rows can be processed individually, using a gather method:

public void gather(KijiRowData row, GathererContext context)
      throws IOException {
    NavigableMap trackPlays = row.getValues("info", "track_plays");
    for (CharSequence trackId : trackPlays.values()) {
      context.write(new Text(trackId.toString()), ONE);
    }
  }

KijiMR provides stock reducers for standard operations, which can be used with the your gatherers. You can learn more about using MapReduce with Kiji by running through the Music tutorial, or you can read our user guide.

In addition to these capabilities, the Kiji framework provides many more tools for managing data in HBase, including running MapReduce jobs, authoring stored procedures, and training predictive models. Take a look at our tutorials to learn more.