Kiji Project

Modular, open-source framework for collecting, analyzing, and serving entity data in real time.  Apache 2.0 Licensed.

Download Bento Box

View on GitHub

  • SF Kiji User Group


hadoop         hbase_logo          Avro          scalding        279px-Cassandra_logo.svg

About the Kiji Project

What is Kiji?

The Kiji Project is an Apache 2.0 Licensed, modular open-source framework, built on Apache Hadoop and HBase, that allows you to collect, analyze and serve data in real time. Kiji allows for batch model training and real-time model scoring, ensuring the user experience is adapted with each interaction. With Kiji, developers can create a flexible and comprehensive customer-centric schema that enables a 360 degree view of each customer. Data is stored in a rich, compressed, binary Avro format allowing the application to support complex data types. Kiji handles all aspects of serialization and deserialization while maintaining schema metadata, ensuring backwards compatibility as an application’s schema evolves and captures real-time application interactions.

Why Kiji?

Most organizations are collecting and storing data in a distributed file systems, such as HDFS, and key-value stores, such as HBase and Cassandra, to better serve individual customers. However, these tools are raw, difficult to use and have no underlying framework to integrate with customer-facing applications. Kiji is the middleware necessary to ingest detail data, stream real-time data, build predictive models and deploy those models on the fly.

Kiji’s unified interface makes it easy to build Big Data Applications. Example use cases include recommendation systems, fraud detection, micro-segmentation, anomaly detection and more.

Kiji Components

The Kiji Project is modularized into separate components to support a wide range of usage and encourage clean separation of functionality. The Bento Box contains all Kiji modules assembled in a self-contained download. Each module can also be individually downloaded on GitHub.

Kiji Bento Box

  • KijiSchema: simplifies real-time storage and retrieval of diverse data from primitive types to objects, time-series and event streams. KijiSchema handles challenges with serialization, schema design and evolution, and meta data management common in NoSQL storage solutions.

    KijiSchema DDL Shell: provides a Data Definition Language that allows for the creation, inspection, and modification of schemas for KijiSchema.

  • KijiMapReduce: provides a powerful paradigm to apply MapReduce in both batch and real-time workloads. KijiMapReduce introduces producers to perform record-wise analytics and gatherers, which build predictive models by analyzing aggregate behaviors. There is a library of helpful examples and useful implementations of MapReduce jobs.

    Kiji Hive: provides HiveQL access to Kiji data through a familiar SQL shell.

    KijiExpress: provides a Scala interface for analyzing Kiji data via  Scalding.

  • Kiji Model Repository: is a library of machine learning tools built on top of KijiExpress.

    KijiScoring: provides the real-time scoring of predictive models within your application.

  • KijiREST: provides an HTTP REST API for front-end developers to access Kiji data and to trigger model scoring.

Get Involved in the Kiji Community

Download the Kiji Bento Box