Posted by & filed under KijiSchema, Releases.

A new version of KijiSchema (1.0.0-rc4) has been released!

This version of KijiSchema includes a number of powerful new features and functionality. In order to incorporate these new features we had to make some changes to the API. You will need to update your code to take advantage of this version. We apologize for the inconvenience, but believe that from a usability and maintainability perspective these changes represent a dramatically improved API moving forward. We do not anticipate making such radical changes in the future, and are moving rapidly toward a stable 1.0.0 release.

Major new features

We added several new features to KijiSchema in this version:

  • Support for composite row keys allows you to create row keys (entity IDs) from a tuple of string and numeric values, creating hierarchical data sets.
  • Builder-pattern constructors for many key classes (KijiURI, KijiDataRequest)
  • Improved paging capability through the KijiPager API
  • Stricter validation of layout version numbers allows you to specify particular data format compatibility levels
  • Automatic upgrade notification when a new BentoBox version is available
  • Bug fixes throughout

We made several API changes that will necessitate changes in your code:

  • KijiConfiguration has been removed – You must use KijiURI to specify what Kiji instance you connect to.
  • KijiDataRequest is immutable – You must use a KijiDataRequestBuilder (call KijiDataRequest.builder() to get one) or use KijiDataRequest.create())
  • KijiURIs are immutable – You must use a KijiURIBuilder (call KijiURI.newBuilder() to get one)
  • KijiAdmin has been removed and its methods have been added to the Kiji interface.
  • Some HBase-specific classes have been moved to org.kiji.schema.hbase.
  • Several classes were removed or made private that should not be in the public API

Composite Row Keys

The biggest new feature in KijiSchema 1.0.0-rc4 is support for composite row keys. EntityIds can now be composed of a tuple of string, int, and long values. By default, composite row keys are hash-prefixed (using the hash of the first element of the tuple). This allows you to specify hierarchical row keys like [ “entertainment”, “xbox360” ] or [ “CA”, 94110 ] and efficiently perform sequential scans over elements that have the same parent in the hierarchy.

Support for composite row keys is included at the JSON layout level; DDL support for this functionality will be provided in the next release.

The command-line tools have been updated to support the new entity ID forms:

  • --entity-id=hbase=hex:deadc0de as hexadecimal HBase keys
  • --entity-id=hbase=[utf8:]'hbase\x0akey\x00' as UTF-8 encoded HBase keys
  • --entity-id=[kiji=]'kiji-key' for raw, hashed or hash-prefixed Kiji keys
  • --entity-id="['component1', 2, 'comp3']" for Kiji formatted row keys.

Kiji URIs

All Kiji addresses have been updated to use the KijiURI interface. A KijiURI represents an HBase cluster (technically, its Zookeeper quorum) and a Kiji instance name. It can also optionally specify a table name. A Kiji URI may look something like kiji://zkhost:2181/default/sometable; this represents a table named “sometable” in the default Kiji instance on the HBase cluster whose Zookeeper quorum is the Zookeeper node at zkhost:2181.

Rather than specify a Zookeeper quorum explicitly, you can also use the Zookeeper nodes from hbase-site.xml by specifying a “quorum” of .env, like this: kiji://.env/default. This is the default behavior of the KijiURI builder, unless you explicitly override it with a different Zookeeper address.

You can create a KijiURI by calling KijiURI.newBuilder(). One overloaded form of the newBuilder() method will take a string to parse for convenient access. KijiURI objects themselves are now immutable.

Kiji URIs are used by all command-line tools. The kiji ls tool no longer uses a --table argument; instead, you specify a URI containing both a Kiji instance and a table name, like so: kiji ls --kiji=kiji://.env/default/sometable.

The KijiConfiguration class has been removed entirely. When you connect to a Kiji instance, you need to use a KijiURI, like so: Kiji.Factory.open(KijiURI.newBuilder().withInstanceName(“default”).build());

Updated KijiDataRequest API

Another major change in this version was a revamp of the KijiDataRequest API. To simplify concerns around shared state, the KijiDataRequest object has been made immutable. You can no longer create a KijiDataRequest and add columns to it, or modify properties of existing KijiDataRequest.Column objects; you must create KijiDataRequests through a KijiDataRequestBuilder. This builder-pattern object also improves validation of data requests and ensures that you are not submitting an ambiguous data request.

For example, to request the 3 most recent versions of cell data from a column “bar” from the family “foo” within the time range [123, 456]:

KijiDataRequestBuilder builder = KijiDataRequest.builder()
    .withTimeRange(123L, 456L);
builder.newColumnsDef().withMaxVersions(3).add("foo", "bar");
KijiDataRequest request = builder.build();

A KijiDataRequestBuilder.ColumnsDef object can be used to attach multiple columns with the same properties to the data request by calling add() multiple times.

An equivalent way to build the same data request is as follows:

KijiDataRequest dataRequest = KijiDataRequest.builder()
    .withTimeRange(123L, 456L)
    .addColumns(KijiDataRequestBuilder.ColumnsDef.create()
        .withMaxVersions(3).add("foo", "bar"))
    .build();

This form can be used to construct arbitrarily complex data requests in a single expression.

For convenience, you can also build KijiDataRequests for a single cell using the KijiDataRequest.create() method:

KijiDataRequest dataRequest = KijiDataRequest.create("info", "foo");

The KijiPager API

The paging API has been rebuilt in this version. The KijiColumnPager class has been removed in favor of a new API, KijiPager. A KijiPager can be used to iterate over multiple pages of multiple timestamped values each. This is useful if you have a row that may have several dozen (or several hundred) timestamped values for a single column and you can’t or don’t want to retrieve them all at once.

The KijiPager API has drastically simplified semantics; a KijiPager iterates over values in a single column. If a column has paging enabled (a nonzero page size in its KijiDataRequest), you can create a pager by calling KijiRowData.getPager(“family”, “qualifier”). A KijiPager implements the java.util.Iterator interface and functions as you would expect an iterator to.

Stricter layout version validation

The addition of composite row keys introduces a backwards-compatible change to the table layout format. The JSON layout format includes a version attribute that specifies the version of the layout syntax to use. The original version string that this allowed is kiji-1.0. This deprecated version string will be supported indefinitely, but we do not want to use the name “kiji” to identify the layout format going forward. This format will also be referred to as layout-1.0.

To use composite row keys, you must specify the new layout format version, like so:

”version” : “layout-1.1”

The version string was not previously checked. Starting with 1.0.0-rc4, only the layout-1.0 and layout-1.1 format version strings will be accepted. New layouts should use layout-1.0 or layout-1.1 and not the deprecated kiji-1.0 key. If you declare a requirement on layout-1.0, these semantics will be maintained in a compatible fashion moving forward. To use the latest features, specify layout-1.1. We will announce future layout version numbers as they are associated with additional features or new layout interpretation semantics. We are committed to maintaining backward compatibility with existing data format; all KijiSchema 1.x versions will support all previous layout formats.

Automatic Upgrade Notifications

Starting with BentoBox 1.0.0-rc4, we have included a background task that checks for updated versions of the SDK. When you run the bin/kiji script, you will be advised if a new version of the BentoBox is available. It will display such a message at most once a day.

Hidden APIs

Several components which were not intended to be part of the public API have been hidden (or had their constructors hidden) or removed entirely. You should not have depended on these APIs to begin with, and they are now impossible to access from client code:

  • KijiConfiguration (use KijiURI)
  • KijiAdmin (use Kiji)
  • KijiTableLayout’s constructor (use KijiTableLayout.newLayout())
  • KijiInstaller is no longer static (use KijiInstaller.get())
  • KijiURIException is no longer a checked exception
  • CounterManager
  • NumberParser
  • KijiDataBuffer
  • TableLayoutSerializer
  • OperatorRowFilter

In summary…

We believe this is a major upgrade in the functionality of KijiSchema and will support many new capabilities going forward. Check back for another release in a few weeks.

Ready to try Kiji? Go download the BentoBox today!

Leave a Reply

  • (will not be published)