Tabled

From Hail Cloud Computing Wiki
Jump to: navigation, search

tabled - distributed key/value lookup table service

Contents

Overview

A key storage service of Project Hail, tabled provides an infinitely scalable, lexicographically sorted key/value lookup table. Keys cannot exceed 1024 bytes; values can be any size, including several gigabytes or more.

tabled user interface is HTTP REST, and is intended to be compatible with existing Amazon S3 clients.

tabled uses CLD and Chunkd to provide cloud scalability and high availability.

Status

Beta. Data and metadata are successfully replicated, and recovery occurs after failure. Recovery is more time-consuming, and less immediate, than we would like.

Resources

Download releases here.

Developers: browse the git repo, or check out from git://git.kernel.org/pub/scm/daemon/distsrv/tabled.git

Open projects

A wealth of projects large and small awaits interested contributors. Programmers will need to learn git, participate on the hail-devel mailing list, check out the source code, build and set up the project. Contributions from non-programmers are welcome as well -- documentation, feedback, and in general using the software.

Here are some suggestions for projects:

  • document setup and system administration procedures
  • scaling and testing
  • working on the single-endpoint problem, whereby -- due to our use of Berkeley DB -- a single tabled server is always master, and therefore is the lone entity able to write to the database

S3 API compliance

The following notes detail what tabled lacks, for full S3 API compliance. Unless otherwise noted -- such as with the SOAP API -- we would like to support these missing elements of the S3 API. As is evident from the [short] list length, tabled's API support is quite complete and usable.

Outstanding API compliance issues (widely used 2006 S3 API):

  • Range HTTP header (partial object retrieval)
  • ACL support is limited. We support certain ACL grants (the canned access policies), but not the full suite.
  • Server access logging, and associated API

Outstanding API compliance issues (2010 S3 API updates):

  • POST HTTP method
  • Efficient object copying (x-amz-copy-*)
  • Object versioning

Areas where we do not intend API compliance:

  • We plan to add an APPEND operation, to atomically append data to an existing object. A byte offset is returned to the client upon successful completion.
  • We do not limit object size to five gigabytes.
  • We are open to supporting site-specific authentication such as Kerberos, in addition to the spec-dictated authentication scheme.

API compliance we might not implement, or may implement in a non-standard way:

  • Location constraints (US, EU, etc.)
  • Bucket payment configuration
  • x-amz-request-id, x-amz-id-2 HTTP headers. Presumably we want to invent our own transaction ids, as x-tabled-XXX.

Items we do not intend to implement:

  • SOAP API support. Forums seem to indicate the S3 SOAP API is only used by a tiny minority, compared to the well-known S3 REST API.
  • BitTorrent support.
Personal tools