Chunkd

From Hail Cloud Computing Wiki
Revision as of 20:10, 5 November 2009 by Jgarzik (Talk | contribs)

Jump to: navigation, search

chunkd: single-node data storage service

Contents

Overview

chunkd is a low-level key/value storage service for anonymous data objects. The intended use is as a low-level piece of large-scale distributed data storage infrastructure, for Project Hail.

Keys are opaque binary data, from 1 byte to 1024 bytes in size.

Values ("the data") may be any size, be it 1 byte, 1 megabyte, 100 gigabytes, anything. Object size is bounded only by available disk storage on the local node.

Currently the service is TCP-based. But its message structure is intended to be portable, making Infiniband/RDMA, SCTP, and other protocols a future possibility.

From chunkd's simple GET/PUT/DELETE API, you can build cluster-wide redundant storage services, exporting data storage capability from any device supported by your OS, via a single, standard, easy to use, device-agnostic interface.

API

The network protocol consists of TCP-based messages. Each message has a fixed-size binary header, followed by optional data. There is no separate authentication step; each message is signed and checksummed using HMAC.

Operation Description Inputs Outputs
NOP Verify request/response path (including auth) user response code
GET Retrieve object data, metadata user, key, key length response code, SHA1 checksum, last-modified time, data length, data
GET metadata Retrieve object metadata user, key, key length response code, SHA1 checksum, last-modified time, data length
PUT Store object user, key, key length, data, data length response code, SHA1 checksum
DELETE Delete object user, key, key length response code
LIST Retrieve list of all objects user response code, XML document listing all known object metadata associated with the user

Given that the protocol is message-based, it should be easy to add support for non-TCP network protocols such as RDMA, SCTP or AMQP.

Design goals

A single chunkd instance is generally designed to be run on a "cloud node", generally a "1U data center server", which probably equates to a physical or virtualized instance of: single multi-core CPU, 2-4GB RAM, gige, single disk. That example lends itself to 1000's of such chunkd storage nodes.

But it is also quite valid to have much larger chunkd nodes, perhaps tied to larger servers, 10gige, multiple disks and/or SAN networks.

Current version 1.0 technical design goals include

  • Multiple worker threads, to enable I/O parallelism. I/O parallelism is necessary because it
    • is the only way to max out storage hardware command and completion queues
    • enables greater optimizations on a TCQ/NCQ-enabled storage device, compared to slower command-at-a-time solutions
  • No internal data caching. Leverage kernel pagecache.
  • Use POSIX filesystem API for our "database". Avoid sql, db4, sqlite, etc.
  • Register ourselves in a CLD cell, for discovery by higher level services.

Status

Early beta. Protocol implementation is feature complete, but server is not yet optimized for performance. Most notably, the server is still single-threaded.

Strong cryptographic authentication is built into chunkd's network protocol, but the server's implementation of authenication is essentially a placeholder, no-op implementation. It should be easy to plug in SASL, Kerberos, or PAM into the current, modular framework.

Resources

chunkd is loosely modelled on the chunk server described in the GoogleFS paper (PDF).

Download releases here.

Developers: browse the git repo, or check out from git://git.kernel.org/pub/scm/daemon/distsrv/chunkd.git

Open projects

A wealth of projects large and small awaits interested contributors. Programmers will need to learn git, participate on the hail-devel mailing list, check out the source code, build and set up the project. Contributions from non-programmers are welcome as well -- documentation, feedback, and in general using the software.

Here are some suggestions for projects:

  • document and debug chunkd network protocol
  • document setup and system administration procedures
  • write chunkd client libraries for Java, Python, C++, or other languages
Personal tools