Twitter has opened source its log service DistributedLog (DL) on Github under the Apache 2 license. DL is a high-performance log replication service that provides persistence, replication, and consistency capabilities that are critical to building reliable distributed systems such as replicated state-machines, general-purpose publish/subscribe systems, distributed databases, and distributed queues.

The DistributedLog classifies the sequences of records that are maintained and calls them logs (also called Log Stream). The process that writes the records to the DL Log is called Writer. The process that reads and processes records from the Log is called Reader. Thus, its overall software stack looks like this:

Specifically, it contains the following components:

Log

Log is an ordered, immutable Log record. Its data structure is shown as follows:

logging

Each log record is a sequence of bytes. Log records are written to the log stream sequentially, and a unique Sequence Number called DLSN (DistributedLog Sequence Number) is assigned. In addition to the DLSN, applications can also set their own serial numbers when building logging, which the application defines as transactionIds (TXIds). Either the DLSN or the TransactionID can be used to locate readers to start reading from a particular log record.

Log Segments

The Log is decomposed into Log segments, each of which contains a subset of its records. Log fragmentation is distributed and should be placed in Log fragmentation storage (such as BookKeeper). DistributedLog polls each Log segment based on a configured policy, either for a configurable time period (such as every two hours) or for a maximum configurable size (such as every 128MB). Therefore, Log data is divided into Log segments of the same size and evenly distributed to the Log segment storage nodes. In this way, Log storage is not limited to a single server, and the read traffic can be spread across the cluster.

Log data can be stored forever until the application explicitly truncates it, or it can be stored for a configurable period of time. Explicit truncation can be useful for building replicated state machines, such as distributed databases. They tend to have tight control over when data can be truncated. Time-based retention logs are more useful for scenarios that are analyzed in real time and only care about data within a certain period of time.

The namespace

Log flows belonging to the same organization are often grouped under the same namespace and managed accordingly. The DL namespace is basically used to locate where the Log stream is. An application can create and delete streams in a namespace, or truncate a stream to a given sequence number (DLSN or TransactionID).

Writer

Writers write data to the Log of their choice. All records are added to the Log in order. The serial number is the responsibility of the Writer, which means that there can only be one active Writer for a Log at a given point in time. When two writers attempt to write to the same Log due to a Network partition, the DL ensures the correctness of the Log partition through the fencing barrier.

Writer is provided and managed by a service layer named Write Proxy, which accepts fan-in writes from a large number of clients.

Reader

Readers will read records from the Log of their choice, starting at a given location. This given location can be a DLSN or a TransactionID. Readers will read the records in exactly Log order. Different readers can read records from different starting positions in the same Log.

Unlike other subscription/publishing systems, DistributedLog does not record/manage the location of readers. It leaves the tracking to the application itself, because different applications may have different requirements for tracking and coordinating locations that are difficult to address all in one way. At the application level, it is easy to track the location of readers with a variety of stores, such as ZooKeeper, FileSystem, or Key/Value stores.

Log records can be cached in a service layer called Read Proxy to handle a large number of readers.

The Fan in and Fan – out

The core of DistributedLog supports the single-writer, multiple-reader semantics. The service layer is built on top of DistributedLog Core and supports a large number of writers and readers. The service layer contains Write Proxy and Read Proxy, and Write Proxy manages the Log Writer and is able to recover them in the event of machine downtime. It aggregates writers from many sources, allowing you to not care about Log ownership (aka fan-in). Read Proxy optimizes the Read path of readers by placing records in the cache to handle situations where hundreds or thousands of readers Read the same Log stream.

The advantages of DistributedLog as a logging service can be summarized as follows:

  • High performance: In the face of large numbers of concurrent logs, the DL on a persistent Writer can provide millisecond latency while coping with a large number of reads and writes per second from thousands of clients.
  • Persistence and consistency: Messages are persisted to disk and stored as multiple copies to avoid loss. Strict sequence is used to ensure consistency between Writer and Reader.
  • Various workloads: DL supports a variety of workloads, including delay-sensitive online transaction processing (OLTP) applications (such as WAL for distributed databases and memory-based replication state machines), real-time stream extraction and computation, and analytical processing.
  • Multi-tenant: For real workloads, the DL is designed to be I/O isolated to support multi-tenant large-scale logging.
  • Layered architecture: DL has a modern layered design that separates the stateful storage layer from the stateless service provider layer, enabling storage expansion independent of CPU and memory, thus supporting large-scale writes to fan-in and reads to Fan-out.

 

From: http://www.infoq.com/cn/news/2016/05/Twitter-Github-DistributedLog