preface

I first got to know Raft in May of ’19 when I was looking for distributed and related things on the Internet. At that time, I downloaded the code of MIT6.824 from Github and followed the detailed explanation of the course to lab2B, and then gave up, and then began to live a muddleled life. Recently, I felt that I could not go down in this way, so THIS article came into being. The code address

Raft definition

Raft is a distributed, multi-copy synchronization algorithm that addresses distributed consensus problems. The most famous algorithm in academia is Paxos, but only in academia because it’s so hard to understand, while the most famous one in engineering is Raft, which is known for being easy to understand. It may seem very simple to understand, but it’s actually a little hard to pull off. Raft is simple because it takes problems apart and analyzes them. Raft breaks down distributed consensus issues into: elections, log replication, security, and member changes.

Raft election

There are only three states per node in Raft:

  • leader
  • follower
  • candidateDefault when the node is enabledfollowerState. Here is the transition diagram between the three states:

reference

There are a number of projects in the open source world that implement Raft algorithms, the best known of which is Etcd, so I’m going to take the architecture and key design from Etcd, if you will.

Select the language

I am originally from Java language, so I should choose Java language, but MIT6.824 and Etcd both use GO language, and TiDB, the open source distributed database, also use GO, so I choose GO. Also go has natural concurrency and is easy to learn, there is GC which is not good, but I believe I can solve it. Of course, I also considered implementing in Rust, but the language was really hard and I was afraid of affecting my coding experience without a choice. In the end, I chose to use Go.

The project structure

cli
config
member
raft
raft
transport
types

implementation

Config File & Config Structure (RaftConfig)

Figure 2 shows the simple configuration file for MyRAFT. LocalAddr represents the local address (IP :port) and clusterAddr represents the cluster address (multiple IP :port) separated by English. The RaftConfig structure has only two properties LocalAddr and ClusterAddr, both of type String. The code:

type RaftConfig struct {
	LocalAddr   string / / like '127.0.0.1:6379'
	ClusterAddr string / / the cluster address 127.0.0.1:6379127.00 0.1:6379
}
// Generate the RaftConfig instance from the config file path
func NewConfig(configPath string) (rf *RaftConfig, e error) {
	rf = &RaftConfig{}
        // Parsing the configuration file returns a map
	configKv := InitConfig(configPath)
	rf.LocalAddr = configKv["localAddr"]
	rf.ClusterAddr = configKv["clusterAddr"]
	return rf, nil
}
Copy the code

Members of the management

The Raft Cluster contains multiple members, so I’ve abstracted two instances: Member and Cluster. One represents a member of the cluster and the other represents the cluster. The Member structure also has two attributes: ID and peerAddr. PeerAddr is the address of each raft instance in the above configuration file, and ID is the unique identifier of each raft instance. The ID is of type uint64.

type ID uint64
//addr is in IP :port format
func GenerateID(addr string) ID {
	var b []byte
	b = append(b, []byte(addr)...)
	hash := sha1.Sum(b)  // the sha1 algorithm hash table gets the hash value

	return ID(binary.BigEndian.Uint64(hash[:8]))  // Round up the last 8 bytes of the hash value
}
Copy the code

For clusters, I define an interface: Cluster

type Cluster interface {
	// ID returns the cluster ID  
	ID() types.ID 
	// Members returns a slice of members sorted by their ID
	Members() []*Member
	// Member retrieves a particular member based on ID, or nil if the
	// member does not exist in the cluster
	Member(id types.ID) *Member
}
Copy the code

RaftConfig structure is defined to implement the Cluster interface:

type RaftCluster struct {
	localID types.ID             // Uniquely identifies the current node
	cid     types.ID             // Unique identifier of the current cluster
	members map[types.ID]*Member // Raft cluster member
}
Copy the code

Create RaftCluster instance and Member instance:

/ / clusterAddr like 127.0.0.1:9009127.00 0.1:9010127.00 0.1:9011, localAddr like 127.0.0.1:9011
func NewRaftCluster(localAddr string, clusterAddr string) *RaftCluster {
        // Create a RaftCluster instance
	rc := &RaftCluster{
		members: make(map[types.ID]*Member),
	}
        // Generate the current RAFT ID according to localAddr
	rc.localID = types.GenerateID(localAddr)
        // Loop to create a Member instance and place it in the members property
	clusterAddrArrs := strings.Split(clusterAddr, ",")
	for _, peerAddr := range clusterAddrArrs {
		m := NewMember(peerAddr)
		rc.members[m.ID] = m
	}
        // The cluster id is generated using the cluster address
	rc.cid = types.GenerateID(clusterAddr)
	return rc
}
Copy the code

Transfer implementation between Raft nodes

According to Raft, there are two main TYPES of RPC: one is election request, and the other is AppendEntries: the Leader node synchronizes with the follower node. The leader node broadcasts the heartbeat to the followers using AppendEntries RPC but the log is empty. My transfer implementation borrows from Etcd’s transfer implementation. Let’s start with the RaftMessage structure, which looks like this:

type RaftMessage struct {
	From     uint64 // From where
	To       uint64 // To whom
	Type     MessageType
	Success  bool // Check whether it succeeds
	Term     uint64
	LogIndex uint64
	Entries  []Entry
}
Copy the code

I’ve abstracted all Rpc request responses into a RaftMessage structure,

  • FromToIt represents who sent it, and then sent it to whom, and their values are usedMemberIn the classIDLoading.
  • Type Indicates the message typeMessageTypeThere are many Raft algorithmsrpcCalls, such as vote requests, heartbeat requests, and so on.
  • SuccessIndicates success or not
  • TermFor the Leader’s tenure number for each round, almost every request/response must carry its own tenure number
  • LogIndexIndex of log entries. Different RPC transfers represent different values. Eg: In the vote request, it represents the last log index of the candidate, while in the add log request, it represents the log entry index before the new log entry.
  • EntriesRepresents the log to be added, and the reason this is an array is for performance