The background

Remote Procedure Call (RPC).

It is difficult to define RPC. Some say RPC is the idea of calling functions on a remote server as if they were local. Others argue that RPC is a concept rather than a specific technical implementation or protocol. Both are valid, but neither fully describe RPC. Before we get into what RPC really is, let’s take a look at the background.

Evolution of the server service architecture

RPC became a mainstream technology term when we first saw it about five or six years ago, and the main reason it became a popular technology term was the prevailing microservices architecture. So let’s talk about server application architecture.

Currently, there are three dominant server application architectures: Monolith, MicroService, and ServerLess.

Here is a brief introduction to singleton and microservice architectures.

A monolithic architecture is simply understood as a package in which all code and functionality is tested and deployed. Runs in a single thread. Microservices architecture is an upgrade of a single architecture that runs in multiple threads.

Many people probably got their first taste of microservices architecture with Martin Fowler’s Microservices. So many people mistakenly think Martin Fowler is the inventor of microservices. But it wasn’t Martin Fowler. That article was written in 2014. Microservices was first unveiled by Fred George in Agile India in March 2012 with Micro (U)Services Architecture-Small, Short lived services rather than SOA.

Microservice architecture mainly solves three problems of single architecture.

  1. Deployment mode, single architecture modification of any function needs to be fully compiled, tested, and deployed, microservices do not need to.
  2. Scalability, microservices architecture has higher scalability and high enough technical freedom.
  3. Business decoupling, microservices have better decoupling. Load balancing in a single architecture tends to result in unbalanced load of modules and waste resources, while microservices do not.

With the development of the Internet, more and more applications, unable to withstand the massive requests from users, have turned to microservices architecture to obtain more convenient horizontal expansion.

However, while microservices architecture solves the problems of single architecture, it also creates a lot of problems of its own. These issues require the introduction of many additional concepts or technologies, such as DDD and DevOps, to solve them.

Because microservices require service splitting, each service may run in a different thread on a different server. Therefore, the implementation of micro services must first face the problem of inter-process communication.

The PREVAILING SOA architectures of the early years are somewhat similar to microservices architectures, but SOA is centralized and has scalability bottlenecks. So the current popular microservices advocate decentralization, remove the concept of ESB, direct point-to-point communication, and the technology to achieve point-to-point communication is RPC.

RPC and Distribution

In fact, RPC was born much earlier than microservices, and the earliest problem to be solved was distributed computing. It originated in the early 80’s and has a history of about 40 years. The Lupine framework based on the Cedar language developed by xerox PaloAlto research center was the world’s first RPC framework.

At that time, Xerox defined RPC as:

RPC is a language-level communication protocol that allows programs running on one computer to call another address space through a channel.

The fact that we’re talking about RPC now differs from the original definition, but the overall direction remains the same.

As microservice architecture is also a kind of distributed architecture, microservice architecture is a subset of distributed architecture.

In summary, RPC is the cornerstone of building distributed systems.

RPC process

The overall RPC process is shown in the figure below:

In the whole RPC process, there are two roles, namely the caller (Client) and the provider (Server).

When a service calls another service, it is the caller. If the service is also provided to other service invocations, it is also a provider. That is, a service in a complete system may be both a caller and a provider. However, a service can only play one role during an RPC.

In order to make the whole call process more comfortable, the implementation of RPC is generally very simple at the code level, just need to specify the provider’s IP, MethodName, Parameter can call the function of the remote service like calling the local function.

Because the two parties are on different threads and possibly even different machines, the calls are made over the network.

In the microservices architecture, each service is registered in a service registry when it is started. When an RPC is initiated, the real service provider is found using the IP corresponding to the IP list periodically obtained in the service registry, and the parameters are serialized. The serialized data is then binary encoded because the data transmitted over the network is always in binary form.

After receiving the network request, the provider performs the opposite operation, first decoding the binary, then deserializing the parameters, finally finding the corresponding method and invoking the method through the reflection mechanism. The return value of the calling method is serialized, binary-encoded, and transmitted back to the caller. The caller receives the network request and then retrieves the available returned data in the same way.

The working principle of

agreement

Different from TCP and UDP, RPC and HTTP are application-layer protocols.

Protocol definition

The data interaction between the caller and provider is in the form of data flow, resulting in a complete packet. The provider also needs to know where to read the parameters from the packet, which requires a clear definition of the protocol.

Private RPC protocol

There is a common HTTP protocol in the application layer. However, HTTP packets are large and contain a lot of useless information, newline characters and carriage returns, which are not suitable for RPC scenarios with high performance requirements. Therefore, we usually design a set of private protocols by ourselves.

Common practices are fixed-length binary protocol headers and extensible variable-length protocol bodies.

The protocol Header is a common part that has nothing to do with services and contains data such as the protocol ID, message ID, message type, protocol version, and serialization mode.

The protocol body (Payload) is an extensible part that contains service information, such as method names and parameter values.

serialization

Serialization refers to the transformation of objects in a program into transportable binary data.

When serializing, you need to consider both cross-language and language-specific situations.

cross-language

If multiple services are written in different languages, there are cross-language considerations when designing RPC.

For example, if A service is written in Java and B service is written in Golang, objects created using Java classes and Golang structs will not be compatible.

A complete RPC solution contains the RPC framework and data exchange format.

More common cross-language RPC schemes are gRPC, Thrift, and Avro.

GRPC uses the protobuf format, Thrift uses the Thrift IDL format, and Avro uses JSON format.

In the early days of RPC, the most popular data exchange formats were JSON and XML. However, XML has large volume and poor parsing performance. JSON has some disadvantages such as weak expression ability. Therefore, protobuf and Thrift IDL schemes have emerged, which give consideration to both volume and expression ability and solve the pain points of the former two.

Data interchange formats also have new schemes to improve performance, such as ProtoStar based on Protobuf.

GRPC is more popular in China at present.

A particular language

If multiple services are developed in the same programming language, a specific serialization framework can be used to solve this problem.

Examples include Java-only Hessian (Hessian is also implemented in other languages), JVMS only Finagle, Golang-only RPCX, and so on.

Such frameworks have two advantages.

  1. Higher performance.
  2. Lower development costs.

Service registration and discovery

In the simplest RPC, there are only two roles: caller and provider.

But in a mature RPC architecture, there is also a role called a Register.

Before introducing the registry, consider a question.

What is the problem if the PROVIDER’s IP is hardcoded into the code when initiating an RPC?

The answer is that there is a high degree of coupling between caller and provider. When the provider’s IP changes, the caller changes with it.

If the number of services is small, such as only two services, this problem will not be felt. IP hardcoding seems more efficient and convenient. But when there are tens, hundreds, or even thousands of services, it’s almost impossible to hardcode IP.

In this case, if a container is responsible for maintaining the IP address of the service provider, the caller synchronizes the IP address list of the RPC service once in a while, and uses the IP address in the IP list to obtain the real IP address of the service provider when initiating the RPC. This decouples the caller from the provider.

This container is the registry.

Registers its IP with the registry when the provider service is started.

The caller initiates an RPC after obtaining the provider’s IP through the request registry.

There is a saying in computer science that perfectly illustrates this phenomenon: any problem in computer science can be solved by adding an indirect intermediate layer.

As you can see from the above description, the registry concept is similar to that of DNS. But very few people use DNS as a registry. The reason is that DNS has several characteristics that make it unsuitable for a registry.

  • DNS is manual registration.
  • DNS has cache, and the real-time performance cannot meet the requirement of second level.
  • Registries also require automatic registration and discovery and health check mechanisms, which DNS does not have.

Mature open source solutions include Zookeeper, Eureka, Consul, Etcd and Nacos.

Etcd and Consul are currently popular in China.

Health check

The caller initiates an RPC against the provider after obtaining the PROVIDER’s IP in the registry. However, we cannot guarantee that the state of every provider service node is always healthy. For example, some provider service nodes have peak CPU, high load, slow response, and frequent timeouts. You can mark the service node as healthy and reduce traffic to it. If the service node returns to normal processing speed, set it to health.

The provider maintains a heartbeat with the registry, and if the heartbeat stops, the provider is considered down and the health status is marked as dead. The caller no longer invokes the service. However, the registry will also use probing to try to recover the service node.

Load balancing

When a service node cannot support the current traffic, multiple service nodes can be deployed to form a cluster and distribute requests to each service node in the cluster through Load balancing. In this way, multiple service nodes can be used to share the request pressure.

RPC load balancing is different from Web services load balancing in that RPC load balancing is implemented at the code level. The RPC caller establishes a long connection with all service nodes delivered from the registry, and selects a service node through the load balancing algorithm each time he initiates an RPC.

There are two common load balancing algorithms, static balancing algorithm and dynamic balancing algorithm.

Static load balancing algorithm

Polling algorithm

Attempts to achieve absolute balance of requests by recording a global value. But to consider concurrency, you need to use a locking mechanism to ensure that global values are read and written, so there is a throughput bottleneck.

Random algorithm

Randomly select one of the back-end servers based on the size value of the list. The advantage is that the implementation is simple, but the disadvantage is that it is not applicable to the case of inconsistent server carrying capacity. The larger the throughput, the closer the polling effect.

Source address hashing algorithm

Using the client IP address as the parameter, the Hash function calculates a value, modulo the size of the server list, and the result is the serial number of the server to access.

Weighted polling algorithm/rotation weight algorithm

In actual situations, the configuration of each server is different, so the anti-stress capability is also different. Whether using polling or random method, this problem will be faced. A better approach is to assign a weight to each server based on its actual hardware configuration and stress resistance. On this basis, the polling method is used for allocation.

Weighted random algorithm/random weight algorithm

Similar to weighted polling.

Dynamic load balancing algorithm

The most commonly used static load balancing algorithm is the weighting method. Server nodes are often configured differently.

However, when the number of server nodes reaches hundreds or thousands, manual weight setting is not recommended. A more convenient way is to dynamically obtain the current status of the provider during the health check for dynamic load balancing.

Minimum connection number algorithm

The minimum number of connections is to dynamically obtain the current number of connections of each server and find the smallest server node to call.

Fastest response speed algorithm

Measure the current load on the machine based on recent requests and response time delays and find the server with the shortest response time to make the call.

In addition to the minimum number of connections and the fastest response speed two algorithms, can also carry out more fine-grained calculation, such as dynamic calculation of CPU, memory state, etc..

Implementation of common load balancing algorithms

This article is not mainly about load balancing, but as an important part of RPC, it is necessary to understand and master load balancing. The following uses Go language to achieve several simple load balancing algorithms.

Simulate 3 servers.

// main_test.go
package loadbalance

type Balanceable struct {
	Name       string
	Weight     int
	connection int
}

var server1 = Balanceable{Name: "1", Weight: 1}
var server2 = Balanceable{Name: "2", Weight: 2}
var server3 = Balanceable{Name: "3", Weight: 3}

var servers = []*Balanceable{
	&server1,
	&server2,
	&server3,
}

func (server *Balanceable) getActive(a) int {
	return server.connection
}

func (server *Balanceable) conn(a) {
	server.connection++
}
Copy the code

Random number algorithm

The implementation of random algorithm is very simple and the performance is very good. Because it doesn’t have to worry about concurrency.

package loadbalance

import (
	"fmt"
	"math/rand"
	"testing"
	"time"
)

func random(servers []*Balanceable) *Balanceable {
	rand.Seed(time.Now().UnixNano())
	idx := rand.Intn(len(servers))
	return servers[idx]
}

func TestRandom(t *testing.T) {
	counter := map[string]int{"1": 0."2": 0."3": 0}
	for i := 0; i < 100; i++ {
		server := random(servers)
		counter[server.Name]++
	}
	fmt.Println("counter:", counter)
}
Copy the code

Run tests:

go test -timeout 30s -run ^TestRandom$ load_balance -count=1 -v
Copy the code

5 test results:

counter: map[1:35 2:34 3:31]
counter: map[1:31 2:31 3:38]
counter: map[1:35 2:34 3:31]
counter: map[1:41 2:25 3:34]
counter: map[1:30 2:36 3:34]
Copy the code

You can see that each time the random number algorithm results are different, there is some deviation, but the deviation is not too big, almost 33.

Polling algorithm

Polling requires maintaining a counter and locking it to ensure atomicity of operations.

package loadbalance

import (
	"fmt"
	"sync"
	"testing"
)

var pos = 0
var m sync.RWMutex

func roundRobin(servers []*Balanceable) *Balanceable {
	m.Lock()
	defer m.Unlock()
	if pos >= len(servers) {
		pos = 0
	}
	ret := servers[pos]
	pos++

	return ret
}

func TestRoundRobin(t *testing.T) {
	counter := map[string]int{"1": 0."2": 0."3": 0}
	wg := sync.WaitGroup{}
	wg.Add(100)
	for i := 0; i < 100; i++ {
		go func(i int) {
			server := roundRobin(servers)
			m.Lock()
			defer m.Unlock()
			counter[server.Name]++
			wg.Done()
		}(i)
	}
	wg.Wait()
	fmt.Println("counter:", counter)
}
Copy the code

Run tests:

go test -timeout 30s -run ^TestRoundRobin$ load_balance -count=1 -v
Copy the code

5 test results:

counter: map[1:34 2:33 3:33]
counter: map[1:34 2:33 3:33]
counter: map[1:34 2:33 3:33]
counter: map[1:34 2:33 3:33]
counter: map[1:34 2:33 3:33]
Copy the code

The polling algorithm allocates values by recording them and is predictable, so the results are determined each time.

Random weight algorithm

package loadbalance

import (
	"fmt"
	"math/rand"
	"testing"
	"time"
)

func weightedRandom(servers []*Balanceable) *Balanceable {
	max := 1
	min := servers[0].Weight
	for _, server := range servers {
		w := server.Weight
		max += w
		if min > w {
			min = w
		}
	}
	rand.Seed(time.Now().UnixNano())
	idx := rand.Intn(max-min) + min
	tmp := 0
	for _, server := range servers {
		tmp += server.Weight
		if tmp >= idx {
			return server
		}
	}
	return servers[0]}func TestWeightedRandom(t *testing.T) {
	counter := map[string]int{"1": 0."2": 0."3": 0}
	for i := 0; i < 100; i++ {
		server := weightedRandom(servers)
		counter[server.Name]++
	}
	fmt.Println("counter:", counter)
}
Copy the code

Run tests:

go test -timeout 30s -run ^TestWeightedRandom$ load_balance -count=1 -v
Copy the code

5 test results:

counter: map[1:19 2:34 3:47]
counter: map[1:14 2:29 3:57]
counter: map[1:22 2:26 3:52]
counter: map[1:16 2:37 3:47]
counter: map[1:15 2:30 3:55]
Copy the code

The results of random weight algorithms are also uncertain, but are very close to the 1:2:3 ratio.

Minimum connection number algorithm

package loadbalance

import (
	"fmt"
	"math/rand"
	"sync"
	"testing"
	"time"
)

func leastConnections(servers []*Balanceable) *Balanceable {
	rand.Seed(time.Now().UnixNano())
	ret := servers[0]
	for _, server := range servers {
		if server.getActive() <= ret.getActive() {
			ret = server
		}
	}
	return ret
}

func TestLeastConnections(t *testing.T) {
	counter := map[string]int{"1": 0."2": 0."3": 0}
	wg := sync.WaitGroup{}
	wg.Add(100)
	for i := 0; i < 100; i++ {
		go func(a) {
			server := leastConnections(servers)
			counter[server.Name]++
			server.conn()
			wg.Done()
		}()
	}
	wg.Wait()
	fmt.Println("counter:", counter)
}
Copy the code

Run tests:

go test -timeout 30s -run ^TestLeastConnections$ load_balance -count=1
Copy the code

5 test results:

counter: map[1:33 2:33 3:34]
counter: map[1:33 2:33 3:34]
counter: map[1:33 2:33 3:34]
counter: map[1:33 2:33 3:34]
counter: map[1:33 2:33 3:34]
Copy the code

The minimum connection number algorithm is similar to the polling algorithm, and the results are very stable.

Current limiting fuse

The main application scenarios of RPC are distributed systems. Distributed systems often face high concurrency, and RPC also faces high concurrency scenarios.

To ensure service stability and high availability, the service needs to protect itself from CPU and memory usage.

Fuses limiting can be placed in the RPC framework or as a standalone component.

Current limiting

Traffic limiting is a common way of service self-protection. When the traffic received by the service provider exceeds the limit threshold, the business logic is not continued, but a limiting exception is returned.

Traffic limiting can be implemented in RPC frameworks or business systems. But an architecture best implemented in an RPC framework that allows business developers to focus on the business itself is a good architecture.

The flow limiting function can be placed on the caller, provider, or even both.

When limiting traffic, the first step is to identify the source of the call. To divide the application level into dimensions, the dimensions can be IP level. This allows traffic policies to be made for different callers. For example, a caller to an IP address that exceeds a certain NUMBER of QPS will reject processing.

Current limit algorithm

Common traffic limiting algorithms include Counter, Sliding Window, Leaky Bucket, and Token Bucket.

Counter algorithm

The advantage of the counter algorithm is simple implementation.

All traffic limiting algorithms divide time into N time Windows. For example, if you divide it into minutes, each minute is a window of time. Process a maximum of X requests in each time window. When the number of requests exceeds X, no more requests are made. If the next time window is entered, the number of requests will be reworked.

Here is the counter algorithm implemented using the Go language.

Simulate service entry.

// main_test.go
package limiting

import "fmt"

type Server struct {
	limit func(a) bool
}

func (s *Server) api(a) {
	pass := s.limit()
	if pass {
		fmt.Println("noraml")}else {
		fmt.Println("overload")}}Copy the code

Counter algorithm implementation.

package limiting

import (
	"testing"
	"time"
)

func Counter(interval int, limit int) func(a) bool {
	startTime := makeTimestamp()
	requestCount := 0
	grant := func(a) bool {
		if makeTimestamp() < startTime+int64(interval) {
			requestCount++
			return requestCount <= limit
		}
		startTime = makeTimestamp()
		requestCount = 1
		return true
	}
	return grant
}

func makeTimestamp(a) int64 {
	return time.Now().UnixNano() / int64(time.Millisecond)
}

func TestCounter(t *testing.T) {
	limit := Counter(100.3)
	server := Server{limit}
	for i := 0; i < 5; i++ {
		go func(a) {
			server.api()
		}()
	}
	time.Sleep(100 * time.Millisecond)
	for i := 0; i < 5; i++ {
		go func(a) {
			server.api()
		}()
	}
	time.Sleep(1 * time.Second)
}
Copy the code

Test results:

noraml
noraml
noraml
overload
overload
noraml
noraml
noraml
overload
overload
Copy the code

One problem with this algorithm is that it is easy to exceed the limit at the critical point of time window. If the malicious caller learns the critical point of the time window, it can send twice the maximum number of requests in a second, potentially overwhelming the server. It is not recommended to use counter algorithms in a production environment.

Sliding window algorithm

Sliding window is to divide the time axis into N grids, and every time after a period of time, slide a grid, each grid has its own independent counter, so as to make up for the hidden danger of the counter algorithm.

The finer the grid, the smoother the sliding window and the more accurate the limiting.

You can refer to this implementation.

Bucket algorithm

The principle of the leaky bucket algorithm is to compare requests to water and services to leaky buckets. Water first flows into the tank and drops out at a limited speed. When the water is too heavy and not fast enough, the water directly overflows, which is a denial of service.

In some scenarios, in addition to allowing average processing efficiency, some degree of burst processing is supported.

The problem of floor bucket algorithm is that it cannot support burst processing, so the solution is token bucket algorithm.

Token bucket algorithm

The principle of the token bucket algorithm is that the system generates tokens at a constant rate and places them in the token bucket. The token bucket has a capacity. When the capacity is full, adding more tokens to the bucket will be discarded. When a request comes in, the token is first fetched from the token bucket, and if it is successfully fetched, processing continues. If there is no token in the token bucket, the request is rejected.

Go language built-in time/rate package, the flow limiter is based on the token bucket algorithm implementation.

Refer to the time/rate implementation.

Single-node traffic limiting and cluster traffic limiting

There are two methods of traffic limiting: single-node traffic limiting and cluster traffic limiting.

Single-node traffic limiting: smooth. The ability to control single-node traffic is excellent because it is processed in memory, with the disadvantage of not being able to control the caller’s request traffic as the cluster expands.

Cluster limiting: Centralized Redis. Cluster traffic can be controlled, but there will be problems such as Synchronized, Race Condition, delay, and performance bottlenecks.

The essence of limiting traffic is to prevent current services from being overwhelmed by traffic peaks.

Common limiting schemes include Guava RateLimiter, Sentinel, Redis, etc.

fusing

Call links in distributed systems are usually very long. When a downstream service times out, if not properly handled, the entire invocation link may be faulty and cannot be used normally. Therefore, it is not enough to limit the flow of the service provider alone, and the caller needs to deal with the problem caused by the problem of the service provider. This process is usually called a circuit breaker.

The circuit breaker mechanism is relatively simple, it is a state machine, with open, closed and half open three states. Normally, the fuse is closed; When a service provider is abnormal, record and collect abnormal information. When the information meets the fuse condition, the fuse turns to on, and the request sent to the service will be intercepted by the fuse and enter the failure logic. When the fuse is open for a period of time, it automatically switches to half-open, allowing a request to enter the server. If the request can be processed normally, the fuse is closed, otherwise it continues to switch to open.

The essence of circuit breakers is to prevent current services from being dragged down by downstream services.

Here, again, is the adage that any problem in computer science can be solved by adding an indirect middle layer.

The common circuit breakers include Hystrix and Resilience4j.

Safety certification

RPCS between services are usually on the same Intranet and are not exposed on the public network, so they are relatively secure.

Considering the small size of the overall service, RPC security authentication seems to be a redundant step. Because all the callers and providers need to communicate with each other.

However, as the overall service volume increases, one RPC may be used by N service lines, resulting in greater call usage and higher downtime probability. To facilitate troubleshooting, you need to know who the caller is.

The implementation idea of RPC security authentication is very similar to that of a Web Server.

The first need to register, login these two basic processes.

Which services can an RPC be called by? The caller first needs to register with the provider, which is approved by the provider, and the caller is a legitimate user. Since RPC registration provides only one IP information, subsequent calls really only need to be accompanied by IP information, equivalent to log-in free.

The server verifies that the IP address is valid to decide whether to continue processing the request.

Authorization can also be done on the basis of authentication.

After authorization is added, the granularity can be further refined, such as down to the method level, which methods can be invoked by the authorized caller. This is similar to the RBAC model, depending on the business requirements.

The implementation idea is introduced above. The concrete implementation of security authentication is related to the concrete implementation of RPC. Like RPC, which is implemented over HTTP, you can use TLS or add tokens to the Header. Other ways can use some general solutions, asymmetric encryption algorithm private key, OAuth, etc.

The only problem with adding security certification is the additional performance cost.

Similarities and differences between RPC and other technologies

RPC, LPC, and IPC

The counterpart of RPC is Local Process Call (LPC), a Local procedure Call. LPC refers to the sharing of memory space between two processes on the same computer to complete calls to each other. RPC is basically the same as LPC, except that it is called by processes on two computers, which has the concept of network.

Both RPC and LPC belong to inter-process Communication (IPC). Wikipedia defines IPC as follows:

Inter-process Communication (IPC) refers to some techniques or methods for transmitting data or signals between at least two processes or threads. A process is the smallest unit of resources allocated by a computer system (strictly speaking, threads). Each process has its own set of independent system resources that are isolated from each other. Interprocess communication is created to enable different processes to access resources and coordinate their efforts. In general, two applications that use interprocess communication can be divided into client and server (see master-slave architecture), where the client process requests data and the server responds to the client’s data requests. Some applications are themselves both servers and clients, which is common in distributed computing. These processes can run on the same computer or on different computers connected to a network. IPC is very important to the design process of microkernels and nano kernels. Microkernels reduce the amount of functionality provided by the kernel. These functions are then obtained by communicating with the server via IPC, which is a significant increase in the number of IPC’s compared to a normal macro kernel [1]

We can think of RPC as a special type of IPC.

RPC, HTTP and MQ

In distributed systems, there are HTTP and MQ communication modes in addition to RPC.

The difference between the three is mainly the difference in characteristics. Different technical solutions can be selected according to different application scenarios.

MQ’s biggest role is peak clipping and valley filling, so it is naturally asynchronous and suitable for application decoupling. Most communications in distributed systems require synchronous invocation, so MQ is not suitable for communication between services in distributed systems.

Both RPC and HTTP support synchronization and asynchrony, so the mainstream distributed system communication solutions are RPC and HTTP.

RPC tends not to be user-oriented and both callers know each other’s presence, so performance is more important than semantics and readability. RPC’s performance will be better than HTTP in most cases, so RPC will be the first to communicate with distributed systems.

RPC and RESTful

RESTful is the most mainstream remote service interface design style, which is mainly applied to HTTP interface design.

There are many differences between RPC and REST, mainly in design philosophy. RPC is process-oriented design and REST is resource-oriented design.

In addition, there are many differences in the specific technology implementation.

coupling

RPC is highly coupled, and usually the caller and provider need to use the same RPC framework. REST is not required and can be invoked by any client.

Message protocol

RPC typically uses very compact binary message protocols such as Protobuf and Thritf. REST uses loose text-messaging protocols, such as JSON and XML.

Communication protocol

The most mainstream implementation of RPC is TCP, and the most mainstream implementation of REST is HTTP and HTTP2.

performance

Because RPC uses binary message protocol and TCP as communication protocols, performance is much higher than REST. This is one of the important reasons why RPC will become the mainstream solution for communication in distributed systems.