Original: Little sister taste (wechat public ID: XJjdog), welcome to share, reprint, please keep the source.

New in 2020, the description of some components has been updated. The 19 year article is here. If you work on selection, or want to learn about some of the technologies that are popular right now, this article is for you.

This article covers 14 areas and covers hundreds of frameworks and tools. There will be some you like, and probably some you hate. This is one of the most common tools I work with on a regular basis, and it applies to companies large and small. If you have a better one, please leave a comment.

1, Message queue 2, Cache 3, Library Sub-table 4, Data synchronization 5, Communication 6, Micro service 7, distributed tool 8, monitoring system 9, scheduling 10, entry tool 11, OLT (A) P 12, CI/CD 13, Problem investigation 14, local toolCopy the code

1. Message queue

  • √ recommended :(1) throughput is preferredkafka
  • (2) Stability is preferredRocketMQ
  • (3) Internet of Things:VerneMQ

A large distributed system is usually asynchronous and runs on a message bus. As the most important basic component, message queue plays an important role in the whole architecture. Asynchrony usually means a change in the programming model, with less timeliness.

Kafka is by far the most commonly used message queue, especially for big data, with extremely high throughput. Rocketmq and RabbitMQ, which are carrier-level message queues, are more commonly used in services. In comparison, ActiveMQ is the least used and belongs to the older generation of message frameworks.

Pulsar is a young messaging system designed to solve some of the problems on Kafka. It has a limited toolchain. Some radical teams have tried it out and been well received, but not many have actually used it.

MQTT is a protocol, mainly used in the Internet of things, can two-way communication, belongs to the category of message queue, recommended to use VerneMQ.

Related articles: “Holistic” distributed Messaging Systems, Design essentials. “Kafka” 360 degree test: Kafka will lose data? Does its high availability meet requirements? “Kafka” uses multithreading to increase Kafka’s consumption power “AMQ” ActiveMQ architecture design and best practices, requires 10,000 words “MQ” open source a Kafka enhancement: OKMQ-1.0.0

Second, the cache

  • √ recommended :(1) use the default cache in the heapcaffeine
  • (2) Distributed cache adopts redis cluster mode, but pay attention to the restrictions

Data cache is an effective way to reduce the pressure on the database, there are stand-alone Java internal cache, and distributed cache.

For standalone, Guava’s LoadingCache and EhCache are familiar, but SpringBoot chose Caffeine as its default in-heap cache because Caffeine is faster.

For distributed caches, the preferred option is Redis, don’t hesitate. Since Redis is single-threaded (6.0 supports multithreading but is disabled by default), it is not suitable for time-consuming operations. So for some large data caches, such as images, videos, etc., using memcached is much better.

JetCache is a Java-based cache system wrapper that provides a unified API and annotations to simplify cache usage. Similar to SpringCache, support for local and distributed caches is also a useful tool for simplifying development.

Related article: “Redis” this may be the most pertinent Redis specification “Redis” and biological Redis Cluster, to a close contact “Redis” Redis zset how cow? Please pass your ear to “Redis”. I am so flustered. Redis has so many clustering schemes, which one should I use? Protocol architecture secret: Graft. Redis is getting old, what antique client are you still using? The new “in-heap” Cache is indeed faster than Guava’s Cache

Iii. Sub-database sub-table

  • √ Recommended: Sharding-JDBC in ShardingSphere

Sub-database sub-table, almost every point size company, will have their own plan. Currently, it is recommended to use the driver layersharding-jdbc(already in Apache), or the proxy layermycat. If you don’t have an extra operation team and don’t want to spend money on other machines, go with the former.

Spring’s dynamic data sources are a good choice if there are few projects involved in the library sub-table. It is encoded directly in code and is intuitive but not easily extensible.

If only read/write separation is required, the replication protocol in the mysql official driver is a more lightweight option.

The above sub-table components are the final winners. These components are different from other components, and once the solution is decided, there is almost no fallback, so be careful.

The sub-table is a small case, and the stage of preparing the sub-table is the key: that is, data synchronization.

Related articles: “sub-database sub-table” “sub-database sub-table”? Selection and process should be careful, otherwise it will get out of control “data synchronization” hope a data synchronization, cure all diseases “database table” database table “practice” HA “MySQL official driver” master and slave separation mystery “Sharding” routing rules in reality, It may be more complex than you think to “Sharding” non-standard SQL Sharding – JDBC practices

4. Data synchronization

  • √ Recommended: canal

Domestic usemysqlBut PostgresQL’s use has been increasing due to its superior performance.

No matter what database, real-time data synchronization tools simulate themselves as a slave library, pulling and parsing data. Specifically, mysql synchronizes via binlog; Postgresql uses WAL logs for synchronization.

For mysql, Canal is the most used solution in China; Databus is also a useful tool.

Tools such as Canal and Maxwell now allow data to be synchronized to be written to MQ for subsequent processing, making it much easier.

For ETL (Extract, Clean, Transform), these are basically source, Task, sink routes, which correspond to the previous functions. Gobblin, Datax, Logstash, Sqoop, and others are just such tools.

Their main work is how to easily define configuration files, write a variety of data source adaptation interface. These ETL tools can also be used as tools for data synchronization (especially full synchronization), usually by ID, last update time, etc.

Binlog is a real-time incremental tool, assisted by ETL tools. Usually a data synchronization function requires the participation of multiple components, which together constitute a whole.

Related article: “cloud library” MySQL is impotent, can not put so much data! “Data synchronization” by the Canal component analyzes the general process of integrated middleware architecture “cloud library” to record a messed up solution degradation (the rough road of hot and cold separation on the cloud)

Five, communication

  • √ Recommended: HTTP + JSON for debugging. Binary protocol optional for high performance

In Java, Netty has become the undisputed network development framework, including itssocketioDon’t talk to me about Mina. For the HTTP protocol, yescommon-httpclient, and more lightweight toolsokhttpTo support.

For an RPC, you need to specify a communication mode and a serialization mode. Json is the most commonly used serialization method, but transmission and parsing cost is large, XML and other text protocols similar to it, have a lot of redundant information; Avro and Kryo are binary serialization tools that do not have these drawbacks, but debugging inconvenience.

RPC stands for remote procedure call, among which Thrift, Dubbo and gRPC are all socket communication frameworks with binary serialization by default. Feign and Hessian are remote invocation frameworks for onHTTP.

By the way, gRPC’s serialization tool is Protobuf, a binary serialization tool with a high compression ratio.

In general, the response time of a service is mainly spent on the business logic and the database, with the communication layer taking a small proportion of the time. You can choose according to your company’s research and development level and business scale.

Related article: “Web Development” With Netty, what exactly are we developing? “WS” WebSocket protocol 8 Asks

Vi. Micro services

  • √ Recommended: (1) Registry: Consul
  • (2) Gateway: nginx+Gateway
  • (3) Configuration center: Apollo
  • (4) Call chain: Skywalking
  • (5) Fuse: resilience4j

We’ve talked about microservices more than once, but this time let’s take a peek at the architecture from the bunch of supporting frameworks around them. Yes, we’re still talking about spring Cloud.

The default registry eureka is no longer maintained, Consul has become preferred, which is developed out of the box using the RAFT protocol. Nacos, ZooKeeper, etc., can be used as an alternative. Nacos with background, more suitable for Chinese use habits.

The fuse component, the official Hystrix is no longer maintained. Resilience4j is recommended. Recently, Ali’s Sentinel also has a strong performance.

There are many new faces for the invocation chain due to the rise of OpenTracing. Jaeger or SkyWalking are recommended. Spring Cloud’s sleuth+ Zipkin integration is less powerful than even the traditional intrusive CAT.

Configuration center is a great tool for managing configuration files for multiple environments, especially if you don’t want to restart the server. By far the best open source is Apollo, which provides support for Spring Boot. Disconf is also widely used. The Spring Cloud Config feature is relatively limited and is rarely used.


The most popular gateway is Nginx. On top of nginx, there is OpenRestry based on Lua scripts. Because openResty is quite cumbersome to use, there is a gateway with a higher level of encapsulation like Kong.

For Spring Cloud, zuul series recommends using Zuul2, zuul1 is multi-threaded blocking and has a hard edge. The Spring-cloud-Gateway is the birth of Spring Cloud and is strongly supported by Spring Cloud. It is developed based on the new feature WebFlux of Spring5.0. The underlying network communication framework is Netty, with high throughput.

“Integral” microservices are not all, but only a subset of a specific domain. “SCG” is the word Spring Cloud Gateway2.0, oriented to the future technology, understand? “Trace” 2W word long, let you instantly have “call chain” development experience “fuse” light close slow twist, micro service fuse big main

7. Distributed tools

It is well known that the distributed system ZooKeeper can be used in many scenarios, similar to etCD and Consul based on raft protocol.

Because they guarantee a high degree of consistency, they are ideal as a coordination tool. Usage is concentrated in: configuration center, distributed lock, naming service, distributed coordination, master election and other places.

For distributed transactions, there is Alibaba’s Fescar tool for support. But unless it is especially necessary, it is better to use flexible transactions in search of ultimate consistency.

Viii. Monitoring system

  • √ Recommended: Prometheus + grafana + telegraf
  • Log collection: a large number of ELKB, a small number of Loki

There are many kinds of monitoring system components. At present, the above four types are probably the most popular.

Zabbix is a great choice for a limited number of hosts.

Prometheus is coming violently and poised to dominate the world. It can also be presented with a more beautiful grafana front end.

Both the InfluxDB and Telegraf components of InfluxData are easy to use, mainly because they have full functions.

Using the ELKB toolchain stored in es is also a good choice. A lot of companies I know use it.

Related documents:

“Overall” so many monitoring components, there is always a suit for you to “log” ELKB practice experience, and give a set of complex configuration file “log” log collection of “DNA” “log” practice a Loki, experience the palm dance of the light “log” your wildflowers, The logback-based log “specification” and “desensitization” and “monitor” used to teach humans to use fire Prometheus, now trying to alarm “APM” 2w word long text, let you instantly have “call chain” development experience “APM” this round, Skywalking won the “bottom” of the unpopular instrument package, the function d fried days “bottom” yours is mine. 3 ko multithreading cases, local variable passthrough

Nine, scheduling

  • √ Recommended: xxL-job

You’ve probably all used cron expressions. This expression originally came from the Linux crontab tool.

Quartz is an old scheduling scheme in Java. Distributed scheduling uses database locking, and the management interface needs to be developed by itself.

Elastic job-cloud is widely used, but the system operation and maintenance is complex and the learning cost is high. Relatively speaking, XXL-job is more lightweight. The background of the system developed by the Chinese is pretty.

X. Entry tools

  • √ Recommended: LVS

In order to unify the user’s access to the intersection, the general use of some entrance tools for support.

Among them, haProxy, LVS, keepalived and so on are widely used.

Servers generally use stable centos and are supported by Ansible tools, which is great.

(A) P

  • √ Recommended: ES

Today’s enterprises, the amount of data is very large, data warehouse is a must.

For search, Solr and ElasticSearch are popular, both based on Lucene. Solr is more mature and more stable, but not as real-time search as ES.

As for column storage, Hbase based on Hadoop is the most widely used. Leveldb, which is based on LSM, has excellent write performance, but is currently mostly used as an embedded engine.

Tidb is a domestic upstart, compatible with mysql protocol, the company through training to export DBA, the future is foreseeable.

In terms of timing database, Opentsdb is more used in very large monitoring systems. Druid and Kudu are better at handling real-time aggregation of multi-dimensional data.

Cassandra was popular for a while when it first came out, and despite news of Facebook’s demise, the ecosystem has been established, and it’s been in the top 15 database engines for years.

Twelve, CI/CD

.jpg)

To support continuous integration and virtualization, we have other tools besides the familiar Docker.

Jenkins is the first choice to pack for release, after all, he has been big Brother for so many years. Of course, the company that wrote the Idea also made a tool called TeamCity, which has a very smooth interface.

Solor has to be said to be an artifact, after using it, friends of the code red, I was almost vomit starchild to drown.

Internally, gitLab is used to build git servers. In fact, the GitLab CI in it is also very easy to use.

Harbor, based on Docker Registry, extends the governance capabilities such as permission control, audit, image synchronization, management interface and so on. It is recommended to be used.

Scheduling, k8sGoogle open source, strong community push, there are a lot of landing solutions. Rancher on the K8S function expansion, and k8S cluster interaction of some convenient tools, including the execution of the command line, management of multiple K8S cluster, view the running status of k8S cluster nodes, recommended integration.

Related article: Is “Continuous Integration” publishing so hard? “Process” technical review, what do you joke about? The invisible hand in “process” research and development is very painful. With MinIO, would you still be able to use FastDFS?

13. Troubleshooting

Java often has memory overflow problems. After exporting the stack using JMAP, I usually use MAT for in-depth analysis.

For real-time online analysis, there are Arthas and Perf tools.

Of course, there are a large number of Linux tools to support this.

Related articles: The Most Common Batch of Command parsing on Linux (10 Years of Selections) The most common set of Vim techniques the most common set of Sed techniques the most common set of AWK techniques

Local tools

There are many jars and tools for local use. Here are just a few of the most common ones.

Database connection pool, the most used druID. Currently, there are hikari database connection pools that claim to be the fastest, as well as the old-school DBCP and C3P0.

Json, domestic use fastJSON most, every two or three days out of a vulnerability; Abroad, Jackson is used more. The apis are similar, Jackson has more features, but FastJSON is easier to use. Given fastJSON’s frequent security issues, there has been a wave of moves to fastJSON.

In terms of toolkits, although there are various Commons packages, Guava is preferred.

End

Today is September 08, 2020.

This kind of article, every year I will sort out. Some of the new faces, some of the ones I personally knocked out. Architecture selection, in addition to your own more familiar with a certain technology, use more assured. It’s more about doing a lot of research, comparing, and mastering.

Technology changes with each passing day, new bottle old wine, noun a laundry list, programmer very hard. Only the underlying principle, the idea of simplicity, has endured.

Author introduction: Little sister taste (XJjdog), a programmer is not allowed to detour the public number. Focus on infrastructure and Linux. Ten years of architecture, 10 billion traffic per day, and you discuss the high concurrency world, give you a different taste.