Original: Little sister taste (wechat public ID: XJjdog), welcome to share, reprint, please keep the source.

A few days ago, a training friend, and I asked for a Java backend progression roadmap, I sent this article to him “must see! Java backend, bright sword takes the hindmost”. Today, I wanted the most commonly used tools and frameworks for the Java backend. I had drawn such a diagram before, so I sent it to him. Although not very full, but also hope to get his praise. I didn’t think…

1, Message queue 2, Cache 3, Library Sub-table 4, Data synchronization 5, Communication 6, Micro service 7, distributed tool 8, monitoring system 9, scheduling 10, entry tool 11, OLT (A) P 12, CI/CD 13, Problem investigation 14, local toolCopy the code

1. Message queue

A large distributed system is usually asynchronous and runs on a message bus.

Kafka is by far the most commonly used message queue, especially for big data, with extremely high throughput. Rocketmq and RabbitMQ, which are carrier-level message queues, are more commonly used in services. In 2019, stop focusing on JMS (i.e., bloated ActiveMQ).

Pulsar is a young messaging system designed to solve some of the problems on Kafka. It has a limited toolchain. Some radical teams have been tested and responded well.

MQTT is a specific protocol, mainly used in the Internet of things, two-way communication, belongs to the message queue category.

Second, the cache

Data cache is an effective way to reduce the pressure on the database, there are stand-alone Java internal cache, and distributed cache.

For stand-alone devices, Guava’s cache and EhCache are familiar.

For distributed caches, the preferred option is Redis, don’t hesitate. Since Redis is single-threaded, it is not suitable for time-consuming operations. So for some large data caches, such as images, videos, etc., using memcached is much better.

JetCache is a Java-based cache system wrapper that provides a unified API and annotations to simplify cache usage. Similar to SpringCache, support for local and distributed caches is a great tool for simplifying development.

Iii. Sub-database sub-table

Sub-database sub-table, almost every point size company, will have their own plan. For now, sharding-JDBC for the driver layer or MyCAT for the agent layer is recommended. If you don’t have an extra operation team and don’t want to spend money on other machines, go with the former.

Spring’s dynamic data sources are a good choice if there are few projects involved in the library sub-table. It is encoded directly in code and is intuitive but not easily extensible.

If only read/write separation is required, the replication protocol in the mysql official driver is a more lightweight option.

The above sub-table components are the final winners. These components are different from other components, and once the solution is decided, there is almost no fallback, so be careful.

The sub-table is a small case, and the stage of preparing the sub-table is the key: that is, data synchronization.

4. Data synchronization

The majority of domestic companies are using mysql, but PostgresQL is increasingly used due to its excellent performance.

No matter what database, real-time data synchronization tools simulate themselves as a slave library, pulling and parsing data. Specifically, mysql synchronizes via binlog; Postgresql uses WAL logs for synchronization.

For mysql, Canal is the most used solution in China; Databus is also a useful tool.

Tools such as Canal and Maxwell now allow data to be synchronized to be written to MQ for subsequent processing, making it much easier.

For ETL (Extract, Clean, Transform), these are basically source, Task, sink routes, which correspond to the previous functions. Gobblin, Datax, Logstash, Sqoop, and others are just such tools.

Their main work is how to easily define configuration files, write a variety of data source adaptation interface. These ETL tools can also be used as tools for data synchronization (especially full synchronization), usually by ID, last update time, etc.

Binlog is a real-time incremental tool, assisted by ETL tools. Usually a data synchronization function requires the participation of multiple components, which together constitute a whole.

Five, communication

In Java, Netty has become the unrivaled framework for web development, including socketio (don’t get me started on Mina). The HTTP protocol is supported by common-HttpClient and the more lightweight tool okHTTP.

For an RPC, you need to specify a communication mode and a serialization mode. Json is the most commonly used serialization method, but transmission and parsing cost is large, XML and other text protocols similar to it, have a lot of redundant information; Avro and Kryo are binary serialization tools that do not have these drawbacks, but debugging inconvenience.

RPC stands for remote procedure call, among which Thrift, Dubbo and gRPC are all socket communication frameworks with binary serialization by default. Feign and Hessian are remote invocation frameworks for onHTTP.

By the way, gRPC’s serialization tool is Protobuf, a binary serialization tool with a high compression ratio.

In general, the response time of a service is mainly spent on the business logic and the database, with the communication layer taking a small proportion of the time. You can choose according to your company’s research and development level and business scale.

Vi. Micro services

We’ve talked about microservices more than once, but this time let’s take a peek at the architecture from the bunch of supporting frameworks around them. Yes, we’re still talking about spring Cloud.

The default registry eureka is no longer maintained, consul has become preferred. Nacos, ZooKeeper, etc., can be used as an alternative. Nacos with background, more suitable for Chinese use habits.

The fuse component, the official Hystrix is no longer maintained. Resilience4j is recommended. Recently, Ali’s Sentinel also has a strong performance.

There are many new faces for the invocation chain due to the rise of OpenTracing. Jaeger or SkyWalking are recommended. Spring Cloud’s sleuth+ Zipkin integration is less powerful than even the traditional intrusive CAT.

Configuration center is a great tool for managing configuration files for multiple environments, especially if you don’t want to restart the server. By far the best open source is Apollo, which provides support for Spring Boot. Disconf is also widely used. The Spring Cloud Config feature is relatively limited and is rarely used.


The most popular gateway is Nginx. On top of nginx, there is OpenRestry based on Lua scripts. Because openResty is quite cumbersome to use, there is a gateway with a higher level of encapsulation like Kong.

For Spring Cloud, zuul series recommends using Zuul2, zuul1 is multi-threaded blocking and has a hard edge. The Spring-cloud-Gateway is the birth of Spring Cloud, but is not widely used at present.

7. Distributed tools

It is well known that the distributed system ZooKeeper can be used in many scenarios, similar to etCD and Consul based on raft protocol.

Because they guarantee a high degree of consistency, they are ideal as a coordination tool. Usage is concentrated in: configuration center, distributed lock, naming service, distributed coordination, master election and other places.

For distributed transactions, there is Alibaba’s Fescar tool for support. But unless it is especially necessary, it is better to use flexible transactions in search of ultimate consistency.

Viii. Monitoring system

There are many kinds of monitoring system components. At present, the above four types are probably the most popular.

Zabbix is a great choice for a limited number of hosts.

Prometheus is coming violently and poised to dominate the world. It can also be presented with a more beautiful grafana front end.

Both the InfluxDB and Telegraf components of InfluxData are easy to use, mainly because they have full functions.

Using the ELKB toolchain stored in es is also a good choice. A lot of companies I know use it.

Nine, scheduling

You’ve probably all used cron expressions. This expression originally came from the Linux crontab tool.

Quartz is an old scheduling scheme in Java. Distributed scheduling uses database locking, and the management interface needs to be developed by itself.

Elastic job-cloud is widely used, but the system operation and maintenance is complex and the learning cost is high. Relatively speaking, XXL-job is more lightweight. The background of the system developed by the Chinese is pretty.

X. Entry tools

In order to unify the user’s access to the intersection, the general use of some entrance tools for support.

Among them, haProxy, LVS, keepalived and so on are widely used.

Servers generally use stable centos and are supported by Ansible tools, which is great.

(A) P

Today’s enterprises, the amount of data is very large, data warehouse is a must.

For search, Solr and ElasticSearch are popular, both based on Lucene. Solr is more mature and more stable, but not as real-time search as ES.

As for column storage, Hbase based on Hadoop is the most widely used. Leveldb, which is based on LSM, has excellent write performance, but is currently mostly used as an embedded engine.

Tidb is a domestic upstart, compatible with mysql protocol, the company through training to export DBA, the future is foreseeable.

In terms of timing database, Opentsdb is more used in very large monitoring systems. Druid and Kudu are better at handling real-time aggregation of multi-dimensional data.

Cassandra was popular for a while when it first came out, and despite news of Facebook’s demise, the ecosystem has been established, and it’s been in the top 15 database engines for years.

Twelve, CI/CD

To support continuous integration and virtualization, we have other tools besides the familiar Docker.

Jenkins is the first choice to pack for release, after all, he has been big Brother for so many years. Of course, the company that wrote the Idea also made a tool called TeamCity, which has a very smooth interface.

Sonar (notice the error on the picture) has to say that it is an artifact, after using it, the small friends of the code a red, I was quickly spitfire child to drown.

Internally, gitLab is used to build git servers. In fact, the GitLab CI in it is also very easy to use.

13. Troubleshooting

Java often has memory overflow problems. After exporting the stack using JMAP, I usually use MAT for in-depth analysis.

For real-time online analysis, there are Arthas and Perf tools.

Of course, there are a large number of Linux tools to support this. Like these:

Parsing the Most Frequently Used Batch of Commands on Linux (10 Years of Picks)

Local tools

There are many jars and tools for local use. Here are just a few of the most common ones.

Database connection pool, the most used druID. Currently, there are hikari database connection pools that claim to be the fastest, as well as the old-school DBCP and C3P0.

Json, domestic use fastJSON most, every two or three days out of a vulnerability; Abroad, Jackson is used more. The apis are similar, Jackson has more features, but FastJSON is easier to use.

In terms of toolkits, although there are various Commons packages, Guava is preferred.

End

Today is August 13, 2019. Typhoon Lekima has just finished.

This kind of article, every year I will sort out. Some of the new faces, some of the ones I personally knocked out. Architecture selection, in addition to your own more familiar with a certain technology, use more assured. It’s more about doing a lot of research, comparing, and mastering.

Technology changes with each passing day, new bottle old wine, noun a laundry list, programmer very hard. Only the underlying principle, the idea of simplicity, has endured.

Author introduction: Little sister taste (XJjdog), a programmer is not allowed to detour the public number. Focus on infrastructure and Linux. Ten years of architecture, 10 billion traffic per day, and you discuss the high concurrency world, give you a different taste. My personal wechat xJJdog0, welcome to add friends, further communication.

Recent Popular articles

“Must see! Java backend, bright sword hindmost fairy” back-end technology index, on the hot

“On Linux, the most commonly used batch of command parsing (10 years selection)” CSDN release day, 1K praises. 1 out of 8.

If you don’t understand the core components of Spring Cloud, I will make up this story for nothing

The most common set of “Sed” techniques for Linux production is the most common series of Sed articles, easy to understand. Vim is much easier to understand.