This is the 23rd day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

preface

This gateway belongs to the business gateway type and has been submitted to Apache for incubation and renamed shenyu, originally soul.

This article will still use the name soul, because when I landed this gateway, the version was org.dromara: Soul :2.2.1, which was not renamed at that time, and we use it now, because there is no time to upgrade to the new version, and it has been too long away from the latest major version. So for now, this article will use the old name Soul.

The following instructions or some of the problems are only based on this release note, may have been resolved in the latest version, do not directly understand.

Some of the code was submitted to the community under a more general arrangement, and many of the code was not submitted to the community because it was written haphazardly or for our own scenarios. Others have been thinking about organizing and submitting to the community, but feel that the implementation may be too low and have no time to organize, so they did not submit to the community.

The gateway selection

Why soul

When selecting an open source component, we may consider many factors, such as business scenarios, technology provisioning, component features, component performance, community activity, and so on.

We chose this gateway based on the current situation of our group.

When some students asked us why we chose soul, I explained that human cost was one of the reasons why we chose soul. For a business gateway, several open source gateways in the industry (except Zuul), the performance difference can be seen in some performance comparison documents, should not be several orders of magnitude difference. After all, it’s not a traffic gateway, and performance isn’t really the only consideration.

What our group values is its extensibility, relatively complete console, flexible routing rules, because these can reduce our more development time.

By the way, I want to explain why we didn’t choose The Spring Cloud Gateway. If it was for stability, it would be the first choice. After all, it relies on the Spring community and is well known as a component of Spring Cloud.

At that time, there were three people in our group, except FOR TL, there was only me and another R&D team, and I was responsible for the implementation of gateway landing. I couldn’t develop a console quickly in a very short time.

The gateway selection

TL was mainly responsible for the gateway selection. After analyzing more than a dozen gateways, the boss finally decided on two gateways, Fizz and Soul. Then I did the final research and analyzed which one to choose.

I had a brief understanding of the next two gateways at that time, and here is a comparison (some of the following instructions may not be applicable now, and some of the following descriptions are not accurate since I didn’t spend too much time understanding them at that time, so this comparison is just for reference)

Feature supports Soul Fizz
Visual management platform There are There are
Routing rules The caller routes according to the requested URL (original URL), supporting fuzzy matching, re, and so on. Plugins are provided to make simple adjustments to the URL mapping Gateway nodes can be grouped. Different gateway nodes support different routing rules. Url alias mapping routes are supported based on URL routing. Route types include service orchestration, service discovery, and reverse proxy
Plug-in scope Application level and interface level Service (interface) level
Service information exposure Automatic registration (code immersion), manual relatively cumbersome Manual maintenance, no service provider code level immersion
Multilingual support Java supports automatic registration, manual configuration in other languages No code immersion, manual registration of HTTP interfaces, language independent
Supported protocols http\spring cloud\dubbo\tars\soaf http \spring cloud
Custom plug-in configuration Supported and quite flexible, custom plugins are maven-dependent and code is decoupled Support, but need to develop on fizz, code coupling
Service choreography Does not support The support is relatively complete, but there is a cost of use for complex business
The default plug-in Supports authentication, fusing, traffic limiting, and traffic whitelist Only authentication and flow control are supported
Interface authentication The waF plug-in supports whitelist authentication and is disabled by default The default MD5 signature supports custom plug-in authentication and application whitelist
User Rights Configuration Not supported, not perfect It has a relatively complete permission configuration module, supporting role configuration, menu permissions, data permissions, personnel organization information, etc.
Whether open source Open source The console is commercially licensed
monitoring Support Prometheus Built-in Monitoring Panel

The most interesting thing about Fizz relative to Soul is the service orchestration part, but the console is not open source. As a commercial, I couldn’t consider it at all, so I chose Soul.

Some other factors

We may also be a very important reference in the selection of technical reserves, many people will prefer to choose their familiar. Most of the micro-service components that I did in the company before are self-developed, and the open-source components are to be understood for the sake of marking. They will not be used particularly in actual projects, so they will not be judged by “emotion”.

Also, community activity, so far, is great. At that time when the selection of the ground to submit individual bugs or PR are quickly dealt with. Compared to some of the previous PR in RocketMQ, the slowest time span was up to half a year before processing, and I had to deal with some special scene bugs locally before the next release. Perhaps RocketMQ is relatively stable, and that may be one reason for the slow processing.

The ground practice

After having a basic understanding and confirmation, it is the following routine operations:

SpringCloud plug-in menu configuration enhancements

By default, the springCloud plugin automatically registers and creates routing rules in Java, which is manually configured on the Web side (only configured on the springCloud plug-in). The function of divide plugin is similar to that of The Registry. The configuration is a bit cumbersome (new to this and not familiar enough at the time).

Our company mainly uses Java and GO languages, with a little PHP and Python. At that time, only Java supported automatic registration, so it was not friendly to use.

In addition, considering that automatic registration of Java projects needs to rely on relevant JAR packages and the need to configure annotations on the code, there is a certain code intrusion, to promote to business students, they are not very good to accept the use of automatic registration.

The registry we used was NACOS, so we adapted the Spring Cloud plug-in registry to allow quick manual configuration.

In order to reduce the intrusion of the original soul code, I used the sliced method to block requests. I thought that when I upgraded the soul to a new version, I could copy the code directly, something like this:

I just didn’t think the code changed more and more bad to upgrade.

Nacos version alignment

When I first used it, the NACOS version was not the same as the nacOS version we used. Then I found that the NACOS version was also chaotic, with multiple versions, so I changed it to the unified version.

However, this part was not submitted to the community because it was not verified. Some versions of NACOS are dependent on some of these plug-ins, and I am not sure where it is useful.

We just test everything we use to make sure there are no problems. As for the nacOS version that other plug-ins rely on, we didn’t submit it to the community because we didn’t have time to test it, but the community should have upgraded the nacOS client version to 2.x.

Comb through the documentation: rule configuration, plug-in customization specifications

Because it is for business students, so need a uniform specification, including the creation of the selector, rule format. However, because the menu and data permissions of this version are not perfect enough, they are not completely open to the outside world to configure by themselves, and I am still responsible for configuring and using the access project.

Integrate plug-ins provided by the service side

Comb the plug-in usage specification, business students according to their own business customized plug-in integration.

The integration plug-in part is still not very convenient. After they have developed it, I need to repackage and deploy it. If there is any problem, THEY need me to re-release it. I was thinking about supporting the OSGi approach to dynamically loading dependencies, but I haven’t had the time to work on it, put it on hold until now, and probably won’t be available for a long time.

Migrate the old gateway filter as the Soul plug-in

The old gateway has a filter for authentication and authorization, which I modified into soul plug-in integration. The old gateway relies on QCONF, hard-coded configuration, and I incidentally directly modified to support NACOS for dynamic configuration of load balancer (some modifications on business). Based on the company’s single sign-on method, At present, it is basically the unified authentication and authorization method adopted by most of the company’s project access gateways.

Cluster deployment and provide domain names

Contact o&M Students Based on nGINx cluster deployment, external support public domain name and Intranet domain name.

Cross-domain problem

Soul itself supported cross-domain, but there was a minor bug at the time that the configuration items didn’t work, and this issue was submitted to the community for a fix.

Logging platform

Collect logs to ELK and process local logs.

Developing logging plugins

Can print the request header, request body, response header, response body and other information, mainly in some cases online troubleshooting is more convenient (especially the custom plug-in to add related fields to the request header, view the request response information), the plug-in code specification, the general logic sorted out has been submitted to the community.

I’ve been trying to reprint the request time and haven’t taken the time to add this feature to it for months now.


In fact, by this stage, it has been basically available for external use, and some projects have begun to be connected.

The following are some of the subsequent developments and configurations that I have occasionally taken the time to add.


Smooth migration scheme of old gateway

The old gateway is based on the Spring Cloud Gateway. For many of these projects, different routing rules need to be smoothly migrated to the new gateway. Therefore, the context_path and rewrite plug-ins on Soul have been tweaked for our scenarios. Some of the spring Cloud Gateway configuration rules (only the ones we use) can be smoothed to soul configuration.

The registry excludes some nodes

A calls C (A->C), and B calls C (B->C). Assume that C registers five instances in the registry. When A calls some interfaces of C, only the first three instances can be called, and when B calls C, only the last three instances can be called.

I also don’t understand why there is a need to call only some of these instances.

So add this function, the registry to do load balancing can queue some nodes, as follows:

After development, I used this scenario more often: some NACOS nodes could not be offline temporarily and traffic could not be sent through, so I ruled it out.

This part of the code was intended to be improved and submitted to the community before, but it has not been improved for several months now. The implementation of my own project is a little rough, the function is ok, and I don’t know whether the community needs this function (not necessarily universal), so I haven’t spare time to improve this part of the submission community.

I will try to submit it later if I have time.

Monitoring and alarm

I did not directly use the monitoring panel provided by the community, mainly because I did not care about some indicators, and then added several indicators to do related monitoring and alarm:

I usually focus on the following indicators (I added them later, but did not have time to submit them to the community, and I will submit them to the community after the code is improved when the follow-up work is not busy) :

TPS for a single gateway instance and TPS per service:

Abnormal gateway forwarding services and number of abnormal forwarding services in the last minute:

YGC frequency in the last hour:

FGC frequency in the last 1 hour:

In terms of performance, I have no idea which indicator is suitable for alarm, so I only use YGC or FGC frequent alarms as performance alarm. The following is an example of previous alarms:

If the gateway forwarding is abnormal, an alarm is also generated (mainly due to timeout or our service authentication failure, forwarding failure has not been available for monitoring and alarm).

conclusion

There are still a lot of areas related to our business that need to be optimized and improved, but due to the lack of time, we haven’t paid attention to them. In the future, more optimization will be added, and if there is a good implementation, we will try our best to spare time to submit to the community.