1 Origin of the problem

It’s worth exploring what went wrong with our Node service, why it happened, and why GraphQL was needed to solve these problems. Next, I will start from the service architecture, briefly introduce the project background, and then through several cases, let you understand more vividly how our current problems arise.

1.1 Introduction to Service Architecture



The left box is for the services of our front-end team, and the right box is for the back-end services. This is just a rough structure, omitting some details such as load balancing. For our domestic hotel service, the diagram here is enough to explain our problems.

The Node group is in a position to undertake upstream and downstream, which belongs to the back-end directly serving the front end, namely the so-called BFF layer. I will explain the formation process of this architecture later.

For now, our main responsibilities are multi-end adaptation, UI adaptation, version control, delivering some AB experiments, log collection, etc.

Although I have mentioned many functions above, I think our core function is actually responsible for transmitting data to the client. Then what problems will occur in the process of transmitting data? Let me illustrate why these problems arise.

1.2 Case 1: Difficulty in data customization

As we all know, our services were originally designed based on THE PC terminal. With the development of The Times, various terminals of the mobile terminal become increasingly rich, including the current APP, small program, touch and so on. So without the split service, we are doomed to an evolution of multi-endpoint adaptation. A typical example of this is that the requirements of the same interface may be different at different ends. Take our hotel system for example:



For the hotel list page display:

On the PC and APP end I can put a lot of information on the list of hotel, each hotel will contain relatively complete hotel model field, however, in some cases, such as small program under this scenario, every hotel only show pictures, may I need quotation, name three fields, the client the changeful demand is very common, But we all have the same interface.

So, our code on the APP side would look something like this:

hotelInfo: {
    name: "Hotel",
    price:232,
    imgUrl:"img.jpg",
    tags:[{
        name:"Parent-child family"}], score: 4.2, rank:"XX Hotel ranked X"
}
Copy the code

It will return all fields under the hotel model directly to the client.

For the small program side, we need to add a parameter to determine the source of the request is the small program, and then delete the fields that are not displayed in the model separately to achieve this kind of field-level control. The pseudocode is shown below:

if ( source= = ="XIAOCHENGXU" ) {
    hotelInfo.tags = undefined;
    hotelInfo.score = undefined;
    hotelInfo.rank = undefined;
}
Copy the code

Let’s say now that my applet has been modified, my new version may want to show four fields, but my old version still only shows three, and then we have a version control problem.

As a server, we have two options, regardless of the differences between the old and new versions, to return four fields redundantly, which brings the benefit of simple development, but does not subtract, no risk. The alternative is to add code like this:

if ( source= = ="XIAOCHENGXU" ) {
    hotelInfo.tags = undefined;
    if( version <120 ){
        hotelInfo.score = undefined;
    }
    hotelInfo.rank = undefined;
}
Copy the code

In this way, we can ensure that different custom fields are returned for different versions, with the benefit of no redundant data transfer. The downside is obvious: the server code becomes less readable, messy, and less maintainable.

Assuming that the above requirements occur 5-10 times a week and are assigned to different people for development, our field changes are sometimes very frequent. However, the interface documents are maintained manually by Wiki or YAPI at present, which makes it difficult for us to ensure that these students update the interface documents in time. Services and documents create many differences that are hard to track.



For the students on the client side, the inconsistency between the documents and the real returned data directly leads to the students on the client side may have to repeatedly communicate with the server side, which virtually increases a lot of communication costs, thus reducing the overall efficiency of iteration and development.

From the perspective of the server side, the same interface, for restful architecture, our solution is usually to add judgment according to the demands of the client, and then filter the data on the server side, delete unnecessary fields, or return different values for the same field. Yet because our every hotel field very much, want to go to do such development clearly to server workload has increased a lot, and I’m only a client example here, and all kinds of clients if each have their own special field customization requirements, we are on the server will need to increase the variety of judgment, Then the decision may only be the return of one or two fields, the essence of the operation is virtually the same, but to make the server-side students development more and more difficult and difficult to maintain.

This kind of practice in the PC era, the product iteration relative frequency is low, is enough to cope with, mobile terminal arrival times, however, demand iteration frequency increases, the client types also gradually become more, all kinds of screen adaptation requirements, field customization requirements are different, and each client may also involve a lot of version change. The development efficiency inevitably decreases, and the operation and maintenance difficulty also increases. At the same time, due to the low offline frequency of fields in historical changes, many redundant fields are gradually formed online. When the content of these fields because of version change of documents maintenance and other historical reasons, not in time to make the development team members also dare not easily delete, is always a natural choice to increase the insurance practice fields, over time will form a vicious circle, more and more fields, can be said to return to the body of the redundancy field also brought certain negative influence transmission performance.

The problems summarized in case 1 can be summarized as follows:

  • Difficulty in data customization
  • Manual Document Maintenance
  • Redundant field transfer

1.3 Case 2: Multi-request Stitching

Hotel Details page on PC:



We can see that the details of the PC side hotel mainly include room type, traffic, details, comments on these four types of resources, we are currently spliced out through two interfaces. On the APP side, on the same hotel details page, due to the layout of waterfall stream, we may return two more resources, peripheral recommendation and guest show, besides the above content.



The need for such disparate resources to be stitched together on the same page. Based on restful interfaces, we usually have two ways to handle this:

  • Make multiple requests on the client
  • Customize the interface on the server side

In the first way, we may add two more requests on the original basis to request guest show resources and peripheral recommendation resources respectively, and then process the data.

// Method 1: / API /detail {traffic, comment} / API /detailprice {detailprice, detail} / API /clientShow {recommendation} / API /recommendation {recommendation}Copy the code

In this way, the data desired by the APP end can be accurately requested. The responsibilities of each interface are clear and need not be modified. However, multiple requests need to be constructed, making the client code bloated.

The second way is to transform the returned content of one of our interfaces according to the different end, and to concatenate different resources on the server side. In the case of APP, we will concatenate these two resources, which may be obtained by sending two requests to the back end.

// Method 2: / API /detail {traffic, comment} / API /detailpriceif( source= = ="APP") {return{Room type, details, guest show, neighborhood recommendation}}return{Room type, details}Copy the code

This approach has the advantage of reducing the number of requests made by the client, but it also creates strong coupling of otherwise unrelated resources, which is actually contrary to the purpose of restful design and leads to hidden dangers in subsequent functional iterations and development. For example, if we modify some content in the peripheral recommendation, it may directly cause the detaiPrice interface to have a work for transparent transmission. At the same time, this approach also makes detailPrice more and more bloated and complex, making subsequent maintenance of the interface more difficult.

1.4 Problem Generalization

So, let’s summarize the main problems encountered in the above cases and try to conduct root cause analysis. In project management, some tools (such as fishbone diagram, 5WHY analysis and Pareto diagram) can help us analyze problems intuitively when we find the root cause of problems. If you are interested, you can learn about PMP and ACP, but I will not expand it here. Now I will use this tool to analyze the root causes of our problems.



As you can see, the fish head on our right is the surface problem that we face, and each of these bones reflects one of the main causes of the problem, which may be at different latitudes.

Such as operational difficulties at present, our project development efficiency is low, the problem of performance degradation, it may be due to the interface documents, version control, multiple request design, redundant data in these aspects, and then we found that these problems are based on a restful interface design specification, constantly demand iteration, The gradual and inevitable problems arising from numerous client page changes can be regarded as the problems of restful interface design itself. In other words, the root cause of these problems is the need to find a better interface design paradigm.

2 Scheme Analysis

2.1 GraphQL profile

In 2012, Facebook started working on a technical solution called GraphQL to solve the problem of server feeding mobile data efficiently. In fact, the background of their project is very similar to the process we are facing, that is, the service started from PC terminal, with the popularity of mobile terminal and increasingly rich demand changes, the efficiency of traditional restful design decreased, they want to solve the efficiency problem, so they made this plan. The difference is that Facebook directly developed the standard early on, and over the years they have perfected the model. And we as learners, or need a change in the process of thinking.

So how do you understand GraphQL?

GraphQL is an API query language and a runtime for implementing data queries, or just a specification. It is usually based on the Http protocol.

What is GraphQL? How does it fit into our already somewhat overburdened system? How did he change our development model?

2.2 a Restful to GraphQL

Through our analysis in the previous chapter, we have seen that the design style of restful apis gradually presents some unavoidable problems in the long cycle, high-speed iterative requirements change. Let’s see how these problems look in GraphQL.



As we can see from the official documentation, GraphQL has:

Precise data customization, which means that clients can decide what data they want to get, no more, no less, precisely customized. The first two problems on our left seem to have been solved.

Automated document generation capabilities, which can greatly reduce the time we developers waste on document maintenance, freeing up energy to focus on business development.

For example, for restful, each endpoint corresponds to a resource. However, for GraphQL, there is only one endpoint. If we want to query something, we can simply freely combine different resources into a request. The same page does not exist in case 2 because of the combination of resources and request design problems, avoid the problem of multiple requests.

2.3 Implementation of GraphQL

Let’s take a look at how the GraphQL specification fits into our existing system and solves the problem.

The first step is to define the schema.

In order to simplify the problem, we still take the hotel model of the hotel system as an example. The following figure shows the object of hotel information. The SDL language we use to define schema is very simple.

hotelInfo:{
    name: "Xx hotel",
    price:232,
    imgUrl:"img.jpg",
    tags:[{"name":"Parent-child family"}], score: 4.2, rank:"XX Hotel ranked X"
}
Copy the code

Here’s how this model looks abstracted to GraphQL’s SDL:

type Hotel {
    name: String!
    imgUrl: String!
    price: Int!
    tags: [Tag]
    score: Float
    rank: String
}
 
 
type Tag {
    name: String!
}
Copy the code

As you can see, the process of defining a schema is a view-model integration process.

Step 2, write a resolver parser.

This step is fairly straightforward. Once we have defined all the fields and types of the view model that serve the business on the server side, we naturally need to write a method for getting this data. This step for the already existing services actually very simple, you only need to indicate in the resolver you define where access to the data in this model, and the data access method, asynchronous functions, can be calculated using synchronization function, read from the cache, the read database, called tripartite API. You can say that GraphQL is doing an aggregation job here, aggregating more upstream data, maybe micro-services or restful services, in the GraphQL layer. A pseudo-code for Reslover:

hoteReslober:( id: String ) => {
    return db.getHotel(id);
}
Copy the code

The third step is to change the client’s request mode.

For the client, that is, the caller, the biggest change is that we are no longer required to passively process the data returned by an interface, and no longer face the problem of over-fetch and under-fetch. We just have to describe exactly what we want on this page, and then focus on data processing, view construction.

Here is an example of a query on the APP side:

query: hotel (id: String ) => {
    name
    imgUrl
    price
    tags{
        name
    }
    score
    rank
}
Copy the code

2.4 Practical transformation process of GraphQL

In order to fully explain the evolution of our project, I would also like to review the historical evolution of the architecture.



We can see that in the beginning, our service was directly called by the client to a Java service. At that time, the client was still mainly PC. Later, with the development of The Times, there are more types of clients, and the Java layer is responsible for many changes of the requirements of the view layer outside the business. Because this layer of service has some business level processing of its own, it does more and more work, more and more complex, and for a period of time, the service often becomes a bottleneck, resulting in more and more backlogged requirements. With deadlines and compatibility, it’s easier to develop some bugs, make some mistakes, and get overwhelmed. Therefore, as time went by, the large front-end group decided to migrate part of the view-layer-related work in the Service, while the Node group itself was composed of former front-end students, so it was natural to choose Node.js as the middle-layer Service technology solution.



In fact, all the problems mentioned in the first chapter already existed in the service. Although we wanted to improve the service through GraphQL, we could not solve the three challenges of migrating a lot of code, constantly taking into account the launch of new functions, and reconstructing GraphQL service in one move. Therefore, in order not to affect the external business, but also to apply this technology, our Node group first implemented a service layer, the internal services into two layers. Externally, it is still a black box restful API. Internally, a lot of logic responsible for view layer control has been transformed into GraphQL services, which are directly called by its own upper-level services.



Due to the long construction period of our transformation, during the implementation of this step, our small program group also added a layer of its own GraphQL service synchronously, and this layer of service forwarded the interface calling our Node service. The internal transformation of the small program was completed gradually while we were transforming. And online using GraphQL for a while now. As you can see, GraphQL itself is a very thin layer architecture, with a typical processing time of less than 20ms. In fact, if you are a restful microservices architecture, it acts primarily as a data aggregation layer for the Gateway. For mature businesses, it is possible to upgrade smoothly. It is also very flexible to apply at the architectural level. Don’t stick to the direct service of a comprehensive transformation in place.



Then our server will further directly provide the external GraphQL service, at which time, the client can also carry out the development of the transformation call mode at the same time.



The following is a comparison of the traffic saved by a small program using the GraphQL service for the same interface call. We can see that the customization of different pages and the same resource, if properly used, can greatly reduce our redundant data, and the transmission speed improvement brought by reducing the return volume is also very obvious.



For the development process.

Traditionally, we do this.



The students at the front and back end should first define the interface, then mock data, develop separately, test themselves, and then combine, test themselves and test themselves. During the joint investigation, we may have to communicate several times about the development environment, version convention, parameter change, inconsistent interface document return and other issues, wasting a lot of time outside of development.

What about GraphQL?



After the attribute name and type are defined in the schema, we server-side students can directly focus on the development of domain services. After development, we can directly upgrade to achieve basic decoupling from the front end.

Front-end students can directly use this service, there is no need to agree with back-end students version number as back-end judgment parameters, just need to view the latest document, write a new query, you can start self-testing.

Both sides of the students have greatly reduced the meaningless joint investigation in the middle, agreed parameters of communication time.

2.5 Some problems seen in practice

While GraphqQL has all of the above advantages, we all know that there is no silver bullet, so let’s take a look at some common GraphqQL development problems.

  • It increases the difficulty of back-end system design

Traditional restful APIS are more concise and clear for system designers, and the responsibilities of each endpoint are very clear, reducing the interdependence and interference of each module. Therefore, in the system design of GraphQL, special attention should be paid to the coupling between modules, instead of mixing all modules together. This raises certain difficulty to the design. As an existing system, you can design the relationship between view models in a purely restful style at migration time, allowing GraphQL to inherit the benefits of restful.

  • Whether there will be back-end performance issues

Performance issues are one of the biggest concerns and concerns on the back end. Will there be additional database queries? How to optimize query performance? All of these present optimization and design challenges. If you’re in the middle tier of Node like us, you don’t have to deal with this problem, leaving the optimization of the database itself to services further back to avoid SQL penetration.

  • Costs of migration

For large systems that already exist, how can migration be accomplished, and do language changes need to be made? Do you need to change the framework? How to smoothly implement the migration? These are the direct costs we are facing. How to realize the transformation with the lowest cost and risk is the problem we need to focus on.

  • Security issues

Because the client has a very free way to query, so the server is no longer directly control the return field, but need to restrict the requestor some such as query depth, query page limit, etc. As for security, since it is slightly different from restful, designing a more secure and reasonable request scheme is also worth thinking about. The scheme we currently use will not be expanded here.

In general, it is not necessary to use GraphQL to completely replace restful apis. In many cases, it is possible to use GraphQL to call restful apis.

3. Practical value of the company

Our big front-end application GraphQL is not just about solving the efficiency problem of our front-end team. We hope that this experiment will have a far-reaching impact.

  • Application scenarios of the transformation

First of all, this experience tells us that GraphQL applies to scenarios.

We don’t think it is a restful substitute, but in our current architecture where the server provides the view layer service, namely the BFF layer, which needs multi-end adaptation, we can use GraphQL to give full play to its value.

In addition, for the existing large-scale system, we can not blindly make bold and bold direct reconstruction, but flexibly apply the fine-tuning of the architecture to gradually realize the transformation of GraphQL, so as to achieve the transformation with the lowest risk and cost.

For many internal restful services, we don’t have to change them. We can use GraphQL services to call restful services in the outer layer and keep the advantages of restful service development in the back end.

  • The courage to embrace change

Although many of our services are the legacy of the company accumulated over the years, some are still very useful, but some of the services are actually relatively old at the technical and business level, which will bring a lot of unavoidable legacy problems.

In my opinion, the company system is an organism, and each system needs to constantly update and reconstruct itself. This process not only improves the performance of each internal system, but also achieves the effect of 1+1>2 through the improvement of the efficiency of each link, so as to achieve qualitative change in the overall function of the company.

If we have a new system that meets my requirement of changing pages frequently and has a BFF layer to aggregate and process data, please feel free to embrace change and choose GraphQL.