1 introduction

One day, a colleague threw out a question in the product problem feedback group: if you log in and open the page two days later, you need to log in again. In the process of checking the problem, we mentioned that 7 days expired, and the boss also threw a question: why 7 days? Hence today’s article. First let’s take a look at the concept of WEB session management.

2 the concept

Before we talk about WEB session management, we need to talk about the HTTP protocol. Instead of mentioning HTTP2.0 and HTTP3.0, the HTTP protocol in this article mainly refers to the most widely used and oldest version of HTTP1.x.

HTTP is a connectionless stateless protocol.

  • No Connection indicates that only one request can be processed at a time. The server disconnects from the client after processing the request and receiving the response from the client. Keep-alive is added in later protocols. This feature keeps the connection from the client to the server Alive and prevents the establishment or re-establishment of the connection when there is a subsequent request to the server.
  • Stateless means that the server has no memory of the transaction at the protocol level and does not know the state of the client. When we send an HTTP request to the server, the server returns data based on the request, but the server does not record any information after the data is returned.

In the early days of the Internet, such protocols could fully meet the needs of WEB applications. However, with the improvement of the complexity of WEB applications, identifying users, recording their behaviors and having user states became rigid requirements, or core function points. At this point, session management comes into being. Session management is a mechanism to control and maintain user interaction state. It defines a series of measures for managing user and WEB application system interaction state.

Session management has two key points, one is how to store and the other is what to store.

3 Storage Mode

3.1 Server Session

In early WEB applications with session management, sessions are often stored in the session on the server. Common WEB programming languages have their own session processing logic, such as PHP and Java, which have session extensions or components by default. Such as the PHP global variable $_SESSION. The client stores the sessionid and passes the sessionid through cookies/hidden form fields/rewriting urls. The latter two methods are generally enabled when cookies are disabled.

Advantages of server seesion

  • Good security, the client and server to maintain the session state of the media is always a sessionID string, we generally ensure the randomness of this string, to prevent the attacker traversal or exhaustive; Impersonation is only possible through CSRF or HTTP hijacking. In the worst case, even if impersonation is successful, there will be some authentication of login credentials on the server side for in-depth defense;
  • High degree of control, centralized storage in the server, background students can operate it, such as kicking offline operations.

For the storage of the server side, there will be the expiration problem we mentioned in the introduction. Why should there be expiration?

  1. In order to save server resources, login state information stored in the server or a certain amount of resources, if not expired, tens of millions of users login state data will be a larger cost, and this cost is completely can be saved;
  2. For security, the user does not operate for a long time, if someone uses his account to do something else, so there is a greater risk; In addition, the business logic can be implemented in order to avoid some security risks.

With expiration, there will be expiration times, common expiration times are 1 hour (3600 seconds), 2 hours (7,200 seconds), 1 day, 7 days (a week), 15 days, 30 days (a month). This expiration time is more of an empirical value, or a value that everyone thinks is appropriate. There seems to be no objective evaluation of the business impact of this value.

Common storage schemes of seesion on the server are as follows:

  • Single-machine solution, storage is either hard disk or memory, the difference between the two is the speed of obtaining and writing, for massive Internet applications, storage to memory solution is more common. The single-node storage solution has a certain upper limit, which is the upper limit of the single-node storage.
  • Distributed solutions. Distributed in-memory databases such as Redis/Memecached, and Tomcat have shared session solutions. Distributed storage solution theoretically has no storage upper limit and can be distributed deployment for Redis.

3.2 Client Cookie Scheme

Considering the burden on the server and the complexity of the architecture, we can store the user’s login credentials directly in the client (browser), depending on the feature that the browser carries cookies with every request. When the user logs in successfully, the login credentials are written into the cookie and the cookie is set to expire. Subsequent requests directly verify whether the cookie with login credentials exists and whether the credentials are valid to determine the user’s login status. Rails’ default session storage scheme was cookie.

advantages

  • The stateless server is realized, completely removing the logic of session management on the server, and the server is only responsible for creating and verifying login cookies.
  • The login state between different applications can be maintained through algorithm or key consistency to achieve session sharing.

disadvantages

  • Cookie has a limited size and cannot store too much data. At the same time, sessions occupy the space of other business scenarios, resulting in cookie space shortage.
  • Cookies are carried with each request. When there is a lot of redundant data in the session, the performance will be affected and some unnecessary network bandwidth will be generated.
  • Cross-domain problems may occur in multiple applications. When a browser makes a request, it checks all stored cookies. If a cookie’s declared scope is greater than or equal to the location of the requested resource, it will attach the cookie to the HTTP request header of the requested resource and send it to the server. Cross-domain problems occur when the scope is small or interlaced.

3.3 Token scheme

Compared to the previous two schemes, the token scheme is common after the front and back ends are separated, and the concept of session is lighter in this scheme.

Before we get to the token scheme, let’s talk about two very important points of security: authentication and authorization. The two are sometimes confused, and are defined as follows:

  • Authentication is the process by which a user, website, or application proves that they are who they claim to be by providing a valid certificate or authentication.
  • Authorization refers to the process of verifying what a user can access. During authorization, a user/application is allowed to access specific APIs/modules only after its permission level is determined. Typically, authorization occurs after the user’s identity is authenticated.
The difference between certification authorization
role Determine who the user claims to be Determine the permissions that users can access
way Verify users with valid credentials Verify access by rules and policies
The timing Before the authorization Executed after the authentication is successful
implementation With ID Tokens With Access Tokens

Token-based authentication and authorization is one of the most popular techniques. When a user enters his username and password somewhere once, in exchange, he gets a unique generated encrypted token. This token is then used in place of the login credentials to access the protected page or resource. When we want to proceed to the next business operation, we get the user’s information and the session’s information through the token, usually through the in-process cache or a specific microservice whose back-end storage varies by company technology architecture and service. Generally speaking, for massive Internet applications, the data is ultimately retrieved from in-memory, such as the distributed NoSQL database mentioned above.

A token is a piece of data generated by the server that contains information that uniquely identifies a user, typically generated as a long string of random characters and numbers.

For example might look like this: bb74324734bcf34748bb08bu2842f3288 or more complex such as: eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJ1bXMiLCJzdWIiOjY1NDA1NjI5NCwiYXVkIjoiZ2FvZGluZ3giLCJleHAiOjE2NDE5MDgzOT The token z9.8mZAB55VLSCsdZXBMO-N9OL9UTVYPVVGUfab7dFe6fy by itself is meaningless and useless, but combined with the proper token system, it becomes an important link to ensure application security.

Token-based implementation process is generally as follows:

  1. Users request access by user name and password (or other login methods, such as wechat scan code in China);
  2. Application authentication login credentials;
  3. The application issues a signed token to the client.
  4. The client stores the token and sends it with each subsequent request (usually in a cookie, similar to a sessionID);
  5. The server validates the token and responds to the data.

There are two token-based schemes we commonly use:

  • OAuth 2.0 (RFC 6749 and RFC 6750).
  • JWT (RFC 7519).

A more detailed description of these two scenarios can be found in: In-depth OAuth2.0 and JWT

There are two major security vulnerabilities in the token lifecycle:

  1. Weak link in session token generation process, the generation of token depends on the user through user name/email or password generation, there is a great risk of leakage;
  2. All the weak points that handle session tokens, such as in transit (no encrypted link or hijacked, etc.), are logged moderately.

4 Storage Content

A session is essentially a cache, a cache of strongly associated users, so common data related to users can be stored in the session, such as avatar, username, real name, credits, and so on. The cached data is read-only and needs to be retrieved from the database and updated again when it is updated, rather than depending on the session data.

In addition to common data, we often write shopping cart or menu, permissions and other things into the session. In addition to user attributes, we also bring business attributes, which can be called business sessions. Service session refers to the data that needs to be cached temporarily in a certain service scenario. Most of the data needs to be stored in the database. After a user logs in again after the session expires, the data needs to be retrieved from the database and loaded into the session.

5 subtotal

The final reason for this problem is that there is a business logic in the business logic for different channels, and this logic does not need to be triggered at a certain time. In addition, the boss finally asked a soul-searching question in the group:

Do we do things based on inertia, or do we do things based on deep thinking?

Self-reflection ~~

This article has been in ink for several weeks, and I don’t know how to write well. I always feel that the writing is not satisfactory. If I write to the agreement itself, I don’t seem to want to write.

Hello, MY name is Pan Jin. I have more than 10 years of experience in R&D management and technical architecture. I have published books, started businesses and led A team of more than 100 people. I worked as NOI and ACM in my early years, and I always keep a strong interest in front-end architecture, cross-end architecture, back-end architecture, cloud native, DevOps and other technologies. I like reading and thinking in daily life, and I welcome you to exchange and learn with me. Wechat official account: Architecture and Distance, blog: www.phppan.com