Gateway, tunnel, and trunk

The Web is a powerful tool for publishing content. Over time, people have moved from simply sending static online documents over the web to sharing more complex resources, such as database content or dynamically generated HTML pages. HTTP applications such as Web browsers provide users with a unified way to access content on the Internet. HTTP has also become a basic building block for application developers, who can carry back content from other protocols over HTTP (for example, wrapping traffic from other protocols in HTTP). All resources on the Web can use HTTP, and other applications and application protocols can use HTTP to accomplish their tasks.

The gateway

The evolution of HTTP extensions and interfaces is driven by user needs. When the need arose to publish more complex resources on the Web, it quickly became clear that a single application could not handle all of these resources imaginable. To solve this problem, the developers proposed the concept of a gateway, which can be used as a kind of translator, abstracting a way to reach a resource. Gateways are the glue between resources and applications. An application can request a gateway (through HTTP or some other defined interface) to process a request, and the gateway can provide a response. A gateway can send queries to a database or generate dynamic content, like a gate: a request goes in and a response comes out. Some gateways automatically convert HTTP traffic to other protocols, so that HTTP clients can interact with other applications without having to understand other protocols.

Client and server gateways: The Web gateway uses HTTP on one side and another protocol on the other. Therefore, the gateway connecting the HTTP client to the NNTP news server is an HTTP/NNTP gateway. We use the terms server side gateway and client side gateway to describe which side of the gateway the conversation takes place on.

  • Server side Gateway: communicates with clients through HTTP and with servers through other protocols (HTTP/*).
  • Client-side gateway: Communicates with the client over other protocols and with the server over HTTP (*/HTTP).

Protocol gateway

HTTP traffic is directed to a gateway in the same way as traffic is directed to a proxy. Most commonly, browsers are explicitly configured to use gateways to transparently intercept traffic or to configure gateways as substitutes (reverse proxies).

  • Server-side Web gateway (HTTP/*) : When the request flows to the original server, the server-side Web gateway converts the client HTTP request to another protocol.
  • Server-side Security Gateway (HTTP/HTTPS) : An organization can encrypt all incoming Web requests through the gateway to provide additional privacy and security protection. Clients can browse Web content using plain HTTP, but the gateway automatically encrypts the user’s conversation.
  • Client-side security accelerator Gateways (HTTPS/HTTP) : These HTTPS/HTTP gateways are located in front of the Web server and are often used as invisible interception gateways or reverse proxies. They receive secure HTTPS traffic, decrypt secure traffic, and send plain HTTP requests to Web servers. These gateways typically contain dedicated decryption hardware to decrypt secure traffic in a much more efficient manner than the original server, reducing the load on the original server. These gateways send unencrypted traffic between the gateway and the original server, so use caution to ensure that the network between the gateway and the original server is secure.

Resources gateway

The most common gateway, an application server, combines the target server with the gateway in one server. The application server is a server-side gateway that communicates with clients over HTTP and connects to server-side applications. The client connects to the application server over HTTP. But instead of sending back the file, the Application server sends the request through a gateway Application Programming Interface (API) to the Application running on the server. The first popular application Gateway API was the Common Gateway Interface (CGI). CGI is a standard set of interfaces that a Web server can use to load a program in response to HTTP requests for a particular URL, collect the program’s output data, and send it back in an HTTP response. Over the past few years, commercial Web servers have provided more sophisticated interfaces to connect Web servers to applications. The early Days of Web servers were fairly simple, and the simplicity of the gateway interface implementation continues to this day.

When a request needs to use the gateway’s resources, the server asks a secondary application to handle the request. The server passes it the data needed to assist the application. This is usually the whole request, or something like a request that the user wants to run on the database (a request string from a URL). It then returns a response or response data to the server, which forwards it to the client. The server and gateway are separate applications, so their responsibilities are clearly separated. This simple protocol (input request, forward, response) is the essence of the oldest and most commonly used server extension interface, CGI.

CGI

CGI was the first, and probably still the most widely used, server extension. It is widely used in dynamic HTML, database query and credit card processing tasks on the Web. CGI applications are server independent, so they can be implemented in almost any language. CGI is simple enough that almost all HTTP servers support it. CGI processing is invisible to the user. From the client’s point of view, it’s just like making a normal request. It has no idea what the transition process is between the server and the CGI application. The characters CGI and possible “?” appear in the URL. Is the only clue that the client is aware of the use of a CGI application. CGI provides a simple, functional glue between a server and a number of resource types to handle the required transformations. This interface also protects the server well from bad extensions that might crash the server if they were directly connected to the server. However, this separation incurs a performance cost. Starting a new process for each CGI request can be expensive, limiting the performance of servers that use CGI, and taxing server machine resources. To solve this problem, a new type of CGI was developed — aptly called Fast CGI. This interface emulates CGI, but runs as a persistent daemon, eliminating the performance penalty of creating or removing a new process for each request.

Server extension API

The CGI protocol provides a neat way for external translators to interface with existing HTTP servers, but what if you want to change the behavior of the server itself, or just maximize the performance you can get from the server? Server developers provide several server extension apis for both needs, providing Web developers with powerful interfaces to connect their modules directly to HTTP servers. The extension API allows programmers to graft their own code onto the server, or replace a component of the server entirely with their own code. Most popular servers provide one or more extension apis for developers. These extensions are usually tied to the structure of the server itself, so most are specific to a particular server type. Web servers provide apis that allow developers to change the behavior of the server or provide customized interfaces for different resources. These custom interfaces provide developers with a powerful interface approach.

Application program interface

As Web applications provide more and more types of services, it is becoming clear that HTTP can be used as a basic piece of software to connect applications. One of the trickier issues in connecting applications is negotiating protocol interfaces between the two applications so that they can exchange data — usually on an application-specific basis. Applications work together and interact with information that is much more complex than HTTP headers can express. The Internet Commission has developed a set of standards and protocols that allow Web applications to communicate with each other. Although Web services can be used to represent stand-alone Web applications (building blocks), we use the term loosely here to refer to these standards. The introduction of Web services is not new, but it is a new mechanism for applications to share information. Web services are built on top of standard Web technologies, such as HTTP.

The tunnel

A Web tunnel allows users to send non-HTTP traffic over AN HTTP connection, allowing them to piggy-back data from other protocols over HTTP. The most common reason to use Web tunneling is to embed non-HTTP traffic in HTTP connections so that it can pass through a firewall that only allows Web traffic.

Establish an HTTP tunnel with CONNECT

Web tunnels are established using the CONNECT method of HTTP. The CONNECT method is not part of the “HTTP/1.1” core specification, but is a widely used extension. The CONNECT method asks the tunnel gateway to create a TCP connection to any destination server and port and blind forward subsequent data between the client and server.

How the CONNECT method establishes a tunnel to the gateway:

  • The client sends a CONNECT request to the tunnel gateway. The CONNECT method on the client requests the tunnel gateway to open a TCP connection.
  • A TCP connection is created.
  • Once a TCP Connection is Established, the gateway sends an “HTTP 200 Connection Established” response to inform the client.
  • At this point, the tunnel is established. All data sent by the client through the HTTP tunnel is directly forwarded to the output TCP connection, and all data sent by the server is forwarded to the client through the HTTP tunnel.

CONNECT request: The syntax of CONNECT is similar to other HTTP methods, except for the start line. A host name followed by a colon and port number replaces the request URI. Both the host and port must be specified (CONNECT home.netscape.com:443 HTTP/1.0). Like other HTTP packets, there are zero or more HTTP request header fields after the start line. As usual, these lines end with CRLF, and the header list ends with a crLF-only blank line.

CONNECT Response: After sending the request, the client waits for a response from the gateway. Similar to common HTTP packets, the response code 200 indicates success. By convention, the reason phrase in the response is usually set to “Connection Established”(HTTP/1.0 200 Connection Established). Unlike normal HTTP responses, this response does not need to contain a Content-Type header. At this point, the connection only forwards the raw byte and is no longer the bearer of the message, so the content type is not needed.

Data tunneling, timing, and connection management

The pipelined data is opaque to the gateway, so the gateway cannot make any assumptions about the order and flow of packets. Once the tunnel is established, data can flow in any direction at any time. As a performance optimization method, it allows the client to send tunnel data after sending the CONNECT request, but before receiving the response. This can send data to the server more quickly, but it means that the gateway must be able to properly process the data that follows the request. In particular, the gateway cannot assume that network I/O requests will only return header data, and must ensure that the data read in with the header is sent to the server when the connection is ready. Clients that pipe data after a request must be prepared to resend the requested data if they find that the response is an authentication request or some other non-200 but non-fatal error state. If, at any time, either end of the tunnel is disconnected, all untransmitted data from that end is sent to the other end, after which the connection to the other end is terminated by the proxy. If there is still data to transfer to the endpoint that closed the connection, the data is discarded.

SSL tunnel

The Web tunnel was originally developed to transmit encrypted SSL traffic through a firewall. Many organizations tunnel all traffic through packet filtering routers and proxy servers to improve security. But some protocols, such as encrypted SSL, are encrypted and cannot be forwarded by traditional proxy servers. The tunnel transmits SSL traffic over an HTTP connection to pass through the HTTP firewall on port 80. To allow SSL traffic to travel over existing proxy firewalls, A tunneling feature has been added to HTTP, where raw encrypted data can be placed in HTTP packets and sent over normal HTTP channels. Non-http traffic is typically tunnelled through a port filtering firewall. This can be leveraged, for example, by sending secure SSL traffic through a firewall. However, this feature can be abused to allow malicious protocols to tunnel through HTTP into an organization.

Comparison between SSL tunnels and HTTP/HTTPS gateways

You can gateway the HTTPS protocol (HTTP over SSL) just like any other protocol: the gateway (not the client) initializes an SSL session with a remote HTTPS server and then performs HTTPS transactions on behalf of the client. The response is received and decrypted by the proxy, and then sent to the client over (insecure)HTTP. Note that for the SSL tunneling mechanism, there is no need to implement SSL in the proxy. An SSL session is established between the requesting client and the destination (secure)Web server, with the proxy server in the middle simply tunneling encrypted data and playing no other role in the secure transaction.

This is how the gateway handles FTP. But there are several downsides to this approach:

  • The connection between the client and the gateway is plain insecure HTTP;
  • Although the proxy is an authenticated principal, the client cannot perform SSL client authentication on the remote server.
  • The gateway should support a full SSL implementation.

Tunnel certification

Where appropriate, other FEATURES of HTTP can be used in conjunction with tunneling. In particular, the authentication support of the proxy can be used in conjunction with tunnels to authenticate the client’s right to use tunnels.

Tunnel safety considerations

The tunnel gateway cannot verify that the protocol currently in use is the one it intended to tunnel over. To reduce tunnel abuse, gateways should only open tunnels for certain well-known ports, such as HTTPS port 443.

relay

An HTTP relay is a simple HTTP proxy that does not fully follow the HTTP specification. The relay handles the part of HTTP that establishes the connection and then blind-forwards the bytes. HTTP is complex, so it is sometimes useful to implement basic proxy functionality and blind forwarding of traffic without performing any header and method logic. Blind relay is easy to implement, so it sometimes provides simple filtering, diagnosis, or content transformation capabilities. But this approach has the potential for serious interoperability problems, so be careful when deploying. There are ways to make relaying a little smarter to eliminate these risks, but all simplified agents run the risk of interoperability problems. To build a simple HTTP relay for a specific purpose, pay special attention to how it is used. For any large-scale deployment, consider using a true, fully HTTP-compliant proxy server very seriously.

A more common (and notorious) problem with some simple blind relay implementations is the potential to hang keep-alive connections because they don’t handle Connection headers properly.

  • The Web client sends a message containing the “Connection: keep-alive “header to the trunk, requesting a keep-alive Connection if possible. The client waits for the response to determine whether its request to establish a keep-alive channel has been accepted.
  • The relay receives the HTTP request, but it does not understand the Connection header, so it passes the packet verbatim down the link to the server. But the Connection header is a hop-by-hop header; Applies only to a single transmission link and should not be sent down the link. Something bad is going to happen!
  • HTTP requests that are relayed arrive at the Web server. When the Web server receives the “Connection: keep-alive “header forwarded by the proxy, it mistakenly assumes that the relay (which looks just like any other client to the server) is requesting a keep-alive conversation! This is fine for the Web server — it agrees to the keep-alive conversation and sends back a “Connection: keep-alive “response header. At this point, the Web server thinks it is having a keep-alive conversation with the relay and follows the rules of the keep-alive conversation. But the relay knows nothing about keep-alive sessions.
  • The trunk sends the response packet and the “Connection: keep-alive “header from the Web server back to the client. The client sees this header and assumes that the relay has agreed to keep-alive. At this point, both the client and the server think they are having a keep-alive conversation, but the relay they are talking to has no idea what a keep-alive conversation is.
  • The relay knows nothing about persistent conversations, so it forwards all the data it receives to the client and waits for the original server to close the connection. But the original server thinks that the relay requires the server to keep the connection active, so it won’t close the connection! The relay will then hang, waiting for the connection to close.
  • When the client receives a response message back, it directly turns to the second request and sends another request to the relay over the keep-alive connection. Simple relays usually do not expect another request to arrive on the same connection. The browser circles around and around, but nothing happens.