The problem background

After receiving the parameter from the client, the base64 decoding failed. After investigation, it was found that the reason was that there was + in the string before the parameter was uploaded. However, after RECEIVING the parameter, PHP found that + changed into a space, resulting in the base64 decoding failure.

The validation test

Access a test interface /internal/test

curl 'http://127.0.0.1/internal/test? a=abc+def'
Copy the code

Validation 1: Simple output $_GET

public function test() {
   var_dump($_GET);
}
Copy the code

Results:

array(1) {["a"] = >string(7) "abc def"
}
Copy the code

Conclusion: You can see that the + becomes a space when you receive the GET argument directly

Why + becomes space

After a search, first of all we need to know what is URL encoding

URL encoding

A case in point

A common URL, such as a URL for CSDN search (so.csdn.net/so/search/s… So.csdn.net/so/search/s…

This is where the URL is encoded by converting Chinese to two hexadecimal numbers starting with %.

Why is the URL encoded?

The parameter part of the URL is composed of pairs of key=value parameters, whereas if &=/? When special characters with certain functions in URL appear in key or value, semantic inconsistency will occur. For example, the value of parameter Q is A&B. When a parameter pair q=a&b&f= S appears, does it mean that the value of q is A&B or that the value of Q is A and the value of B is empty?

Therefore, the URL should be encoded so that the encoded characters are no longer ambiguous. Q =a&b&f=s in the above example will be encoded as q=a%26b&f=s.

How do you encode urls?

How urls are encoded is dictated by the RFC standard,

  1. In rfC-1738, it is proposed to encode unsafe characters in URL by using % and two hexadecimal digits. Note that Spaces are encoded as + in this standard
  2. Encoding of parameters is mentioned again in the urIS specification in the updated VERSION RFC-2396, and note that Spaces are encoded as %20 in this standard
  3. In the updated VERSION of RFC-3986, more detailed recommendations are made on Url codec, indicating which characters need to be encoded so as not to cause semantic changes in Url, and explaining why these characters need to be encoded.

Let’s go back to the problem we started with

From the above data, we can see that the reason why + is changed to space is exactly according to the rfC-1738 standard for inverse coding, namely. PHP accepts the $_GET argument according to the RFC-1738 standard. So when you read $_GET directly, + is decoded as a space instead

How to solve this problem

So how do we get PHP to decode according to rfC-3986 instead of RFC-1738?

The easiest thing to do, of course, is to have + encoded the right way, which is to encode the URL according to rfC-3986 when the client requests the interface. At this point + is encoded as %2b, and when PHP receives the argument, it decodes %2b to +, and you’re done.

The verification results

Encode the URL correctly

 curl 'http://127.0.0.1/internal/test? a=abc%2bdef' 
Copy the code

You can see the interface output

array(1) {["a"] = >string(7) "abc+def"
}

Copy the code

Are there any other holes in the PHP language?

In addition to accepting $_GET arguments, there are two common functions in PHP that handle URL arguments, urlencode and urldecode. Note that these two functions are also encoded and decoded according to RFC-1738, as can be seen from the instructions on the official website

This pegasus int ‘l from the » RFC 3986 Encoding (see Rawurlencode ()) in that for historical reasons, spaces are encoded as plus (+) signs.

Do a test

The string ABC def is encoded first

 $str = 'abc def';
 echo urlencode($str);
Copy the code

The output

abc+def
Copy the code

The string a= ABC +def is then decoded

 $str = 'a=abc+def';
 echo urldecode($str);
Copy the code

The output

a=abc def
Copy the code

You can see that Spaces are indeed encoded as +, and + is decoded as space

How do you solve it?

Rawurlencode and Rawurldecode are available in PHP using rfC-3986

Rawurlencode — urL-encode according to RFC 3986

Let’s do another experiment where we encode the string ABC def

 $str = 'abc def';
 echo rawurlencode($str);
Copy the code

The output

abc%20def
Copy the code

You can see that the space is encoded as %20, and then the string A = ABC +def is decoded

 $str = 'a=abc+def';
 echo rawurldecode($str);
Copy the code

The output

a=abc+def
Copy the code

You can see that the + decoded is still +, not a space

conclusion

Therefore, the most standard and easy to implement solution is to make the client or front end follow THE RFC-3986 standard for correct URL encoding when requesting the server interface