0 x_jin · 2013/11/27 18:50

0 x00 directory


How does a browser parse a URL 0x03 link really have to be this fixed format? Is the 0x04 link really what it looks like?Copy the code

0x01 Link Composition


Do links really have to be fixed to the format we use?

I wonder how many of you have thought about that! The format we usually enter is www.xxxx.com!

Or add the protocol name HTTP HTTPS port and path or add the account password! Below is the picture: Gleason

Part one: Protocol name (ending with a single colon) Part two: User information, that is, account password! Part 3: Host name part 4: Port Part 5: Query, there is a bug... It should be? After the number of the content is the query! Part 6: Fragment IDS (which are not sent to the server!)Copy the code

0x02 How does a browser parse a URL


We all know that when we visit a website, there are protocols like HTTP, FTP, HTTPS and so on!

First, the browser extracts the protocol name from our link. How does it do that?

(The following is copy web trapped on the content he wrote more detailed!)

1. Extract the protocol name:

He will look for the first: where and how to find the number so: the left of the number is the protocol name! If you get a protocol name that doesn’t have a character, think it’s a relative URL and not the protocol name!

2. Remove hierarchical URL tags:

The string // should count after the protocol name. If this character is found, it will be skipped. If not, it will be ignored. So http:baidu.com is also accessible! Backslashes can also be used in browsers instead of the diagonal bar \\ instead of // except firefox!

3. Obtaining authorization Information:

Scan the URL in turn and intercept whichever of the three symbols appears first

/(forward slash)? (Question mark) #Copy the code

Extract information from the URL, even if authorized part of the information!

Other browsers accept it except Internet Explorer and Safari; (semicolon) is also an acceptable separator in the authorization information section!

(1) Locating login information, if any:

Authorization part of the information extracted, in the interception of the information to find @ if found so he is in front of the part is landing information! Login information to search again: (colon) colon in front of the account is behind the password!

(2) Extract the target address

The rest of the authorization information is the destination address, the first colon separated even host name and port! Square brackets are used to enclose ipv6 addresses, which is also a special case!

Combined with the above information, we analyze the following links:

ftp://admin:[email protected]: 21Copy the code

I often use this link to log in to FTP! This will give the admin identity password :admin

FTP protocol to log in to host 192.168.1.100, port number is 21!

4. Determine the path (if it exists)

If the authorization section ends with a diagonal bar, in some scenarios, followed by a backslash or semicolon, as mentioned earlier, scan the next one, okay? # or the end of the string, whichever comes first! The intercepted part is the path information! Finally, according to the Unix path semantic normalization arrangement!

5. Extract the query string (if it exists)

If it was followed by a question mark in the previous parsing, scan for the next # or the end of the string, whichever comes first! The middle part is the query string.

6. Extract the fragment ID

If the last message is successfully parsed, and it ends with a #, then the fragment ID from that symbol to the end of the string is counted, and the fragment ID is not sent to the server! This is used to jump to the anchor of the A tag or js location.hash, etc.

You’ll remember if you went basic certification fishing with your gay friends at Wooyun last year! At that time, many websites in the place of inserting pictures have determined whether the suffix is the suffix of the picture JPG GIF and so on! But hook doesn’t end in a GIF! The solution was to add #.jpg after the hook! This can be successful to fishing! The principle is the same!

Let’s take a few examples to analyze:

Example 1:

http://xss1.com&[email protected]
Copy the code

Such a link would seem to the average user to be visiting xss1.com!

But it’s actually going to www.baidu.com! Why is that? Combined with the above knowledge we analyze!

First the protocol name is extracted and then the authorization part is extracted, right? / # the browser can’t get a string to get the host address! [email protected] @ is considered login information and will not be treated as hostname. So now xss1.com&action=test is used as the login information and now the only hostname is www.baidu.com!

And xss1.com&action=test is treated as login information to visit www.baidu.com when we visit the website!

Example 2:

http://xss1.com\@www.baidu.com
Copy the code

First take a look at how this link looks in Chrome:

It’s obvious to see such a link in Chrome and go to Xss1.com!

Now let’s look at what it looks like in Firefox:

We will be prompted if we want to visit www.baidu.com with the account number xss1.com\.

Why is that? Browser differences we talked about earlier!

Because in firefox, other browsers will (backslash as forward slash parsing!)

The forward slash represents the end of the authorization information section! Because extracting the authorization part information is done with \? #

So the authorization information section ends and the bento in front becomes the host name!

Firefox doesn’t treat \ as a forward slash ([email protected]) as a login message followed by the host name! That’s why the above prompt appears when you use Firefox to access the link!

Example 3:

http://xss1.com; .baidu.com/Copy the code

Because the machine does not have IE on the above bar!

Microsoft browser allows host names to appear; (semicolon) and successfully resolved to this address! Of course, baidu.com also needs to do such domain name resolution Settings in advance!

Most other browsers automatically correct the URL to http://xss1.com/; .baidu.com/

The user then accesses xss1.com(except Safari, which considers this syntax error)

Does the 0x03 link really have to be this fixed format?


Do not know how many people have thought about this problem, the link really can only be like this!

After the above introduction, I believe you should say No!

I remember a previous article about how XSS loaded hooks in http://blacklist! So the brother split the http://

var i='http';
var b='://';
Copy the code

That’s one way to do it but do we have a better way to do it? The answer is yes //www.baidu.com can also be loaded!

The protocol used to load the hook is the same as the protocol used to load the page. If you load a hook like this on an HTTPS page, the default is HTTPS to load the hook!

At this point, we have to think about how else can we load a web page if we can open a web page normally? This time we can fuzz!

The diagram below:

As you can see, we can also type TAB, newline, / @ \ and so on! Let’s test that out! Construct the following link to visit!

\\/www.baidu.com
\\@www.baidu.com
\\[email protected]
\\\\\\\www.baidu.com
///////www.baidu.com
Copy the code

And so on all normal access to Baidu! You can try it yourself! Best words to write in a tag or IMG script! This is closer to the environment we usually encounter!

Is it obscene since we mentioned it in the title of the article? No is not enough! In this way, our connection is always characteristic!

WWW. Com. Net what characteristics are still in, since it is obscene we will be more obscene! Like a string like this!

ⅅʳºℙˢ -- --> drops ʷº ʸⓊ --> wooyun ℙˢ --> orgCopy the code

Finally put together:

ⅅ ʳ DHS ℙ ˢ. ʷ DHS DHS ʸ Ⓤ ⁿ. DHS ʳ ℊCopy the code

This is also able to visit you can try!

So where does this string come from?

We can use HTTP :/xsser.me/hf.html to fuzz!

Before fuzz:

Code for the domain name: Punycode

Punycode-encoded domain names are recognized by DNS servers!

Take the Chinese domain name for example, because the core of the operating system is composed of English, the DNS server resolution is also by the English code exchange, so the DNS server does not support direct Chinese domain name resolution. All Chinese domain names need to be translated into Punycode, and then the DNS parses punycode. Finally we successfully visited the website we want to go to! Today, punycode is not parsed by the DNS server but decoded by the browser when it is accessed!

The sleepy Dragon in Drops was also mentioned in the article!

drops.wooyun.org/papers/146

With all that said, start! (And by the way, how this thing works.)

Callback = hostname; Callback = x.protocol;

Then change the hostname value to the hostname we want to test! (Do not bring the agreement name)

Such as drops.wooyun.org

Then change the A tag link in exp to the host name with the protocol name! (No access without it)

All Settings are set as follows:

The following small parameters can be used by default! With all the parameters set, we now want to identify which character we want to test by replacing it with :{CHR}!

Ok, now after setting up, click Fuzzing to shoot the first bird and let’s test D first!

You can see a number appearing in the box on the right. This number is ASCii characters separated by commas!

We can use tools to convert ASCii codes back, but I like chrome for its convenience!

Now we copy them! Then throw them back into Chrome! Open the console (F12)

Type string. fromCharCode and press Enter to get it!

Ok, so after testing, we have the first character, d, that we can use

Dd ⅅ ⅆ Ⅾ ⅾ Ⓓ ⓓ DdCopy the code

Instead of!

Here I will not give you a fuzz! Let’s post the last string after fuzz!

http:// ⅅ ʳ DHS ℙ ˢ. ʷ DHS DHS ʸ Ⓤ ⁿ. DHS ʳ ℊCopy the code

You can copy it and visit it! It’s still accessible!

But this is also limited to requiring a parsed middleware to access!

Not if you curl!

Why is that? Simple because he won’t parse the string without parsing curl!

And the reason why the browser can access it is because it will parse the encoded value and then access it!

So that’s something to know too!

But where can we use that? Let’s look down!

If you’re inserting a hook or something and the other party is blacklisting www.com.org, you can bypass it in this way!

The train of thought here everybody goes diffusing what more wretched train of thought asks communication below!

Let’s do another example!

First take a red X site that Tencent considers dangerous

You can see this link is sent out as a dangerous site!

Now let’s call one of the characters fuzz! Why is it a character?

(Because your fuzz character will be used as a symbol to Tencent that this is not a link! Then you can’t just open the page like this…)

You can see this with more symbols let Tencent this is not a link will not generate a hyperlink!

So we usually only fuzz a few characters good!

Let’s do it! Let’s test it!

The original link: http://laohujijiqiao8.com

Again, use http://xsser.me/hf.html for fuzz

After a fuzz test out http://laohujijiqiao8.com o the following characters can be used instead! 

O O º ℴ Ⓞ ⓞ O OCopy the code

Now let’s test it out!

http://laohujijiqia DHS 8. comCopy the code

Send out a sign to see if you still have a dangerous website! Above: 

Now there is no sign that this is a dangerous site and can be opened normally! Did we get what we wanted?

Before using this way to mark a blue site into the display of Tencent official website! Here’s the link:

http:[email protected]#
Copy the code

(PS: before did not add # or blue link, but add # will be displayed as Tencent official website!)

Because the front of the link: www.qq.com sent out will be shown as Tencent official website! But not now!

Is the 0x04 link really what it looks like?


Someone in the community made such a post: Baidu URL jump bypassed Tencent red XX

But do we really need to have url jump vulnerability to jump?

No Any website will do! As follows:

http:[email protected]
Copy the code

Enter this address in your browser and you will find that you have gone to www.qq.com instead of www.baidu.com as you usually think. Why? We can look at the beginning of this article!

Http://can be calculated after userinfo also calculate user information account password of what!

[email protected]! So why do we go to qq.com instead of Baidu.com [email protected] to let the browser think that www.baidu.com is a piece of user information after the host name he wants to visit the address!

So we sometimes pretend to find jump vulnerability can also be so implemented!

In Chrome and Firefox, however, you can write:

http:[email protected]
Copy the code

The protocol name without // is considered http://

Gay people who have never seen Web trouble or data URI before! You might be surprised to see this little example above, but you can do this!

In the web trap also said that actually URL address can be used to replace the base! Just calculate the IP address to convert into a base to access!

Decimal — – | | | | | | > hex – | | | | | | > octal Then the protocol specified in the interview and then add a 0

http://0[octal] such as 115.239.210.26 is used first. Split the numbers 115, 239, 210, 26 and choose base 10 to convert to base 16!

(Use zeros for prefixes, either one or more zeros, like adding more zeros in XSS to bypass the filter!)

First, convert these four digits into hexadecimal! The result: 73efd2 1A and then convert the hexadecimal number 73EFD21A to base 8!Copy the code

Results: 16373751032

Then specify that the protocol http:// is prefixed with 0 plus the resulting link:

http://0016373751032
Copy the code

Successfully resolved to our original IP!

Combine this with the first example:

http://xss1.com&[email protected]
Copy the code

The back also with www.baidu.com too eye, now we above the conversion of the address added in the back remember to bring 0 prefix!

http://xss1.com&[email protected]
Copy the code

Doesn’t it make you feel better?

Let’s see if we can use this address to load some resources such as image js and so on!

You can see that the image loaded successfully! That should also load JS and so on!

I believe that the spread of gay friends have ideas, usually used to bypass some restrictions and so on!

Specific we go to experiment! The world of the Web is infinite!