“This is the 11th day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021”

This series is source code reading article, or from my most familiar KOA, this time not koA itself, to read koA depends on some tool libraries source, this is the first article in this series, talk about the URL parsing library ParseURL.

Parseurl is a URL parsing tool that parses the same results as the node standard library url.parse. Why would KOA rely on a third-party library when the standard library already provides URL parsing capability? Start reading the source code for ParseURL with questions.

Before reading, take a look at the use of the ParseURL library in KOA. There are only four references to the ParseURL library in KOA, which handles getters and setters for path and QueryString methods in request.

get path () {
	return parse(this.req).pathname
},
set path (path) {
	const url = parse(this.req)
	if (url.pathname === path) return

	url.pathname = path
	url.path = null
	this.url = stringify(url)
},
get querystring () {
	if (!this.req) return ' '
	return parse(this.req).query || ' '
},
set querystring (str) {
	const url = parse(this.req)
	if (url.search === `?${str}`) return

	url.search = str
	url.path = null

	this.url = stringify(url)
},
Copy the code

The parseURL library exports two functions: The default exported parseURL and originalURL, both of which accept reQ as a parameter, are of the same type as the reQ parameter in the createServer callback function in the HTTP module. You only need to focus on the URL and originalUrl properties, namely:

typeReq { url? :string; originalUrl? :string;
}
Copy the code

Where parseURL parses URL attributes, originalURL parses originalURL, and the URL is parsed if there is no OriginalURL. Otherwise, the two methods are the same.

Let’s take a look at the test cases: use the parseurl parse http://localhost:8888/foo/bar, get a Url object:

Url {
  protocol: 'http:'.slashes: true.auth: null.host: 'localhost:8888'.port: '8888'.hostname: 'localhost'.hash: null.search: null.query: null.pathname: '/foo/bar'.path: '/foo/bar'.href: '<http://localhost:8888/foo/bar>'._raw: '<http://localhost:8888/foo/bar>'
}
Copy the code

Compare the results of url.parse:

Url {
  protocol: 'http:'.slashes: true.auth: null.host: 'localhost:8888'.port: '8888'.hostname: 'localhost'.hash: null.search: null.query: null.pathname: '/foo/bar'.path: '/foo/bar'.href: '<http://localhost:8888/foo/bar>'
}
Copy the code

Parseurl has an extra _raw field. This field is unique to parseURL.

function parseurl (req) {
  var url = req.url

  if (url === undefined) {
    // URL is undefined
    return undefined
  }

  var parsed = req._parsedUrl

  if (fresh(url, parsed)) {
    // Return cached URL parse
    return parsed
  }

  // Parse the URL
  parsed = fastparse(url)
  parsed._raw = url

  return (req._parsedUrl = parsed)
};
Copy the code

Here fastParse is the real parse logic for the URL, and you can see that the _RAW field is added to the parse result, but the real purpose of adding it is on the last line, where the final parse result is assigned to the original REq object. If there is an existing _parsedUrl result, and if there is an existing _parsedUrl result and its _raw is the same as this URL, then the result can be returned directly, which is a cache effect.

The internal fastParse logic is pretty simple, and is essentially implemented by calling url.parse, but there is an optimization. When parsing/paths, if there are no special characters, the string can be returned directly without further parse logic, which optimizes performance to some extent.

function fastparse (str) {
  if (typeofstr ! = ='string' || str.charCodeAt(0)! = =0x2f / * / * /) {
    return parse(str)
  }

  var pathname = str
  var query = null
  var search = null

  // This takes the regexp from <https://github.com/joyent/node/pull/7878>
  // Which is /^(\/[^?#\s]*)(\? [^#\s]*)? $/
  // And unrolls it into a for loop
  for (var i = 1; i < str.length; i++) {
    switch (str.charCodeAt(i)) {
      case 0x3f: /* ?  */
        if (search === null) {
          pathname = str.substring(0, i)
          query = str.substring(i + 1)
          search = str.substring(i)
        }
        break
      case 0x09: /* \t */
      case 0x0a: /* \n */
      case 0x0c: /* \f */
      case 0x0d: /* \r */
      case 0x20: / * * /
      case 0x23: / * # * /
      case 0xa0:
      case 0xfeff:
        return parse(str)
    }
  }

  varurl = Url ! = =undefined
    ? new Url()
    : {}

  url.path = str
  url.href = str
  url.pathname = pathname

  if(search ! = =null) {
    url.query = query
    url.search = search
  }

  return url
}
Copy the code

Bench scripts are available in the repository, and you can see that ParseURL’s overall performance is very high, which is probably why KOA chose ParseURL.

In urls, the opposite of parse is format, which converts url objects to strings. Stringify is the url.format method in the setter methods for PATH and QueryString seen above in KOA. No third-party libraries are used here, and node.js is the native way to use it directly.

This concludes the parseURL source code. Many NPM packages do very simple things. The next article will take a look at other NPM packages that KOA relies on.