When you build output files using Webpack, you usually attach a hash to the file name. The hash value is computed based on the content of the file. As long as the content of the file remains the same, the hash remains the same, so you can use the browser cache to save download traffic. But the Hash provided by Webpack doesn’t seem so reliable…

This article focuses only on ensuring that Webpack 1.x produces stable hash values during production and release. If you don’t know webpack, you can poke Webpack.

This article is discussed in the context of WebPack 1.x, as some of the issues have already been addressed in WebPack 2. For the sake of describing the problem, the code and configuration presented in this article may be clunky and may not be engineering best practices, please tap.

For those who don’t want to read the article, consider reading the plugin’s source zhenyong/webpack-stable-module-id-and-hash

The target

In addition to HTML files, other static resource file names are hash values, calculated according to the content of the file itself, to ensure that the file does not change, then the file name after construction is the same as last time.

Hash provided by Webpack

[hash]

Suppose the file directory looks like this:

/ SRC | - pageA. Js (entry 1) | - pageB. Js (2) entranceCopy the code

Using the WebPack configuration:

entry: { pageA: './src/pageA.js', pageB: './src/pageB.js', }, output: { path: __dirname + '/ build', / / intercept [hash] [4] hash: said four filename: '[name] [hash: 4] js'},

First build output:

Currently, it is possible that the current configuration of a log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log logCopy the code

Build the output again:

Currently, it is possible that the current configuration of a log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log logCopy the code

The hash value is stable. Wait a minute!

According to Configuration · Webpack/Docs Wiki:

[hash] is replaced by the hash of the compilation.

Paraphrase:

[hash] is a hash computed from a compilation object, and does not change if the compilation object’s information remains unchanged

How to write a plugin

A compilation object represents a single build of versioned assets. While running Webpack development middleware, a new compilation will be created each time a file change is detected, thus generating a new set of compiled assets. A compilation surfaces information about the present state of module resources, compiled assets, changed files, and watched dependencies.

Paraphrase:

The compilation object represents the process of compiling a release once. If in development mode (e.g. using –watch to detect changes, compile in real time), a new complidation is created for each change. Contains the context information required for the build (builder configuration, files, file dependencies).

Let’s move pagea.js and build it again:

E6a9.js 1.48 kB 0 [emitted] pageA pageb.e6a9.js 1.47 kB 1 [emitted] pageBCopy the code

Finding that the hash has changed, and that all files always have the same hash value, seems to be consistent with the documentation. If any resource (code) that the build relied on changes, the compilation information will be different from the last time.

If the source code stays the same, the hash value must be stable. No, let’s change the WebPack configuration:

Entry: {pageA: '. / SRC/pageA. Js', / / no longer building entrance pageB / / pageB: '. / SRC/pageB. Js'},

Build again:

Emitted by pagea.1f01.js 1.48 kB 0 [emitted] pageACopy the code

The compilation information also includes the build context, so removing an entry or changing a loader will cause hash changes.

The downside of [hash] is that it does not hash based on content, but the hash value is “stable”, which ensures that “the static resource accessed by the browser is new every time it goes live”.

Are you okay with using hash? I’m not? So let’s look at another configuration that WebPack provides to hash by content.

[chunkhash]

[chunkhash] is replaced by the hash of the chunk.

Paraphrase:

[chunkhash] is calculated based on the contents of chunk. (Chunk can be understood as an output file, which may contain multiple JS modules)

Let’s change the configuration:

entry: {
    pageA: './src/pageA.js',
    pageB: './src/pageB.js',
},
output: {
    path: __dirname + '/build',
    filename: '[name].[chunkhash:4].js',
},
Copy the code

Build try:

Even though it is currently emitted by the world at least once a year, it still remains an important dilemma. Currently, pagea.f308.js 1.48 kB 0 [emitted] pageA pageb.53a9.js 1.47 kB 1 [emitted] pageBCopy the code

Click pagea.js to build:

Currently, the configuration of a log log server must be configured with the configuration of a log server. Currently, pagea.16d6.js 1.48 kB 0 [emitted] pageA pageb.53a9.js 1.47 kB 1 [emitted] pageBCopy the code

Found only pageA hash changed, seems [chunkhash] will solve the problem? Wait a minute!

Our current code does not involve CSS, so we need to add a CSS file dependency:

/src
  |- pageA.js
  |- pageA.css

//pageA.js
require('./a.css');

Copy the code

Loader to configure CSS files for Webpack, and extract all styles to output to a file

module: { loaders: [{ test: /\.css$/, loader: ExtractTextPlugin.extract('style-loader', 'css-loader') }], }, plugins: New ExtractTextPlugin('[name].[contenthash:4].css')],Copy the code

Construction:

Ab4b.js 1.6 kB 0 [emitted] pageA pagea.b9bc. CSS 36 bytes 0 [emitted] pageACopy the code

If you change the style, the style’s hash will definitely change, but what about pagea.js’s hash?

The answer is “changed” :

Even though it is emitted by the world at least once a year, it must be emitted by the world at least once a year. Currently, pagea.0482. js 1.6 kB 0 [emitted] pageA pagea.c61a. CSS 31 bytes 0 [emitted] pageACopy the code

Remember that webpack’s [chunkhash] is calculated based on the contents of chunk, and the output of pagea.js chunk is considered by Webpack to include CSS files, which are extracted by you. So when you change the CSS, you change the content of this chunk. This is a bad experience, right? How to make the CSS not affect the HASH of JS?

Custom chunkhash

Source webpack/Compilation. Js:

. this.applyPlugins("chunk-hash", chunk, chunkHash); chunk.hash = chunkHash.digest(hashDigest); .Copy the code

As you can see from this code, chunk.hash can be customized by replacing chunk’s digest method in the ‘chunk-hash’ “hook”.

See the documentation how to write a Plugin to learn how to write a plugin to register a hook method:

Plugins: [... new ContentHashPlugin() // Add plugins (for production release)],}; / / plugin function function ContentHashPlugin () {} / / webpack executes ContentHashPlugin plugin function the apply method. The prototype. The apply = function(compiler) { compiler.plugin('compilation', function(compilation) { compilation.plugin('chunk-hash', Function (chunk, chunkHash) {function(chunk, chunk.digest = function() {return 'this is the custom hash value '; }}); }); };

So how do I compute this hash value?

You can concatenate the contents of each module (single source file) on which Chunk depends to calculate an MD5 as a hash value. Of course, you need to sort all files before concatenating:

var crypto = require('crypto'); var md5Cache = {} function md5(content) { if (! md5Cache[content]) { md5Cache[content] = crypto.createHash('md5') // .update(content, 'utf-8').digest('hex') } return md5Cache[content]; } function ContentHashPlugin() {} ContentHashPlugin.prototype.apply = function(compiler) { var context = compiler.options.context; Function getModFilePath(mod) {// Get a path like './ SRC/pagea.css '// libIdent will handle path separators on different platforms. context: context }); Function compareMod(modA, modB) {var modAPath = getModFilePath(modA); var modAPath = getModFilePath(modA); var modBPath = getModFilePath(modB); return modAPath > modBPath ? 1 : modAPath < modBPath ? 1:0; } / / module source code, do not use the function development phase getModSrc (mod) {return mod. _source && mod. _source. _value | | '. } compiler.plugin("compilation", function(compilation) { compilation.plugin("chunk-hash", function(chunk, chunkHash) { var source = chunk.modules.sort(compareMod).map(getModSrc).join(''); chunkHash.digest = function() { return md5(source); }; }); }); }; module.exports = ContentHashPlugin;

If pagea.css is modified, the hash value of pagea.js will no longer be affected.

In addition, the ExtractTextPlugin will extract the contents of pagea.css and replace the contents of this module with mod._source._value:

// removed by extract-text-webpack-plugin
Copy the code

Because each CSS module corresponds to this content, it does not affect the effect.

The erm0L0V/Webpack-MD5-Hash plugin solves a similar problem, but its “sorting” algorithm is based on the module ID, which is theoretically unstable, so we’ll discuss the pitfalls of unstable module ids.

Pit for module ID

We simply think of each file as a module. When Webpack handles module dependencies, it defines an ID for each module. If you look at webpack/ compiler.js, Webpack assigns an increasing number of ids to each module based on the order in which it was collected. As for “module order”, this is definitely not stable over the course of your development life! Unstable!

Module ID is unstable

Our file structure now looks like this:

/src
    |- pageA.js
    |- pageB.js
    |- a.js
    |- b.js
    |- c.js
Copy the code

pageA.js

require('./a.js') // a.js
require('./b.js') // b.js
var a = 'this is pageA';

pageB.js

require('./b.js') //  b.js'
require('./c.js') // c.js
var b = 'this is pageB';

Update the configuration to extract modules that have been referenced twice:

  output: {
        chunkFilename: "[id].[chunkhash:4].bundle.js",
    ...
plugins: [
    new webpack.optimize.CommonsChunkPlugin({
        name: "commons",
        minChunks: 2,
        chunks: ["pageA", "pageB"],
    }),
    ...

build build build:

Even though it is emitted by the world at least once a day, it is still an absolute dilemma that the world must continue to struggle with. Currently, it is an even dilemma that the world continues to struggle with. Pagea.1cda.js 262 bytes 0 [emitted] pageA pageb.0752. commonsCopy the code

Look at pageb.0752. Js, there is a paragraph:

__webpack_require__(2) //  b.js'
__webpack_require__(3) // c.js
var b = 'this is pageB';

It can be seen from the above that the module ID given to B. js during webpack construction is 2

In this case, let’s change pagea.js:

/ / remove the dependence on a. s / / the require ('. / a. s') / / a. s the require ('. / b.j s)/var/b.j s a = 'this is pageA';Copy the code

Build the build the build:

Currently, it is possible that an existing resolution of an existing resolution may be emitted by an existing resolution of an existing resolution. Currently, it is possible that an existing resolution of an existing resolution may be emitted by an existing resolution of an existing resolution. commonsCopy the code

Well! Only pagea.js hash has changed, which makes sense. Let’s go to pageb.0752

    __webpack_require__(1) //  b.js'
    __webpack_require__(2) // c.js
    var b = 'this is pageB';

See that?! For this build, Webpack gives B. js ID 1.

If you use CDN to upload this file, you may not be able to upload it, because the file size and name are exactly the same. It was this unstable module ID that gave the pit!

How do you solve it?

First thought: change the way you compute hashes and use the content of the file that builds the output?

Consider: do not, obviously pageB this time will not have to re-upload, waste.

The more elegant idea is: let the module ID stabilize for me!!

Give me the stable Module ID

Webpack 1 official solution

The Webpack documentation provides several scenarios

  • OccurrenceOrderPlugin

    The plugin assigns ids based on how many times modules are referenced (by Entry, by chunk). If your application’s file dependencies don’t change much, then module ids are stable, but who knows?

  • RecordsPath configuration

    Store/Load compiler state from/to a json file. This will result in persistent ids of modules and chunks.

    The ID used in the “file processing path” of each packaged module will be recorded. Next time, the same module will be packaged directly using the ID in the record:

    "node_modules/style-loader/index.js! node_modules/css-loader/index.js! src/b.css": 9,Copy the code

    This requires everyone to submit this document. I think the experience is very bad.

    In addition, once you modify the file name, or add or subtract loader, the original path will be invalid, thus entering the pit again!

  • DllPlugin and DllReferencePlugin

    The principle is that before you package the source code, you need to create a new build configuration and package it separately with the DllPlugin to generate a record of the ID of the module file path. Then use the DllReferencePlugin to reference this record in your original configuration, similar to the recordsPath. But it’s more efficient and stable, but this extra build, I think, is not elegant, as to how fast it will be, I don’t care about the speed, and still have to submit one more record file.

Webpack 2 ideas

  • Webpack/HashedModuleIdsPlugin js at master webpack/webpack

  • Webpack/NamedModulesPlugin js at master webpack/webpack

The idea of both plug-ins is to use the file path of the module as the module ID, rather than the default number used in WebPack 1, and WebPack 1 does not accept non-numbers as module IDS.

Our way of thinking

The file path of the module is mapped to a number through a hash calculation, using the globally unique number as ID to solve, no problem!

Reference:

  • Before-module-ids hooks exposed in webpack/ compilation.js
  • webpack/HashedModuleIdsPlugin.js

Give the solution in WebPack 1.x:

. xx.prototype.apply = function(compiler) { function hexToNum(str) { str = str.toUpperCase(); var code = '' for (var i = 0; i < str.length; i++) { var c = str.charCodeAt(i) + ''; if ((c + '').length < 2) { c = '0' + c } code += c } return parseInt(code, 10); } var usedIds = {}; function genModuleId(module) { var modulePath = module.libIdent({ context: compiler.options.context }); var id = md5(modulePath); var len = 4; while (usedIds[id.substr(0, len)]) { len++; } id = id.substr(0, len); return hexToNum(id) } compiler.plugin("compilation", function(compilation) { compilation.plugin("before-module-ids", function(modules) { modules.forEach(function(module) { if (module.libIdent && module.id === null) { module.id = genModuleId(module); usedIds[module.id] = true; }}); }); }); }; .Copy the code

The hook registration method is similar to the content hash plugin. After obtaining the module file path, md5 computs the output hexadecimal string ([0-9A-e]) and converts the string characters into ASCII integers. Since hexadecimal strings will only contain [0-9a-e], ensuring that the integer converted to a single character is two digits guarantees that the algorithm works.

For example:

path = '/node_module/xxx'
md5Hash = md5(path) // => A3E...
nul = hexToNum(md5Hash) // => 650369 
Copy the code

One minor drawback is that using the module file path as the hash input is not 100% perfect, and if the file name is changed, the module ID is “unstable”. In fact, you can use the contents of the module file as the hash input, but for efficiency reasons, the trade-off is to use the path.

conclusion

In order to ensure that the hash value of the file in the production stage of Webpack 1.x can be perfectly mapped to the file content one by one, I consulted a lot of information. According to the current solution discussed on Github, it basically solved the problem, but it is not elegant and perfect enough. Therefore, I used the ideas of Webpack 2 to add a little skill. It’s a pretty elegant solution to the problem.

The plugin is placed on Github: Zhenyong /webpack-stable-module-id-and-hash

The resources

  • Vendor chunkhash changes when app code changes · Issue #1315 · webpack/webpack
  • Differences between Hash and Chunkhash in Webpack, and Hash fingerprint decoupling schemes for JS and CSS – Zhoujunpeng – Blog Park
  • Webpack using optimized | Web front-end tencent AlloyTeam Blog | vision: to become the earth excellent Web team!