This article is 1500 words, it takes 10 minutes to read, 500 lines of code with comments, and 15 minutes to browse. Please correct any errors.

Last period, through project optimization sharing and case analysis: OPTIMIZED a large project on Taobao, shared some dry goods (code examples, combination of pictures and texts), and explained the webpack optimization tool SplitChunksPlugin system once. SplitChunksPlugin source code, find out how webPack works with SplitChunksPlugin, complete packaging optimization. As the beta version of webpack5 is currently being iterated, I will directly parse the source code for webpack5.0.0-beta.17 to understand the principles and get a taste of some of the new webpack5 features at the source level. Both sessions suggest details.

SplitChunksPlugin where the default configuration comes from

The actual development will find that sometimes the default packaging of WebPack is not as described in the official website. What is the problem? Go to the source code to find the answer. The default configuration of SplitChunksPlugin is as follows:

// both D and F are methods for assigning values to objects
const D = (obj, prop, value) = > {
  if (obj[prop] === undefined) { obj[prop] = value; }};const F = (obj, prop, factory) = > {
  if (obj[prop] === undefined) { obj[prop] = factory(); }};const applyOptimizationDefaults = (
  optimization,
  { production, development, records }
) => {
  // Omit other configurations
  const { splitChunks } = optimization;
  if (splitChunks) {
    D(splitChunks, "hidePathInfo", production);
    D(splitChunks, "chunks"."async");
    D(splitChunks, "minChunks".1);
    // The default values of these properties are different in production and development mode
    F(splitChunks, "minSize", () => (production ? 30000 : 10000));
    F(splitChunks, "minRemainingSize", () => (development ? 0 : undefined));
    // maxAsyncRequests in development mode is infinite
    F(splitChunks, "maxAsyncRequests", () => (production ? 6 : Infinity));
    F(splitChunks, "maxInitialRequests", () => (production ? 4 : Infinity));
    // By default, "~" is used as a separator, and "-" is used as a separator.
    D(splitChunks, "automaticNameDelimiter"."-");
    const { cacheGroups } = splitChunks;
    F(cacheGroups, "default", () = > ({idHint: "".reuseExistingChunk: true.minChunks: 2.priority: - 20,})); F(cacheGroups,"defaultVendors", () = > ({idHint: "vendors".reuseExistingChunk: true.test: NODE_MODULES_REGEXP,
      priority: - 10,})); }};Copy the code

The default configuration of plitChunksPlugin is not exactly the same as that in the official document. Several values will change with mode switching, but these details are shielded from the official website. It is estimated that because webpack is in development mode by default, the official website does not show the default value in development mode. We often switch to development mode when we’re developing, so be aware of these differences.

As an example of the previous project, let’s look at the packaging results under the new version:

SplitChunksPlugin’s three-step strategy

The Webpack plugin uses the apply method as its entry point and then registers the optimization event. The plugin logic is all in the splitChunksplugin.js file:

apply(compiler) {
	// Compiler is an instance of the WebPack compiler that is globally unique and contains all configuration information for the WebPack environment
	compiler.hooks.thisCompilation.tap("SplitChunksPlugin", compilation => {
		// omit minor code
		// compilation is a compilation of resource instances, which can be used to obtain information about all modules and resources currently compiled
		// Compilation has an event stream mechanism that listens for events and triggers callbacks (that is, observer mode), where the code splitting logic is executed when optimized events occur
		compilation.hooks.optimizeChunks.tap(
			{
				name: "SplitChunksPlugin".stage: STAGE_ADVANCED
			},
			chunks => {
				// Three steps to complete the code segmentation optimization}}})Copy the code

During the compilation cycle, the compilation will trigger the optimizeChunks event and pass in chunks after generating a chunkGraph (a graph structure containing code block dependencies) to begin the code segmentation optimization process. All optimizations are done in the callback function of the optimizeChunks event.

Preparation stage

Preprocessing of optimization, defining some necessary methods and data structures in the optimization process, which will be used in the following stages:

const chunkSetsInGraph = new Map(a);/** * The core of optimization is to extract the common modules, so generate a key value for each module and chunks containing this module. * Each module has a key and chunks containing this module have chunkssets. * This way we know which chunk each module is repeated in, which is key to optimization. * The chunkssets are mapped to chunksets in the chunkSetsInGraph so that these chunkssets can be extracted by the key values for optimization. * /
for (const module of compilation.modules) {
  const chunksKey = getKey(chunkGraph.getModuleChunksIterable(module));
  if(! chunkSetsInGraph.has(chunksKey)) { chunkSetsInGraph.set( chunksKey,new Set(chunkGraph.getModuleChunksIterable(module))); }}const chunkSetsByCount = new Map(a);/** * We know what chunks are repeated in each module. Now we need to organize this information according to the number of repeats in chunkSetsByCount. * This is done in order to match the minChunks attribute. The corresponding chunksSet can be found directly based on the minChunks (the minimum number of module repeats). * Chunkssets that do not match minChunks are automatically excluded. * Note that one module corresponds to one chunksSet, and one count corresponds to multiple chunkssets, i.e. multiple modules */
for (const chunksSet of chunkSetsInGraph.values()) {
  // Traverse the chunkSetsInGraph, count the chunks per module set, i.e., the number of repeats per module, and establish a mapping between the chunks and the chunks set
  const count = chunksSet.size;
  let array = chunkSetsByCount.get(count);
  if (array === undefined) {
    array = [];
    chunkSetsByCount.set(count, array);
  }
  array.push(chunksSet);
}

const combinationsCache = new Map(a);// Get a set of chunks that may meet the minChunks condition for subsequent comparison with the minChunks condition
const getCombinations = (key) = > {
  // Get chunks (chunksSet) for this module according to the key value
  const chunksSet = chunkSetsInGraph.get(key);
  var array = [chunksSet];
  if (chunksSet.size > 1) {
    for (const [count, setArray] of chunkSetsByCount) {
      if (count < chunksSet.size) {
        // Each module corresponds to a set. Here is to find the subset of setArray to avoid omission
        for (const set of setArray) {
          if (isSubset(chunksSet, set)) {
            array.push(set);
          }
        }
      }
    }
  }
  return array;
};

// key Map structures. Each entry corresponds to a partitioned cacheGroup. The key name is a key value generated based on the name attribute, and the key values are modules, chunks, and cacheGroup information objects corresponding to the key value
const chunksInfoMap = new Map(a);const addModuleToChunksInfoMap = (
  cacheGroup,
  selectedChunks,
  selectedChunksKey,
  module) = > {const name = cacheGroup.getName(module, selectedChunks, cacheGroup.key);
  // Check if the name conflicts with an existing chunk. In addition, webpackage 5 does not allow cacheGroup names to overwrite entry names
  if(! alreadyValidatedNames.has(name)) { alreadyValidatedNames.add(name);if (compilation.namedChunks.has(name)) {
      // omit the error}}/** * If the cachGroup has a name, the cacheGroup key and name are used as keys; if not, the selectedChunksKey values generated from the cacheGroup and chunk are used. * If a cachGroup has a name, all modules that belong to the cachGroup will have the same key, so they will be merged into an INFO package. * If a cachGroup does not have a name, each module will generate a different key. Each module ends up in a separate package, which is recommended to be understood in conjunction with the "treasure property Name" in the previous installment
  const key =
    cacheGroup.key + (name ? ` name:${name}` : ` chunks:${selectedChunksKey}`);
  // Add module to maps
  let info = chunksInfoMap.get(key);
  if (info === undefined) {
    chunksInfoMap.set(
      key,
      (info = {
        modules: new SortableSet(undefined, compareModulesByIdentifier),
        cacheGroup,
        name,
        // Check whether minSize is positive
        validateSize:
          hasNonZeroSizes(cacheGroup.minSize) ||
          hasNonZeroSizes(cacheGroup.minRemainingSize),
        sizes: {},
        chunks: new Set(),
        reuseableChunks: new Set(),
        chunksKeys: new Set()})); } info.modules.add(module);
  // Calculate the volume of the code block
  if (info.validateSize) {
    for (const type of module.getSourceTypes()) {
      info.sizes[type] = (info.sizes[type] || 0) + module.size(type); }}// Add the code block to chunksInfoMap for final packaging
  if(! info.chunksKeys.has(selectedChunksKey)) { info.chunksKeys.add(selectedChunksKey);for (const chunk ofselectedChunks) { info.chunks.add(chunk); }}};Copy the code

In preparation, chunksInfoMap and addModuleToChunksInfoMap are the two most important roles to mention:

  • The chunksInfoMap stores the code segmentation information, and each item is a cache group that iterates over which additional code blocks are eventually split, eventually adding the code segmentation results to the chunkGraph, which eventually generates the packaged file we see. Of course, these cache groups now come with some additional information, such as cacheGroup, which is the cacheGroup code splitting rule we configured for subsequent validation; Sizes, for example, records the total volume of modules in the cache group, which is then used to determine whether it meets our minSize condition.
  • AddModuleToChunksInfoMap adds new code splitting information to chunksInfoMap. Each addition selects whether to create a new cache group or add modules to an existing cache group based on the key value and updates the cache group information.

Module grouping stage

After all modules are prepared, the addModuleToChunksInfoMap method will be used to save the modules that meet the requirements to chunksInfoMap.

for (const module of compilation.modules) {
  // Use getCacheGroups to obtain the cacheGroup to which a module belongs. A module may qualify for more than one cacheGroup
  // Get cache group
  let cacheGroups = this.options.getCacheGroups(module, context);
  if (!Array.isArray(cacheGroups) || cacheGroups.length === 0) {
    continue;
  }

  // Chunks containing the same module will have unique keys to fetch chunks to optimize
  const chunksKey = getKey(
    // Get all chunks containing the module
    chunkGraph.getModuleChunksIterable(module));let combs = combinationsCache.get(chunksKey);
  if (combs === undefined) {
    // This is the method defined in the preparation phase to obtain a set of chunks that may meet the minChunks condition for subsequent comparison with the minChunks condition
    combs = getCombinations(chunksKey);
    combinationsCache.set(chunksKey, combs);
  }

  for (const cacheGroupSource of cacheGroups) {
    // Fetch all cacheGroup configurations. If the values are not present, they are inherited from the splitChunks global configuration
    const cacheGroup = {
      key: cacheGroupSource.key,
      priority: cacheGroupSource.priority || 0.// The chunksFilter corresponds to the Chunks property in the cacheGroup configuration, which is simply treated as a method
      chunksFilter: cacheGroupSource.chunksFilter || this.options.chunksFilter,
      minSize: mergeSizes(
        cacheGroupSource.minSize,
        cacheGroupSource.enforce ? undefined : this.options.minSize
      ),
      minRemainingSize: mergeSizes(
        cacheGroupSource.minRemainingSize,
        cacheGroupSource.enforce ? undefined : this.options.minRemainingSize
      ),
      minSizeForMaxSize: mergeSizes(
        cacheGroupSource.minSize,
        this.options.minSize
      ),
      maxAsyncSize: mergeSizes(
        cacheGroupSource.maxAsyncSize,
        cacheGroupSource.enforce ? undefined : this.options.maxAsyncSize
      ),
      maxInitialSize: mergeSizes(
        cacheGroupSource.maxInitialSize,
        cacheGroupSource.enforce ? undefined : this.options.maxInitialSize
      ),
      minChunks: cacheGroupSource.minChunks ! = =undefined
          ? cacheGroupSource.minChunks
          : cacheGroupSource.enforce
          ? 1
          : this.options.minChunks,
      maxAsyncRequests: cacheGroupSource.maxAsyncRequests ! = =undefined
          ? cacheGroupSource.maxAsyncRequests
          : cacheGroupSource.enforce
          ? Infinity
          : this.options.maxAsyncRequests,
      maxInitialRequests: cacheGroupSource.maxInitialRequests ! = =undefined
          ? cacheGroupSource.maxInitialRequests
          : cacheGroupSource.enforce
          ? Infinity
          : this.options.maxInitialRequests,
      getName: cacheGroupSource.getName ! = =undefined
          ? cacheGroupSource.getName
          : this.options.getName,
      filename: cacheGroupSource.filename ! = =undefined
          ? cacheGroupSource.filename
          : this.options.filename,
      automaticNameDelimiter: cacheGroupSource.automaticNameDelimiter ! = =undefined
          ? cacheGroupSource.automaticNameDelimiter
          : this.options.automaticNameDelimiter,
      idHint: cacheGroupSource.idHint ! = =undefined
          ? cacheGroupSource.idHint
          : cacheGroupSource.key,
      reuseExistingChunk: cacheGroupSource.reuseExistingChunk,
    };
    // This is where chunks that comply with minChunks and chunks are screened according to our cacheGroup configuration
    for (const chunkCombination of combs) {
      // If minChunks are not specified, break the chunks
      if (chunkCombination.size < cacheGroup.minChunks) continue;
      / / deconstruction assignment, in accordance with chunksFilter (" initial "|" async "|" all ", is actually chunks properties) conditions of chunks
      const {
        chunks: selectedChunks,
        key: selectedChunksKey,
      } = getSelectedChunks(chunkCombination, cacheGroup.chunksFilter);

      // Save modules, chunks, and cacheGroup information that are currently eligible to chunksInfoMap
      addModuleToChunksInfoMap(
        cacheGroup,
        selectedChunks,
        selectedChunksKey,
        module); }}}Copy the code

During the grouping phase, the configuration of the cacheGroup is taken out, minChunks and chunks rules are checked in the configuration, and only groups that meet the criteria are created. Only configurations related to quantity are checked in this phase. Other configurations are verified in the next phase.

Queue inspection phase

In the previous phase, chunksInfoMap cacheGroup information was generated. In this phase, according to the user’s cacheGroup configuration, each cacheGroup in chunksInfoMap is checked one by one to see if it complies with the rules. Leaving the appropriate chunkGraph to join the compilation until all the code splitting results are updated into the chunkGraph. The code is quite long, but it is step-by-step. Rule verification is performed first, and then the modules in the qualified cache group are packaged into new chunks:

// Remove the cache group smaller than minSize (in this case, the chunsInfoItem) from the chunksInfoMap
for (const pair of chunksInfoMap) {
  const info = pair[1];
  if(info.validateSize && ! checkMinSize(info.sizes, info.cacheGroup.minSize)) { chunksInfoMap.delete(pair[0]); }}while (chunksInfoMap.size > 0) {
  // Find the most matching cacheGroup group information and prioritize partitioning and packaging
  let bestEntryKey;
  let bestEntry;
  for (const pair of chunksInfoMap) {
    const key = pair[0];
    const info = pair[1];
    if (bestEntry === undefined || compareEntries(bestEntry, info) < 0) { bestEntry = info; bestEntryKey = key; }}const item = bestEntry;
  chunksInfoMap.delete(bestEntryKey);

  let chunkName = item.name;
  // A new chunk generated by the cache group
  let newChunk;
  let isExistingChunk = false;
  let isReusedWithAllModules = false;
  // The real code split starts here
  if (chunkName) {
    const chunkByName = compilation.namedChunks.get(chunkName);
    // If a chunk with such a name is found in the original chunks, it is extracted and eventually all chunks with the same name are merged together
    if(chunkByName ! = =undefined) {
      newChunk = chunkByName;
      item.chunks.delete(newChunk);
      isExistingChunk = true; }}else if (item.cacheGroup.reuseExistingChunk) {
    // If the name is not set, check whether the existing chunk can be reused
    outer: for (const chunk of item.chunks) {
      if(chunkGraph.getNumberOfChunkModules(chunk) ! == item.modules.size) {continue;
      }
      if (chunkGraph.getNumberOfEntryModules(chunk) > 0) {
        continue;
      }
      for (const module of item.modules) {
        if(! chunkGraph.isModuleInChunk(module, chunk)) {
          continueouter; }}if(! newChunk || ! newChunk.name) { newChunk = chunk; }else if (chunk.name && chunk.name.length < newChunk.name.length) {
        newChunk = chunk;
      } else if( chunk.name && chunk.name.length === newChunk.name.length && chunk.name < newChunk.name ) { newChunk = chunk; }}if (newChunk) {
      item.chunks.delete(newChunk);
      chunkName = undefined;
      isExistingChunk = true;
      isReusedWithAllModules = true; }}// If there is no chunk in the cache group, the loop will be skipped. Chunksinfomap.delete (bestEntryKey) previously deleted the cache group, so it is equivalent to removing the cache group without chunk from the result set of code splitting
  if (item.chunks.size === 0 && !isExistingChunk) continue;

  const usedChunks = Array.from(item.chunks);
  let validChunks = usedChunks;
  // Check whether code blocks in the cache group satisfy maxInitialRequests and maxAsyncRequests conditions, which are not necessary if they are both infinite
  if (
    Number.isFinite(item.cacheGroup.maxInitialRequests) ||
    Number.isFinite(item.cacheGroup.maxAsyncRequests)
  ) {
    validChunks = validChunks.filter((chunk) = > {
      // If chunk is the initial block, just check maxInitialRequests.
      // If chunk is not the original code block, determine whether maxAsyncRequests meet the requirements.
      // If chunk can be used as the initial code block, take the minimum of both; However, this branch condition is currently unavailable because the current version of the code block is only initial (as entry) or non-initial (lazy loading).
      const maxRequests = chunk.isOnlyInitial()
        ? item.cacheGroup.maxInitialRequests
        : chunk.canBeInitial()
        ? Math.min(
            item.cacheGroup.maxInitialRequests,
            item.cacheGroup.maxAsyncRequests
          )
        : item.cacheGroup.maxAsyncRequests;
      // If the maximum number of requests is not met, it is removed from the validChunks
      return !isFinite(maxRequests) || getRequests(chunk) < maxRequests;
    });
  }

  // Remove code blocks that no longer contain modules in the cache group
  validChunks = validChunks.filter((chunk) = > {
    for (const module of item.modules) {
      if (chunkGraph.isModuleInChunk(module, chunk)) return true;
    }
    return false;
  });

  // Add the new cache group to chunksInfoMap after removing the chunks that do not meet the criteria, and iterate continuously to update the code splitting result
  if (validChunks.length < usedChunks.length) {
    if (isExistingChunk) validChunks.push(newChunk);
    if (validChunks.length >= item.cacheGroup.minChunks) {
      for (const module of item.modules) {
        addModuleToChunksInfoMap(
          item.cacheGroup,
          validChunks,
          getKey(validChunks),
          module); }}continue;
  }

  // Webpack5 new feature minRemainingSize, to ensure that the remaining volume of chunk after partition is not less than this value, to prevent the occurrence of extremely small single code blocks
  if (
    validChunks.length === 1 &&
    hasNonZeroSizes(item.cacheGroup.minRemainingSize)
  ) {
    const chunk = validChunks[0];
    constchunkSizes = { ... chunkGraph.getChunkModulesSizes(chunk) };for (const key of Object.keys(item.sizes)) {
      chunkSizes[key] -= item.sizes[key];
    }
    if(! checkMinSize(chunkSizes, item.cacheGroup.minRemainingSize)) {continue; }}// Create a new code block and add it to our compiler's chunkGraph. This new code block is the split common code
  if(! isExistingChunk) { newChunk = compilation.addChunk(chunkName); }// It is not enough to create a new code block, but also to establish the relationship between chunk and chunkGroup
  for (const chunk of usedChunks) {
    // Add graph connections for splitted chunk
    chunk.split(newChunk);
  }

  // Provides output information about whether the new chunk is multiplexed
  newChunk.chunkReason =
    (newChunk.chunkReason ? newChunk.chunkReason + "," : "") +
    (isReusedWithAllModules ? "reused as split chunk" : "split chunk");
  // Provides output information: Information about the cacheGroup to which the new chunk is to be split is added to the final package output for debugging
  if (item.cacheGroup.key) {
    newChunk.chunkReason += ` (cache group: ${item.cacheGroup.key}) `;
  }
  if(! isReusedWithAllModules) {// Add all the modules in the cache group to the newly generated chunk, which is to package the cache group into new code blocks
    for (const module of item.modules) {
      // The current version of the chunkCondition method always returns true
      if (!module.chunkCondition(newChunk, compilation)) continue;
      chunkGraph.connectChunkAndModule(newChunk, module);
      // Chunks are removed from the cache group to optimize the size
      for (const chunk of usedChunks) {
        chunkGraph.disconnectChunkAndModule(chunk, module); }}}else {
    // If all modules in the cache group are reused, remove them from usedChunks to avoid redundancy
    for (const module of item.modules) {
      for (const chunk of usedChunks) {
        chunkGraph.disconnectChunkAndModule(chunk, module); }}}// Remove extracted modules from other cache groups to avoid duplicate code
  for (const [key, info] of chunksInfoMap) {
    if (isOverlap(info.chunks, item.chunks)) {
      if (info.validateSize) {
        let updated = false;
        for (const module of item.modules) {
          if (info.modules.has(module)) {
            // remove module
            // Delete the module
            info.modules.delete(module);
            // Update the cache group volume
            for (const key of module.getSourceTypes()) {
              info.sizes[key] -= module.size(key);
            }
            updated = true; }}// After deleting duplicate modules, determine the size of the cache group again. If the size is smaller than minSize, delete the cache group
        if (updated) {
          if (info.modules.size === 0) {
            chunksInfoMap.delete(key);
            continue;
          }
          if(! checkMinSize(info.sizes, info.cacheGroup.minSize)) { chunksInfoMap.delete(key); }}}else {
        for (const module of item.modules) {
          info.modules.delete(module);
        }
        if (info.modules.size === 0) {
          chunksInfoMap.delete(key);
        }
      }
    }
  }
}

// Finally there is a section of maxSize validation, very long, but the mechanism and steps are similar to the previous, here omitted, interested can clone my github source repository for a close look
Copy the code

After filtering in this stage, all the cache groups in chunksInfoMap that met the configuration rules would be packaged into new code blocks and added into the chunkGraph of compilation to complete the code segmentation and finally generate a package file. Don’t be afraid of a lot of if and else branches, which are really just a step-by-step check to see if the various configurations are satisfied, excluding some special cases.

In addition, some methods, such as the chunkCondition of a Module that always returns true in its current version, should be reserved for an extensible branching logic that may be more optimized in future versions.

To highlight

The core of SplitChunksPlugin is to allocate each module to each cache group according to rules, forming a map structure of a cache group, chunksInfoMap. Each cache group will correspond to the new code blocks eventually divided. We configure the cacheGroups in splitChunks to control each cache group in chunksInfoMap.

Looking back at the whole process, there is no complex algorithmic logic, just traversing to determine whether various conditions are met at the right time, but it can divide the complex package structure of a large project into predictable results. The logic behind practical tools is often very simple and clear, and the same is true for our development projects. We do not need to over-design, but first use the most direct logic to complete what we need to do, which may be the best solution at present. If you really need some complex design, you should try to aggregate the complexity in data structures and solve the problem in a declarative rather than imperative way.

Next, take a look at The chunkGraph, the core data structure of WebPack, and see how WebPack organizes files step by step into package output.

The source code

My Webpack annotations

Other dry goods

Webpack series 1:

  • Optimized a large project on Taobao, shared some dry goods (code examples, graphic combination)

Answer:

  • Koban small front of the big factory surface

CSS details:

  • The interviewer wants to know how much you know about absolute position

For those who can’t find their way

  • Back end to front end of the little brother suddenly reap big factory offer, the truth is actually