In the previous manual, we have manually completed the basic file content required to package an.epub, and combed out the processes that can be automated by tools, as well as those that need additional information.

This time we started coding and implementing our ebook generation gadget.

Previous: Write your own ebook with Markdown


Get started: Automatic

1. Create projects

Create a directory kepub and execute NPM init -y, then modify package.json file and set “type”: “module” to enable ES Module mode.

Install ESLint, along with our choice of Airbnb standards and associated dependencies:

npm i -D eslint eslint-config-airbnb eslint-plugin-jsx-a11y eslint-plugin-react eslint-plugin-import
Copy the code

Then fine-tune the ESLint configuration to suit your needs:

# .eslintrc.yml
parserOptions:
  ecmaVersion: 2021
  sourceType: module
extends:
  - airbnb
rules:
  indent:
    - off
    - 4
  no-console: off
  import/extensions:
    - warn
    - always
    - js: always
Copy the code

We selected marked to render the Markdown article, and analyzed and collected the pictures and other resources used in the article through Cheerio. The final zip package is handled with adm-zip, which is based on a pure Node.js implementation and does not rely on native applications, ensuring that our project can run directly without special adaptation to Win/MAC/Linux.

npm i -S marked cheerio adm-zip
Copy the code

2. The entrance

In the index.js entry file of the project, we agree that the first parameter passed in is the directory of e-books to be processed, where there is the corresponding book.json configuration:

// index.js
import fs from 'fs/promises';
import path from 'path';

const work = async (target) => {
    const targetDir = path.resolve(target);
    await fs.access(targetDir);
    if(! (await fs.stat(targetDir)).isDirectory()) {
        console.error(`Target is not directory: The ${JSON.stringify(target)}. `);
        process.exit(1);
    }
    const configPath = path.join(targetDir, 'book.json');
    try {
        await fs.access(configPath);
    } catch (ex) {
        console.error(`Can't find "book.json" in target The ${JSON.stringify(target)}. `);
        process.exit(1);
    }
    if(! (await fs.stat(configPath)).isFile()) {
        throw new Error('ConfigError: "book.json" is not file.');
    }
    const config = JSON.parse(await fs.readFile(configPath));
    // TODO:More Parameter Checking

    // TODO:Start processing}; work(... process.argv.slice(2));
Copy the code

The above is the basic parameter check of some processing work. According to actual needs, it can also be supplemented with detailed book.json format verification, which will not be described here.

3. Base rendering

For the basic files and meta information of e-books, we can directly implement the corresponding rendering function based on template string and parameter passing.

For example, render the contents of package.opf:

const renderPackageOpf = ({ meta }) = ` <? The XML version = "1.0" encoding = "utf-8" standalone = "no"? > < package XMLNS = "http://www.idpf.org/2007/opf" XMLNS: dc = "http://purl.org/dc/elements/1.1/" XMLNS: dcterms = "http://purl.org/dc/terms/" version = "3.0" XML: lang ="${meta.lang}"
         unique-identifier="pub-identifier">
  <metadata>
    <dc:identifier id="pub-identifier">${meta.id}</dc:identifier>
    <dc:title id="pub-title">${meta.title}</dc:title>
    <dc:language id="pub-language">${meta.lang}</dc:language>
    <dc:date>${meta.date}</dc:date>
    <meta property="dcterms:modified">${meta.modified}</meta> </metadata> <manifest> <! -- TODO --> </manifest> <spine> <! -- TODO --> </spine> </package> `.trimStart();
Copy the code

If you are interested, you can also change the ID, date, and Modified fields to automatic generation mechanism, further reducing manual work when creating e-books.

In the process, as long as you call the above rendering function and pass in the configuration of book.json, you can get the basic structure of e-book package.opf file.

The manifest and spine sections also need relevant resource configuration parameters after the entire e-book rendering is completed, which is temporarily left blank.

1) Extract the template file

Although the above rendering function is working, there is an obvious problem:

The string content in the render function is in XML format, but when written in our code, it is treated as a normal string by the IDE without any code highlighting or validation. When the content is modified and adjusted, if there are formatting problems such as deleted characters by mistake, it cannot be found quickly in the coding stage.

So we are here to do a small optimization, extract the contents of the above string template to templates/EPUB/package. The opf. XML file, and then to achieve a render function:

  • By passing in the template nametemplateNameTo findtemplatesThe corresponding template file in the directory is read as a template string.
  • Passing in render parametersargsAnd inject the parsed fields into the template rendering function as render parametersfnInside.
  • Execute render functionfnReturns the final file contents.
// scripts/render.js
import fs from 'fs/promises';
import path from 'path';

const dirname = path.dirname(import.meta.url.replace(/^file:\/\/? /.' '));

export const render = async (templateName, args = {}) => {
    const filePath = path.join(dirname, '.. /templates', templateName);
    try {
        await fs.access(filePath);
    } catch (ex) {
        throw Error(`TemplateError: can't find template The ${JSON.stringify(templateName)}\n  to file: ${filePath}`);
  }
  const template = (await fs.readFile(filePath)).toString();
  const argKeys = Object.keys(args);
  const argValues = argKeys.map((key) = > args[key]);
  // eslint-disable-next-line no-new-func
  const fn = new Function(... argKeys,`return \`${template}\ `; `);
  returnfn(... argValues); };Copy the code

Thus, we have implemented a generic template rendering function.

In addition to package.opf, the previous mimeType and meta-INF /container. XML files can also be extracted as files in the templates directory and rendered by passing in their names throughout the process.

2) the Markdown rendering

Markdown rendering needs to be converted in advance. Pass in the filePath filePath to render, read its contents, and call marked to convert it to get the HTML content of the page.

We create a templates/EPUB/book-page.xhtml template for book pages, and call render(), which we implemented in the previous step, to render a standard page file in EPUB:

import fs from 'fs/promises';

import { marked } from 'marked';

export const renderMdPage = async (filePath, args = {}) => {
  try {
    await fs.access(filePath);
  } catch (ex) {
    throw Error(`RenderError: can't find file The ${JSON.stringify(filePath)}`);
  }
  const markdown = await fs.readFile(filePath);
  const content = marked.parse(markdown.toString());

  // TODO:Collect headlines, pictures

  const { title = 'Untitled Page' } = args;
  return render('EPUB/book-page.xhtml', {
    title,
    content,
  });
};
Copy the code

Templates /EPUB/book-page.xhtml


      
<! DOCTYPEhtml>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:epub="http://www.idpf.org/2007/ops"
      xml:lang="en"
      lang="en">
  <head>
    <title>${title}</title>
  </head>
  <body>
    ${content}
  </body>
</html>
Copy the code

4. Mission begins

In the ebook process, we need to process multiple Markdown page files according to the Pages field in book.json and preserve their directory hierarchy. At the same time, multiple image resources may be referenced within each page file. The resource list, spine, and navigation table of the entire book can be generated only after the page and the resource information referenced in the page are summarized.

This requires us to render and generate files as we go along, and sort through the information.

So we create a Task Task class in the project, and each Task creates an instance of it to handle. During a task, it has its own temporary directory to hold intermediate files during the process, and can cache resource information in its own instance variables. It eventually orchestrates the generation of the basic information mentioned above, packages it into a book, and then cleans up the temporary directory.

import fs from 'fs/promises';
import path from 'path';
import os from 'os';

import mkdirp from 'mkdirp';

const tempDir = path.join(os.tmpdir(), 'kepub');

export default class Task {
    constructor(targetDir, config) {
        this.targetDir = targetDir;
        this.config = config;
        this.state = 'idle';
    
        const stamp = Date.now();
        const taskName = `task_${stamp}_The ${Math.random()}`;
        const taskDir = path.join(tempDir, taskName);
    
        this.name = taskName;
        this.saveDir = taskDir;
    
        this.rendering = [];
        this.pageToc = [];
        this.pageList = [];
    }

    async writeFile(subPath, content) {
        const { saveDir } = this;
        const filePath = path.join(saveDir, subPath);
        const dirPath = path.dirname(filePath);
        await mkdirp(dirPath);
        return fs.writeFile(filePath, content);
    }

    async run() {
        if (this.state ! = ='idle') {
            throw new Error(`TaskError: current task state is not "idle", but The ${JSON.stringify(this.state)}`);
        }
        this.state = 'running';
    
        const { meta } = this.config;
    
        const manifestList = [];
        const spineList = [];
        // TODO:Process ebooks, update resource lists and navigation catalogs
    
        await Promise.all([
            this.writeFile('mimetype'.await render('mimetype')),
            this.writeFile('META-INF/container.xml'.await render('META-INF/container.xml')),
            this.writeFile('EPUB/package.opf'.await render('EPUB/package.opf.xml', {
                meta,
                manifestList,
                spineList,
            })),
        ]);
    
        this.state = 'complete'; }}Copy the code

1) Render a single page and record resources

We had a TODO that collects titles and images in the render function for the Markdown page in the render.js module. Now it’s time to fill in this hole.

We define the title field in the Pages node of book.json, but the actual book title is often updated with the content. So we try to read the text of the first

heading in the file as the default heading. Here we use Cheerio for processing:

export const renderMdPage = async (filePath, args = {}) => {
    // ...
    const markdown = await fs.readFile(filePath);
    const html = marked.parse(markdown.toString());

    const $ = loadHtml(html);
    const firstH1 = $('h1').text();
    // ...
};
Copy the code

For pictures on the page, we can also collect them through Cheerio.

Finally, inform the external task instance in the return value, the final render title and the image resource used:

export const renderMdPage = async (filePath, args = {}) => {
    // ...
    const markdown = await fs.readFile(filePath);
    const html = marked.parse(markdown.toString());

    const $ = loadHtml(html);
    const firstH1 = $('h1').text();
    const extractSrc = (_, el) = > $(el).attr('src');
    const images = $('img').map(extractSrc).get();

    const {
        title = firstH1 || 'Untitled Page',
    } = args;
    const content = await render('EPUB/book-page.xhtml', {
        title,
        content: html.replace(/(<img[^>]+[^/])>/g.'$1 / >')});return {
        title,    // Returns the final title for the task to generate the directory
        content,  // page *.xhtml file content
        images,   // A list of applied image resources on the page
    };
Copy the code

Now that the underlying rendering function for a single page is almost complete, we will call it in Task to render all pages of the book.

2) Convert the directory structure and render the whole book

The Pages field we defined earlier in book.config is a tree structure that allows us to adjust and update flexibly on a daily basis, but the resulting resource list and spine is one-dimensional (the same arrangement of pages as a real book).

Therefore, before we start the task, we should first deal with this structure flat, which will also facilitate us to use async-pool and other libraries to achieve concurrency control in the subsequent process. In addition, the way we refer to the nodes in the list retains the basic tree structure of the original directory data, which is convenient to generate the navigation directory of the tree.

const flattenPages = (pages) = > {
    const list = [];
    const toc = [];
    pages.forEach((originPage) = > {
        constpage = { ... originPage }; list.push(page);const tocItem = { page };
        toc.push(tocItem);
    
        const { children } = page;
        if (children && Array.isArray(children)) {
            delete page.children;
            const { list: subList, toc: subToc } = flattenPages(children);
            tocItem.children = subToc;
            list.push(...subList);
        }
    });
    return {
        list,
        toc,
    };
};
Copy the code

Then we called the flattening pages () inside task.run () to process the page structure, and then recorded the href page link field for each page:

class Task {
    // ...
    run() {
        // ...

        // Process the task parameters
        const {
            meta,
            pages,
            cover,
        } = this.config;
        const {
            list: pageList,
            toc: pageTree,
        } = flattenPages(pages);

        // Handle page parameters
        pageList.forEach((page) = > {
            const refPage = page;
            const basePath = page.file.replace(/\.md$/i.' ');
            const href = `${basePath}.xhtml`;
            refPage.href = href;
        });

        await this.convertPages(pageList);
        // ...}}Copy the code

The task.convertPages () function is then implemented to process the list of pages above.

Because there are many I/O operations that can be handled asynchronously, tiny-Async-pool is used for concurrency control to save the processing time of the entire task:

import asyncPool from 'tiny-async-pool';

const RENDER_CONCUR_RESTRICTION = 10;

class Task {
    // ...
    async convertPages(pageList) {
        const { targetDir } = this;
        const imageList = [];

        // Process a single page
        const convertPage = async (page) => {
            const {
                title: titleOrNot,
                file,
                href,
            } = page;

            const filePath = path.join(targetDir, file);
            const {
                title,
                content,
                images,
            } = await renderMdPage(filePath, {
                title: titleOrNot,
            });

            // Update the source data record
            if(titleOrNot ! == title) {const refPage = page;
                refPage.title = title;
            }

            // TODO:Repair relative pathimageList.push(... images);const savePath = `EPUB/${href}`;
            return this.writeFile(savePath, content);
        };

        // Process the page concurrently
        await asyncPool(RENDER_CONCUR_RESTRICTION, pageList, convertPage);

        return {
            imageList: images,
        };
    }
    // ...
}
Copy the code

In this way, we have realized the conversion generation process of the book page, and returned all the image resources used in the book.

But there’s a problem here:

  • We only get the reference of the image resources, but do not really copy the image to the task directory, packaging the task directory will miss the image file;
  • The image path may be relative to the page file and needs to be converted to basedEPUB/package.opfProject path;
  • The image path may also be a network resource link, which does not need to copy;
  • Re-referenced images are not de-duplicated.

We first convert the image resource to the directory path, processing it to the project path relative to EPUB/package.opf, and then do the reprocessing.

Find the previous TODO: fix the relative path position and change it to:

const isAbsolute = (src) = > / ^ ^ : \ \ /] + : \ /)? / / /.test(src);

class Task {
    // ...
    async convertPages(pageList) {
        // ...
        const convertPage = async (page) => {
            // ...
            const pageDir = path.dirname(filePath);
            images.forEach((src) = > {
                let fixedSrc = src;
                if(! isAbsolute(src)) {// Handle page relative paths
                    const absSrc = path.join(pageDir, src);
                    fixedSrc = path.relative(targetDir, absSrc);
                }
                if (!imageList.includes(fixedSrc)) {
                    imageList.push(srcFromPubDir);
                }
            });
            // ...
        };
        // ...
    }
    // ...
}
Copy the code

This gives us the image path based on the project directory, or the absolute/network path.

3) Transfer image resources

This time we create task.copyimage () and task.convertimages () to process the list of images.

In the former case, we find the real location by passing in the image path type, and then return the href path in the package.opf file

:

  • If it is a network resource, the system does not process it and returns to the original path.
  • If the path is relative to the project path, the path of the relative task is deduced and copied and returned to the in-project path;
  • If it is an absolute path, a temporary random name in the task directory is generated ashrefTo return.
const COPY_CONCUR_RESTRICTION = 5;

class Task {
    // ...
    async copyImage(src) {
        // 1. Network resources
        if (/^https? : \ \ / / /.test(src)) return src;

        const { targetDir, saveDir } = this;
        const isAbs = isAbsolute(src);
        // 2. Use the original relative path
        consthref = ! isAbs ? src// 3. Absolute path, generate random names within the task
            : this.getTempName().concat(path.extname(src));

        const srcPath = isAbs ? src : path.join(targetDir, src);
        const savePath = path.join(saveDir, `EPUB/${href}`);

        const dirPath = path.dirname(savePath);
        await mkdirp(dirPath);

        return new Promise((rs, rj) = > {
            pipeline(
                createReadStream(srcPath),
                createWriteStream(savePath),
                (err) = > {
                    if(err) rj(err); rs(href); }); }); }getTempName() {
        const usedName = this.$usedTempName || [];
        this.$usedTempName = usedName;

        const name = [Date.now(), Math.random()]
            .map((n) = > n.toString(16))
            .join('_').replace(/\./g.' ');
        if (usedName.includes(name)) return this.getTempName();
        usedName.push(name);
        return name;
    }

    async transportImages(imageList) {
        const imageHrefList = [];
        const copyImage = async (image) => {
            const href = await this.copyImage(image);
            imageHrefList.push({
                href,
            });
        };
        // Copy images concurrently
        await asyncPool(COPY_CONCUR_RESTRICTION, imageList, copyImage);

        return {
            imageHrefList,
        };
    }
    // ...
}
Copy the code

Interested students, page can consider trying to save the cost of image copy through symbolic links; Or add image compression to optimize ebook volume.

Finally, we return to our task.run (), where we execute the Task.convertPages() and task.transportimages () to obtain the basic contents of the page-related resource list:

class Task {
    // ...
    run() {
        // ...
        // Convert the page
        const {
            imageList,
        } = await this.convertPages(pageList);

        // Process the image
        const {
            imageHrefList,
        } = await this.transportImages(imageList);

        const manifestList = [
            // Add the page to the resource list. pageList.map(({ href }, index) = > ({
                id: `page-${index}`,
                href,
            })),
            // Add an image to the resource list. imageHrefList.map(({ href }, index) = > ({
                id: `image-${index}`,
                href,
            })),
        ];

        // TODO:Add more resources
    }
    // ...
}
Copy the code

4) Generate catalog and cover

Once we’ve implemented the page and image processing flow, we’ll automatically create two special resources: the table of contents and the cover.

We can recursively spell out the HTML structure of the directory part according to the previous pageTree, render it through the universal render() function, and add it to the manifestList:

const parseToc = (toc) = > {
    if (!Array.isArray(toc) || toc.length < 1) return ' ';
    const buffer = [];
    buffer.push('<ol>');
    toc.forEach((item) = > {
        const { page, children } = item;
        const { href, title, hidden } = page;
        buffer.push(`<li${hidden ? ' hidden=""' : ' '}><a href="${href}">${title}</a>`);
        if (children) {
            buffer.push(parseToc(children));
        }
        buffer.push('</li>');
    });
    buffer.push('</ol>');
    return buffer.join('\n');
};

class Task {
    // ...
    run() {
        // ...
        const {
            list: pageList,
            toc: pageTree,
        } = flattenPages(pages);
        // ...
        const manifestList = [
            // ...
        ];

        // Generate a directory
        await this.writeFile('EPUB/toc.xhtml'.await render('EPUB/toc.xhtml', {
            tocHtml: parseToc(pageTree),
        }));
        manifestList.unshift({
            id: 'toc-page'.href: 'toc.xhtml'.properties: 'nav'});// ...}}Copy the code

Don’t forget to add the special attribute for the directory page: [properties=”nav”].

Templates /EPUB/toc.xhtml contents:


      
<! DOCTYPEhtml>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:epub="http://www.idpf.org/2007/ops"
      xml:lang="en"
      lang="en">

<head>
    <title>Table of Contents</title>
</head>

<body>
    <nav epub:type="toc"
         id="toc">
        <h1>Table of Contents</h1>
        ${tocHtml}
    </nav>
</body>

</html>
Copy the code

The cover is composed of image resources and image pages. The former can be transferred to the manifestList directly, and the latter is also rendered by templates:

class Task {
    // ...
    run() {
        // ...
        const {
            meta,
            pages,
            cover,
        } = this.config;
        // ...

        // Handle the cover
        if (cover) {
            manifestList.push({
                id: 'cover-image'.href: await this.copyImage(cover),
                properties: 'cover-image'});await this.writeFile('EPUB/cover.xhtml'.await render('EPUB/cover.xhtml', {
                cover,
            }));
            manifestList.unshift({
                id: 'cover-page'.href: 'cover.xhtml'}); }// ...}}Copy the code

The corresponding template templates/EPUB/cover. XHTML content:


      
<! DOCTYPEhtml>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:epub="http://www.idpf.org/2007/ops"
      xml:lang="en"
      lang="en">

<head>
    <title>Cover</title>
    <style type="text/css">
        img {
            max-width: 100%;
        }
    </style>
</head>

<body>
    <figure id="cover-image">
        <img src="${cover}"
             alt="Book Cover" />
    </figure>
</body>

</html>
Copy the code

5) Complete the list, pack it and clean it up

After all the previous processing, the manifestList has collected all the resource-based information required in the book.

With media-types, we can query each resource’s media type (MIME) :

import mimeTypes from 'mime-types';

class Task {
    // ...
    run() {
        // ...
        
        // Process resource types
        manifestList.forEach((item) = > {
            const refItem = item;
            const { href } = item;
            const mediaType = mimeTypes.lookup(href);
            const isPage = mediaType === 'application/xhtml+xml';
            refItem.mediaType = mediaType;
            refItem.isPage = isPage;
        });
        const spineList = manifestList.filter((item) = > item.isPage);

        // ...}}Copy the code

Now we can complete the initial EPUB/package.opf template file to render the resource manifest and the spine:

Updated template templates/EPUB/package. Opf. XML content:


      
<package xmlns="http://www.idpf.org/2007/opf" 
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:dcterms="http://purl.org/dc/terms/"
         version="3.0"
         xml:lang="${meta.lang}"
         unique-identifier="pub-identifier">
  <metadata>
    <dc:identifier id="pub-identifier">${meta.id}</dc:identifier>
    <dc:title id="pub-title">${meta.title}</dc:title>
    <dc:language id="pub-language">${meta.lang}</dc:language>
    <dc:date>${meta.date}</dc:date>
    <meta property="dcterms:modified">${meta.modified}</meta>
  </metadata>
  <manifest>
    ${manifestList.map(item => `
    <item id="${item.id}" href="${item.href}" media-type="${item.mediaType}"The ${item.properties ? `properties="${item.properties}"` :"'} / >`
    ).join('')}
  </manifest>
  <spine>
    ${spineList.map(item => `
    <itemref idref="${item.id}"The ${item.id= = ='cover' ? 'linear="no"' :"'} / >`
    ).join('')}
  </spine>
</package>
Copy the code

Finally, in task.run (), package the Task directory as an.epub file and clean up the Task directory after completion:

import AdmZip from 'adm-zip';
import rimraf from 'rimraf';

class Task {
    // ...
    run() {
        // ...

        / / packaging
        const savePath = `The ${this.targetDir}.epub`;
        const zip = new AdmZip();
        zip.addLocalFolder(this.saveDir);
        zip.writeZip(savePath);

        / / clean up
        if(! isDebug) { rimraf.sync(this.saveDir);
        }
        this.state = 'complete'; }}Copy the code

At this point, we have a tool that can convert an existing Markdown corpus into an.epub ebook with a simple configuration.

Full DEMO address: github.com/krimeshu/ke…

If you are interested, you can clone it and check the effect by NPM test.

5. Follow-up optimization

Our tools are currently “working” and may need to be adjusted and refined in the future for more complex situations.

Or optimize existing processes, such as:

  • Cli command form call;
  • Customize cover page, catalog page effect;
  • Custom sub-page styles;
  • Personalized fonts;
  • The introduction of SVG.
  • Multilanguage support;
  • Add trigger interactions, scripts.

Among them, thanks to the HTML5 support added in EPUB3, we can greatly enhance interactivity by adding triggers and scripts to achieve effects similar to interactive e-books and AVG text adventure games.

Although EPUB3 standard fully supports the ebook reader, other than apple’s books for the time being, but with the popularity of device performance, software support, the day may come to achieve such effect through e-books.

Interested students are also welcome to participate in the development of this project ~

Project address: github.com/krimeshu/ke…