What is it like to use BitTorrent to render front-end pages?

preface

When it comes to seeds, we may be familiar with them, and they may be very influential in some of the scenes in our lives. From the point of view of download speed, seed can greatly improve the download efficiency of resources by means of a “sharing” mechanism. In that case, can we use this advantage to improve the loading speed of our web pages? Before implementing the requirements, we also need to understand the overall seed distribution scheme and rationale.

Front knowledge

Learn about Torrent & BitTorrent

Before we look at Torrent transmissions, let’s take a look at peer-to-peer (P2P) communication patterns. This pattern is similar to the distribution of your resources. Your resources may exist in different places, on different devices, in a decentralized form among other devices, and each peer can provide services to other peers.

BitTorrent

BitTorrent (Short for BitTorrent in Chinese) is a network file transfer protocol that enables peer-to-peer file sharing. It is a set of concepts that most people feel are equivalent to P2P, and it has developed P2P technology to near-perfection. BT has multiple sending points, and when you download, you also upload, which is a very effective sharing mechanism to improve the utilization of resources.

So the more people use your seed, the faster it will download

Something like the following mechanism.

Each user is both the user and provider of resources, creating a win-win situation through cooperation.

Torrent

Generally speaking, if we need to distribute our seeds, we will generate a.torrent file from a file or folder. Generally, the seed file contains some identification data about the target file, which is sharded (virtual sharding) to the power of 2k integers, and then generates hash values corresponding to each block to verify the integrity and uniqueness of the block, and then writes the hash and index of each block to the Torrent file.

We can take a look at the file structure of the next Torrent

Here’s a random file and a seed

{
  "name": "gulpfile.js"."announce": [
    "udp://tracker.leechers-paradise.org:6969"."udp://tracker.coppersurfer.tk:6969"."udp://tracker.opentrackr.org:1337"."udp://explodie.org:6969"."udp://tracker.empire-js.us:1337"."wss://tracker.btorrent.xyz"."wss://tracker.openwebtorrent.com"]."infoHash": "0620db0051864b7cda0fd61df5779a5da6531aa6"."private": false."created": "The 2021-12-20 T15: her. 000 z"."createdBy": "WebTorrent <https://webtorrent.io>"."urlList": []."files": [{"path": "gulpfile.js"."name": "gulpfile.js"."length": 1359."offset": 0}]."length": 1359."pieceLength": 16384."lastPieceLength": 1359."pieces": [
    "bbaf8a989233c80eba320df04c414e7399eb8781"]}Copy the code

Here’s an overview of what some fields are used for

Name indicates the name of the seed
Announce The service address of the tracker represented
The unique identity of the seed represented by infoHash
Files List of files that the seed identified contains
Pieces means pieces of seeds

Then we can know what the tracker is used for.

In fact, the name means “tracker”, hence the name “Si Yi”, which is used to track and store some information about seeds. Because if you make a seed and you want people to get the information they need to download it and so on, you generally need a centralized tracker to record your seed information, and then it will tell the user the information of the seed provider, and then it can be used to download it.

Typically when we request the tracker with a unique identifier of the seed, it tells us how many people are currently playing the seed and what IP addresses are available, so that we can reuse the downloaded resources between endpoints, such as a node owning a block of your target data, You can download the block directly from him, and he can download it from you if you have the block he needs.

WebRTC

From the perspective of the transmission mode of Web seed, it is essentially based on P2P connection to establish a data transmission channel, and the corresponding data transmission interaction can only be carried out after the establishment of the channel.

Speaking of P2P connection, in fact, we will also easily think of another technology closely related to us, that is WebRTC, it also relies on P2P connection for data transmission at the bottom, we can simply understand the charm of P2P connection through this technology.

Generally speaking, we need to go through the following steps to establish a WebRTC connection:

Create an RTC connection object
The initiator creates an offer (SDP description)
- [SDP description] Content: What audio and video data are available, what are the formats of audio and video data, what are the transmission addresses, etc.
Originator setLocalDescription (value is the offer created above)
The created Offer data is transmitted to the target end via the signaling server
The target also creates its own RTC connection object
If you want to transmit a video stream, you need addStream to add the stream to the connection channel
After receiving the offer, the destination sets remoteDescription
Then create the reply offer
The destination sets its own localDescription
Send answer offer to the originator
The originator sets its own remoteDescription after receiving the Answer offer

Media agreement

In P2P communication, we also need to pay attention to protocols. For example, we use HTTP, HTTPS, WS and so on. We need to negotiate a set of encoding and decoding methods supported by both sides of the connection for communication.

WebRTC uses V8 codecs by default, if the other party to be connected does not support V8 decoding, if there is no media negotiation process. Even if the connection is successful and the video data is sent to the other party, the other party cannot play the video.

For example, peer-A can support multiple encoding formats of VP8 and H264, while peer-B can support VP9 and H264. To ensure the correct encoding and decoding of both ends, the simplest way is to take their intersection H264

In WebRTC, Session Description Protocol (SDP) is used to describe the related codec information. In the whole WebRTC, both parties involved in the connection need to exchange SDP information first, so that they can know the information needed and supported by each other. This is also known as “press negotiation”.

NAT

After the two connecting parties exchange media protocols, it is necessary to start to understand the communication capabilities of the two parties, and to find a link that can communicate with each other. How do we surf the Web? IPV4 address space is scarce in China and has been exhausted.

IPv4 uses 32-bit (4-byte) addresses, so there are only 4,294,967,296 (232) addresses in the address space. However, some addresses are reserved for special purposes, such as private networks (about 18 million addresses) and multicast addresses (about 270 million addresses), which reduce the number of addresses available on the Internet. As addresses are assigned to end users, IPv4 address exhaustion also occurs. The address structure reconstruction based on classified network, classless interdomain routing and network address translation significantly reduces the rate of address exhaustion. But on February 3, 2011, IANA’s primary address pool was exhausted after the last five address blocks were assigned to the five Regional Internet Registries.

In this context, our NAT technology began to step onto the historical stage.

The name of NAT is accurate. Network address translation (NAT) replaces the address information in the header of AN IP packet. NAT is usually deployed at the egress of an organization and provides public network reachability and upper-layer protocol connectivity by replacing the internal network IP address with the egress IP address. In addition, there are many types of NAT:

One-to-one NAT
One-to-many NAT
Classified by NAT port mapping mode
- The cone NAT
- Restricted cone NAT
- Port limits cone NAT
- Symmetrical NAT

Specific NAT details are not described in this article, please refer to the information.

NAT traversal & STUN

There are many solutions for NAT penetration, and this is just one of them.

The so-called probe technology is a technology that installs probe plug-ins on all entities participating in communication to detect whether NAT gateways exist in the network and implements different traversal methods for different NAT models.

The STUN server is deployed on the public network and is used to receive probe requests from communication entities. The server records the address and port of the received packets and fills them in the response packets sent back. The client compares the IP address and port recorded in the received response message with the locally selected IP address and port to identify the NAT gateway. If a NAT gateway exists, the client repeats the previous probe by making a request to another IP address of the server using the previous address and port. Then the two responses are compared to determine the NAT working mode.

The main thing it does is tell you what your public IP + port is so that you can communicate your information to the target party to connect.

In fact, this involves a lot of things, interested can look at this: P2P NAT through (hole) scheme details

MediaStream

Generally speaking, for stream transmission, we can get video stream information and audio stream information conveniently through the API encapsulated by the browser, and then add data such as audio track, video track and desktop track to the stream to be transmitted through addTrack.

Not only that, the browser level will also make some optimizations for our audio stream, such as echo cancellation, noise reduction, gain, and so on.

RTCDataChannel

In addition to transmitting some audio and video streaming data, we can also transmit our own customized data, such as text files, binary data and so on.

Based on this capability, we can also do many other things, such as we can use P2P channels to send some command data, a remote control of the other party’s website (computer, using the Electron scheme) ability.

Webrtc-related simple Demo

Corresponding project address: Demo project

Sample style

Here is gray because there is no camera on the author’s desktop, but we can see the video status on the browser Tab. The left side is used as the caller to transfer his video stream to the other side. Here, the signaling server is used to obtain the unique identity of each connection for data exchange, which is still rough.

Here, we can see from the console printing that the information exchange between them is consistent with the above legend, which is the data exchange between SDP and Candidate, as well as the transmission of stream information.

Here’s a quick mention of how the browser gets video and audio streams:

const mediaStream = await navigator.mediaDevices
    .enumerateDevices()
    .then(devices= > {
      const cam = devices.find(function (device) {
        return device.kind === 'videoinput'
      })
      const mic = devices.find(function (device) {
        return device.kind === 'audioinput'
      })
      const constraints = { video: cam && true.audio: mic }
      return navigator.mediaDevices.getUserMedia(constraints)
    })
mediaStream.getTracks().forEach(track= > peer.addTrack(track, mediaStream))
Copy the code

When adding audio or video streams, we need to determine whether the current device supports audio/video tracks.

webtorrent

After understanding the basic concepts of seeds and P2P, we can further consider, how to distribute our static resources in the form of BT seeds?

For a typical front-end application, we would print all the static resources into an output file, and then have the option of directly accessing the nginx hosted static resources, which means that we simply provide the browser with the seed to pull the static resources and download the full data.

Package static resources into seeds

Here I will take a random project to run the test

This directory will be used as an example.

Seed static resources

If you want your seed resource to be accessible, you need the seed target resource file, otherwise the downloaders won’t be able to find the source to download.

Here I also set up a SEED web page:

Just drag the folder where the seed is needed, and when the seed is successful it will give you a magnetURL link for the seed download.

We can grab magnetURL on the right and download the resource as we like.

Seed related logic concrete implementation

// const trackers = ['wss://tracker.btorrent.xyz', 'wss://tracker.openwebtorrent.com']
const trackers = undefined;

const rtcConfig = {
  'iceServers': [{'urls': ['stun:stun.l.google.com:19305'.'stun:stun1.l.google.com:19305']]}}const torrentOpts = {
  announce: trackers
}

const trackerOpts = {
  announce: trackers,
  rtcConfig: rtcConfig
}

const client = new WebTorrent({
  tracker: trackerOpts
})

export const seedFiles = (files) = > {
  client.seed(files, torrentOpts, torrent= > {
    torrent.on('upload'.function (bytes) {
      console.log('just uploaded: ' + bytes)
      console.log('total uploaded: ' + torrent.uploaded);
      console.log('upload speed: ' + torrent.uploadSpeed)
    })

    console.log('client.seed done', {
      magnetURI: torrent.magnetURI,
      ready: torrent.ready,
      paused: torrent.paused,
      done: torrent.done, }); })}Copy the code

We rely on the package WebTorrent to provide file seed capability. At the same time, in order to solve the problem of NAT, we generally need to configure a STUN service address for the ability to do NAT penetration, which is mentioned above. Here, THE author directly uses Google’s service.

Note the announce configuration, which specifies the address of the tracker. If you pass empty when instantiating Webtorrent, it uses some of its built-in trackers, which I used directly. You can also set up your own bitTorrent tracker service.

Download & Seed render

With the seed source in place, the next step is how to download and render it?

The Webtorrent package provides an API for downloading:

const client = new WebTorrent();
const torrentInstance = client.add(magnetURI, {
  path: '/'
}, renderTorrent);
Copy the code

RenderTorrent is used to process torrent information after downloading.

How do I use the static resources downloaded here for page rendering?

The answer is to harness the power of the PWA.

Once we have the downloaded static resource data, we can first read the data from the entry file (typically index.html) of the static resource and add it directly to the current page using innerHTML. The page then begins parsing the newly inserted DOM data. Resolving a reference to an external resource initiates a request, and the interception of the request, of course, can easily be thought of as a PWA killer.


const renderTorrent = async (torrentInfo: WebTorrent.Torrent) => {
  logger.info(`Torrent Downloaded! TorrentInfo: ${torrentInfo}`);
  const files = torrentInfo.files;
  const indexHtmlFile = torrentInfo.files.find(file= > {
    return file.name === INDEX_HTML_NAME
  });
  let index = files.length;

  if(! indexHtmlFile) { logger.error(`can't found index.html`)}else {
    logger.log('Import file:${indexHtmlFile? .name}`) indexHtmlFile? .getBuffer((err, buffer) = > {
      if (err) {
        logger.error(err);
        return;
      }
      logger.log(`index.html: ${buffer.toString()}`)
      document.body.innerHTML = buffer.toString()
    })
  }
}
Copy the code

I’m simplifying the logic here

The overall logic is simple: query the entry file from the file list, and then read the data to insert into the page.

In addition to processing the entry file, we also need to prepare for the resource data to be read after the PWA intercepts the request, i.e. find a place to store the data first.

The authors have prepared two options: one is to use bloB access directly, and the other is to use a browser storage capability, such as indexDB.

See the code:

.const tempCacheObj = {};

while (index-- > 0) {
  const file = files[index];
  if (file.name === INDEX_HTML_NAME) continue;
  logger.info(`current handle file: ${file.name}`);
  try {
    const fileGlobUrl = await promisifySetTorrentResponse(file);
    tempCacheObj[file.path] = fileGlobUrl;
    logger.info(`handler ${file.name} is complete`)}catch (error) {
    logger.error(`handle  ${file.name} error: ${error}`); }}// Clear the cache
await localforage.clear()
// Cache
awaitlocalforage.setItem(localForageStorageKey, tempCacheObj); .Copy the code

This logic is also used in renderTorrent processing functions to process static resource files by iterating through the list of processing files.

Look at this promisifySetTorrentResponse logic:

const promisifySetTorrentResponse = async (file: WebTorrent.TorrentFile) => {
  return new Promise((resolve, reject) = > {
    file.getBlobURL((e, v) = > console.log(v))
    file.getBlobURL(async (e: Error.blobUrl: string) = > {if (e) {
        logger.error('Obtaining fileGlobUrl exception:'+ (e? .message ??Unknown exception)));
        reject(e);
        return null;
      }
      logger.info(`Add ${file.path} to cache.`); resolve(blobUrl); }); })}Copy the code

The core logic is to take the blobUrl of the resource file and return it to the upper level for processing. At the upper level, we have stored all the resource blobUrl into indexDB, where we have used the localforage package directly to read and write to the local storage for convenience.

Cache storage can also be used as a cache:

const cacheDB = await caches.open(CACHE_NAME);
const res = await self.fetch(blobUrl)
cacheDB.put(reqPath, res.clone());
return res.status === 200
Copy the code

This method is used to store the return value of the request directly, which can be used by the PWA directly after interception.

Now that the data is ready, is it time to do the render logic?

Register the ServiceWorker for the page

How can you forget ServiceWorker if you want to use PWA’s ability?

Take a look at the basic logic:

self.addEventListener('install'.async event => {
  logger.info('installing! ')
  await self.skipWaiting();
})

self.addEventListener('activate'.async event => {
  await clearCache(CACHE_NAME);
  await self.clients.claim();
  logger.info('activated! ')
})

self.addEventListener('fetch'.async function (event) {
  const request = event.request;
  const scope = self.registration.scope;
  const url = request.url;
  constfetchPath = url.slice(scope? .length ??0);

  console.log('Fetch request for:')
  if(! fetchPath) { event.respondWith(self.fetch('/index.html'));
  } else if (fetchPath === 'intercept/status') {
    event.respondWith(new Response(' ', { status: 234.statusText: 'intercepting'}}))else {
    // event.respondWith(dbResHandler(fetchPath, event))
    event.respondWith(handleFetch(fetchPath, event))
  }
})
Copy the code

Install is an event that will be triggered when your ServiceWorker is installed. The Activate stage is when your ServiceWorker is activated.

The third is our most critical event, which is the fetch stage of the interception request. At this stage, we can do the logic of the interception request.

The author’s processing of this project is still relatively rough, and there may be a chance to make it into a universal capability later.

Here I have written three if else judgments. The first one is mainly to return the data of the main document, which is the entry file used to host the rendering capability, rather than the target static resource we are rendering.

This is the following page:

It’s ugly, but it works.

This web page is primarily used to enter the target magnetURL and then grab the seed data to the current page.

The second if else statement is used to test whether the serviceWorker intercepts the interface request properly:

if (fetchPath === 'intercept/status') {
  event.respondWith(new Response(' ', { status: 234.statusText: 'intercepting'}}))Copy the code

Request test side:

function verifyRouter (cb) {
  var request = new window.XMLHttpRequest()
  request.addEventListener('load'.function verifyRouterOnLoad () {
    if (this.status ! = =234 || this.statusText ! = ='intercepting') {
      cb(new Error('Service Worker not intercepting http requests, perhaps not properly registered? '))
    }
    cb()
  })
  request.open('GET'.'./intercept/status')
  request.send()
}
Copy the code

The author will call this function after registering the serviceWorker to test the interception and make an Ajax request.

And then we have this last else statement, which is basically dealing with the logic of loading other resources, so what does this handleFetch function do?

async function handleFetch(fetchUrl: string, event) :Promise<Response> {
  const filepath2BlobUrlObj = await localforage.getItem(localForageStorageKey);
  const currentFileBlob: string = filepath2BlobUrlObj[`${staticPrefix}/${fetchUrl}`]

  if(! currentFileBlob) {console.log(event.request)
    return fetch(event.request)
  }
  const res = await fetch(currentFileBlob);
  return res;
}
Copy the code

Logic:

Get the path map blobUrl resource for all the resources downloaded from the seed from the local cache
Call the fetch method to request the blobUrl address
The interface of the FETCH request responds directly to the browser

The effect

After looking at the code for so long, let’s see what it looks like:

We can look at the console on the right, and when someone requests the seed file that has been seeded, it will transfer the data needed for the target connection.

Then look at the render side:

You can see that our page has been rendered completely, and I have printed some brief file data on the right console.

conclusion

Overall, the new play was very interesting, can continue to explore, still feeling than writing a webpack plug-in, after the completion of a build our page upload automatically into the seed, and seed links, so that our page can be used by third parties to share the directly, without the server maintenance costs.

There are a lot of interesting things going on in this gameplay, not only rendering pages, but also the ability to do multiple file-sharing transfers, multiple users downloading online in real time, and very fast. Let’s abandon some dish, embrace and share 😂.