1. UI adjustment and optimization, clear and readable page navigation

Improve page navigation, adjust and optimize clear jump instructions:

↗ read the full       
↘ next article   
↖ back to the home page

UFQinews uses arrows in a variety of directions, and is as straightforward as navigation in the physical world. Moreover, on different operating system platforms, such as iOS, Android and Windows, different arrows are displayed with the UI features of their respective platforms, which is better integrated with the local style.


2. + sharing function, easy to use news sharing

To make it easier for users to share pages, especially to various social networks, UFQinews has added ShareThis button.

Users can easily use ShareThis at the bottom of the page to share the current page.

+Click on the
Or Ctrl + D
52 forward

The ShareThis button provides direct links to dozens of social networking sites.

3. Adjust the caching strategy, page loading faster, link prefetch/preload

It has always been the direction of our technical team to load and render pages as soon as possible under the limited computing resources and bandwidth resources. The main recent efforts of UFQinews in this regard include: 1) Improving resource files: Images, style sheets (CSS) and script Scripts caching mechanism, make these resources in a session time, just for the first time from the server to download and to the local cache, in this period of time, back when the page loads, the resource file will be automatically loaded from the browser’s local cache, without the need for every time I downloaded from the server. This will greatly improve page load times. Add the following HTTP headers to the HTTP return of the resource class file.

HTTP/1.1 200 OK

Date: Sat, 14 Sep 2019 01:36:17 GMT

Last-Modified: Mon, 30 Jul 2018 08:59:19 GMT

ETag: “e2a-57233ad3e67c0”

Cache-Control: max-age=604800

Expires: Sat, 21 Sep 2019 01:36:17 GMT

2) Preload with the link preload/prefetch instruction In browsers that support the preload instruction, UFQINEWS uses the link preload/prefetch instruction to automatically load the next page or resource to be viewed. For example, in the List page, automatically load the next page; On the news detail page, the next item is automatically loaded, so that when the user flips through the next page, the page will open in seconds.

4. Reconstruct the crawler engine to save energy in grasping and collecting

On the production side, UFQinews has a complicated news crawler system, which is mainly responsible for fetching the most popular local news items of the day from major news sites. The retrieved news page content is further analyzed by using natural language processing technology, and then classified into corresponding categories for further reading, calling and use.

This page capture and analysis system is a distributed computing system. The earliest design of its content flow is to rely on the NFS system as the page file bearer service.

UFQINEWS page crawling system/crawler system

The distributed system of UFQINEWS supports the deployment of this crawler system into a master-multiple-slave structure, which can start more than N nodes at the same time, and each Node can start more than M processes. Thus forming a powerful page crawl ability, timely parallel processing N*M news pages.

Like all distributed systems, UFQINEWS needs a mechanism or queue to ensure the consistency of data. UFQINEWS relies on the network file sharing mechanism of NFS to write content and page queues to an NFS file and read them sequentially, so as to ensure the consistency of page reads.

When running M processes in parallel on a single NOde, both NFS and content reading and writing to the disk system require file IO. Occasionally, When we observed that the Power consumption in the VMware virtual machine was “Very high” in the “Power Usage” and “Power Usage Trend” in the Windows Task Manager, further analysis found that the Power consumption was Very high. This is due to UFQINEWS production crawler system.

(Analysis of power consumption of UFQINEWS production system)

This kind of power consumption phenomenon, we think is unacceptable, neither green, not environmental protection. And the deeper reason behind it is well understood, frequent file IO read and write access, must be high frequency calls to the device driver, the actual disk operation.

Before this, when we reviewed the current design, we also thought that relying on NFS was not a long-term solution, nor the best choice, but we just thought that optimization should be improved. This time, we found that UFQINEWS consumed so much energy, which became the last straw that broke the camel’s back. Further optimization and improvement of UFQINEWS file IO and its queue must be done.

In the original design, we wanted to build a set of network services for storing data while UFQINEWS was running, similar to the design and operation of AED Server. In the reconstruction design of UFQINEWS this time, we found that the focus of the problem was that some service took over the file IO, that is, whether it was page queue or news page content, the contents written to the file were stored in the memory or some intermediate service, and could be deleted when used up.

The AED Server is appropriate for this scenario. Storing page content or page queues to the AED Server can meet the requirements, and it can also be seen that similar requirements are similar to those of a regular “caching” service.

Along the way, let’s consider using Memcached or Redis to take over the real file IO, which might solve the current problem. So we use this method to optimize and perfect the crawler system of UFQINEWS.

Whenever you need to read and write IO files, change to the Memcached service. For page content, direct access is enough. For page queues, you need to use the form of an array to dynamically update the contents, put the newly generated items, and discard the items that have been processed.

After a few weeks of intermittent modifications and adjustments, we gradually replaced all the IO of the process content in the crawler system of UFQinews with Memcached for reading and writing, and the power consumption of UFQinews at runtime was gradually reduced from “Very high”.

Moderate — Very High — High Moderate — Very Low

Although there is no quantitative data available, there should be some basis for this level 5 power consumption in Windows, but it is a sub-optimization adjustment, and it is actually shown.

As software programmers, we talk all the time about algorithm improvement and program optimization, but it’s rare to see such an immediate and dramatic difference between the two.

5. + Static content links, optimized SEO, more friendly to search engine indexing

To further adjust the page link address, add the following two quick links to the main page:

1) The list page, list.123,456.html, points to the original./? Pnskwordid = 123456

2) Content details page, page.1234. HTML, pointing to the original./? mod=rdr&pgid=1234

After improvement, the pseudo-static page will be based on the crawls and page analysis in favor of the search engine.

