The author | Alex, zhao

Technology review | zhao

Nasir Ahmed

Acoustic shadow legend

# 003 #

Some time ago, LiveVideoStack published a highly read article titled “A Brief History of Video Compression: 1920 to 2020.” The article documents one milestone after another in the history of video compression, and one of the most notable and important inventions is DCT. Without DCT, a series of compression standards such as H.26X and JPEG would be out of the question.

What is DCT?

As modern people rely more and more on computers, they need to transfer more and more kinds of data, such as the photos and videos we often share with others. How to reduce the amount of data and increase storage space without losing the main information, so as to improve transmission efficiency and reduce transmission costs?

Enter data compression. Data compression is divided into lossless compression and lossy compression. Lossless compression means that the exponential data can be recovered 100% during decompression, while lossy compression (commonly used for sound, image, and video compression) will abandon part of the data during decompression, achieving a relatively high compression ratio and decreasing image quality. But obviously, lossy compression can greatly compress file data, save disk space, and improve transmission efficiency.

One of the core of lossy compression is DCT.

DCT is called Discrete Cosine Transform. DCT is a kind of Fourier transform, which is often used for lossy data compression of signals and images (including pictures and videos).

DCT divides the image into small chunks made up of different frequencies and quantifies them. In the quantization process, the high frequency components are discarded and the remaining low frequency components are saved for the subsequent image reconstruction.

A brief introduction to the whole image compression process:

DCT 8*8 image blocks

  • Decompose the image into 8*8 image blocks

  • Convert the RGB system representing pixels to a YUV system

  • Then, from left to right and from top to bottom, DCT transformation is performed on each image block, discarding the high frequency component and retaining the low frequency component

  • The remaining image blocks are quantized and compressed, and the image composed by the compressed data greatly reduces the storage space

  • DCT inverse transformation (IDCT) is performed on each image block during decompression, and then a complete image is reconstructed

Due to the discarding of certain frequency images, the final image resolution will be different.

Before and after DCT compression:

Photo By Sid Shanker

As you can see, the compressed image is a little blurry than the original, but the main features of the image are still recognizable.

In essence, DCT requires a set of N correlated (similar) data points, and after the transformation, returns N de-correlated (dissimilar) data points (coefficients), characterized by the energy being compressed into only M coefficients, where M<N.

DCT is often described in the technical literature as having the properties of de-correlation and energy concentration, which may be a little difficult to understand at first glance. Among them, DCT compresses the energy of the matrix into the first element, known as the DC coefficient. The remaining coefficients are called AC coefficients.

This means that the upper left corner of the output two-dimensional DCT is called the DC coefficient. It is the most important output of DCT and contains a lot of information about the original image. The remaining coefficients are called AC coefficients. If you use DCT to transform the image, the AC coefficient contains more details of the image. Meanwhile, if these DCT coefficients are applied to the reverse 2D-DCT, the original coefficients will be obtained. DCT does not compress the data itself and provides a good basis for subsequent operations such as quantization.

Who invented DCT?

The first person to propose DCT was Nasir Ahmed.

Nasir Ahmed

Nasir was born in 1940 in Bangalore, India, where he completed his undergraduate studies in electrical engineering. He then came to the United States to study. At the University of New Mexico, He earned his M.S. and Ph.D. degrees in electrical and computer engineering.

Nasir worked for Honeywell from 1966 to 1968 before beginning his teaching career at Kansas State University. In 1984, he became a professor of electrical and computer engineering at the University of New Mexico, where he remained on the faculty until his retirement in 2001. He is now a professor emeritus at the University of New Mexico.

During his tenure, Nasir was also a consultant at Sandia National Laboratories (1976-1990), a honeywell laboratory focused on technological innovation in collaboration with universities and companies.

DCT was the most important achievement of Nasir’s life.

In the mid-1970s, Nasir led a team of researchers at Kansas State University who developed DCT technology.

DCT is the most widely used data compression and conversion technology in the world and is the basis for most digital media standards (image, video, and audio).

How was DCT created?

In the 1960s and 1970s, the research on digital orthogonal transform and its application in image data compression emerged one after another. Many transformations claim to have better performance than others, but these comparisons are all based on qualitative comparisons that look at a set of “standard” images for data compression using transform coding techniques.

During the same period, significant progress was made in quantitative comparison. Variance criterion and Rate Distortion criterion have been developed and widely used to evaluate the performance of image data compression. In addition, KLT (Karhunen-Loeve Transform, K-L transform) becomes the optimal transform for comparison purposes.

It was against this technical background that Nasir began to tackle the DCT problem.

Nasir found that KLT was indeed an optimal transformation based on the mean square error criterion and first-order Markov model, but there was no effective algorithm to calculate it. Therefore, how to calculate the best approximate value of KLT effectively became his research focus. He then came up with a method worth studying — Chebyshev interpolation. In 1972, he turned the idea into a proposal and submitted it to the NATIONAL Science Foundation (NSF) for funding. In his proposal, Nasir proposed using Chebyshev polynomials to study “cosine transformations” — later known as DCT:

Much to his disappointment, the NSF did not fund the proposal, with one of its reviewers saying it was “too simple.”

Nasir did not give up, however, and sought out his doctoral student T. Natarajan and his friend K.R.Rao, who spent the summer of 1973 working on the problem. Eventually, their research yielded results that were too good for Nasir to believe. Nasir was later due to attend a mathematics conference in New Orleans with Harry Andrews, so Nasir decided to consult him there.

Harry Andrews suggested Nasir use distortion criteria to check the performance of this “cosine transform” and sent him a computer program to help him calculate the results.

In the end, the results again showed that the DCT transform performed better than all the others and was very close to KLT in performance. Harry Andrews then suggested that Nasir publish the results. Nasir followed his advice and sent the paper in the form of a letter to IEEE Computer Transactions (IEEE’s newsletter), which was published in January 1974.

As Nasir later recalled, no one could have imagined that DCT would become such a sensation in the future.

The importance of DCT was so mismatched by its discovery that Gilbert Strang wrote in a 1999 paper: “The discrete problem is so common and almost inevitable that it is surprising that the industry did not discover DCT until 1974 by Nasir Ahmed et al.” There is also some recent research evidence that while it is an indisputable fact that DCT was developed by Ahmed et al., John von Neumann also did some pioneering work on DCT around 1941, though von Neumann himself may not have been aware of its importance.

Introduction of DCT implementation

DCT has eight forms. What we usually call DCT actually refers to DCT-II, and its corresponding inverse transformation is DCT-III. The original definition of DCT-II and DCT-III was very simple:

Among them:

X: X is the DCT output.

X: x is the DCT input.

K: k is the index of the calculated output data, ranging from 0 to N−1

N: the number of N transformation elements.

S: s is the scaling function, except s(0)=0.5 and s(y)=1

The simplest implementation of the original N-point DCT-II transformation algorithm might look like this:

void dct_ii(int N, const double x[], double X[]) { for (int k = 0; k < N; ++k) { double sum = 0.; double s = (k == 0) ? SQRT (0.5) : (1); for (int n = 0; n < N; + + n) {sum + = s * x [n] * cos (PI * * k/n) (n + 0.5); } X[k] = sum * SQRT (2.0 / N); }}Copy the code

Subsequently, some fast algorithms of DCT have been developed successively, among which LLM DCT is probably the most noticeable one (LLM comes from three authors of corresponding algorithm: Loeffler, Ligtenberg, and Moschytz, His papers “Practical Fast 1-D DCT Algorithms with 11 Multiplications”) and AAN DCT(AAN is also named after three algorithm authors: Arai, Agui and Nakajima, whose corresponding paper is “A Fast DCT-SQ Scheme for images”).

The algorithm flow of LLM DCT is as follows:

It introduces a very classic butterfly shape:

It should be noted that in the algorithm paper of LLM DCT, there is an obvious error in the label description of the butterfly graph. O1 appears twice in its description, and obviously the first one should be O0.

The AAN DCT is computed as follows, using five multiplications (plus eight post-multiplications for scaling, which are not considered in the article as they can be moved to the later quantization matrix and amortized).

Take H.264 standard for example, it actually puts DCT transform and subsequent quantization together to reduce the complexity of DCT transform calculation, so sometimes when you look at THE DCT transform coefficient of H.264, you can hardly imagine that it is actually a DCT transform at first glance. Since the era of H.264, DCT transformation began to use integer transformation, to avoid the similar problem of incomplete matching of coding and decoding caused by different DCT and IDCT implementation accuracy in MPEG2 era.

The application of DCT

The de-correlation and energy compression properties of DCT make it extremely attractive in image and video compression. Karhunen-loeve transform (KLT) is usually called ideal transform, which has better de-duplication property, but is computationally difficult to solve. DCT, on the other hand, is easy to program, allowing it to quickly dominate the image and video compression space. Now common image, video compression, such as JPEG, H.26x, MPEG, etc., are using DCT.

In the following form is DCT applications in image and video compression (form from: en.wikipedia.org/wiki/Discre…

image

Image compression standard

Year published

Common application

JPEG

1992

The most widely used image compression standards and digital image formats

JPEG   XR

2009

Developed by Microsoft, it supports both lossless and lossy compression and is the preferred image format for XPS

WebP

2010

Developed by Google to support lossy compression of digital images

HEIF

2013

HEVC based image compression format, support animation, in terms of compression, more efficient than GIF

BPG

2014

Based on HEVC compression

JPEG   XL

2020

Royalty-free, supports lossless and lossy compression, raster graphics file formats

video

Video coding standard

Year published

Common application

H.261

1988

The first formal video compression standard, often used in past video conferencing and video telephony products

MJPEG

1992

QuickTime, video editing, non-linear editing, digital cameras

MPEG-1

1993

CD, Internet video

Mpeg-2 (h. 262)

1995

Broadcast, digital TELEVISION, HDTV, cable, satellite, high-speed Internet, DVD storage and processing of digital images

DV

1995

Video cameras, digital tapes

H.263 (MPEG-4 Part II)

1996

Video teleconferencing on PSTN, H.320, and ISDN

AVC /   H.264 / MPEG-4

2003

The most common HD video recording/compression/distribution format for Internet video, YouTube, Blu-ray disc, HD TV broadcasting, Web browsers, streaming TV, mobile devices, consumer devices, Netflix, video calling, Facetime, etc

Theora

2004

Web video, Web browser

VC-1

2006

Windows Media, Blu-ray DISC

Apple   ProRes

2007

Professional Video Production

WebM   Video

2010

Multimedia open source format developed by Google for use with HTML5

HEVC /   H.265

2013

Next generation coding standard following H.264/ MPEG-4 AVC, significantly improved compression capability

Daala

2013

Developed by the Xiph.Org Foundation, the goal is to surpass H.265 and VP9 in performance

H.266/VVC

2020

The newly released compression standard is mainly for 4K and 8K video services

Nasir status

In February, episode 8 of season 5 of This Is Us was interspersed with “The Story of Mr. And Mrs. Ahmed.” This story is based on the reality of what happened between Nasir and his wife Esther. The two were university of New Mexico alums who met and fell in love at a university reunion for international students. They married and have known each other ever since. In 2018, Nasir and Esther also published a limited edition book about their life story – Parallel Lives In Curved Space. Last year, the couple celebrated their 56th wedding anniversary.

 

Nasir and Esther are having a video conversation with the cast of Lives of Our Lives

Why weave such a story into the lives of Us episode?

It turned out that the director wanted to pay tribute to Nasir Ahmed, the inventor of DCT technology, through this story. Without Nasir, the Pearson family in the show would not have been able to keep in touch and console each other through video conversations during the COVID-19 pandemic.

The same is true of us in real life.

References:

Y. Arai, T. Agui, and M. Nakajima, “A Fast DCT-SQ Scheme for Images,” IEICE Transactions, Vol. E71, pp. 1095 — 1097, Nov. 1988. 1, 8, 9, 10, 11

2. C. Loeffler, A. Ligtenberg, and G. S. Moschytz, “A Practical fast 1-D DCT algorithms with 11 Multiplications,” in International Conference on Acoustics, Speech, And Signal Processing, Vol. 2, pp. 988 — 991, May 1989. 1, 4, 9, 10

3. N. Ahmed and K. R. Rao, Orthogonal Transforms for Digital Signal Processing. New York,NY: Springer-Verlag, 1975. 1, 10

4. Ahmed, Nasir; Natarajan, T.; Rao, K. R. (January 1974), “Discrete Cosine Transform” (PDF), IEEE Transactions on Computers, C-23 (1): 90-93, doi: 10.1109 / T – c. 1974.223784

5. Ahmed, Nasir (January 1991). “How I Came Up With the Discrete Cosine Transform”. Digital Signal Processing. 1 (1): 4, 5, doi: 10.1016/1051-2004 (91) 90086 – Z.


Scan the QR code to learn more about the conference