The author | Krishna Rao Vijayanagar

Translation | Alex

Technology review | zhao

DCT Easy Tech #001#

DCT (Discrete cosine transform) is the most basic tool of modern image and video compression. It converts the data in the image into the frequency domain and does so in order to reveal the information contained in the pixels. This article will explain the DCT in an easy-to-understand way.

Start with a simple exercise

Before we dive into the “esoteric” mathematical problem of transformation in signal processing, let’s first get a sense of the “why” of the transformation. Why do we need the transformation? What does the transformation do?

Disclaimer: Please do not mind the image aesthetics, after all, I am a video engineer not a PS expert 🙂

Take a look at the picture below. Imagine you are looking at three spheres through a window. If I asked you which sphere was the largest, you would immediately tell me, right?

The answer is fairly simple. The one on the left looks smallest, and the one on the right looks largest, right?

Are you sure?

Now, let’s have a bird’s eye view of the three spheres from above (assuming using a drone), do you still think the right side is the largest? Not sure? Let’s take a look.

In this picture, you can see:

  • Three spheres can be seen from above.

  • The thick blue line is the window seen from above (and therefore looks 2D).

When viewed from above, the three spheres look the same size. But in reality, they are not the same distance from the window. Looking out the window:

  • The leftmost sphere is so far away from the window that it looks smallest.

  • The sphere on the right is near the window, so it appears to be the largest.

Now, do you want to change your answer?

The way you look at the data changes

Let’s take a minute to think about what we just did, and why did our answer change?

We take a piece of data and, by changing our physical position (looking at the data from the front and from above), form two views or perspectives on the data.

Combining these two different perspectives helps us better understand the data and forces us to reexamine and reexamine our positions and thoughts on the information presented.

Ok, now let’s put that aside and look at another one. In this example, you can learn more about the data simply by changing the perspective.

Stars and constellations

When you look up at the sky and find a constellation, ask yourself: Are all the stars in the constellation in the same plane? Will they be very far apart from each other?

Here’s a fascinating video (go to LiveVideoStack) that shows how the stars in a constellation are “connected”. You’ll see that the stars are far from each other, but they seem to be on the same plane and take on a particular shape because we see them from earth at a great distance.

So, what is the transformation?

A transformation is a mathematical function that shifts input data from one field (perspective) to another for the purpose of:

  • Reveal hidden features of the data

  • Understand the data better

  • Highlight or weaken certain data features

I spend a lot of time talking about “transformations” because for many people, transformations are a regular obstacle to learning DCT or any other mathematical transformation (Fourier, Z, Laplace, etc.).

With an understanding of “transformations” in signal processing, let’s learn about the famous DCT.

Explain DCT to a 5-year-old

After all the math and tech lingo, let’s try explaining DCT to a 5-year-old (though it’s hard).

Imagine you’re playing the “I Spy With My Little Eye” guessing game With a small child. Here’s how it works: one person picks an object in a room, thinks about it, doesn’t tell anyone, and the rest of the group guesses what the object is by asking, say, 20 questions.

Now, suppose I am thinking of a picture of a boat on the wall, and ask the child to ask me questions and guess what I am thinking.

To better explain DCT, please allow me to modify the rules and provide 20 clues 🙂

Which one do you think is the best lead? For example, the item hangs on the wall directly opposite the door, and below the doorbell. Such a detailed clue will certainly help the child guess the answer, right?

The next best clue could be something like this: The object is a square or a box. We may also say that there are seas and boats.

If you give clues carefully, you will find that you don’t need 20 clues to help your child find the answer, usually five to eight. Certain cues tend to make children guess answers faster — indicating that the cues contain enough information.

Let’s pause for a second and see what we just did.

Here’s what you just did:

  • Take some data as input data — the position of the drawing

  • Convert input data into 20 clues (output)

  • Arrange the data in order of importance (how much information is contained in the clue)

  • Only a few clues were found to determine the painting’s location, with the rest responsible for adding details

DCT actually does something similar:

  • Get the data in one form

  • Convert it to another form so that the output data is sorted in descending order of importance

  • At this point we can return the original data with only a few output data points

I hope you understand what DCT is.

This is the end of the children’s play time 🙂

The following sections require some high school math and programming knowledge (preferably MATLAB or Octave).

Introduces DCT

Discrete cosine transform or DCT is widely used for image and video compression. Wikipedia explains:

DCT represents a finite sequence of data points by summation of cosine functions of different frequency oscillations.

Is everybody still there? You don’t need to worry about the math behind DCT just yet.

In simple terms, DCT takes a set of N correlated data points and returns N de-correlated data points (coefficients). In this way, the energy is compressed into a few coefficients M, where M<

In case you haven’t already figured it out, the DCT does two very important things when it converts input data into another domain:

1. Remove relevant data (remove any similar or related data points)

2. Compress energy/information into a few output data points

To summarize, DCT will:

  • N Data points as input

  • Returns N data points as output

  • Ensure that most of the input information is concentrated in a few of the N output data points

This is the energy compression (or information compression) property of DCT.

Let’s use an example (using MATLAB) to help deepen your understanding of DCT.

As shown below, this is an 8×8 matrix filled with the number 255. If you need 8 bits for each number, then you need a total of 8 x 64= 512 bits to store the entire matrix, right?

Now, let’s apply 8×8 2D-DCT to the entire matrix and get 8×8 DCT coefficients. 2D-DCT is the two-dimensional form of DCT for two-dimensional data, such as grayscale images.

The output of the 2D-DCT operation on the 8×8 matrix is as follows:

It looks completely different, right?

If you look closely, you will see that the first coefficient element of the matrix ([0, 0]) is not 0, and all the other elements are 0. This greatly reduces the amount of storage required for the matrix.

This is due to the de-correlation and energy compression properties of DCT. It is often described in technical literature (and it can be a little hard to understand) as follows:

DCT compresses all the energy in the matrix into the first element, the DC coefficient, while the remaining coefficient is called the AC coefficient.

This means:

  • The upper left corner of the output two-dimensional DCT is called the DC coefficient. It is the most important output of DCT and contains a lot of information about the original image.

  • The remaining coefficients are called AC coefficients. If you use DCT to transform the image, the AC coefficient contains more details of the image.

Now, if you apply the 2D-DCT inverse transformation to these DCT coefficients, you get the original coefficients.

If you want to try it, you can repeat the above experiment using the MATLAB command below.

InputPixels = ones(8,8) * 255; dctCoeffs = dct2(inputPixels); reconstructedPixels = idct2(dctCoeffs); Application of DCT in image and video compressionCopy the code

The de-correlation and energy compression characteristics of DCT are very suitable for image and video compression. Karhunen-loeve Transform (K-L Transform) is often called ideal Transform because of its better de-correlation, but it is difficult to realize in calculation. DCT, on the other hand, is widely used in the world of image and video compression because it is easier to program.

Here is a simple MATLAB script that demonstrates the power of 2D-DCT when applied to image compression (and video compression as well).

% read an image (MATLAB provides a few sample images) RGB = imread('autumn.tif'); % convert to grayscale I = rgb2gray(RGB); % compute the 2D DCT J = dct2(I); % discard certain coefficients (set to zero) J(abs(J) < T) = 0; % recover the pixels using the inverse 2D DCT K = idct2(J); % matlab code to display the original and reconstructed image figure imshowpair(I,K,'montage') title('Original Grayscale  Image (Left) and Reconstructed Image (Right)');Copy the code

The code is simple:

  • Load an RGB image and convert it to grayscale

  • Calculate 2D-DCT and store it in J

  • Set all coefficients whose amplitude is less than the threshold T to 0

  • Calculate 2D-DCT inverse transform and restore pixel (reconstruct image)

  • Compare the original image with the reconstructed image

Let’s do two experiments:

Experiment 1: Let’s set the threshold to 50 and set all AC coefficients with amplitude less than 50 to 0. Then, DCT inverse transform was used to reconstruct the image. Remember, in this example we don’t touch the DC coefficient (which is much larger than 50).

The image on the right below looks blurry and does not contain all the features of the original image. But the striking fact is that we set most of the coefficients to zero, retaining only 3.45% of the total coefficient 71070. So you see, with a coefficient of 3.45%, we were able to reconstruct an image that was passable, blurry but recognizable.

Experiment 2: Now, let’s change the threshold to 10 and set all coefficients with amplitude less than 10 to 0. This time, the number of non-zero coefficients, that is, the ones we kept, accounted for 23.45% of the total. This is the reconstructed image (2D-DCT inverse) — doesn’t it look amazing?

Why do these two things happen?

  • Because we’re protecting the DC coefficient, we’re not throwing it away.

  • In the first experiment, we dropped a lot of AC coefficients, affecting the finer details of the image, so it looked blurry.

  • In the second experiment, we retained more of the AC coefficients while preserving the finer details of the image.

Now, if you lower the threshold further, you can get a higher quality image.

Finally, these experiments show that DCT can significantly compress and recover data (in a lossy manner) even if more than 50% of the coefficients are discarded.

The experimental points

These two experiments demonstrate the power of DCT and its two properties of de-correlation and energy compression. They show that even if you discard most of the DCT coefficients, you can still reconstruct an image of some quality.

conclusion

I hope you have an intuitive understanding of DCT and the role its features play in image and video compression. I didn’t delve into the deep mathematical details of it in this article, but if you want to dig deeper, you can consult various literature sources on the Internet.

Special note: This article is from OTTVerse and has been translated and published with permission from Krishna Rao Vijayanagar. Thank you.

Original link:

Ottverse.com/discrete-co…


Scan the QR code to learn more about the conference