Discrete Cosine Transform (DCT)

What is DCT?

The Discrete Cosine Transform (DCT) is a mathematical technique used to transform a signal from the spatial domain to the frequency domain. It’s closely related to the Discrete Fourier Transform (DFT) but uses only real numbers and has better energy compaction properties, making it particularly well-suited for image and audio compression.

Key Idea: DCT expresses a finite sequence of data points as a sum of cosine functions oscillating at different frequencies.

DCT in Image Compression

DCT is a fundamental part of many image compression standards, including JPEG. Here’s how it’s used:

1. Blocking

The original image is divided into small blocks, typically 8x8 pixels.

2. Transformation

DCT is applied to each block independently. This converts the spatial information (pixel intensities) within the block into frequency information (coefficients representing the contribution of different cosine frequencies).

3. Quantization

The DCT coefficients are quantized. This involves dividing each coefficient by a corresponding value from a quantization table and rounding to the nearest integer. This step introduces lossiness but significantly reduces the amount of data needed to represent the image. Higher frequency coefficients (which are less perceptually important) are often quantized more heavily, resulting in many zeros.

4. Entropy Coding

The quantized coefficients are then further compressed using entropy coding techniques like Huffman coding or arithmetic coding. These methods exploit the statistical redundancy in the data (e.g., the prevalence of zeros) to achieve further compression.

5. Reconstruction (Inverse DCT)

To reconstruct the image, the process is reversed:

Entropy Decoding: The compressed data is decoded to recover the quantized coefficients.
Dequantization: The quantized coefficients are multiplied by the corresponding values from the quantization table.
Inverse DCT (IDCT): The IDCT is applied to each block to transform the frequency coefficients back into the spatial domain (pixel intensities).

DCT Equations

1-D DCT (Type-II, most common)

X_{k} = α (k) n = 0 \sum N - 1 x_{n} cos [\frac{π}{N} (n + \frac{1}{2}) k], k = 0, 1, \dots, N - 1

where:

$x_{n}$ is the input signal (e.g., pixel values in a row or column of a block).
$X_{k}$ is the $k^{t h}$ DCT coefficient.
$N$ is the length of the signal.
$α (k) = ⎩ ⎨ ⎧ \frac{1}{N} \frac{2}{N} if k = 0 if k \neq = 0$

1-D Inverse DCT (IDCT)

x_{n} = k = 0 \sum N - 1 α (k) X_{k} cos [\frac{π}{N} (n + \frac{1}{2}) k], n = 0, 1, \dots, N - 1

2-D DCT (for images)

X_{k_{1}, k_{2}} = α (k_{1}) α (k_{2}) n_{1} = 0 \sum N_{1} - 1 n_{2} = 0 \sum N_{2} - 1 x_{n_{1}, n_{2}} cos [\frac{π}{N _{1}} (n_{1} + \frac{1}{2}) k_{1}] cos [\frac{π}{N _{2}} (n_{2} + \frac{1}{2}) k_{2}]

where:

$x_{n_{1}, n_{2}}$ is the pixel value at position $(n_{1}, n_{2})$ in the block.
$X_{k_{1}, k_{2}}$ is the DCT coefficient at frequency indices $(k_{1}, k_{2})$ .
$N_{1}$ and $N_{2}$ are the dimensions of the block (e.g., 8x8).

2-D Inverse DCT (IDCT)

x_{n_{1}, n_{2}} = k_{1} = 0 \sum N_{1} - 1 k_{2} = 0 \sum N_{2} - 1 α (k_{1}) α (k_{2}) X_{k_{1}, k_{2}} cos [\frac{π}{N _{1}} (n_{1} + \frac{1}{2}) k_{1}] cos [\frac{π}{N _{2}} (n_{2} + \frac{1}{2}) k_{2}]

Basis Images

DCT basis images are the visual representations of the cosine functions used in the DCT. Each basis image corresponds to a specific frequency pair $(k_{1}, k_{2})$ .

Low-frequency basis images: Represent gradual changes in intensity across the block.
High-frequency basis images: Represent rapid changes in intensity (fine details and edges).

When DCT is applied to an image block, it essentially decomposes the block into a weighted sum of these basis images.

Benefits of DCT over FFT

While both DCT and FFT are used for frequency domain transformations, DCT has several advantages in the context of image compression:

Energy Compaction: DCT tends to concentrate the energy (information) of a typical image into a few low-frequency coefficients. This property is crucial for effective compression because it allows us to discard many high-frequency coefficients with minimal perceptual loss. FFT does not have the same level of energy compaction for real-world images.
Real-Valued Output: DCT uses only cosine functions and produces real-valued coefficients. FFT involves complex exponentials and results in complex-valued coefficients, which requires more storage.
Reduced Blocking Artifacts: Because of its energy compaction and the way it handles boundaries, DCT tends to produce fewer blocking artifacts (visible block boundaries in the reconstructed image) compared to a naive application of FFT in image compression.
Computational Efficiency Although there exist fast algorithms for both DCT and FFT, DCT might be slightly faster in certain implementations, especially for the common 8x8 block size.

2ndSem

Explorer