Wavelets
- Fourier transform has its basis functions in sinusoids.
- Wavelets based on small waves of varying frequency and limited duration.
- Account for frequency and location of the frequency.
- In addition to frequency, wavelets capture temporal information.
- Bound in both frequency and time domains.
- Localized wave and decays to zero instead of oscillating forever.
- Form the basis of an approach to signal processing and analysis known as multiresolution theory.
- Concerned with the representation and analysis of images at different resolutions.
- Features that may not be prominent at one level can be easily detected at another level.
- Comparison with Fourier transform:
- Fourier transform used to analyze signals by converting signals into a continuous series of sine and cosine functions, each with a constant frequency and amplitude, and of infinite duration.
- Real world signals (images) have a finite duration and exhibit abrupt changes in frequency.
- Wavelets are based on a mother wavelet, denoted by .
-
Wavelet transform converts a signal into a series of wavelets.
-
Wavelet transform basis functions are obtained by scaling and shifting the mother wavelet:
where is the translation to determine the location of wavelet and is scaling to govern its frequency.
-
- Fourier transform used to analyze signals by converting signals into a continuous series of sine and cosine functions, each with a constant frequency and amplitude, and of infinite duration.
Background
- Objects in images are connected regions of similar texture and intensity levels.
- Use high resolution to look at small objects; coarse resolution to look at large objects.
- If you have both large and small objects, use different resolutions to look at them.
- Images are 2D arrays of intensity values with locally varying statistics
Image pyramids
-
Structure to represent images at more than one resolution.
-
Collection of decreasing resolution images arranged in the shape of a pyramid.
-
Figure 7.2a
-
Highest resolution image at the pyramid base.
-
As you move up the pyramid, both size and resolution decrease.
-
Base level of size .
-
General level of size , .
-
Pyramid may get truncated at level , .
-
Number of pixels in a pyramid with levels () is:
-
-
Figure 7.2b
- Building image pyramids.
- Level approximation output provides the images needed to build an approximation pyramid.
- Level prediction residual output is used to build a complementary prediction residual pyramid.
- Contain only one reduced-resolution approximation of the input image at the top level.
- All other levels contain prediction residuals where level prediction residual is the difference between level approximation and an estimate of the level approximation based on the level approximation.
- Both approximation and prediction residual pyramids are computed in an iterative fashion.
- Start by placing the original image in level of the approximation pyramid.
- Three step procedure:
- Compute a reduced-resolution approximation of level input image; done by filtering and downsampling the filtered result by a factor of 2; place the resulting approximation at level of approximation pyramid.
- Create an estimate of level input image from the reduced resolution approximation generated in step 1; done by upsampling and filtering the generated approximation; resulting prediction image will have the same dimensions as the level input image.
- Compute the difference between the prediction image of step 2 and input to step 1; place the result in level of prediction residual pyramid.
- After iterations, the level approximation output is placed in the prediction residual pyramid at level .
- Variety of approximation and interpolation filters
- Neighborhood averaging producing mean pyramids.
- Lowpass Gaussian filtering producing Gaussian pyramids.
- No filtering producing subsampling pyramids
- Interpolation filter can be based on nearest neighbor, bilinear, and bicubic.
- Building image pyramids.
-
Upsampling:
-
Doubles the spatial dimensions of approximation images.
-
Given an integer and 1D sequence of samples , upsampled sequence is given by
-
Insert a 0 after every sample in the sequence.
-
-
Downsampling:
-
Halves the spatial dimensions of the prediction images.
-
Given by
-
Discard every other sample.
-
Haar Wavelet
-
Oldest and simplest orthonormal wavelets.
-
Expressed in matrix form as
- is an image matrix, .
- is an Haar transformation, and contains the basis function for the wavelet.
- Basis function defined over continuous closed interval for where .
- is resulting transform.
- Transform is required because is not symmetric.
- generated by defining the integer where or for , and for .
-
Haar basis functions are
-
-
Haar scaling function is denoted by and Haar wavelet function is denoted by .
-
Haar scaling function (averaging or lowpass filter) at level 0 (in the original signal) is given by
-
Translation by is denoted by
-
Figure below shows both and
(Diagram showing phi and phi_j. phi is a box from 0 to 1. phi_j is a box from j to j+1.)
-
Coefficients of the signal indexed by are given by
-
An approximate reconstruction of from is given by
Discrete Wavelet Transform
-
CWT is redundant as the transform is calculated by continuously shifting a continuously scalable function over a signal and calculating the corelation between the two
-
The discrete form is normally a [piecewise] continuous function obtained by sampling the time-scale space at discrete intervals.
-
The process of transforming a continous signal into a series of wavelet coefficients is known as wavelet series decomposition.
-
Scaling function can be expressed in wavelets from to
-
Adding a wavelet spectrum to the scaling function yields a new scaling function, with a spectrum twice as wide as the first.
- Addition allows us to express the first scaling function in terms of the second
- The formal expression of this phenomenon leads to multiresolution formulation or two-scale relation as
- This equation states that the scaling function (average) at a given scale can be expressed in terms of translated scaling functions at the next smaller scale, where the smaller scale implies more detail.
-
Similarly, the wavelets (detail) can also be expressed in terms of translated scaling functions at the next smaller scale as
-
The functions and are known as scaling filter and wavelet filter, respectively.
- These filters allow us to implement the discrete wavelet transform (DWT) as an iterated digital filter bank.
Subsampling property
- Gives a step size of 2 in the variable for scaling and wavelet filters.
- Every iteration of filter banks reduces the number of samples by half so that in the
- In the last case, we are left with only one sample.
Implementation of Haar Wavelets
-
Any wavelet implemented by the iteration of filters with rescaling.
-
Set of filters form the filter bank.
-
Let be an integer.
-
Averaging and detail filters implemented using two filtering matrices and given by
\frac{1}{2} & \frac{1}{2} & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \frac{1}{2} & \frac{1}{2} \end{bmatrix} \frac{1}{2} & -\frac{1}{2} & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \frac{1}{2} & -\frac{1}{2} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \frac{1}{2} & -\frac{1}{2} \end{bmatrix}$$ -
Let and denote the transpose of and , respectively.
-
Let denote a identity matrix.
-
Then, the following facts about and are true:
-
For simplicity, consider the original signal to be sampled as a vector of length .
-
The filtering process includes downsampling () and decomposes into two vectors (for block average) and (for detail) given by
-
and can be combined to reconstruct the original signal
- A lossy compression can be achieved by discarding the detail vector , or setting it to be zero.
-
-
Haar filter is applied to an image by the application of and filters in a tensorial way.
-
Let be a picture image represented as an matrix of pixels.
-
Applying the filter to , we get a new image as
-
is an matrix such that
-
-
Application of H and G filters results into four matrices given by.
- is called the fully averaged picture.
- and are called partially averaged and partially differenced pictures.
- is called the fully differenced picture.
-
The four components can be used to reconstruct the original image as.
- Above equation is known as a synthesis filter bank.
- The matrix is orthogonal as its inverse is the transpose, or
-
Matrices in synthesis bank are also known as orthogonal filter bank.
-
Note that
- The synthesis bank is the inverse of the analysis bank.
- Analysis bank contains the steps for filtering and downsampling.
- Synthesis bank reverses the order and performs upsampling and filtering.
-
Analysis of a picture (for two levels) is shown below
(Diagram of a wavelet decomposition of an image, showing two levels. The first level splits the original image into four quadrants: top-left is the averaged image, top-right and bottom-left are partially averaged/differenced, and bottom-right is fully differenced. The top-left quadrant is then further decomposed in the same way.)
- Figure below shows an original texture, and its compression and reconstruction
(Diagram of wavelet compression/reconstruction. A satellite image is decomposed into four quadrants. The averaged quadrant is then further decomposed. The reconstruction uses only the final averaged image, with the detail images set to zero.) - Top left shows the original pixel texture. - Application of Haar wavelet results into four pixel components which are combined into a pixel image shown on top right. - Top left quarter of this image shows the fully averaged part. - Top right quarter contains the partially averaged part. - Bottom left quarter contains the partially differenced part. - Bottom right quarter contains the fully differenced component. - Haar wavelet is applied to the fully averaged part again and the assembled components are shown in the bottom left picture. - This picture is then used for reconstruction of the texture and the reconstructed texture is shown in the bottom right picture.
- Lossy compression is achieved by discarding the differenced pictures (setting the matrices to zero) and retaining only during the reconstruction phase.
- The process can be carried through several processing steps, thus removing a large amount of detail information.
Other wavelets
- Haar wavelet transform, as described above, may not be able to take good advantage of the continuity of pixel values within images.
- Other wavelets may perform better at this, and achieve higher compression of textures, specially if the textures are smooth images.
JPEG 2000 Standard1
-
Based on wavelets to achieve compression.
-
Scalable in nature.
- Can be decoded in a number of ways.
- By truncating the codestream at any point, we can get the image representation at a lower resolution.
- Encoders and decoders are computationally demanding.
- Standard JPEG produces ringing artifacts at lower resolutions, specially near image edges.
- It also produces blocking artifacts due to its blocks.
- Standard JPEG produces ringing artifacts at lower resolutions, specially near image edges.
-
Comparison with standard JPEG
- Much better scalability and editability.
- In standard JPEG, you have to reduce the resolution of the input image before encoding if you want to go below a certain bit limit.
- Comes for free in JPEG 2000 because it does so automatically through multiresolution decomposition.
- Superior Compression
- Nearly imperceptible artifacts at higher bit rates.
- At lower bit rates ( bits/pixel for grayscale images), JPEG 2000 has less visible artifacts than standard JPEG and almost no blocking.
- Compression gains are due to DWT and more sophisticated entropy coding.
- Multiresolution representation
- Use of DWT allows for decomposition of image at different resolutions.
- Allows for other purposes (such as presentation) in addition to compression.
- Progressive transmission by pixel and resolution accuracy.
- Efficient code stream organization
- Progressive by pixel accuracy (SNR scalability) and image resolution.
- Quality can be gradually improved by downloading more data bits.
- Designed with web applications in mind.
- Choice of lossless or lossy compression in a single compression architecture.
- Random code-stream access and processing.
- Access to different regions of interest at varying degrees of granularity.
- Possible to store different parts of same picture using different quality.
- Error resilience
- Robust to bit errors from noisy communication channels.
- Flexible file format, specially for color-space information and metadata.
- High dynamic range support
- Support any bit depth, including 16-bit and 32-bit floating point images.
- Side channel spatial information for transparency and alpha planes.
- Much better scalability and editability.
-
Color components transformation
- Images are transformed from RGB to another color space to handle the components separately.
- Two Possible choices.
- Irreversible color transform
- Uses YCBCR color space.
- Irreversible because it has to implemented using floating point or fixpoint and causes round-off errors.
- Reversible Color Transform
-
Uses a modified YUV color space that does not introduce quantization errors.
-
Fully reversible.
-
Transformation given by Forward:
Reverse:
- Chrominance components can be down-scaled in resolution.
- Downsampling effectively handled by separating images into scales and dropping the finest wavelet scale.
-
- Irreversible color transform
- Divides an image into two-dimensional array of samples, known as components.
- As an example, a color image may consist of several components representing base colors red, green, and blue.
-
Tiling
- Image and its components are decomposed into rectangular tiles, which form the basic unit of original or reconstructed image.
- All the components (for example different color components) that form a tile are kept together so that each tile can be independently extracted/decoded/reconstructed.
- Tiles can be any size, but all the tiles in the image are the same size.
- Possible to have different sized tiles on right and bottom border.
- Decoder needs less memory to decode the image.
- You can also opt for partial decoding by decoding only a subset of tiles.
- Quality of the image may decrease due to lower peak SNR.
- Using many tiles may lead to blocking artifacts.
-
Wavelet transform
- Tiles are analyzed using wavelets to create multiple decomposition levels.
- Yields a number of coefficients to describe the horizontal and vertical spatial frequency characteristics of the original tiles, within a local area.
- Different decomposition levels are related by powers of 2.
- Wavelet transformation to arbitrary depth.
- Two different wavelet transforms used.
- Irreversible: CDF 9/7 wavelet transform
- CDF - Cohen Daubechies Feauveau
- Introduces quantization noise that depends on the precision of the decoder.
- Reversible: a rounded version of the biorthogonal CDF 5/3 wavelet
- Uses only integer coefficients; no rounding and hence, no quantization noise.
- Used in lossless coding
- Irreversible: CDF 9/7 wavelet transform
- Wavelet transform implemented by the lifting scheme or convolution.
- Tiles are analyzed using wavelets to create multiple decomposition levels.
-
Quantization
- Transformed coefficients are scalar-quantized to reduce the number of bits used in representation
- Information content of a large number of small-magnitude coefficients reduced by quantization, giving code-blocks.
- Leads to a loss of quality.
- Code blocks are sets of integers that are encoded bit-by-bit.
- Greater quantization step leads to greater compression and loss in quality.
- Quantization step of 1 implies no quantization; used in lossless compression.
-
Coding
- At this point, we have a collection of sub-bands representing several approximation scales.
- Each sub-band a set of coefficients
- Real numbers representing aspects of image associated with certain frequency range as well as a spatial area of the image.
- Precincts
- Quantized sub-bands split into precincts, regular regions in the wavelet domain.
- Selected so that coefficients in a precinct across sub-band form approximate spatial block in the reconstructed image.
- Code blocks
- Precincts split into code blocks.
- Code blocks located in a single sub-band
- All code blocks have the same size, except at the end of the image.
- Additional compression is achieved by entropy coding of bit-planes of the coefficients in code-blocks to reduce the number of bits required to represent quantized coefficients.
- At this point, we have a collection of sub-bands representing several approximation scales.
1From Wikipedia