Wavelets

  • Fourier transform has its basis functions in sinusoids.
  • Wavelets based on small waves of varying frequency and limited duration.
    • Account for frequency and location of the frequency.
  • In addition to frequency, wavelets capture temporal information.
    • Bound in both frequency and time domains.
    • Localized wave and decays to zero instead of oscillating forever.
  • Form the basis of an approach to signal processing and analysis known as multiresolution theory.
    • Concerned with the representation and analysis of images at different resolutions.
    • Features that may not be prominent at one level can be easily detected at another level.
  • Comparison with Fourier transform:
    • Fourier transform used to analyze signals by converting signals into a continuous series of sine and cosine functions, each with a constant frequency and amplitude, and of infinite duration.
      • Real world signals (images) have a finite duration and exhibit abrupt changes in frequency.
      • Wavelets are based on a mother wavelet, denoted by .
        • Wavelet transform converts a signal into a series of wavelets.

        • Wavelet transform basis functions are obtained by scaling and shifting the mother wavelet:

          where is the translation to determine the location of wavelet and is scaling to govern its frequency.

Background

  • Objects in images are connected regions of similar texture and intensity levels.
  • Use high resolution to look at small objects; coarse resolution to look at large objects.
  • If you have both large and small objects, use different resolutions to look at them.
  • Images are 2D arrays of intensity values with locally varying statistics

Image pyramids

  • Structure to represent images at more than one resolution.

  • Collection of decreasing resolution images arranged in the shape of a pyramid.

  • Figure 7.2a

    • Highest resolution image at the pyramid base.

    • As you move up the pyramid, both size and resolution decrease.

    • Base level of size .

    • General level of size , .

    • Pyramid may get truncated at level , .

    • Number of pixels in a pyramid with levels () is:

  • Figure 7.2b

    • Building image pyramids.
      • Level approximation output provides the images needed to build an approximation pyramid.
      • Level prediction residual output is used to build a complementary prediction residual pyramid.
        • Contain only one reduced-resolution approximation of the input image at the top level.
        • All other levels contain prediction residuals where level prediction residual is the difference between level approximation and an estimate of the level approximation based on the level approximation.
    • Both approximation and prediction residual pyramids are computed in an iterative fashion.
    • Start by placing the original image in level of the approximation pyramid.
    • Three step procedure:
      1. Compute a reduced-resolution approximation of level input image; done by filtering and downsampling the filtered result by a factor of 2; place the resulting approximation at level of approximation pyramid.
      2. Create an estimate of level input image from the reduced resolution approximation generated in step 1; done by upsampling and filtering the generated approximation; resulting prediction image will have the same dimensions as the level input image.
      3. Compute the difference between the prediction image of step 2 and input to step 1; place the result in level of prediction residual pyramid.
    • After iterations, the level approximation output is placed in the prediction residual pyramid at level .
    • Variety of approximation and interpolation filters
      • Neighborhood averaging producing mean pyramids.
      • Lowpass Gaussian filtering producing Gaussian pyramids.
      • No filtering producing subsampling pyramids
      • Interpolation filter can be based on nearest neighbor, bilinear, and bicubic.
  • Upsampling:

    • Doubles the spatial dimensions of approximation images.

    • Given an integer and 1D sequence of samples , upsampled sequence is given by

    • Insert a 0 after every sample in the sequence.

  • Downsampling:

    • Halves the spatial dimensions of the prediction images.

    • Given by

    • Discard every other sample.

Haar Wavelet

  • Oldest and simplest orthonormal wavelets.

  • Expressed in matrix form as

    • is an image matrix, .
    • is an Haar transformation, and contains the basis function for the wavelet.
      • Basis function defined over continuous closed interval for where .
    • is resulting transform.
    • Transform is required because is not symmetric.
    • generated by defining the integer where or for , and for .
      • Haar basis functions are

  • Haar scaling function is denoted by and Haar wavelet function is denoted by .

  • Haar scaling function (averaging or lowpass filter) at level 0 (in the original signal) is given by

  • Translation by is denoted by

  • Figure below shows both and

    (Diagram showing phi and phi_j. phi is a box from 0 to 1. phi_j is a box from j to j+1.)

  • Coefficients of the signal indexed by are given by

  • An approximate reconstruction of from is given by

Discrete Wavelet Transform

  • CWT is redundant as the transform is calculated by continuously shifting a continuously scalable function over a signal and calculating the corelation between the two

  • The discrete form is normally a [piecewise] continuous function obtained by sampling the time-scale space at discrete intervals.

  • The process of transforming a continous signal into a series of wavelet coefficients is known as wavelet series decomposition.

  • Scaling function can be expressed in wavelets from to

  • Adding a wavelet spectrum to the scaling function yields a new scaling function, with a spectrum twice as wide as the first.

    • Addition allows us to express the first scaling function in terms of the second
    • The formal expression of this phenomenon leads to multiresolution formulation or two-scale relation as

    • This equation states that the scaling function (average) at a given scale can be expressed in terms of translated scaling functions at the next smaller scale, where the smaller scale implies more detail.
  • Similarly, the wavelets (detail) can also be expressed in terms of translated scaling functions at the next smaller scale as

  • The functions and are known as scaling filter and wavelet filter, respectively.

    • These filters allow us to implement the discrete wavelet transform (DWT) as an iterated digital filter bank.

Subsampling property

  • Gives a step size of 2 in the variable for scaling and wavelet filters.
  • Every iteration of filter banks reduces the number of samples by half so that in the
    • In the last case, we are left with only one sample.

Implementation of Haar Wavelets

  • Any wavelet implemented by the iteration of filters with rescaling.

    • Set of filters form the filter bank.

    • Let be an integer.

    • Averaging and detail filters implemented using two filtering matrices and given by

      \frac{1}{2} & \frac{1}{2} & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \frac{1}{2} & \frac{1}{2} \end{bmatrix} \frac{1}{2} & -\frac{1}{2} & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \frac{1}{2} & -\frac{1}{2} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \frac{1}{2} & -\frac{1}{2} \end{bmatrix}$$
    • Let and denote the transpose of and , respectively.

    • Let denote a identity matrix.

    • Then, the following facts about and are true:

    • For simplicity, consider the original signal to be sampled as a vector of length .

    • The filtering process includes downsampling () and decomposes into two vectors (for block average) and (for detail) given by

    • and can be combined to reconstruct the original signal

      • A lossy compression can be achieved by discarding the detail vector , or setting it to be zero.
  • Haar filter is applied to an image by the application of and filters in a tensorial way.

    • Let be a picture image represented as an matrix of pixels.

    • Applying the filter to , we get a new image as

    • is an matrix such that

  • Application of H and G filters results into four matrices given by.

    • is called the fully averaged picture.
    • and are called partially averaged and partially differenced pictures.
    • is called the fully differenced picture.
  • The four components can be used to reconstruct the original image as.

    • Above equation is known as a synthesis filter bank.
    • The matrix is orthogonal as its inverse is the transpose, or

    • Matrices in synthesis bank are also known as orthogonal filter bank.

    • Note that

      • The synthesis bank is the inverse of the analysis bank.
      • Analysis bank contains the steps for filtering and downsampling.
      • Synthesis bank reverses the order and performs upsampling and filtering.
  • Analysis of a picture (for two levels) is shown below

(Diagram of a wavelet decomposition of an image, showing two levels. The first level splits the original image into four quadrants: top-left is the averaged image, top-right and bottom-left are partially averaged/differenced, and bottom-right is fully differenced. The top-left quadrant is then further decomposed in the same way.)

  • Figure below shows an original texture, and its compression and reconstruction

(Diagram of wavelet compression/reconstruction. A satellite image is decomposed into four quadrants. The averaged quadrant is then further decomposed. The reconstruction uses only the final averaged image, with the detail images set to zero.) - Top left shows the original pixel texture. - Application of Haar wavelet results into four pixel components which are combined into a pixel image shown on top right. - Top left quarter of this image shows the fully averaged part. - Top right quarter contains the partially averaged part. - Bottom left quarter contains the partially differenced part. - Bottom right quarter contains the fully differenced component. - Haar wavelet is applied to the fully averaged part again and the assembled components are shown in the bottom left picture. - This picture is then used for reconstruction of the texture and the reconstructed texture is shown in the bottom right picture.

  • Lossy compression is achieved by discarding the differenced pictures (setting the matrices to zero) and retaining only during the reconstruction phase.
    • The process can be carried through several processing steps, thus removing a large amount of detail information.

Other wavelets

  • Haar wavelet transform, as described above, may not be able to take good advantage of the continuity of pixel values within images.
  • Other wavelets may perform better at this, and achieve higher compression of textures, specially if the textures are smooth images.

JPEG 2000 Standard1

  • Based on wavelets to achieve compression.

  • Scalable in nature.

    • Can be decoded in a number of ways.
    • By truncating the codestream at any point, we can get the image representation at a lower resolution.
    • Encoders and decoders are computationally demanding.
      • Standard JPEG produces ringing artifacts at lower resolutions, specially near image edges.
        • It also produces blocking artifacts due to its blocks.
  • Comparison with standard JPEG

    • Much better scalability and editability.
      • In standard JPEG, you have to reduce the resolution of the input image before encoding if you want to go below a certain bit limit.
      • Comes for free in JPEG 2000 because it does so automatically through multiresolution decomposition.
    • Superior Compression
      • Nearly imperceptible artifacts at higher bit rates.
      • At lower bit rates ( bits/pixel for grayscale images), JPEG 2000 has less visible artifacts than standard JPEG and almost no blocking.
      • Compression gains are due to DWT and more sophisticated entropy coding.
    • Multiresolution representation
      • Use of DWT allows for decomposition of image at different resolutions.
      • Allows for other purposes (such as presentation) in addition to compression.
    • Progressive transmission by pixel and resolution accuracy.
      • Efficient code stream organization
      • Progressive by pixel accuracy (SNR scalability) and image resolution.
      • Quality can be gradually improved by downloading more data bits.
      • Designed with web applications in mind.
    • Choice of lossless or lossy compression in a single compression architecture.
    • Random code-stream access and processing.
      • Access to different regions of interest at varying degrees of granularity.
      • Possible to store different parts of same picture using different quality.
    • Error resilience
      • Robust to bit errors from noisy communication channels.
    • Flexible file format, specially for color-space information and metadata.
    • High dynamic range support
      • Support any bit depth, including 16-bit and 32-bit floating point images.
    • Side channel spatial information for transparency and alpha planes.
  • Color components transformation

    • Images are transformed from RGB to another color space to handle the components separately.
    • Two Possible choices.
      1. Irreversible color transform
        • Uses YCBCR color space.
        • Irreversible because it has to implemented using floating point or fixpoint and causes round-off errors.
      2. Reversible Color Transform
        • Uses a modified YUV color space that does not introduce quantization errors.

        • Fully reversible.

        • Transformation given by Forward:

          Reverse:

          • Chrominance components can be down-scaled in resolution.
          • Downsampling effectively handled by separating images into scales and dropping the finest wavelet scale.
    • Divides an image into two-dimensional array of samples, known as components.
      • As an example, a color image may consist of several components representing base colors red, green, and blue.
  • Tiling

    • Image and its components are decomposed into rectangular tiles, which form the basic unit of original or reconstructed image.
    • All the components (for example different color components) that form a tile are kept together so that each tile can be independently extracted/decoded/reconstructed.
    • Tiles can be any size, but all the tiles in the image are the same size.
      • Possible to have different sized tiles on right and bottom border.
      • Decoder needs less memory to decode the image.
      • You can also opt for partial decoding by decoding only a subset of tiles.
    • Quality of the image may decrease due to lower peak SNR.
    • Using many tiles may lead to blocking artifacts.
  • Wavelet transform

    • Tiles are analyzed using wavelets to create multiple decomposition levels.
      • Yields a number of coefficients to describe the horizontal and vertical spatial frequency characteristics of the original tiles, within a local area.
      • Different decomposition levels are related by powers of 2.
    • Wavelet transformation to arbitrary depth.
    • Two different wavelet transforms used.
      1. Irreversible: CDF 9/7 wavelet transform
        • CDF - Cohen Daubechies Feauveau
        • Introduces quantization noise that depends on the precision of the decoder.
      2. Reversible: a rounded version of the biorthogonal CDF 5/3 wavelet
        • Uses only integer coefficients; no rounding and hence, no quantization noise.
        • Used in lossless coding
    • Wavelet transform implemented by the lifting scheme or convolution.
  • Quantization

    • Transformed coefficients are scalar-quantized to reduce the number of bits used in representation
    • Information content of a large number of small-magnitude coefficients reduced by quantization, giving code-blocks.
    • Leads to a loss of quality.
    • Code blocks are sets of integers that are encoded bit-by-bit.
    • Greater quantization step leads to greater compression and loss in quality.
    • Quantization step of 1 implies no quantization; used in lossless compression.
  • Coding

    • At this point, we have a collection of sub-bands representing several approximation scales.
      • Each sub-band a set of coefficients
      • Real numbers representing aspects of image associated with certain frequency range as well as a spatial area of the image.
    • Precincts
      • Quantized sub-bands split into precincts, regular regions in the wavelet domain.
      • Selected so that coefficients in a precinct across sub-band form approximate spatial block in the reconstructed image.
    • Code blocks
      • Precincts split into code blocks.
      • Code blocks located in a single sub-band
      • All code blocks have the same size, except at the end of the image.
    • Additional compression is achieved by entropy coding of bit-planes of the coefficients in code-blocks to reduce the number of bits required to represent quantized coefficients.

1From Wikipedia