Introduction
  Introduction
Initial Proposal
Project Description

Background      Information
  Psychoacoustic Model
Filter Banks

Project Research
  Research Findings
List of MATLAB Code
Simulations

Further Work
  Extensions to Research
Wavelets

References

About Us

Compression Scheme Implementation

In order to compress a given audio file, we first had to implement:
  • a psychoacoustic model
  • a set filter banks
  • a quantization scheme
  • a signal compression scheme
With these three major components, we can encode a .wav file in the WAVS compression format and make a guess at the size of the created file. Given that we know the frequency components in the signal, we can then reconstruct the signal and hear the result.

Psychoacoustic Model Implementation

In order to compress data by a large factor, an algorithm must be lossy, i.e., it must throw out some of the information. In the case of an audio signal, one would assume that throwing out portions would result in a noticeable degradation in sound qua lity. When done blindly, this is true. However, using what is known as the psychoacoustic model helps to minimize the audible effects of lossy compression.

Implementation of the psychoacoustic model in MATLAB was performed with the following steps:

  1. (normalize.m) Normalize the power spectrum of the signal to a 0 dB maximum. This is done with the following equation:
    X[n] = s[n]/(N*2^(b-1))
  2. Break the signal into frames of 512 (12ms).
  3. Window the signal using a Hanning window with 1/16 overlap, so each signal has 10.9 ms of new data.
  4. (psd.m) Calculate the power spectral density using a length-512 FFT. A power normalization term of 90.302 dB is necessary for proper computation.
  5. (find_tones.m) Find the tone maskers. Once found, take the power found one index before [k-1] and after [k+1] and combine with the power at [k] to create a tone masker approximation, since the tone may actually be between the frequency samples.
  6. (noise_maskers.m) Find noise maskers and their locations within each critical band.
  7. (check_maskers.m) If a masker is below the absolute threshold of hearing, it may be discarded. If two maskers are within a critical bandwidth of each other, the weaker of the two may be thrown out as well.
  8. (mask_threshold.m) Calculate the masking threshold of each mask.
  9. (global_threshold.m) Sum the masking thresholds to get the overall masking threshold for all frequencies in this signal frame.
With this threshold, one can now move to quantization and bit allocation.

The following are three different test cases that go through this process step-by-step:

Filter Bank Implementation

Filter banks divide up a signal to help code the subbands differently. A bank is made up of an array of band pass filters that span the audio spectrum. They use information gotten from the psychoacoustic model to quantize the signals leading to a compressed bit stream representation of the signal.

Implementation of the filter banks was performed in MATLAB as follows:
  1. (encode.m) Encodes a given input signal. Windows a signal into 512 sample frames. Finds the global masking threshold for the window, then sends it to the filter banks.
  2. (findThreshold.m) Uses the psychoacoustic model to build a masking threshold for the windowed signal.
  3. (filterBanks8.m) Uses 32 analysis filters to break up the signal. Then down-samples and performs the coding. Then up-samples and re-synthesizes the signal.
  4. (performEncoding.m) Uses the psychoacoustic model to determine the number of bits to use to quantize a signal over a given frequency range. Then quantizes the signal.

The following is an example of running a signal through the filter banks and an analysis of the error produced.

Quantization Implementation

Our group developed two different quantization methods for performing the audio compression. The first method, called full range quantization, involves a pre-defined range that includes all possible input values. Since this method gave a noticeable degradation of sound quality, we decided to develop a different method of quantization. The second is a dynamic method, called narrow range quantization, that determines the range of quantization and the delta based on the current set of input data. The inputs to be quantized can range from [-1, 1], and it is quantized with 16 bits (input has 65,536 distinct values between [-1, 1]).

Full Range Quantization

The full range quantization method (q.m, iq.m) quantizes over the full range of allowable input values, [-1, 1], regardless of the range of the current input. This method achieves a better compression ratio because it does not need to store the extra information that is needed by the first method. However, it has less accuracy. This method has the same inputs as the other quantization method. With 3 bits, these are the results:

Level number:        1   2   3   4   5   6   7   8
Quantization value: -1 -.75 -.5  0  .5  .25 .5 .75

Input:  -.4  -.22  .14  .4
Output:   3    4    6    7

To reconstruct the input, an inverse-quantization method is used:

Input:   3    4    6    7
Output: -0.5 -0.25 0.25 0.50  (compare with: -.4 -.22 .14 .4)

TOTAL ERROR: .34

Narrow Range Quantization

The narrow range quantization method (quantization.m, invquantization.m) allows for greater accuracy at the expense of having a poorer compression ratio. This quantization sche me uses the current set of input to determine the range of values to quantize over as well as the delta. For example, if the current input only has a range of [-.4, .4], then we quantize over this range instead of the full range of [-1, 1]. The quantiza tion values will be much closer to the true values of the input. However, this method requires storing two extra numbers for each frame of data: the delta used and the lowest quantized value. This narrow range quantization requires two inputs: the input values, and the number of bits used for the quantization. The number of distinct numbers that can be stored is equal to 2^bits. For example, 4 bits allows the storing of 16 different numbe rs between the maximum and the minimum value of the current input. The quantization function returns 2 vectors. One of them contains the quantization levels of the current input. Each number will be an integer between 1 and the maximum number of level s. The second vector contains the information needed for reconstruction. It contains the delta value (the numerical difference between two adjacent quantization values) and the base-line value (the lowest value in the input). With the lowest input valu e and the delta value, we are able to reconstruct the original input. For example, if the baseline value is -.4 and delta is .1, then this is how the method works for 3-bit quantization:

Level number:         1     2    3    4   5   6   7   8
Quantization value:  -.4  -.3  -.2  -.1   0  .1  .2  .3

Input:  -.4  -.22  .14  .4
Output:   1    3     6   8
             Baseline: -.4
             Delta:     .1

To reconstruct the input, an inverse-quantization method is used:

Input:    1    3     6    5        (method needs Baseline and Delta)
Output: -0.4  -0.2   0.1  0.3      (compare with: -.4 -.22 .14 .4)

TOTAL ERROR: .16

Click here for a graphical representation of the above example

File Format and Comparison

In order to determine compression ratios for our compression schemes we first have to determine the number of bytes that each file takes.
  • Original .wav file:
    We read the number of bytes of the file from the operating system.

  • Original files size:
    Since we used .wav files at a 16-bit quantization per sample, we determined the file size to be:

    Total Bytes = (16 bits per sample) * (Number of samples) / (8 bits per byte)

  • 16-bit Compression:
    In this case we quantized using 16-bits per sample but we had the added overhead of 4 bits per sample to determine the quantization level of the filter in the filter bank: (Note: our file more than likely will be larger than the .wav file)

    Total Bytes = (16 quantization bits per sample + 4 overhead bits per sample) *
                            (Number of Samples) / (8 bits per sample)


  • 8-bit Compression:
    In this case we quantized using 8-bits per sample but we had the added overhead of 4 bits per sample to determine the quantization level of the filter in the filter bank:

    Total Bytes = (8 quantization bits per sample + 4 overhead bits per sample) *
                            (Number of Samples) / (8 bits per sample)


  • Full Range Compression:
    In this case we quantized using a variable number of bits, b(i), for each sample. This scheme also required 4 bits per filter of overhead in order to encode what bit level each sample had been quantized: (Since there are 32 filters per bank we have to first determine the number of bits in each frame, 512 samples, and then sum over the total number of frames)

    Total Bytes = (Sum from 0 to Number of Frames of
                            (Sum from i=0 to 32 of ((b(i) bits per sample) *
                            (32 samples per filter) + 4 bits overhead per filter)))


  • Narrow Range Compression:
    In this case we quantized using a variable number of bits, b(i), for each sample. This scheme also required 30 bits per filter of overhead in order to encode what bit level each sample had been quantized, the lowest value, and the delta, as discussed above: (Since there are 32 filters per bank we have to first determine the number of bits in each frame, 512 samples, and then sum over the total number of frames)

    Total Bytes = (Sum from 0 to Number of Frames of
                            (Sum from i=0 to 32 of ((b(i) bits per sample) *
                            (32 samples per filter) + 30 bits overhead per filter)))


  • Mp3:
    We read the number of bytes of the file from the operating system.

From this data we can then determine a compression ratio:

Percent Compression = (Original .wav File Size) / (Compressed File Size) * 100 - 100

(In this ratio comparision, 100 percent compression implies the file size was one-half of the original file size)

Results

Along with the compression ratios, we also have to look at the sound quality that compression gives us. By listening to these various compression schemes, we can determine how good the sound quality is as compared to the original signal with 16-bit quantization. Here is a listing of how the sound quality rates and some approximate compression ratios we were getting for each compression scheme:
  • Original .wav file:
    Baseline

  • 16-bit Compression:
    Compression: 0 percent
    Sound Quality: As good as the original.

  • 8-bit Compression:
    Compression: 70 - 95 percent
    Sound Quality: Not very good. Noticeable loss of quality from the original.

  • Full Range Compression:
    Compression: 60 - 75 percent
    Sound Quality: Not very good. Noticeable loss of quality from the original signal.

  • Narrow Range Compression:
    Compression: 30 - 50 percent
    Sound Quality: As good as the original.

  • Mp3:
    Compression: 375 - 450 percent
    Sound Quality: As good as the original.


Along with the narrow and full range quantization schemes, which dynamically determine the number of bits to quantize each bank in the filter bank, we also used straight 8-bit and 16-bit compression. In this case each filter bank is quantized at the same level, either 8 or 16-bits. However, the data still uses our file format. Here are the results we found running our compression schemes on various test signals. These tests took anywhere from 2 or 3 minutes to an hour to run in MATLAB. We did not run the 16-bit compression test for the larger files because MATLAB would crash every time we tried to run them. (To hear the results of these test click here)

Compression Full Range Narrow 8-bit 16-bit Original Est. Original .wav Mp3
  Bytes Bytes Bytes Bytes Bytes Bytes Bytes
Pure sine 14854 18168 14000 27600 24000 24044 5015
2 separate sines 14374 17656 14000 27600 24000 24044 5015
2 near sines 14830 18142 14000 27600 24000 24044 5015
Chime 261726 311542 237584 X 460000 460044 84427
Percussion 72330 89316 70000 X 120002 120046 37198
Modern 252692 301260 246576 X 449232 449276 82755

The following are five different test cases where we go through the analysis of the input and output signals in the time and frequency domain and look at the error between the signals in the frequency domain:


Observations

From the data we collected, we found that we have a long way to go in order to compete with mp3. In general, we found that sound quality decreased as the compression ratio increased. To hear our results, click here. However, there are some further avenues of research that might allow us to realize better compression in the future without the loss in sound quality.

[Alex Chen]   [Nader Shehad]   [Aamir Virani]   [Erik Welsh]