Introduction to digital audio

Introduction

This web page provides a short introduction to digital signal processing and the conversion of signals from the digital to the analog domain, or vice versa. It consists of an assembled collection of publications, mainly obtained from the internet, which are edited and re-grouped.

Some parts of this web page are quite mathematical. It is not necessary to read these mathematical parts in order to understand the remaining parts of this website.

Contents

Digital signals versus analog signals

When a signal is recorded without taking into to account the non-ideal characteristics of the recording media (limited bandwidth, noise), then there are no means to differentiate noise and distortion caused by the recording media from the original signal (which is the case with traditional analog recording). In the digital domain, noise and distortion can be separated (to a certain extend) from the signal, because it uses a limited and well-defined set of allowable waveforms (which is the case with digital recording).

The conversion of an analog signal to a digital signal requires two distinct processes, sampling and quantisation.

These processes will be explained in more detail in the remainder of this chapter.

Sampling

Sampling is the process which looks at the signal at regular periods of time (denoted by ΔT, or just T), and only remembers the signal value at these specific moments (see e.g. Figure 2.1).

The important Nyquist sampling theorem states how quickly samples must be taken to ensure that the original signal can be reconstructed exactly.

The next part is going to a bit technical, but proofs that `cutting the original signal into pieces' doesn't means that information is lost when the sampling frequency (fs = 1/T) is at least equal to twice the highest frequency component fc in the signal to be sampled (you can just memorize this result, and skip the mathematical part if it is too difficult for you).

Let f ( t ) be an analog signal, with F ( ω ) be the Fourier transform of that signal, with F ( ω ) = 0 for |ω| > ωc (hence, f ( t ) is a band limited signal).

The Fourier transform of f( t ) is given by:

 

The inverse Fourier transform is given by:

 

According to Fourier, a periodic function F ( ω ) with period T can be written as follows:

 

with

 

and with T = π/ωc, which in case |ω| < |ωc| can be written as

 

After substitution of n = -n, it can be derived that

 

with |ω| < |ωc|. Substitution of this result in the inverse Fourier transform results in

 

which after integration gives us

 

The last equation says that a band limited signal can be reconstructed completely from its values f ( nT ) (i.e. sample values) with T = π/ωc, or with sample frequency ωs = 2π/T = 2ωc. Hence the sample frequency should be at least twice the band width of the signal to be sampled. (end of mathematical proof).

The sampling theorem induces that the original analog signal to be sampled must be band limited to half the sampling frequency. This can be achieved by passing it through an ideal low-pass filter. This is one of the issues often forgotten by lots of people who claim that a 20kHz square wave cannot be sampled correctly by a digital audio system. Remark that a 20kHz square wave consists of a summation of odd harmonics of a 20kHz sine wave, i.e. a 20kHz sine wave (1st harmonic or basic tone) 60kHz sine wave (3rd harmonic), 100kHz sine wave (5th harmonic) and so on. The higher order harmonic components are filtered away before sampling, and hence only a 20kHz sine wave will be sampled.

The spectrum of the sampled signal is the same as the spectrum of the continuous signal except that copies (known as aliases) of the original now appear centred on all integer multiples of the sample rate. As an example, if a signal of 20 kHz bandwidth is sampled at 50 kHz then alias spectra appear from 30 - 70 kHz, 80 - 120 kHz, and so on. It is because the alias spectra must not overlap that a sample rate of greater than 2B is required (see Figure 2.2).

If the original analog signal to be sampled is not band limited, this will lead to an effect known as aliasing. During aliasing signals greater than 2.fc are imaged back across half the sampling frequency by the amount it exceeds half the sampling frequency. For a sampling rate of 44.1 kHz, a signal of 25 kHz will be aliased to (25.000 - 22.050 =) 2950 Hz.

In digital audio we are concerned with the base-band - that is to say the signal components which extend from 0 to B. Therefore, to sample at the standard digital audio rate of 44.1 kHz requires the input signal to be band-limited to the range 0 Hz to 22.05 kHz. Strictly speaking the input signal must be band-limited to 22.05 kHz.

The need for the application of a analog filter after DA-conversion (also called reconstruction filter) may not be intuitively obvious. This phenomena is visualized in Figure 2.3. The DA-conversion process will recover the sampled sequence as a square wave signal (by using a sample and hold stratey on the sampled values), as shown in the first part of Figure 2.3. This signal contains higher-order frequencies (the aliases) when compared to the original signal. Hence, the DA-conversion process has now created frequencies that did not previously exist. Because the input signal was band limited, a low-pass filter is used to strip the higher-order harmonics from the square wave (i.e. remove the aliases from the spectrum), and we are left with the orignal signal we started with.

One could think of just leaving the side band information in the spectrum, because we can't hear them. High frequency input signals may however cause intermodulation distortion inside an audio power amplifier or distortion components in loudspeakers.

Oversampling

Sampling or reconstructing an analogue signal at 44.1 kHz requires that the input signal is band-limited to half the sample rate, in this case 22.05 kHz, else the aliased spectra will overlap and information will be lost. For a practical implementation this may require an analogue filter of order 8 or 10 to be inserted to provide an audio bandwidth of 20 kHz and approximately 80 dB attenuation above about 24 kHz, required for high-fidelity sound reproduction. It is possible to design such a filter, but it would require a number of closely-toleranced components. Some inherent problems with such a filter design is that they have phase non-linearities at high frequencies, and high-frequency group delay-- change in phase shift with respect to frequency, resulting in audible distortions.

A possible solution is to sample the input at a higher frequency, thereby relaxing the constraints on the analogue input signal spectrum. Low-pass filtering and decimating (reducing the sampling rate) can then be applied in the digital domain, which is easier to accomplish correctly.

Oversampling consists of 2 parts, upsampling and digital filtering (see also Figure 2.4). During upsampling a signal x(n) at sampling frequency fs is changed to a signal y(n) with increased sample frequency k.fs by inserting k-1 zero-valued samples to the signal (in Figure 2.4, k=4). The insertion of k-1 zeros spreads the energy of each input signal sample over k output samples, effectively attenuating each sample by a factor k. Thus it is necessary to compensate for this by multiplying each sample of y(n) by k. After upsampling, digital filtering is accomplished by interpolating the samples of y(n), resulting in a signal z(n). The spectrum of z(n) is the same as the spectrum of the original continuous signal, but repeats around multiples of k.fs.

The impulse response of an ideal low-pass filter is a sync function, which stretches from - to + in time. To make the filter causal and implementable the ends of the response must be neglected. This results in a filter with a finite number of coefficients (see Figure 2.5). The truncation of the impulse in a filter caused by the use of a finite number of points results in a ripple in the response. Rather than simply truncate the impulse response in time, it is better to make a smooth transition. This can be accomplished by multiplying the coefficients in the filter by a window function which peaks in the centre of the impulse (see [Watk94] for details). The coefficients should be quantized finely enough to preserve a good filter performance. The advantages of digital FIR filters w.r.t. analogue filters is that they are not subject to component tolerance or drift and are straightforward to make phase-linear.

Because oversampling results in a spectrum in which side bands occur at a multiple of k.fs, k 1, the analogue filter that is needed is much easier to implement. In case k = 8 we have a region from 20kHz to 332.8kHz, and hence the filter requires a less steep slope than in the original situation without oversampling.

Jitter

Jitter is basically defined as time instability during sampling or reconstructing analog signals, and hence it occurs in both analog-to-digital and digital-to-analog conversion. Reading digital samples is controlled by the pulses of an oscillator (also called a clock). If the distance between clock pulses are not equidistant, the actual sampling/reconstruction time will vary from sample to sample. A timing error on the clock induces an erroneous sample level being captured in case of AD conversion (see the next section about quantisation), or the correct value being converted at the wrong moment in case of DA conversion, inducing noise and distortion.

Quantisation

To store a digital signal, each sample should be represented as a binary series of bits, the infinitely varying voltage amplitude of the analog signal must be assigned a discrete value. This process of assignment is known as quantisation.

In a 16-bit audio format, we can represent a sinusoidally varying voltage audio signal by 216 or 65,536 discrete levels. It is apparent then that quantisation is a limiting performance factor in the overall digital audio system, by the number of bits allowed to the quantizing system.

Quantisation of a signal is nothing more complex than rounding it to the nearest whole number. If the input to a quantiser is between p and p+0.5, then we round down to p; if the input lies between p +0.5 and p+1 then we round up to p+1. In general, the output step size (or quantum) need not be unity, but can be of arbitrary size q; in this case p is not an integer, but a whole multiple of q.

Consider the following simple system, where x ( n ) is a discrete-time input signal, Q is a quantiser whose output is y ( n ).

If x ( n ) is random and uncorrelated then at any given sample instant it is equally likely to lie anywhere in the interval (p - q/2) to (p + q/2). In other words, the quantisation error e (n ) = y ( n ) - x ( n ) is random and uncorrelated, and has a rectangular distribution of peak value q/2. Elementary statistics show that the expected (or average) quantisation error is zero, and that the quantisation noise power is q2/12.

A measurement of the quantisation error in a digitizing system can be made, and it is expressed as the signal-to-error (S/E) ratio. This ratio is given by 6.n, where n is the number of bits in the data word. Hence, the theoretical S/N ratio for a 16-bit system is 96 dB.

If x ( n ) exhibits some correlation (as does music for example) then this simple analysis is no longer valid. The errors become correlated to the signal and show up as distortion rather than as broad-band hiss. See the next section about dither for more details about this topic.

Consider a 16-bit ADC clocked at 44.1 kHz. Its quantisation noise is approximately 96 dB below a full-scale sinusoid, and is spread evenly from dc to 22.05 kHz. If the ADC is clocked faster then the total quantisation noise power remains unaffected, but it is spread over a wider bandwidth. For example, if the converter speed is doubled, the quantisation noise power is spread from dc to 44.1 kHz. The desired signal is, of course, still in the band from dc to 22.05 kHz, and the quantisation noise power in this band is halved.

A digital low-pass filter with a cut-off of 22.05 kHz cuts out half of the quantisation noise, increasing the SNR by 3 dB, but leaving the audio-band signal unaffected. This filter is generally the same one as the digital anti-alias filter mentioned above. The process is extendible, and for each doubling of the sampling rate, the audio-band quantisation noise is lowered by 3 dB. For example, the quantisation noise for a 4-times oversampling converter will be 6 dB lower (after filtering) than for the same converter operating without oversampling.

By the same principle, if we ran a 15 bit ADC at 4 x 44.1 kHz we would get the same audio-band performance as with a 16-bit device sampling at 44.1 kHz, since the increase of quantisation noise due to the poorer resolution is balanced by the SNR improvement brought about by oversampling. Again the process is extendible, and for each factor of four by which we increase the sample rate we can drop one bit of resolution off the converter. So if we oversample by a factor of 415 then theoretically we can drop 15 of the 16 bits, and use a one-bit converter. Alas, this implies a sample rate of about 50.1012 Hz, which is well into the infra-red. Noise-shaping can be used to reduce the sample rate required to a such a level that the use of very low resolution converters is practical (see the corresponding paragraph).

Oversampling gives similar benefits in the digital to analogue conversion process. For each factor of four by which the sample stream is oversampled, one bit may be dropped from each data word without degrading the audio-band performance. Again, to reduce the word-length to one bit in this simplistic manner requires the same impractical sample rate as above, but once again high-quality audio performance is achieved at practical sample rates with noise-shaping.

Dither

Quantisation error manifests itself as noise at high signal levels. However, quantisation errors become quite significant when a low-level signal approaches the level of the LSB, then the quantizing error actually becomes the signal, and therefore is an audible component of the output. Fortunately, in practical systems this adverse effect can be effectively eliminated through the use of dither.

Dither is the process of adding low-level analog noise d ( n ) to a signal x ( n ), to randomize the quantizer's small-signal behaviour. The quantisation error becomes decorrelated from the signal. This is because the value of the quantiser input relative to its output steps no longer depends solely upon the input x, but also upon an uncorrelated random process d.

The distribution of d is critical; it must effectively decorrelate the quantisation error from the input signal x, while adding a minimum of noise power to the output signal y.

A zero-mean rectangular distribution of peak amplitude q/2 is effective at decorrelating the expected error (i.e. the first moment of the error) and this adds excess noise power of approximately q2/12 to the output. This in conjunction with the noise contributed by the quantiser itself makes the total output noise power approximately q2/6.

However, this still is not adequate for high-quality audio as the output noise power (i.e. the second moment of the error) is still correlated with the signal. Adding a second uniform random variable removes this correlation. Now the total output noise power is q2/4 which is three times the original of q2/12. However we achieve more acceptable audio performance as the neither the expected error nor the error power depends upon the signal x.

We can go on adding more random numbers in this fashion, and the effect of each is to decorrelate the next statistical moment of the error signal from the input. However, each also adds more noise power to the output signal, and two is found in practice to be the most satisfactory solution for audio work. The sum of these two uniform random variables has a zero-mean triangular distribution with peak value of q, and is therefore commonly referred to as "LSB TPDF dither" (Least Significant Bit Triangular Probability Density Function).

The concept of dither might seem initially counterintuitive, but it is really quite simple. Dither relies on some special behaviour of the human ear. The ear can detect a signal masked by particularly broadband noise. In some cases, the ear can easily detect a midrange signal buried as much as 10 to 12 dB below the level of broadband noise. This explains why noise shaped dither can get better than 18 bits resolution out of 16 bit storage. See Lipshitz and Vanderkooy, "Resolution Below the Least Significant Bit in Digital Audio Systems with Dither" (JAES, vol. 32, no 3, 1987 Dec) for more details.

Links about dither: The secrets of dither. http://www.digido.com/ditheressay.html What is dither? http://www.mtsu.edu/~dsmitche/rim420/reading/rim420_Dither.html Publications about dither: http://audiolab.uwaterloo.ca/~rob/pub.html

Noise Shaping

This section originates from a publication of Christopher Hicks, called The application of dithering and noise-shaping to digital audio.

The goal of noise shaping is to improve the subjective performance of the conversion process by moving noise further out of the audio band.

Quantisation (after the addition of white-noise dither) ideally results in a white power spectrum - that is, the noise floor has constant noise power spectral density (NPSD). This is a direct result of rounding to the nearest value when performing the quantisation, with a random input signal.

However, if we base our decision of whether to round up or down not upon which is the nearer value but upon some other criterion, then we can make the output quantisation noise spectrum have almost any form we desire, but still have (roughly) the same total power. We are not going to hear noise above about 20 kHz, so we force as much of the quantisation noise as possible into the band above 20 kHz.

The Nyquist frequency for a 256 times oversampling converter is about 11.2 MHz, (compared with 50 THz calculated for the hypothetical 1-bit converter above) so we can put as much noise as we want into the band from 20 kHz to 5.6 MHz. By keeping the quantisation NPSD low enough in the audio band we are able to achieve an audio-band SNR of 90 dB or more; the NPSD above 20 kHz will be very much higher, but since there is no desired signal at those frequencies it really does not matter.

This is accomplished in practice by placing the quantiser in a feedback loop with a digital filter, such that the filtered quantisation error is subtracted from a subsequent input sample.

Consider the addition of a feedback loop to the dithered quantiser to give the following system.

Here, x is the 'noiseless' input sample, d is the TPDF random dither process and y is the quantized output sample. Block Q represents the quantiser as before, and h ( m ) is a discrete-time filter which, as we will see, affects the spectrum of the error in the output y .

The equations governing the behaviour of this system are:

u ( n ) = x ( n ) + ( y ( n ) - u ( n ) ) * h ( m )

y( n ) = Q( u ( n ) + d ( n ) )

in which `*' represents the convolution operator.

As they stand, these equations are difficult to analyse because of the non-linearity introduced by the quantisation function. However, if the random dither sample d ( n ) is drawn from a suitable distribution then the combined effect of adding d ( n ) and then quantizing is statistically equivalent to the addition of a different random variable e ( n ). We may therefore redraw the block diagram, and rewrite the system equations thus:

u ( n ) = x ( n ) - ( y ( n ) - u ( n ) ) * h ( m )

y ( n ) = u ( n ) + e ( n )

Now the system contains no non-linearities, and it therefore becomes useful to take Z-transforms, thereby converting the convolution operator to its equivalent multiplication.

U ( z ) = X ( z ) - ( Y ( z ) - U ( z )).H( z )

Y ( z ) = U ( z ) + E ( z )

and rearranging to eliminate U gives

Y ( z ) = X ( z ) + E ( z ).(1 - H ( z ))

Replacing z by ω ' = ej ω T (i.e. calculating spectra by evaluating the z-transform on the unit circle) gives

Y ( ω ') = X ( ω ') + E ( ω ').(1 - H ( ω '))

where ω is the frequency in radians per second, and T is the sample period.

So now we see that the spectrum of the input signal is unchanged at the output, but it has added to it a random process E whose spectrum has been modified by the function (1 - H ( ω ')), where H ( ω ') is the frequency response of the feedback filter in the noise shaper.

W.r.t. the statistics of E it is sensible to use dither with a triangular distribution and peak value of q (the quantiser step-size), for the same reasons as in the non-noise-shaped case. It was shown that this results in a white quantisation error with zero mean, and of approximate power q 2/4.

We have seen that the filter H affects the noise spectrum, and now the strategy is to design a filter such that the noise is moved to frequency bands where it does not matter.

A first glance at equation 1 suggests that H ( z ) = 1 would be perfect, eliminating all the noise, since then (1 - H ( z )) = 0. Alas this cannot be as it would result in a non-causal system; for causality, the filter must have a group delay of at least one sample period at all frequencies, which is equivalent to saying that we can have only negative powers of z in the filter transform.

So the simplest filter we can use is a single sample delay, whose transfer function is 1/ z ; it is interesting to calculate the resulting noise spectrum for this case, since to implement this filter requires very little computation - one addition, and one subtraction per sample.

We saw above that the output noise is given by

N ( z ) = E ( z ).(1 - H ( z ))

and that replacing z by ω ' gives the equivalent spectral expression

N ( ω ') = E ( ω ').(1 - H ( ω '))

Additionally we saw that E ( ω ') is constant (since E is a white process). Now we do the same again, with H ( z ) = 1/ z .

N ( z ) = E ( z ).(1 - 1/ z )

N ( ω ') = E ( ω ') . (1 - 1/ ω ')

The noise power gain is therefore given by | 1 - 1/ ω ' |2, and plugging in some numbers we can calculate the noise gain at a few spot frequencies (assuming 48kHz sampling for convenience):

Noise gain

f

ω

ω '

| 1 - 1/ ω ' |2

0

0

1

0 (- dB)

1/8 T (6kHz)

π /4 T

e-j π /4

0.5 (-3 dB)

1/6 T (8kHz)

π /3 T

e-j π /3

1 (0 dB)

1/4 T (12kHz)

π /2 T

e-j π /2

2 (+3 dB)

1/2 T (24kHz)

π T

-1

4 (+6 dB)

So immediately we have more noise at high frequency than at low frequency and, therefore, we have succeeded in our aims so far.

One more thing has to concern us before going any further; we should derive an expression for the total noise power gain of the noise- shaper, so that we can predict the total noise power at the system output. We do this by integrating the noise power gain over frequency thus:

 

 

 

where ω ' = ej ω T as before. If we have no noise shaper (i.e. H ( ω ')=0) then the integral is trivial and evaluates to unity. This therefore should be our target for non-zero H ( ω '). It can be easily shown that

 

 

 

so the simple noise-shaping filter we designed above increases the total noise power by 3dB.

Noise shaping can be applied within the A/D conversion process to get the high-resolution low frequency performance required for digital audio from a low-resolution high-frequency converter.

Employing a high oversampling factor (typically 64 to 512 times for audio) effectively means that we want the least quantisation noise near dc as that is where the desired signal is found. The simple noise-shaper designed above with noise transfer function (1 - 1/ z ) has this property, but is generally not effective enough.

Instead, we could design the filter such that the noise transfer function becomes (1 - 1/ z ) K where K is an integer equal to the order of the noise shaper required. Expanding with the binomial theorem enables the filter itself, H ( z ), to be calculated thus:

 

=

 

=

 

Now this is in the form (1 - H ( z )) and we can implement a suitable filter H ( z ) trivially, as an FIR filter with coefficients given by:

, for k = 1, 2, ..., K

Noise shaping is also used in a very similar manner in mastering processes such as Deutsche Gramaphon's 4D and Sony's Super Bit Mapping. If a 16-bit CD master is to be prepared from a master of higher resolution (for example 20 bits) then we can use noise-shaping to preserve a higher dynamic range where the ear is most sensitive, by forcing the quantisation noise associated with the word-length reduction into frequency bands where the ear is relatively insensitive.

Experimental evidence shows the ear to be most sensitive at around 3kHz, and to have a second, smaller sensitivity peak around 12kHz. If we arrange for the noise gain (1 - H ( ω ')) to have corresponding dips at these frequencies then there will appear to be less background hiss despite a small overall noise power gain due to the noise-shaper.

Typically one would use an FIR filter of order ten to fifteen for this type of audio work. The design of such a filter, whose shape cannot be described conveniently in mathematical terms, is, in general, impossible to solve analytically. Usually such a problem has to be tackled by a computer using an iterative numerical method. This is not a problem as it only has to be done once, at the design stage; after that, a list of a dozen numbers is stored for use in the noise-shaper.

It is important to note that once the word-length has been reduced in this way, subsequent operations in the digital domain will completely undo all the benefits of the noise-shaping. This is due to the inherent quantisation that exists in even a simple operation, such as application of gain in the digital domain, or the application of digital lossy compression methods. For this reason, all digital processing should be done to as high a resolution as is possible; the very last step in the mastering process is then the production of a sixteen bit master, using noise-shaping if desired.

The shaped noise floor is found to be subjectively quieter, though the total quantisation noise power is actually slightly higher.

Practical methods of Digital-to-Analog Conversion

The following text is derived from an article posted on the internet, written by Max Hauser.

There are several methods to convert a digital sample into an analog signal, some of which are explained below.

Multibit feedback noise shaping

The modulator properly predistorts (noise-shapes) the oversampled digital signal sent to a lower-resolution multibit DAC so that when properly analog-postfiltered its output will yield the full 16-bit resolution stored on the CD. This is the oldest scheme common in consumer products, widely popularized by the NV Philips SAA 7030 / TDA 1540 chip set (1983) with a 14-bit internal DAC and 4:1 oversampling yielding 16-bit final resolution.

Under ideal circumstances, a 16-bit converter would exactly convert all 16-bits of the sample data word in a linear fashion. Inaccuracy in the most significant bit (MSB) of the data word can result in an error of half the signal's amplitude. Multibit convertors are also plagued by gain error, slew-rate distortion, and zero-crossing distortion. All of these error and distortion types introduce severe harmonic distortion and group delay; thereby perturbing signal stability, imaging, and staging.

One-bit feedback noise shaping

Called "Bitstream" by Philips and "delta-sigma" or pulse-density modulation (PDM) by the research community.

The PDM converter is a true 1-bit technology, i.e. the internal DAC has only one bit of resolution. The density ratio of the sign of the pulses is related to the original 16-bit data word. This signal representation may not seem immediately obvious. A simple model using a room and light bulb helps illustrate what is happening. If a light is on, then the room is brightly lit; if the light switch is off, the room is dark. But if the switch is cycled rapidly on and off, an intermediate intensity can be created.

The 16-bit net D-to-A resolution is accomplished by the oversampling, noise-shaping and postfiltering process. The sample data from the decoder chip is first passed to a low-pass non-recursive 4-times oversampling FIR interpolation filter. This type of filter yields higher quality because it is phase-linear. First-order noise shaping is performed by the accumulator of the multiplier in the filter. The second filtering stage consists of a 32-times oversampling linear interpolator and a 2-times oversampling sample and hold circuit. At this stage, a 352 kHz digital dither signal at -20 dB is added to the sample signal. This reduces nonlinearities induced by quantisation noise. At this point, the total oversampling is 256-times and the data word has increased to 17-bits. The data is then fed at a frequency of 11.2896 MHz into the second order noise shaper. The noise shaper reduces the 17-bit data to a 1-bit stream by using Sigma-Delta modulation. In this process quantisation noise is redistributed away from the audio frequency by as much as 2 orders of magnitude. The bitstream is then converted to an analog form by a switched capacitor network. Because there are only two voltage references in the PDM converter, there is no level matching requirement for improved accuracy. Therefore the linearity errors associated with it are eliminated.

This approach requires a higher oversampling factor, such as 128 or 256, other things being equal.

Feed-forward or multi-stage noise shaping

Abbreviated "MASH" by Nippon Telephone and Telegraph.

The method is based on pulse-width modulation (PWM). In PWM the width of the signal pulse represents the unique data word, thus it is critical that the PWM steps have exact width and minimum jitter to maximize accuracy and linearity of the output.

A MASH converter is made of a 4-times oversampling digital filter, producing 18-bit data from a 16-bit input sample. It is followed by a series of small first- and second-order one-bit feedback noise shapers in parallel, each of which operates on a quantisation-error (residue) output from the previous stage. The noise shapers convert the 18-bit data into an 11-step quantized format for the PWM after 8-times oversampling. The output from the noise shapers is then fed into a PWM converter. The PWM system is operated at 768 times the original sampling frequency (33.868 MHz). Finally, the output is low-pass filtered.

"MASH" data converters are definitely not "one-bit" data converters in a meaningful sense, although they are commonly made up of one-bit subsections and this sometimes causes confusion. In practice the MASH converter can be considered a "3.5-bit" converter.

Comparison of different conversion methods

Each of these competing modulator topologies has technical strengths and weaknesses that are very involved and do not lend themselves to summary. The signal fidelity in each of them can be excellent but depends on different sets of circuit elements. It is all a matter of "second-order" electrical effects; if the components are all perfect (as they invariably are assumed to be, in popular explanations of this subject matter), then all the techniques work equally well. Audible differences are much more likely due to other design choices inside the DA convertor, like the quality of analog-digital ground isolation, or the choice of output-filter op amps etc.

Measurements of THD and linearity error for various 16-, 18-, 20-, and 1-bit converters yield interesting results. PWM and PDM converters show < ± 1 dB linearity for input signals from -100 to -80 dB and are virtually linear thereafter. Some of the most expensive players on the market with 18- and 20-bit converters using 4-, 8-, 16-, and even 32-times oversampling yield up to ± 4 dB linearity error for signals as high as -75 dB. In the THD tests performed with a -60 dB 1 kHz sine wave test signal, the expensive multi-bit players showed harmonics up to the 13th at levels greater that -110 dB. Only the PDM converter was able to hold all non-fundamental harmonics under -110 dB.



Copyright © 2001, Marc Heijligers and the DAC group - All rights reserved.