Digital Music: How It Works

As part of our series on The Source of Good Music we've already touched several fields of "Digital Music". We've compared different storage formats for music information presented and compared, it fell to assess the quality of these data mainly two concepts again and again: sample rate and word length. But what exactly do they actually mean? And what happens in the translation of analog signals in the digital world? Even at the risk of literally go here in a bottomless pit, we will try to provide some basic answers, focusing on the widespread PCM method (Pulse Code Modulation).

The A/D Conversion

The starting point of our investigation is to serve the following scenario: imagine a song on a freshly squeezed record (yum). A piece of music we want to digitize, so it is in CD quality, so with a sampling rate of 44.1 kHz or 44100 Hz and a word length of 16 bits. Before we put this plan into action, a few quick items: The sampling rate can be called ‘the sampling frequency’, ‘sampling rate’ or ‘beat’, ‘the word width’ as bit depth or sampling depth. We’re going to need all of them…

At this stage the musical information, our analog piece encodes itself to the groove of the record, we also speak of a time and value continuim signal. So for the duration of the record, even if one considers it only through milliseconds, microseconds or nanoseconds, it’s always that a portion of the groove representing which is represented by the sound of that just considered musical moment.

The same principle applies for sound waves traveling through the air or the offset-voltage of having a loudspeaker diaphram to vibrate into. It also is time and value continuum signals. An obvious fact is that interruptions in the groove result in unpleasant noises and drop outs of the music information which can virtually render it as incomplete. Nor would one expect a sound wave to have a piece missing, this obvious equation now results in a broken in the analog-to-digital conversion.

So we are now transferring our (of course undamaged) record in the digital dimension. Here’s what happens when the A/D conversion: a constant cycle is already pre-defined in which the analog signal is analyzed by our record again and again. We decide that this will happen exactly 44,100 times per second and so our sampling frequency is 44.1 kHz. The information is completed on each time-point analysis by the additional change in momentum reflected in the amplitude of our stored analog signal. This process is called ‘quantization’ and to realize it, we define a signal adapted to the range of values (more on that later), which includes a limited number of unique states. The need for the uniqueness is derived from the binary coding (0 or 1), which in turn has the consequence that the number of possible states in the bit (binary digit) is quantified. The general formula is used to calculate the possible states:

Number of possible values/states = 2 ^ n (n is the number of bits)

We choose a word length of 16 bits and couple it with the formula after 2^ 16 = 65,536 different states to the dynamics of our song map. The some what artificial sounding term “word length” was derived from the English. There, a data size of 16 bits is referred to as a “word”.

During each step of analysis, the actual spectral and dynamic events will be approximately transferred to our grid of clock and bit depth. As a result, we finally obtain a digital representative of our original signal present now no more time and value continuous, but time and value discrete signal.

It is possible to generate in theory by an infinitely high sampling rate and an infinite word size, a quasi-perfect image. In practice due to technological limitations arise during the analog-to-digital conversion, however, both in terms of the temporal level, and is always in the quantization error and inaccuracies. Some of these phenomena will be discussed below and stay here for the time being fixed, that basically the digital signal consists only of excerpts of the analog signal representing the original state only approximately, while the rest is lost forever.

Sampling rate and cut-off frequency

In view of the last sentence, the following questions arise almost on: Can or will you look at the digital signal listen at all and what operates to the effort, but if the result is a flawed and incomplete signal? The last aspect to be answered somewhat bluntly, pointing out that without the digitization of audio instead of the iPod, the whole record collection takes the bus here. Storage efficiency is an important word here. And while our records is likely to get to feel the ravages of time in 100 years, our digital recording still sounds as fresh as the first day. More seriously: Digital storage is a whole topic in itself and will find at this point no further attention.

So instead we turn to the quality of our digital signal. As already noted, it is in reality is anything but a perfect image. And yet can obviously bring with limited possibilities of the impression that they had supposedly dealing with the same signal. This one uses the fact that the human perceptual apparatus is subject to certain restrictions and can be wonderfully deceive beyond. At sufficiently fast scan and correspondingly large word width then we do not hear about a mutilated little, but our song of the analogue record, in our case, at least in CD quality.

Here, have the clock speed of 44.1 kHz and the sampling depth of 16 bits, which could prevail with the introduction of the CD as a quasi-standard, not given as random. Considered in the context of the then technological development, it involves a compromise between quality and a reasonable amount of data generated during the digitization. Again, we are particularly interested the qualitative aspect: the so-called Nyquist-Shannon sampling theorem (also WKS sampling theorem) states that at least twice for a (theoretically) exact digital imaging and the subsequent complete retranslation of our (theoretically) perfect digital signal, the sampling rate must be as high as the highest occurring in the signal frequency. We share our 44.1 kHz so through just about two and obtain an upper limit frequency of 22.5 kHz. It describes the highest frequency that may occur during the A/D conversion, taking into account the sampling rate selected by us, in our song. This is followed by the question of what happens with frequencies above this limit, the so may well occur in reality, and possibly also on our record?

Error and filter

The answer is simple: they are filtered using filters before conversion as much as possible. The problem is that the clock is too slow for information above the cutoff frequency. Consequently, although frequencies will continue to collect data on the occurrence of such faster, but these do not correspond to the actually existing musical material. The consequences are irreparable aliasing errors, which become noticeable in the form of noise and affect the quality of our digital version. Optimal course would be such an anti-aliasing filter, the set at the cut off frequency a clean cut, so that no more aliasing occurs. Unfortunately, this is in reality not possible, so that the damping of the high frequencies already well below the cutoff frequency of 22.5 kHz attaches, takes place over a wider frequency range and can not be still always avoid any aliasing errors. The buffer that is located between the conscious perceptual threshold of human hearing (in his early years at best to 20 kHz) and the cut-off frequency at a digitizing CD quality, and does not come by accident. An impairment of the audible frequency spectrum by the anti-aliasing filter should be prevented wherever possible.

That is not already in the introduction of digital formats – specifically the CD – has been set to a higher resolution, as already stated due to the former boundaries of digital storage and the technical feasibility with respect to the converter architecture. Meanwhile, much more precise transducer feasible and larger amounts of data can be stored, so that the problem can not be avoided as before, but still leaves much push out of the audible range. For example, a doubled sampling rate (88.2 kHz), resulting in an upper limit frequency of 44.1 kHz. The requirements for the anti-aliasing filter under these circumstances of course considerably lower. If such high frequencies in our signal to be converted at all happen, the filter is still working at most in the field of psycho-acoustic perception. The threshold of human hearing is no longer influenced by the anti-aliasing filter.

The problem of aliasing and the use of anti-aliasing filters is a prime example of potential sources of error which occur in the transformation processes and against which it is appropriate to take appropriate action. We want to yet See the sake of completeness, some more critical points, without going too much in depth. First there is the jitter effect, which is related also with the temporal dimension. For correct conversion, it is necessary that each sampling a clearly identifiable Quantisierungszustand can be assigned. If it comes to small fluctuations in our clock, is suddenly not quite clear at what time which signal information now belongs. Incorrect frequency components are the unsightly result here. To ensure a consistent stroke possible, converters are equipped with so-called anti-jitter circuits.

Another problem depends not only on the timing with the quantization together. As we have already noted above, at each sampling the amplitude and thus the information about the dynamics of our signal is only approximately written into our bit raster. The inevitable rounding errors manifest themselves as quantization noise, which is similar to the acoustic impression of white noise. While it is not seen at high levels of the music signal usually distracting, the quantization noise can be heard quite muddy the sound impression in quiet passages. The common approach is that first the spectral composition of the noise is changed by a process called dithering. Then provide so-called noise-shaping algorithms that the noise is shifted to a higher frequency range for which the human ear is less sensitive.


We remain finally a moment the issue of quantization. In addition to the quantization noise, the fact that we can according to the sampling depth only show a limited number of possible states of course, means that the total displayable dynamic range of our digital signal is limited. Theoretically this is approximately 6 dB per bit. Our 16 bit therefore correspond to a dynamic range or signal-to-noise ratio of about 96 dB. The lower end is determined by the background noise that is composed inter alia of the quantization noise. When the level of the music signal amongst the noise level is this masked completely by the noise. The upper limit is determined by the maximum possible modulation without distortion. The actual usable dynamic range can be affected, for example by a high noise level.

The level cap on it is important to note that the analog signal never exceeds this in the conversion. Otherwise there is overdriving, which are also known as distortion or clipping. While the analog counterpart can contribute, for example, by the use of vacuum tube circuits or tape machines in moderation for the specific timbre, sounds like digital clipping unfortunately immediately extremely harsh and should be avoided in any case. Along with prudent modulation of the signal to be converted, or the use of a limiter that limits dynmaisch the signal prior to conversion, there is of course the possibility to have more bits to provide. We recall the formula and find that at a resolution of 24 bit or 144 states already 16,777,216 dB theoretical dynamic range are available. This measure also reduces the round-off errors and thus has a positive effect on the occurrence of quantization noise out. In conjunction with the double sampling rate of 88.2 kHz would result as a digital signal that is clearly superior to our file at 44.1 kHz and 16 bit.


We have reached the end of our trip in the world of analog-to-digital conversion. Although we have covered a number of important aspects, it is entirely possible even wider and deeper into the matter. For a basic overview we want to confine ourselves at first with what has been said but. As has already been indicated, it is next to the PCM method or other means to digitize music. There is for example a method called Direct Stream Digital (DSD), their advantages and disadvantages are discussed intensively over again especially in the recent past. We devote a separate article to the topic and will pick up again there certainly content of this article.

Leave a Reply

Your email address will not be published. Required fields are marked *

Further Articles
The Television vs. The Projector?
Room Acoustics