Introduction to Audio/Sound Properties
Audio is ... weird. From one perspective, we can view it in relation to humans, as in: Sound is what we hear, and the information that gets to our brain, etc. From the other perspective, we can view it as a pressure wave that travels through some medium [water, air, etc.] (movement of atoms, molecules, etc.). Yet from another perspective, we can view sound as an electric signal that travels down a speaker wire or down our telephone.
We normally hear sound when it enters our ears and hits (and vibrates) our diaphragm (ear drum).
As far as this sound wave is concerned, it can travel through virtually anything. Anything that has molecules that vibrate can usually transfer sound. This includes things like air, water, etc. Usually the denser the material, the better it is at sound propagation (which is why sound travels so much farther in water). In the old days folks would bang on the rail in case of some emergency, which supposedly could've been heard many miles away.
The pressure wave that sound creates is a longitudinal wave (that is, it moves in direction of the wave). The `energy' of the wave moves in the same direction as the wave itself. Contrast that with the transverse wave (an example may be an electromagnetic or an ocean wave), where the undulations are perpendicular to the direction of the wave.
The speed of sound depends on what it is traveling through and on the temperature. In air, at sea level, at 20 degrees Celsius (68 Fahrenheit), sound speed is 348.8 meters per second (or 1128 feet per second).
The human ear can hear sound from about 20Hz to about 22khz, depending on the person's age and health. This is the range of audible frequencies. Some animals (dogs, bats, etc.) can hear higher frequencies (ultrasound).
Knowing the sound of speed, and the frequency, we can calculate sound wavelengths: For 22kHz, the period is around 1.56cm long, while at 20Hz, the period is 17.19 meters long.
The sound wave amplitude is perceived as `loudness'; the higher the wave the louder the sound. Note that just by plain common sense, our ears do not perceive wavelengths (our head is not 17 something meters long for that).
It is impractical (or rather inconvenient) to measure sound directly. The trouble is that the scale is very large. From the tiniest sound that we can hear, to the loudest possible sound there are tens orders of magnitude. For example, lets say we measure the quietest whisper as `1', then the loudest possible sound would be `1011' many times that.
So we generally avoid all that, and use a logarithmic scale. Basically we only care about the exponent. If the sound is 105, then we just use the 5. Often this is itself multiplied by 10 (or 20) to get something like `50' dB, or 50 decibels.
The decibel (dB) unit is defined as a logarithm of a ratio of two powers, which is then multiplied by 10. (if the 10 unit is not used, then it is known as `Bel', but nobody uses that).
The decibel system can be used to measure any radio of pretty much anything. For sound, we have to use acoustical power. The numerator is the power we're interested in, and denominator is the faintest possible sound (which is usually considered to be 10-6 microwatt, or 10-12 watt).
If a stereo generates 1 watt of acoustical power, then we can calculate the dB like so:
10 * log(106/10-6) = 10 * log(1012) = 10 * 12 = 120 dB.
106 is 1 watt (we're doing the calculation in microwatts).
Amm... 120 dB, as you can guess from the above discussion is just about at the limit of what we'd consider the loudest possible sound.
According to the datasheet for my Sony MDR-J10 headphones, they have a frequency response of 20-20000Hz (pretty much all a human ear can hear), with a sensitivity of 104 dB (These are usually sound/noise ratio). Another a bit more expensive set of headphones, Sennheiser MX400, comes with 18-20,000 Hz, and around 119 dB signal to noise ratio. You'd be surprised how badly some of the cheaper headphones perform.
There are many other sound related units. There is the dB SPL (sound pressure level); which is a more practical unit for measuring sound.
There is a unit that is more sensitive to sound at lower frequencies and less sensitive to high frequencies (kind of like the human ear), that's dBA (ANSI standard). There is also dBB, dBC. In electrical engineering domain, there are also dBm, dBm0, dBrn.
We will not cover these in this class.
Just like we can represent an image (picture) by a series of pixels, we can do the same thing to sound; more accurately, to the sound wave.
The sound is recorded into a microphone, which has a vibrating diaphragm (much like our ear), which generates an electric current (electric pulses) which exactly corresponds to the sound wave.
The sound is then sampled at distinct intervals in time, and these samples are stored. This is called digitizing the sound, and is sometimes performed by an analog-to-digital-converter (ADC) device.
After this step, the sound is just a series of numbers representing the acoustic wave. We can edit them, manipulate them, just as we can edit and manipulate images on a computer.
The primary purpose of course is usually to play the sound back. At this point, this stream of sound `samples' is converted back to an acoustical electromagnetic signal via a digital-to-analog converter (DAC). These signals are then fed into a speaker, which using this sound current vibrates the speaker diaphragm, which vibrates the air particles, which makes the sound be heard (the air particles in turn vibrate your ear diaphragms, etc.)
Notice the mirror nature between this recording the play back process. It turns out that the hardware that does the recording and playback is conceptually very similar. For example, your speaker can also double as a microphone. The primary job of a speaker is: given the current, make the air vibrate. The primary job of the microphone is given the vibrating air, turn that into current.
Sampling and Sampling Rate
Sampling appears to be trivial. The sound is a continuous signal, and you just have to take readings of that signal at discrete intervals. That's it.
It turns out that if the sampling rate is too low, then you might be loosing some of that signal data. (example in class). A rash solution is to sample the sound at a higher rate, which may inadvertently waste space. The trick is finding the balance between that the minimum, and what's an overkill.
If we sample the sound at a bit over Nyquist rate, then we don't loose data, and, we don't waste space by over sampling. The concept is actually very simple, we simply sample sound at twice the highest frequency.
In case of a human ear, the possible audible frequencies are from 16-20Hz, to around 20-22kHz. Thus, if we want to record everything the human ear can possibly hear, we need to be sampling at a bit over 44kHz, or something like 44100Hz, or 44.1kHz, which is what audio compact disks (CDs) are.
By sampling at this rate, we effectively remove higher frequencies. For human listeners that's not a problem (since we cannot hear higher beyond 22kHz anyway).
Some applications sample sound at lower rates. For example, telephone samples at 8kHz, which effectively means that any frequency higher than 4kHz is wiped out; which is precisely why on the phone we sometimes cannot tell the difference between certain letters (like "f" and "s"), and why we need to spell our names (s-as-in-Sam, and f-as-in-Frank), and the listener still manages to make mistakes when writing it down.
Another issue we have to care about when sampling sound is how big to make each sample. Basically, how many distinct values can a wave can take? Most sound is encoded at 8 or 16bit. Some (very rate) high end sound cards have 32 bit samples. The DV standard (Digital Video) allows for 8 or 12bit sound. Audio CD samples are 16bit.
Pulse Code Modulation (PCM)
Pulse code modulation is probably the most convenient way of storing sound samples. Basically this just means that we have an array of these sample values. That's it.
If our sample size is 8 bit, then this array is made of bytes. If sample size is 16 bit, then the array is of words.
For example, a 22kHz stereo sound, with 16bit sampling will need to be sampled at 44.1kHz or 44100 times per second. Each sample will be 16 bits, and we'll have 2 channels (one left, and one right---for stereo). Thus, we'll have 44100 * 16 * 2, or 1,411,200 bits/s, or (divide that by 8 to get bytes): 176,400 bytes/s.
So if we have 3 minutes of sound encoded via the above scheme, it will occupy 31,752,000 bytes or (divided by 1,048,576 by get MB) around 30MB. Sounds about right for ripped songs from CD to WAV files, heh?