|PA8W Amateur Radio|
Basic information on speech
Working SSB, all the
information we try to transfer to the other side is enclosed in
the spoken word.
The frequency spectrum:
The above graph shows how the energy in our natural speech is distributed over the frequency spectrum. You see lots of power needed for the lower frequencies up to 700Hz. This part of the spectrum is the most energyconsuming for your transmitter, and will easily tend to overdrive the power stage, as you can hear in the audio samples left.
In fact, if the above curve is fed unmodified to your transmitter, your transmitters limiter will be activated mainly by this lower region, pulling down the more important higher frequencies as well!
Or, when your limiter is not adjusted properly, the lower frequencies will be clipping quite happily, polluting your audio with artificial harmonics.
If we investigate which frequencies are the most important ones for intelligibility, we will find that higher frequencies show increasing importance!
A man's voice fundamental frequency normally is between 60 and 120Hz. A woman's voice is pitched about one octave higher, 120 to 240Hz.
The fundamental frequency and its first few harmonics barely hold any intelligibility at all. That's up to about 250Hz for a man's voice.
The part above 1000Hz contains most of the speech intelligibility, but there, the natural energy level drops around 10dB, and drops even more towards the 2800Hz mark. The 1000Hz-2800Hz region is of the utmost importance for the readability of speech in a SSB chain.
So, the low region could be considered useless?
Yes and no; cut off everything below 700Hz and your speech will be perfectly readable.
But, the lower region does add some flavour to the cake.
Without the lower region, speech sounds metallic, crispy, not natural, and really unpleasant, like these samples:
When we slightly reduce the low region however, speech will maintain enough body to sound natural and pleasant, and be very easy to copy, also over longer periods of time. The way to go is reduced (approx. -10dB) low region, and lifted high end, (approx. +6dB) for two good reasons:
1, It will spoil less transmitter energy and reduce the risk of overload.
2, It will emphasize speech intelligibility.
Overall, your intelligibility and your dynamics is served best if your audio contains no dominant frequencies and no notches. In other words, in average speech, all frequencies from 300Hz to 2800Hz should show about equal energy density, which is not very easy to judge by the untrained ear, but it can simply be tested with a spectrogram (see the other items in the audio menu).
Note that "energy uniformly and evenly distributed over the entire SSB bandwidth" in most cases will mean something entirely different than a flat frequency response of the audio chain. An honest, flat frequency response is something you need to transfer HiFi music. So that the bass guitar up to the triangle all find their place in the spectrum without any colouration.
(Achieving that at very high sound levels is what I do for a living...)
Hams, on the contrary, try to communicate via a lousy environment, so thatīs a totally different game.
What frequency response is needed for a uniform energy distribution is very much depending on your voice, your microphone technique, the mike itself, etc.
So there's no off-the-shelf solution, but that's where it really gets interesting isn't it?