Fun with Formants: Vocal Sounds with Crystal

In this tutorial we'll look at making vocal sounds with Crystal. Specifically, the sounds will be vowel sounds, so the goal is to make Crystal go "ooh" and "ah". One note before we get started: this effort is to get human-like sounds. Crystal is useful for making synthetic sounds that don't occur in nature. If instead you want truly human vocal sounds, you're probably better off with a sampler. Although, even with a sampler, realistic human voices are notoriously difficult to achieve. In this tutorial however, we're aiming to get synthetic sounds with a vowel-like character.

First a bit of background on that musical instrument in your mouth. The recognizable vowel sounds of the human voice are due to formants created by various cavities in your head. A formant is a set of narrow band pass filters. You create this set of filters in your head when making sounds by altering the size and shape of empty volumes such as the nasal cavity, mouth, and pharynx. By using your vocal chords as a sound source, and passing that sound through one of these formants, you get an interesting variety of sounds. The formants need not be complex: a set of 3 filters usually suffices to create a sound that we can recognize as an "ooh" or an "ah".

Crystal is well-suited for this kind of application since it has a bank of band pass filters. Where, you say, is this filter bank? It is the set of 4 delays. Each delay has a band pass filter. By simply routing audio through these delays (and setting the delay times to zero, so that we get no echoes), we get a bank of 4 filters.

To get started, just pick a simple sawtooth oscillator, set the filter bank to the appropriate frequencies for the desired formant, and route the audio from the voice through the filter bank. What are the appropriate frequencies for various formants? There are many places on the web where you can find tables of formant frequencies for various vowel sounds. Here's a table that you might find useful (from "The Talk Box and Formant Filtering" by Hans Mikelson):

Vowel "ee" "i" "e" "ae" "ah" "aw" "u^" "oo" "u" "er"
Male spoken
270
2290
3010
390
1990
2550
530
1840
2480
660
1720
2410
730
1090
2440
570
840
2410
440
1020
2240
300
870
2240
640
1190
2390
490
1350
1690
Male sung
300
1950
2750
375
1810
2500
530
1500
2500
620
1490
2250
700
1200
2600
610
1000
2600
400
720
2500
350
640
2550
500
1200
2675
400
1150
2500
Female spoken
310
2790
3310
430
2480
3070
610
2330
2990
860
2050
2850
850
1220
2810
590
920
2710
470
1160
2680
370
950
2670
760
1400
2780
500
1640
1960
Female sung
400
2250
3300
475
2100
3450
550
1750
3250
600
1650
3000
700
1300
3250
625
1240
3250
425
900
3375
400
800
3250
550
1300
3250
450
1350
3050
Child spoken
370
3200
3730
530
2730
3600
690
2610
3570
1010
2320
3320
1030
1370
3170
680
1060
3180
560
1410
3310
430
1170
3260
850
1590
3360
560
1820
2160
Amplitudes (db)
-4
-24
-28
-3
-23
-27
-2
-17
-24
-1
-12
-22
-1
-5
-28
0
-7
-34
-1
-12
-34
-3
-19
-43
-1
-10
-27
-5
-15
-20
Download the following bank file to get Crystal patches which demonstrate this technique:
Mac Download
Windows Download

If you look at the filter frequencies in the table, notice that these are relatively low frequencies and are fairly close to the fundamental frequencies around middle C. Since these filters are very narrow band pass filters, that means that the effective range will not be very wide. In other words, as you try out these patches, you'll have to hunt around on your keyboard to find a range where they sound good. The range may only be a few notes...not unlike the human voice (well, mine at least).

A couple things to note about how these patches were created: First, you want the filters to be very narrow band pass filters. You can make them especially narrow by turning up the Q value and by increasing the feedback (be careful about turning feedback parameters all the way up). Second, once you have the filters configured, route the voice to the filters and turn off the dry output of the voice. Third, adjust the relative volumes of the filter outputs to suit to taste.

The "oh" through "er" patches demonstrate a single formant, that is a sawtooth wave through a bank of 3 filters, with each filter frequency taken from the above table. That's nice, but Crystal is built for moving, responsive, interactive sounds, so let's make it go from ooh to ah.

What we want to do is make the filter frequences go from the values for "oo" to the values for "ah". This is a job for modulation, so go to the modulation matrix and set it up to modulate, or change, the filter frequencies. There are a number of different ways to do this with Crystal, but the "ooF-awF MW" patch does it like this: use 3 rows of the modulation matrix to control delay filters 1, 2, and 3. The low value for each modulation will correspond to the frequencies for oo and the hi value will correspond to ah.

To do this, simply choose modulation wheel as the "Source" for the first three rows of the modulation matrix. Set the targets for those three rows to "Delay 1 Filter Freq", "Delay 2 Filter Freq", and "Delay 3 Filter Freq". The mod wheel will now control the filter frequency values for those three delays.

Now, set the "Low" value for each mod matrix row to correspond to the frequencies for the male sung u^ (400, 720, 2500) and the highs to male sung ah (700, 1200, 2600). Now, when the mod wheel is all the way down, the formant will be male sung u^, and when all the way up will be male sung ah.

Go ahead and try the "oo-ah MW" patch. Hold down a note in the range on the keyboard where it sounds good, and move the mod wheel up and down. The sound will go from oo to ah!

Next, instead of using the mod wheel to modulate the filter frequencies, let's use a modulation envelope. That's what the "oo-ah ME" patch does. It starts out with oo, goes to ah, and returns to oo when you release the key.

Next, let's add a bit of chorus by using pulse width modulation. In other words, let an LFO modulate the pulse width of voice 1. That's what the "oo-ah ME PWM" patch does.

Finally, let's add a second voice harmonized a major 3rd above the original for a 2 voice harmony. Listen to the "oo - ah ME PWM 2V" patch to hear this.

Experiment with different oscillators as the sound source. Try crossfading two voices based on note on velocity (see the VelXFade preset for a velocity cross fade example). Instead of the mod wheel or mod envelope to modulate between oo and ah, try using an lfo. Try different amplitude envelopes for the 2 voices. Try...well, you get the idea :-).