Crystal VST Instrument

Fun with Formants: Vocal Sounds with Crystal

In this tutorial we'll look at making vocal sounds with Crystal. Specifically, the sounds will be vowel sounds, so the goal is to make Crystal go "ooh" and "ah". One note before we get started: this effort is to get human-like sounds. Crystal is useful for making synthetic sounds that don't occur in nature. If instead you want truly human vocal sounds, you're probably better off with a sampler. Although, even with a sampler, realistic human voices are notoriously difficult to achieve. In this tutorial however, we're aiming to get synthetic sounds with a vowel-like character.

First a bit of background on that musical instrument in your mouth. The recognizable vowel sounds of the human voice are due to formants created by various cavities in your head. A formant is a set of narrow band pass filters. You create this set of filters in your head when making sounds by altering the size and shape of empty volumes such as the nasal cavity, mouth, and pharynx. By using your vocal chords as a sound source, and passing that sound through one of these formants, you get an interesting variety of sounds. The formants need not be complex: a set of 3 filters usually suffices to create a sound that we can recognize as an "ooh" or an "ah".

Crystal is well-suited for this kind of application since it has a bank of band pass filters. Where, you say, is this filter bank? It is the set of 4 delays. Each delay has a band pass filter. By simply routing audio through these delays (and setting the delay times to zero, so that we get no echoes), we get a bank of 4 filters.

To get started, just pick a simple sawtooth oscillator, set the filter bank to the appropriate frequencies for the desired formant, and route the audio from the voice through the filter bank. What are the appropriate frequencies for various formants? There are many places on the web where you can find tables of formant frequencies for various vowel sounds. Here's a table that you might find useful (from "The Talk Box and Formant Filtering" by Hans Mikelson):

Vowel "ee" "i" "e" "ae" "ah" "aw" "u^" "oo" "u" "er"

Male spoken
270 2290 3010

390 1990 2550

530 1840 2480

660 1720 2410

730 1090 2440

570 840 2410

440 1020 2240

300 870 2240

640 1190 2390

490 1350 1690

Male sung
300 1950 2750

375 1810 2500

530 1500 2500

620 1490 2250

700 1200 2600

610 1000 2600

400 720 2500

350 640 2550

500 1200 2675

400 1150 2500

Female spoken
310 2790 3310

430 2480 3070

610 2330 2990

860 2050 2850

850 1220 2810

590 920 2710

470 1160 2680

370 950 2670

760 1400 2780

500 1640 1960

Female sung
400 2250 3300

475 2100 3450

550 1750 3250

600 1650 3000

700 1300 3250

625 1240 3250

425 900 3375

400 800 3250

550 1300 3250

450 1350 3050

Child spoken
370 3200 3730

530 2730 3600

690 2610 3570

1010 2320 3320

1030 1370 3170

680 1060 3180

560 1410 3310

430 1170 3260

850 1590 3360

560 1820 2160

Amplitudes (db)
-4 -24 -28

-3 -23 -27

-2 -17 -24

-1 -12 -22

-1 -5 -28

0 -7 -34

-1 -12 -34

-3 -19 -43

-1 -10 -27

-5 -15 -20

Download the following bank file to get Crystal patches which demonstrate this technique:
Mac Download
Windows Download

If you look at the filter frequencies in the table, notice that these are relatively low frequencies and are fairly close to the fundamental frequencies around middle C. Since these filters are very narrow band pass filters, that means that the effective range will not be very wide. In other words, as you try out these patches, you'll have to hunt around on your keyboard to find a range where they sound good. The range may only be a few notes...not unlike the human voice (well, mine at least).

A couple things to note about how these patches were created: First, you want the filters to be very narrow band pass filters. You can make them especially narrow by turning up the Q value and by increasing the feedback (be careful about turning feedback parameters all the way up). Second, once you have the filters configured, route the voice to the filters and turn off the dry output of the voice. Third, adjust the relative volumes of the filter outputs to suit to taste.

The "oh" through "er" patches demonstrate a single formant, that is a sawtooth wave through a bank of 3 filters, with each filter frequency taken from the above table. That's nice, but Crystal is built for moving, responsive, interactive sounds, so let's make it go from ooh to ah.

What we want to do is make the filter frequences go from the values for "oo" to the values for "ah". This is a job for modulation, so go to the modulation matrix and set it up to modulate, or change, the filter frequencies. There are a number of different ways to do this with Crystal, but the "ooF-awF MW" patch does it like this: use 3 rows of the modulation matrix to control delay filters 1, 2, and 3. The low value for each modulation will correspond to the frequencies for oo and the hi value will correspond to ah.

To do this, simply choose modulation wheel as the "Source" for the first three rows of the modulation matrix. Set the targets for those three rows to "Delay 1 Filter Freq", "Delay 2 Filter Freq", and "Delay 3 Filter Freq". The mod wheel will now control the filter frequency values for those three delays.

Now, set the "Low" value for each mod matrix row to correspond to the frequencies for the male sung u^ (400, 720, 2500) and the highs to male sung ah (700, 1200, 2600). Now, when the mod wheel is all the way down, the formant will be male sung u^, and when all the way up will be male sung ah.

Go ahead and try the "oo-ah MW" patch. Hold down a note in the range on the keyboard where it sounds good, and move the mod wheel up and down. The sound will go from oo to ah!

Next, instead of using the mod wheel to modulate the filter frequencies, let's use a modulation envelope. That's what the "oo-ah ME" patch does. It starts out with oo, goes to ah, and returns to oo when you release the key.

Next, let's add a bit of chorus by using pulse width modulation. In other words, let an LFO modulate the pulse width of voice 1. That's what the "oo-ah ME PWM" patch does.

Finally, let's add a second voice harmonized a major 3rd above the original for a 2 voice harmony. Listen to the "oo - ah ME PWM 2V" patch to hear this.

Experiment with different oscillators as the sound source. Try crossfading two voices based on note on velocity (see the VelXFade preset for a velocity cross fade example). Instead of the mod wheel or mod envelope to modulate between oo and ah, try using an lfo. Try different amplitude envelopes for the 2 voices. Try...well, you get the idea :-).