Formant paths tracking using Linear Prediction based methods

– This paper focuses on formants as basic parameters for vowels recognition. There are used two different algorithms for formants ﬁnding based on the LP algorithm: spectral peak picking and root extraction algorithm - obtaining very good path estimations by each algorithm. Those methods are compared in a graphical form in our application ’WaveBlaster’.


Introduction
A sound is nothing but a complicated wave varying in time which is converted by our ears and brain causing a sound perception. Such a human speech sound wave can be divided into homogeneous chunks -phonemes, which are indivisible components of utterance (like letters in writing). Each phoneme has its own characteristic wave -that is why we can distinguish one phoneme from another. If we could implement a method which recognizes phonems, then we could, for instance , recognize speech -we only have to find a good method for wave decomposition.
The most popular method is to find all frequencies that the phonem consists of. To do so, we have to divide the wave into small frames (for instance 46ms) and compute a spectrum of each frame (we use FFT or LP for this purpose). Unfortunately, such a spectrum consists of too much data (frequencies), therefore we have to select only the most significant piece of information from it. One of such significant information are 'formants' -local extrema of frequencies in the spectrum.
In Fig. 1  are irrelevant. It is very hard to calculate formants from the Fourier analysis -due to their variations. It is far more easier to obtain them from the Linear Prediction analysis -because the spectrogram is smoother.

Computational procedure
In the LP analysis there is used the Levinson-Durbin algorithm which calculates the prediction parameters α i and gains the coefficient G from the samples x(t) of each input signal window. Then, using the transfer function of the form (1): where α -the prediction coefficients, G -the gain parameter. We obtain a spectrum (frequency characteristic) by using the arguments of the form e ωj , which gives: where j -the imaginary unit, F -the sampling frequency, f -the interesting frequency. A graph created from all spectra (from each window) -is called a spectrogram ( Fig. 2 and 3). 3 Spectral peak picking The most obvious and simple method for formant finding is the spectral peak picking algorithm. Every local extremum in the spectrogram is treated as a formant. Therefore accuracy of spectrogram computation is very important. We have several parameters which effect the algorithm run: window width and window overlap, time window, pre-emphasis and the most important -prediction order. The window width was set to 46ms with 25% overlap -we decided that the frequency precision of such a window is sufficient. We also used a pre-emphasis filter: where s -the input signal, x -the output signal, α -the pre-emphasis coefficient. With the factor α = 15 16 to eliminate F0 formant (it corresponds to the fundamental frequency which is not desired in our case). Choosing a proper prediction order is quite difficult -the bigger it is, the more detailed spectrum we get. If it is too small -we obtain too small formants, because a spectrum is rather flat with not many extrema, and if it is too large -the spectrum is too detailed and we find formants in the areas where none are supposed to be. Therefore the prediction order should be proportional to the sampling frequency of input signal 1 -we used the formula: where p -the prediction order, FS -the sampling rate, C -the constant value from In Fig. 4 (on the right) we can see four formants on the selected spectrogram of vowel 'a'. In the spectrogram (Fig. 4, on the left) the second formant F2 disappears from time to timethe reason is too small prediction order. If we increase the prediction order we will get more and more detailed spectrograms -sometimes too detailed.

Root extraction algorithm
Another algorithm taken into consideration was based on the poles extraction of transfer function H(z). To calculate poles (i.e. roots of denominator of H(z) -equation (1)) we have to find complex roots c k of a complex polynomial: Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 15/07/2023 16:28:44 U M C S Fig. 5. Spectrogram of the same utterance "papa" (sampling rate 22kHz) but the prediction order is equal to 26 (on the left) and 30 (on the right). More detailed formant trackscan be seen.
Therefore we solve the equation: Then we have to choose a numerical algorithm to solve equation (5) using the Laguerre's, Muller's or Eigenvalue method -we have chosen the Laguerre's one. All these methods find the first root c 1 and then divide the polynomial A(z) by monomial (1 − c 1 ) obtaining polynomial A 1 (z) with a lower degree. Then we repeat the procedure on A 1 (z) until the degree is not zero. Every polynomial division and root finding result in round-off errors which accumulate due to recursive nature of the procedure accumulate. As a result, after finding all roots, we have to 'polish' them -for this purpose we use the same Laguerre method (where unpolished roots are starting points of computations). As formant candidates we take only those roots (poles) which meet the condition: imag(c k ) > 0 -we take only complex roots, one from each conjugate root pair (1 − c k z −1 )(1 − c * k z −1 ). After computing magnitude and phase of each root (by converting c k = (a + ib) into c k = re jϕ ) we obtain formant's frequency F and 3db formant bandwidth B (in Hz units) using the following equations: where f s -the input signal sampling frequency, ϕ -the root phase, r -the root magnitude.

The application
We created a module in our application 'WaveBlaster', for the graphical representation of formant paths. In the application we can choose which method we want to use to track formants. We can also pick both of them -then on the graph we will see a path in three

Comparison
In our opinion, there are no major differences between the described algorithms. Partly it is due to the fact that both methods are based on the same LP transfer function model. The root extracting is slightly more accurate than peak picking because of more theoretical approach and it is a little bit less sensitive to the prediction order. Moreover, the prediction order can be a little smaller than in peak picking to obtain comparable results.
The figures below and above (Fig. 6, 7) show the exemplary comparison of these two methods. Fig. 7. Spectrograms of the "papa" p=22. On the left the peak picking formant tracks, on the right the root extraction formant tracks.

Summary
As can be seen (Fig. 6), both methods give satisfactory results. To make complete use of their advantages, in further analysis, we have to combine the formants into the paths. Then, by Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 15/07/2023 16:28:44 U M C S filtering those paths by a low-pass filter, we obtain compact and smooth lines, which are more useful in computations. Then, it is only one step to recognize the voiced phonems like vowels. As one vowel determines one syllable, we can count a number of syllables in an utterance.
Using this approach, as the next step, we will attempt writing a program for determining the utterance rate, whose unit is a syllable per minute. Such a measure is very helpful, for instance, in estimating frequency of disorder occurrence in the utterance, as the utterance length is as important as speech rate in this issue.