Kohonen networks application in speech analysis algorithms

– This article presents the Kohonen network application in the speech analysis. The Authors have modiﬁed the traditional Kohonen network learning process like weights initialization, neurons reduction and neurons sorting. The results will be presented using authors’ program – “WaveBlaster”.


Introduction
The Kohonen network (or "self-organizing map", or SOM, for short) has been developed by Teuvo Kohonen. The basic idea behind the Kohonen network is to setup a structure of interconnected processing units ("neurons") which compete for the signal. While the structure of the map may be quite arbitrary, we are using rectangular maps.
On Fig. 1 we have: • SizeX * SizeY = K -number of Kohonen network's neurons, • n -dimension of input vectors, • each input vector X has n elements X = {x 1 , x 2 , . . . , x n }, • each neuron y i has exactly n connections, each one connected to consecutive elements x i of X, • each element x i is connected to all K neurons, so we have Kxn connections. Every connection is represented by it's weight w ij , i = 1 . . . n, j = 1 . . . K.
As we can see, each node (connection) of the map is defined by a vector W j = w ij whose elements are adjusted during the training.
The basic training algorithm is quite simple:

14
Kohonen networks application in speech analysis algorithms (1) Select the input vector from the training set -we could take consecutive vectors, or take them randomly. (2) Find the neuron which is closest to the given input vector (i.e. the distance between W j and X is a minimum). The metric can be arbitrary, usually Euklides', where the distance between the input vector and i-th neuron is defined: (3) Adjust the weight vectors of the closest node and the nodes around it in a way that the W j move towards the training data: (4) Repeat from step 1 for a fixed number of repetitions.
As a result of such learning, we can say that, in a Kohonen map, neurons located physically next to each other will correspond to classes of input vectors that are likewise next to each other (Fig. 2). Therefore such regions are called maps.

Speech analysis
The most important for our research is influences recognition. This process can divided into several steps.
First we need to change input speech signals into some set of parameters -for that purpose we usually use fallowing algorithms: Fourier Transform, FFT with octave filters, Linear Prediction, Continuous Wavelet Transform. Most of those transforms produce 2D results which are usually not suitable for us. We often use Neural Networks, like perceptron, to which we pass results of those transforms, therefore we need 1D data. And that is where we use Kohonen network. We pass the 2D transform's result into Kohonen network and we take only the winning neuron for each input vector. For input data we obtain only 1D winning neuron contour. Then we cut such a contour on frames, for instance 800ms, and pass to other algorithms (like perceptron).

Kohonen learning process -Our approach
We added the fallowing modifications to Kohonen network learning process: (1) Initialization.
After initializing the neurons using standard way: random values for the range <0,1> or <-1,1> with normal or Gauss distribution we sort them diagonally by theirs energy: Neurons which weights have less energy goes to top-left corner, and the ones with more energy to bottom-right corner.
It often happens that winning neurons of similar input vectors are spread all over the network, while we would like to have one, consistence map. We have observed during our research that this initialization considerably reduces this unwilling effect. (2) Neurons reduction (post-learning).
This step is applied after the networked is learned using the standard algorithm described in Introduction. The purpose is to reduce each map (which contains similar neurons) to only one neuron within a map. We do the following: • Find two closest neurons k A , k B (the distance between neurons weights is measured using Euklides' metric -equation 1). • If the distance is less then some threshold (algorithm's parameter), fill weights of one of the neurons with zeros. • Repeat steps 1. and 2. until exists pair of neurons closer then the threshold. The result of the reducing procedure is presented on Fig. 4., Fig. 7 and Fig. 8. As we can see, such a result is much clearer and therefore more useful than raw, unmodified result (see Fig. 6).
Sometimes (it depends on what we want to recognize), at the end of the whole process, we sort the network's neurons. It sometimes increases the recognition ratio. The sorting procedure can be the same as in step 1. 'Initialization' but we also use another method. Instead of sorting neurons by energy, we sort them by 'similiarity'.
• Into top-left corner we place a neuron (k 1 ) with the least energy.
• Then we find the closest neuron (k 2 ) to k 1 (accordingly to equation 1.) • Then we find the closest neuron (k 3 ) to k 2 and so on. The exemplary results are presented on Fig. 5 -8.   Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 28/08/2022 02:44:39 U M C S All our research and all conclusions presented in this article were possible to make thanks to Our program -'WaveBlaster' and it's quite developed set o parameters (Fig. 9). The presented way of post-learning is quite unique, therefore we had to create our own tool for this purpose.

Summary
Kohonen network is a very powerful tool which can be applied to various problems, also to speech analysis. We find it very useful in algorithms for disorders recognition in speech, especially in reducing dimension of data passed into perceptrons (from 2D to 1D). We additionally improved our results by applying changes described in this article to Kohonen network learning procedure Thou, the described modifications of learning process were made in terms of increasing influence recognition ratio, we think that their nature is so general that they can be also applied to different problems.