Implementation of perceptual measure Picture Quality Scale with neural network to evaluate distortions in compressed images

The perceptual measures are often used to assess distortions in image compression. In this article diﬀerent images were evaluated using the Picture Quality Scale (PQS) measure with neural network. On the basis of original and compressed images the local distortions in the compressed image are calculated. Then ﬁve factors { F 1 , F 2 , F 3 , F 4 , F 5 } , which represent these distortions are computed, and used to evaluate correlations among them by the covariance matrix. The new values are put to the input of neural network, to calculate the single PQS value. During the process of learning the neural network the best value PQS, which reﬂects the largest degree of particular distortions in the compressed images is obtained. The images are divided into three groups: faces, landscapes and shapes. The process of learning is controlled by the subjective measure Mean Opinion Score (MOS) with 15 observers.


Introduction
Nowadays not only differences between the original and compressed images such as MSE or PSNR are computed but also the visual differences between those images are taken into the account. One of the perceptual measures is Picture Quality Scale (PQS). It is used to assess the visual distortions in compressed images on the basis of Mean Opinion Score (MOS) measure. The usage of PQS measure together with neural network enables easy assessment of different distortions without observers. In this article the assessment of compressed images with the Daubechies wavelet of degrees: 4,6,8,14 and 20 is presented. The article is divided into three parts. Firstly, the PQS measure is described. Secondly, the scheme of neural network is presented. Finally, the research as well as the process of learning of neural network and the obtained results are shown.

Picture Quality Scale Measure
The Picture Quality Scale (PQS) [1,2] is an objective measure that is used to evaluate perceptual distortions in the compressed image using the original one. The PQS schema is presented in Fig. 1. It involves the Human Visual System features and subjective measure Mean Opinion Score (MOS) [1,2]. In the PQS measure five different local distortion maps f 1 , f 2 , f 3 , f 4 , f 5 in the compressed image are calculated. They are used to the gain distortion vector This vector is necessary to compute the covariance matrix and then both the eigenvalues and eigenvector of values are calculated. The distortion parameters Z = {Z 1 , Z 2 , Z 3 , Z 4 , Z 5 } are acquired from the eigenvector and distortion vector F . To point out the PQS value two images are necessary: the original one x 0 and the compressed one x r . The matrix f 1 (1) [1,2] represents the difference between the original and reconstructed images convolved with frequency weighting W T V [1]. (1)

53
The matrix f 2 (2) [1,2] reflects the distortions between the original and the compressed images which are adapted by the Werner Fechner law [1] and the spatial domain counterpart of the frequency response S a (ω) [1]. Additionally, this distortion cuts the values which are below thresholds T by an indicator function I T [1,2].
The matrix f 3 (4) [1], [2] illustrates the differences in linear elements of images, especially the end of block disturbances [1,2]. This matrix is made of two components that is the error in vertical and horizontal block discontinuities. The equations for the vertical matrix are analogical to those for the horizontal matrix (4) and (5) [1,2]. The matrix e w represents the convolution between the matrix e (3) and the modelled contrast sensitivity function (S a ) [1,2].
In the compressed images the correlation distortions can be often seen. They can be present in texture, image features and blocking effects [1,2]. The local distortion map f 4 (6) [1,2] is evaluated with the use of local spatial correlations.
The parameter W represents the number of elements in the window of five by five items. Additionally |l| ≤ 2 and |k| ≤ 2 must be fulfilled [1]. The last distortion that is taken into account in the PQS measure is the vicinity of high contrast image transitions [1,2]. The two Human Visual System features are used in that item that are visual masking and enhanced visibility of misalignments. The f 5 distortion is defined as [1,2]: The I M is an indicator function that selects only the items close to high intensity transitions. The S h and S v stand for the horizontal and vertical masking factors The distortion factors are based on the local distortion maps (from f 1 to f 5 ).
The first factor is the sum of all items from the matrix f 1 divided by the sum of squared items from the original image (8) [1,2].
The second distortion is computed as the sum of all matrix f 2 elements divided by the sum of squared values of compressed image (9) [1,2].
The third factor is the square of two elements: the block error discontinuities at vertical F 3v and horizontal F 3h block boundaries (10) [1,2].
The parameters N h and N v stand for the number of items pointed out by an indicator function. The fourth factor is the sum of items of matrix f 4 divided by the size of images, where N represents the number of columns and M -the number of rows (11) [1,2].
The last factor is the sum of all elements of matrix f 5 divided by the parameter that represents the number of points, whose values after the Kirsch operator, are greater or equal to 400 (12) [1,2].
Pobrane z czasopisma Annales AI- (13) The parameter f i represents the average value of matrix f i . On the basis of the covariance matrix the eigenvalues and eigenvector are computed [1]. The vector Z is calculated as the multiplication of covariance matrix by the factor values. The PQS value should be similar to the Mean Opinion Score (MOS) measure [1,2]. This measure is based on the assessments of visual distortions in images by the group of observers (14) where M represents the number of observers and A(i, k) stands for the assessment of i th observers of k th image. The observers are obliged to adjust the value to the distortions that they notice. Possible assessment values are shown in

Neural network
In order to make the implementation of Picture Quality Scale without the observers easier, the feed-forward neural network [4] was adapted. The network consists of three layers: input layer, hidden layer and output layer. The first layer is made of five neurons all of which have five inputs and one output. The second layer is made of 9 neurons all of which possess five inputs and one output. The last one has got only one neuron with nine inputs and one output that generates the output of the whole network. This output is the PQS value. The scheme of the used network is presented in Fig. 2. The training function implemented in this network to update the weight and bias the values is the Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 02/08/2023 19:00:48 U M C S Levenberg-Marquardt algorithm [5]. The neurons in the first and second layers have the hyperbolic tangent transfer function and the final neuron possesses the linear one so that the output of neural network does not have to be scaled.

The research
The Z vectors were computed for the squared colour images of the size 512 by 512 pixels. The evaluated images were compressed with the wavelet transform Daubechies for different degrees: four, six, eight, fourteen and twenty [6,7]. EZW (Embedded Zerotree Wavelet) [8,9] was used to compress images. The compression rates were from 2 to 20. There was a conversion from thr RGB space color to luminance and chrominance spaces. For each space the vector Z is calculated. For the output Z vector the average is taken. These Z vectors are the input of neural network which calculates the PQS value. The images are assembled into three groups i.e. face, scenery and shape. There were sixty images with different compression rates from 2 to 20. Each pair of original and compressed images was evaluated by fifteen observers so that the Mean Opinion Square measure (14) was calculated. For the evaluation of visual distortions in images the five-level scale was used ( Table 1). The acquired data were doubled, but the sequence of input data was changed so that the neural network would learn not memorize the test data. These data were put as the target for the neural network for the process of learning. The training graph is presented in Fig. 3.
The  The examplary evaluation of distortions from the learning set of the compressed image bardowl.bmp (Fig. 6) for 16.6 compression rate according to the original one was set to 1.888889 by the MOS measure while the neural network gave the 1.9937 value. After training the network, the new set of data was put into the input of neural network. The data were different from the learning data. The results obtained from the new set of data were also satisfactory. They are presented in Fig. 8

Conclusions
The use of neural network enables the computation of PQS value without observers so that the implementation of Picture Quality Scale is much easier. The neural network, which is presented in this article, generates mostly the correct values for different distortions of images compressed with the Daubechies wavelet transforms. The MOS measure is objective depending on the observers, their moods and conditions. That is why the maximum difference between the PQS and MOS values, which is set to 1.3211, is satisfactory. The idea of matching the visual distortions by the neural network to the MOS value is obtained.