Application of the central weighted structural similarity index for the estimation of the face recognition accuracy

In the paper a novel method for the estimation of the face recognition accuracy based on the modiﬁed Structural Similarity is presented. A typical application of the Structural Similarity index is related to the full-reference objective image quality assessment. Growing popularity of this metric is caused not only by the fact of its relatively low computational complexity but also by its sensitivity to three common types of distortions: the loss of contrast, luminance distortions and the loss of correlation. Taking into account the output of the SSIM metric as the quality map with the resolution nearly the same as that of the input images, it is possible to use any two–dimensional central weighting function to control the level of importance of each image region. The approach proposed in this article is based on the usage of the Central Weighted SSIM index for the prediction of the face recognition accuracy using the images contaminated by several common types of distortions e.g. salt and pepper noise, lossy compression, ﬁltration etc. The described


Introduction
Rapid progress in the field of digital image quality assessment, taking place at the beginning of the 21 st century, radically changed the methods used for the image and video quality estimation. Instead of well known traditional metrics based on the Mean Squared Error and similar measures, some modern approaches have been proposed. Most of them are also full-reference metrics but their main advantage is much better correlation with the human perception of some typical image distortions like noise, blur, lossy compression artifacts etc. Probably the most popular metric of that kind is the Structural Similarity index analyzed in this paper.
Full-reference approach to the digital image quality assessment is currently the only universal one since the blind assessment methods proposed so far are specialized so their usefulness is strongly limited. The knowledge of the original image is the price we should pay for that universality.
In our paper the modification of the Structural Similarity index is proposed as the method useful for the estimation of the face recognition accuracy assuming the centrally located faces in the previously cropped images which is a typical situation in most face recognition systems.
A reliable image quality information can be used in many image recognition and classification systems in order to forecast the recognition accuracy in the presence of some typical distortions caused e.g. by low bandwidth transmission or lossy compression.
In this paper the usefulness of the modified (Central Weighted) Structural Similarity metric to forecast the face recognition accuracy in the presence of some typical distortions is analyzed. As the face recognition method the twodimensional Principal Component Analysis algorithm is used. All the experiments are based on standard benchmark database of faces: AT&T Research Lab (known formerly as Olivetti Research Lab) [1]. Sample images used in the experiments are presented below in Fig. 1. There are ten different images of each of 40 distinct subjects. For most subjects, the images were taken in one session, varying the lighting, facial expressions and facial details. All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with a slight tolerance for some side movement). The size of each image is 92 × 112 pixels, with 256 gray levels per pixel.  The problem of automatic image recognition involves gathering, processing and storing large amounts of data. Reducing very high-dimensional feature spaces leads to speeding-up the computations. A method of Principal Component Analysis (PCA, also known as Karhunen-Loeve Transform -KLT) is commonly used to reduce their dimensionality, hence providing much help when it comes to recognition. Classical PCA is applicable only for images with rather low spatial resolution [4]. Moreover, using PCA for tasks such as image recognition or image retrieval can be challenging because it treats the data as onedimensional, when, in fact, they are two-dimensional. That is why almost all presented algorithms involve some sort of dimensionality pre-reduction discarding in many cases the spatial relations between pixels.
One of the possible solutions of this problem is using two-dimensional transformation based on PCA. The first algorithm from this group was presented in [3], where a novel, two-dimensional version of PCA for face recognition task was developed. An extension of this method (as PCArc -for row and column representations) was presented in [4].
The algorithm of recognition discussed in this paper is composed of three basic elements. It is presented in Fig. 2. First, two-dimensional Fast Fourier Transform (FFT) is performed on each image giving a matrix containing spectral components selected from the lower part of the amplitude spectrum [5]. Then it is reduced by means of PCArc producing final matrix of output features [6]. The last stage is the classification performed using the Euclidean distance calculation.
The detailed algorithm of recognition is presented below. Let us assume that the whole dataset contains K classes of images. Each class represent a single In the first step of learning stage, for all images in the dataset we calculate global mean image X and mean image for each class X (k) (used for distance calculation in the recognition stage). Then we remove the global mean image from each single image: In the next step we calculate two covariance matrices for both row representations of images: For both R and C we calculate eigenvalue matrices { (R) , (C) } (diagonal matrices of eigenvalues of M × M and N × N elements, respectively), and eigenvector matrices {V (R) , V (C) }(orthogonal matrices of M × M and N × N elements, respectively, whose columns contain eigenvectors) [7]: From the diagonals of (R) and (C) we select the largest elements and memorize their positions. From V (R) T we select r rows corresponding to the r largest eigenvalues and from V (C) we select c columns, respectively. Then we build two new matrices: F (R) of r × M elements and F (C) of N × c elements, which will be used as the transformation matrices for PCArc. The transformation is performed according to the following equation [6,8]: giving a feature matrix Y for each image X.
The classification employs feature space created for all face images in the database. A query face is projected into it and Euclidean distance is being calculated. Two alternative distances are taken into consideration: the first one is a distance to all elements, whereas the second one is the distance to the centers of all classes only X (k) . The latter approach is easier to perform and requires much fewer computations, however, its efficiency is lower in the case of irregular clusters. One of the recognition experiments is presented in Fig. 3. It was performed in "FaReS-Modeller" Environment [9]. As it can be seen from the figure below, it gives 5 nearest images from the database for a query image ("source image" in the picture), sorted in the decreasing similarity order. 3. Image quality assessment using the Structural Similarity index Taking into account the poor correlation of some classical image quality metrics, like the Mean Squared Error (MSE), Mean Absolute Error (MAE), Signal to Noise Ratio (SNR) etc. [10] with the subjective evaluations, there is a need of using some modern metric which should be much more useful in that aspect. Regardless of the correlation with the subjective quality scores, a good correlation with the pattern (e.g. face) recognition accuracy can be expected assuming the presence of some typical image distortions.
Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 30/07/2023 00:38:05 U M C S A great majority of the image quality metrics used in computer vision and image analysis applications can be classified as the full-reference metrics. It can be treated as one of the disadvantages but, on the other hand, currently available blind (no-reference) metrics [11] are not universal so the number of distortion types influencing on their values is limited. For instance, there are many metrics sensitive on JPEG compression artifacts [12,13] and Gaussian blur [14] but their applicability for some other types of distortions (e.g. various types of noise) is strongly limited and may lead to unpredictable and even wrong results. Similar situation takes place in the field of the reducedreference image quality assessment methods utilizing the partial information related to the original image. Another approach based on the modelling of the Human Visual System is computationally demanding so the complexity of such algorithms causes the similarly limited applicability. For the relatively low performance and embedded systems with strongly limited amount of memory, the statistical quality estimation approach [15] can be also used.
The usefulness of some universal image quality assessment methods, including the full-reference ones, which require the knowledge of the original image without any distortions, is evident, especially in such areas as the development of some new image filtration algorithms, lossy compression methods or video transmission techniques.
The most popular modern approach to the image quality assessment is based on the Universal Image Quality Index [16], further extended into Structural Similarity [17] defined as: where: X and X denote the original and distorted images respectively. The stability constants C 1 and C 2 are introduced in order to prevent dividing by zero for large flat and dark regions of the image but they should not introduce any significant changes of the results for the rest of the image. The authors of paper [17] suggest using the following values: C 1 = (0.01·255) 2 and C 2 = (0.03·255) 2 . The other elements in formula (1) are defined as: Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 30/07/2023 00:38:05 Applying such a method using the 11 × 11 pixels Gaussian sliding window (N = M = 11) the quality map of the image can be obtained with the resolution nearly the same as the analyzed image (reduced by N − 1 rows and M − 1 columns). The overall image quality metric is obtained as the mean value of the whole quality map. One of the most relevant properties of the SSIM index is its sensitivity to three common types of distortions introduced by many image processing algorithms: the loss of correlation (structural distortions), luminance distortion and the loss of contrast.

Proposed method of image quality estimation
Taking into account the fact of sensitivity of face recognition methods to many typical distortions, including the presence of noise, blurring, lossy compression etc., the influence of the image quality on the recognition accuracy is obvious. The general relation is straightforward: the worse image quality means the weaker recognition. Nevertheless, more accurate estimation of the face recognition accuracy on the base of the type and amount of distortions present in the input images is not an easy task. For that purpose the usage of the previously described SSIM index is proposed, also with the modification related to the central weighting using the 2-D Gaussian function (Fig. 4). The idea of using the Central Weighting SSIM is based on the assumption of central location of the most important elements of the input images, so the quality of the central parts of the image should be the most crucial. For the typical face recognition systems the input images are usually cropped in order to eliminate any redundant elements except the central part of face.
In our experiments the calculations of the linear correlation coefficients between the recognition accuracy by the PCArc method using the nearest element and the centers of classes as the classification criteria and the values of the SSIM index, as well as proposed Central Weighted SSIM index, have been performed.
The recognized images have been distorted using several typical methods such as contamination by the salt and pepper noise, lossy JPEG compression, Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 30/07/2023 00:38:05 U M C S low-pass filtration, median filtering, contrast reduction and intensity change. The examplary facial portrait with various distortions is presented in Fig. 5.
For each of 40 classes from the AT&T database 5 images have been used for training and the remaining 5 images have been used for tests, also in their distorted versions. The average recognition accuracy for the images without any distortions is 95.0% (considering the distance to the nearest element) and 90.5% if the distance to the centers of classes has been used. The average results of recognition for different distortions are presented in Table 1.

Conclusions
Summarizing the experiments performed, it can be noticed that the usage of the proposed method does not lead to worsening of the results in any case. The increase of the correlation coefficients is especially evident for the set of filtered (by low-pass and median filters) and JPEG compressed images. Considering the relatively low computational cost of the additional weighting, the proposed method can be successfully applied in many face recognition systems.
In the future work the further extension of the proposed approach using the color images, as well and some other weighting functions, is planned, also in the case of other pattern recognition methods.