Hash function generation by means of Gene Expression Programming

– Cryptographic hash functions are fundamental primitives in modern cryptography and have many security applications (data integrity checking, cryptographic protocols, digital signatures, pseudo random number generators etc.). At the same time novel hash functions are designed (for instance in the framework of the SHA-3 contest organized by the National Institute of Standards and Technology (NIST)), the cryptanalysts exhibit a set of statistical metrics (propagation criterion, frequency analysis etc.) able to assert the quality of new proposals. Also, rules to design "good" hash functions are now known and are followed in every reasonable proposal of a new hash scheme. This article investigates the ways to build on this experiment and those metrics to generate automatically compression functions by means of Evolutionary Algorithms (EAs). Such functions are at the heart of the construction of iterative hash schemes and it is therefore crucial for them to hold good properties. Actually, the idea to use nature-inspired heuristics for the design of such cryptographic primitives is not new: this approach has been successfully applied in several previous works, typically using the Genetic Programming (GP) heuristic [ 1 ]. Here, we exploit a hybrid meta-heuristic for the evolutionary process called Gene Expression Programming (GEP) [ 2 ] that appeared far more eﬃcient computationally speaking compared to the GP paradigm used in the previous papers. In this context, the GEPHashSearch framework is presented. As it is still a work in progress, this article focuses on the design aspects of this framework (individuals deﬁnitions, ﬁtness objectives etc.) rather than on complete implementation details and validation results. Note that we propose to tackle the generation of compression functions as a multi-objective optimization problem in order to identify the Pareto front i.e. the set of non-dominated functions over


Introduction
Cryptographic hash functions are fundamental primitives in modern cryptography and are of crucial importance for our digital life. In particular, they are used in many security applications (data integrity checking, cryptographic protocols, digital signatures, pseudo random number generator etc.). Formally speaking, a hash function H : {0, 1} * −→ { 0, 1} n maps a binary string of arbitrary length into a binary string of some fixed length n (often called footprint) with at least the compression and ease of computation properties [3]. If hash functions satisfy the additional requirements such as preimage resistance, second preimage resistance and most importantly collision resilience, they are a very powerful tool in the design of techniques to protect the authenticity of information. Recent advances in hash functions cryptanalysis permitted successful attacks against the major cryptographic hash function in use, including the well-known MD5 [4] and SHA-1 [5] hash schemes which are therefore no longer considered as secured. In response, the National Institute of Standards and Technology (NIST) recommended to move to the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384 and SHA-512) and the SHA-3 contest has been launched 1 to find new schemes. At the time of writing, the third and last round of this contest has been released and permits to exhibit five candidates for the next SHA-3 standard. At the same time novel hash functions are designed (especially in the framework of the SHA-3 contest), the cryptanalysts exhibit a set of statistical metrics (propagation criterion, frequency analysis etc.) able to assert the quality of new proposals.
This article investigates the ways to build on this experiment and those metrics to generate automatically the compression functions by means of Evolutionary Algorithm (EA). Such functions are at the heart of the construction of iterative hash schemes and it is therefore crucial for them to hold good properties. Actually, the idea to use natureinspired heuristics for the design of such cryptographic primitives is not new: this approach has been successfully applied in several previous works, typically using the GP heuristic [1]. Here, we exploit a hybrid meta-heuristic for the evolutionary process called GEP [2] that appeared far more efficient computationally speaking compared to the GP paradigm used in the previous papers. In this context, the GEPHashSearch framework is presented. This paper is organized as follows: section 2 presents the background of this work (cryptographic hash functions, EAs) and reviews the related works. Section 3 provides a brief overview of the Gene Expression Programming (GEP) heuristic while section 4 holds the main contribution of this paper. It details GEPHashSearch, a GEP-based framework to build the compression functions with reasonably good properties. This is still a work in progress such that section 5 remains limited. Yet the first experimental results which are proposed there are still promising. Finally, section 6 concludes the paper and provides the future directions.

Cryptographic hash functions
Cryptographic hash functions are fundamental primitives in modern cryptography and have many security applications (data integrity checking, cryptographic protocols, digital signatures, pseudo random number generators etc.). Formally speaking, a hash function H : {0, 1} * −→ { 0, 1} n maps a binary string of arbitrary length into a binary string of some fixed length n (often called footprint or fingerprint) with at least the compression and ease of computation properties [3].  To be of cryptographic use, hash functions must satisfy the additional properties, such as preimage resistance, second preimage resistance and most importantly collision resilience. One talks about a collision between x and x ′ when x = x ′ and H(x)=H(x ′ ). Considering that the input of a hash function can be of any size (in particular >n ), collisions are unavoidable. Knowing that if y is such that y = H(x), then x is called the preimage of y, the above mentioned properties can be defined as follows 2 : • Preimage resistance: given y, one can not find -in reasonable time -some x such that y = H(x). Given y, 2 n computations are at most required for finding x. • Second preimage resistance: given x, one can not find -in reasonable timex ′ = x such that H(x)=H(x ′ ). As above, given x, 2 n evaluations are at most required for finding y. • Collision resistance: one can not find in reasonable time x and x ′ such that H(x)=H(x ′ ). Note that there is a free choice of both inputs. There are 2 n 2 evaluations required to find a valid couple (x, x ′ ) (This result comes from the Birthday paradox).
The additional properties are often desired: • non-correlation: input and output bits should not be correlated. Related to this, an avalanche effect property similar to the one of good block ciphers is required: modification of a single bit in the input should change at least half 2 In these definitions, reasonable means that there exists no attack that operate faster than the exhaustive or brute-force search among all possible inputs.
of the output bits. This rules out hash functions for which preimage resistance fails to imply the 2nd-preimage resistance simply due to the function effectively ignoring a subset of input bits. • near-collision resistance: it should be hard to find any two inputs (x, x ′ ) such that H(x) and H(x ′ ) differ in only a small number of bits. • partial-preimage resistance or local one-wayness. It should be as difficult to recover any substring as to recover the entire input. Moreover, even if part of the input is known, it should be difficult to find the remainder (e.g., if t input bits remain unknown, it should take on the average 2 t−1 hash operations to find these bits.) In practice, most arbitrary-length hash functions are built in the iterative process based on a fixed-length compression function or a block cipher. For instance, SHA-1 [5], MD5 [4], as well as all the other hash functions we know, are constructed by applying some variant of the Merkle-Damgård construction to an underlying compression func- Figure 2). The general model for the iterated hash functions operates as follows: the hash input x of the arbitrary finite length is first split into fixed-sized chunks x 1 , x 2 ,...x L ∈{0, 1} b which gives the expanded message (x 1 ,...,x L ). An iterated hash H iterates the underlying compression function h as follows: h i−1 serves as the n-bit chaining variable between the stage i−1 and the stage i. h 0 is a predefined starting value or initializing value (IV). An optional output transformation g is used in a final step to map the n-bit chaining variable to an n-bit result g(h L );g is often the identity mapping g(h L )=h L . The one or two last chunks of the expanded message are padded, and the last chunk x L may contain the additional information, such as the length x of the non-expanded message x.
Incremental hashing. Alternative constructions make use of incremental hashing: The idea behind the method presented in [6] is that if one changes a message and one has already computed the hash function of the (unchanged) message, one only needs to recompute the changed part in order to obtain the new hash value. This is done as follows: each chunk x i of the message x = x 1 x 2 ...x L is prefixed by a block index. Then the hash value y i = h( i .x i ) is computed and the output is combined via a special operation: y = y 1 ⊙ y 2 ⊙ ... ⊙ y L . The computation of h can be done by any standard hash function as soon as it is collision free. With this technique the computation of the hash value can be parallelized, therefore it is also possible to recompute only parts x i of the message. In all cases, a good choice for the combining operation ⊙ is crucial. A first natural thought is to use the bitwise XOR, but it was shown that it is insecure [6]. Using multiplication or addition in a group Z * p (multiplication and addition modulo p) is generally seen as a good choice for the combining operation (in this case, the incremental hashing scheme is called MuHASH and AddHASH where, for security reasons, |p| should have at least Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 13/07/2023 18:19:24 U M C S 512 or 1024 bits, making the final hash value of the same length. If a smaller length is required, the output y can be hashed with a standard collision-free hash function (For instance, applying SHA-1 to MuHASH leads to a 160 bit fingerprint). The security depends on the discrete logarithm problem in the underlying group. As we will see in the sequel, the EA-based heuristic presented in this paper derives from the incremental hashing scheme in the way linking functions are used to connect the genes of the GEPHashSearch individuals.

Evolutionary Algorithms (EA)
EA is a class of solving techniques based on the Darwinian theory of evolution [7] which involves the search of a population X t of solutions. Members of the population are feasible solutions and called individuals. Each iteration of an EA involves a competitive selection that weeds out poor solutions through the evaluation of a fitness value that indicates the quality of the individual as a solution to the problem. The evolutionary process involves at each generation a set of stochastic operators that are applied on the individuals, typically recombination (or cross-over) and mutation. Execution of simple EA requires high computational resources in the case of non-trivial problems, in particular the evaluation of the population is often the costliest operation in EAs. There exists many useful models of EAs, yet a pseudo-code of a general execution scheme is provided in Algorithm 1.

Hash function generation by means of EAs
The idea to use nature-inspired heuristics for the design of cryptographic primitives and in particular hash functions is not new: this approach has been successfully applied in several previous works as reviewed in [8]. This probably started to attract attention of the researchers in the 90's with the Ph.D. Thesis of Clark [9] where different heuristic techniques (genetic algorithms, simulated annealing, and tabu search) were compared to break classical cryptosystems. In particular, the use of simulated annealing was proposed in the cryptanalysis of a certain class of stream ciphers. As mentioned in [8], Millan, Clark, and Dawson then additionally proposed a model for the generation of the Boolean functions with excellent cryptographic applications, thus starting a very  [10] opens the research area for the generation of hash functions by means of evolutionary technics (in this paper, a Cellular Automata (CA)) as initiated by the seminal work of Damgård in [11]. For instance in [12], an evolutionary technique was applied for the design of a digital circuit which computes a simple hashing function. Based on the FPGA architecture, the circuit was synthesized automatically through simulated evolution. More recently in [13], the authors used Genetic Algorithms (GAs) to construct Universal Hash Functions to efficiently hash a given set of keys. The Hash Functions generated in this way provide a lesser number of collisions as compared to selecting them randomly from a family of Universal Hash Functions. The proposed algorithm could be used in the scenarios where the input distribution of keys frequently changes and the hash function needs to be modified often to rehash the values to reduce collisions. Finally, the use of the Genetic Programming (GP) heuristic was successfully applied in [14] for the automated design of cryptographic block ciphers and hash functions. In this article, we extend the work proposed in [14] by exploiting a hybrid meta-heuristic for the evolutionary process called GEP [2] that appeared far more efficient computationally speaking compared to the GP paradigm used in the previous papers. The next section details this heuristic. Also, while the work presented in [14] focuses on a single objective (the avalanche effect captured by the propagation criteria PC k (t) presented in §4.1), we add a set of complementary objectives to direct the traversal of the search space toward solutions which not only optimize the propagation criteria but also the randomness, the complexity and the efficiency of the evolved compression functions.

Gene Expression Programming (GEP) heuristic
Among different classes of EAs, John Koza in [1] proposed to use GA in the so called Genetic Programming (GP) where the individuals represent a function or a program. Gene Expression Programming (GEP) was proposed by Cândida Ferreira in [2]a s an extension of GP. As an EA, GEP uses the populations of individuals, selects the individuals according to their fitness, and introduces genetic variation using one or more genetic operators. The fundamental difference between these three classes of EAs resides in the nature of the individuals: • in GAs the individuals are symbolic strings of fixed length (chromosomes); • in GP the individuals are nonlinear entities of different sizes and shapes called parse trees that represent a program (see Fig. 3); • in GEP, the individuals are also nonlinear entities of different sizes and shapes (expression trees), but these complex entities are encoded as simple strings of the fixed length (chromosomes).
This avoids possible divergence in the size of the parse trees that can be observed within GP and represents the main weakness of this approach: a large number of computational resources can be used to edit huge illegal structures. On the contrary, Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 13/07/2023 18:19:24 U M C S the chromosomes in GEP are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, transpose, etc.). In addition, any modification made in the genome always results in syntactically correct expression trees or programs. Hence, GEP has been chosen as the base of heuristic for finding good candidates for hash functions (or more precisely, compression functions).
Overview. In GEP, the genome or chromosome consists of a linear, symbolic string of the fixed length composed of one or more genes. This string contains two kinds of symbols: functions (which are typically elements from the set ∆ f ) and terminals (belonging to a set ∆ term ). Each gene represents a relation between the function and one can represent it as a tree called Expression Tree (ET). The string reflects, in fact, the traversal of the tree from the left to the right and from the top to the bottom, which is neither the prefix nor the postfix traversal sometimes found in the GP implementations. For instance, let us assume that In fact, the genes in GEP are composed of a head (containing both functions and terminals) together with a tail containing only terminals. Let h (resp. t) be the size of the head (resp. the tail). Then GEP encodes each gene with a string of the length h + t. More precisely, let a max be the maximum number of arguments taken by the functions in ∆ f , then t = h(a max −1)+1. Note that in the previous example, a max =2 meaning that t = h +1 and each gene is encoded with a string of the length 2h +1. For instance, let us take h =7 . Then the complete gene considered previously could be written in GEP as depicted in Fig. 4.
In this case, the expression tree finishes at position 7 whereas the gene ends at position 14. If a mutation occurs at position 4 that changes 'b' into '+', then the gene presented in Fig. 5 is obtained, where the corresponding expression tree is also provided. It now finishes at position 9. So despite its fixed length, each gene has the potential to code for the expression trees of different sizes and shapes.
The way of cross-over operation is similar. Finally, GEP chromosomes are usually composed of more than one gene of equal length. For each problem or run, the number of genes, as well as the length of the head, is chosen. Each gene then codes the sub-ET  Detailing more closely GEP is clearly out of the topic for this paper (for this purpose, please refer to [15]).
4 GEPHashSearch: a GEP-based framework to find "good" compression functions This article presents GEPHashSearch, a framework designed to evolve a population of individuals that are candidates for a compression function used in an iterative or incremental hash scheme. The objective is to design basic blocks of sufficiently good quality to expect that the global scheme remains a strong candidate. Note that many modern Hashes such as the current NIST finalists for the SHA-3 competition have a preprocessing stage; this is to prevent attacks that take advantage of a greater freedom of input (as opposed to being forced to append the length of the message). Some Hashes have expansion functions in addition to that prior to hashing. GEPHashSearch does neither of these as it simply focuses on the compression functions. Basic building operators. As we saw in §2.1, contemporary hash schemes consist in a set of binary operators combined together to form a more complex function applied in the different stages of the hash function evaluation. In GEPHashSearch, the basic building operators under considention are described in Table 1. They will form the set of functions ∆ f used by the GEP individuals in their head. The choices we made were governed by the operators generally used in the known hash schemes. Table 2 presents an overview of the basic operators used in MD5, SHA-1 and the finalists of the current SHA-3 hash competition. In the framework of GEPHashSearch, we focus on the cheapest operators (in terms of gates equivalents and required cycles to compute). ⊕ &| Keccak [19] ⊕ &| ≫ k Skein [20] ⊕ addmod 2 64 ≫ k GEP Individuals. In GEPHashSearch, each GEP individual Ind i that composes the population to evolve is assumed to be a potential candidate for a compression function compress i that intervenes in the iterative construction of a hash function H. This function is assumed to take two input parameters: a message chunk M of b bits and a state bloc S on n bits (corresponding to the fingerprint generated in the previous stage). This setup is illustrated in Fig. 6. The associated ET corresponds to the expression of compress i as a function composed by the basic building operators mentioned in Table 1 that form the set ∆ f . Remember that the objective of our work is to generate "good" candidates for the compression functions hoping that it will lead to good hash functions. In this context, the considered individuals are constrained by the following parameters:  defined as the number of distinct terminals that are really expressed in the individual (i.e. that belong to the ET) divided by the total number of terminals i.e. |∆ term | = N + B. In particular, it is crucial that T D (Ind) = 1 for the final individual chosen at the end of a GEPHashSearch process, meaning that none of the input blocks of the compression function (whether from the fingerprint or the message chunks) are ignored. Now that the GEPHashSearch individuals are defined, we can detail the four fitness functions used to evaluate them.

Fitness functions used in GEPHashSearch
Propagation criteria PC k (t). The propagation criterion PC(t) has been proposed in [21] as a statistical test to check the effect of some bit flips on the output. Given a function f to evaluate, some random input x is generated and the result y = f (x) is computed. Then the input is changed slightly into x ′ such that up to t random bits are flipped. In other words, assuming d H (., .) denotes the Hamming distance, Then the function is evaluated again to obtain y ′ = f (x ′ ). I ti se xpected that for a "good" function, each bit in the output is modified with probability 1 2 , thus the hamming distance d H (y, y ′ ) is expected to follow the Binomial law B( 1 2 ,n) where n is the length of y (or y ′ ). The χ 2 distribution (i.e., a left skewed curve) is used to compare the goodness-of-fit of the observed frequencies of the k sample measures {d H (y i ,y ′ i )} 0≤i<k to the corresponding expected frequencies of the hypothesized distribution (i.e. B( 1 2 ,n)). If the successive values d H (y i ,y ′ i ) respect a binomial distribution, then this results in al o wχ 2 value that permits to conclude that the tested function has good propagation proprieties. This χ 2 value is the measured fitness of the tested function and is referred to as PC k (t).
Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 13/07/2023 18:19:24 U M C S Randomness criteria RC. Apart from having good propagation properties, GEPHashSearch considers also the random nature of the fingerprint generated by the evaluated individuals. In this perspective, GEPHashSearch makes use of the NIST statistical Test Suite [22]. From a global perspective, this statistical package consists of 15 tests T 1 ,...,T 15 that were developed to test the randomness of (arbitrarily long) binary sequences produced by either hardware or software based cryptographic random or pseudorandom number generators. Each of the tests T i operates on a sufficiently long binary sequence s as an input, and computes a specific P-value which, when compared to the selected level of confidence α (for instance, α =0 .01 (1%)), permits to conclude whether the sequence is non-random (P-value <α ) or if, on the contrary, it can be considered as random (P-value ≥ α). In GEPHashSearch, these tests are used to compute for each individual Ind i at a given generation t the randomness criteria RC defined as a result of the function described in algorithm 2.

Algorithm 2. Randomness criteria evaluation RC for a given GEP individual.
Require: Ind, the GEP individual to test against the randomness criteria. Require: l, the length of the sequence to pass to the NIST tests Require: ∆ x = {x 1 ,...,x k }, a set of random message chunks / ∀i, |x i | = b bits and k × n ≥ l Require: IV , a fixed state block of n bits, used for the evaluation of all compression functions Require: α, the level of confidence to apply to each NIST test Ensure: 0 ≤ RC ≤ 15 function RC(Ind, l, ∆ x , IV , α) RC ← 0; s ← ""; At worst, RC =0. Initialize s as an empty string. while |s| <lbits Generates the sequence s of at least l bits. y i ←Ind(IV , x i ); Fingerprint of the message chunk x i . s ← s y i append y i (of n bits) to s. end while for i ← 1.. 15 Test s randomness against each NIST test.
To permit a fair comparison for this criteria of all GEP individuals, we assume that the set of random message chunks ∆ x used to build the bit-string sequence to be evaluated by the NIST tests is constructed at the beginning of each generation and is kept the same for all RC evaluations. Also, as these tests focus on the randomness of the output fingerprints generated by a given individual, we also fixed the value of Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 13/07/2023 18:19:24 U M C S the input state block for the whole compression function evaluation. This fixed value, denoted IV , is defined at the beginning of a GEPHashSearch execution and is kept constant until the end of the run. A brief overview of the considered tests is now provided (see [22] for more details): T 1 : Frequency (Monobit) Analysis: the objective is to determine whether the number of ones and zeros in the tested sequence is approximately the same as would be expected for a truly random sequence. T 2 : Frequency Analysis within each block: this test determines the proportion of ones within each b-bit blocks y i : it should be approximately b 2 ,a s would be expected under the assumption of randomness. T 3 : Longest sequence of identical bits (or runs): this test analyzes the total number of runs in the sequence s, where a run is an uninterrupted sequence of identical bits, to determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. T 4 : Longest sequence of ones within each block: as T 3 yet focusing on the longest run of ones within each b-bit blocks y i . T 5 : Binary Matrix Rank Test: the focus of this test is the rank of disjoint sub-matrices of the entire sequence, to check linear dependence among the fixed length sub-strings of the original sequence. T 6 : Discrete Fourier Transform (DFT) (Spectral) Test: this test analyzes the peak heights in the DFT of the sequence in order to detect periodic features (i.e., repetitive patterns that are near each other) that would indicate deviation from the assumption of randomness. T 7 : Non-overlapping Template Matching Test: the purpose of this test is to detect the production of too many occurrences of a given non-periodic (aperiodic) pattern. T 8 : Overlapping Template Matching Test: this test complements the test T 7 to check the number of occurrences of prespecified target strings. T 9 : Maurer's "Universal Statistical" Test: it detects whether or not the sequence can be significantly compressed without loss of information. A significantly compressible sequence is considered to be non-random. T 10 : Linear Complexity Test: the focus of this test is the length of Linear Feedback Shift Register (LFSR) to determine whether or not the sequence is complex enough to be considered random (random sequences are characterized by longer LFSRs). T 11 : Serial Test: the purpose of this test is to determine whether the number of occurrences of the 2 m m-bit overlapping patterns is approximately the same as would be expected for a random sequence. Random sequences have uniformity; that is, every m-bit pattern has the same chance of appearing as every other m-bit pattern.  In algorithm 2, the minimum length l of the tested sequence should be selected carefully, in accordance with the NIST recommendations for each test. We adapt these recommendations with regards of the GEPHashSearch context to produce the results presented in Table 3. Based on this analysis, GEPHashSearch uses the value l = 10 6 bits for the computation of RC in algorithm 2.
Complexity W 1 . This metric measures the complexity of the compression function seen as the accumulated number of cycles require to execute the operators that compose the evaluated individual i.e. W 1 = op∈Ind ∆ f W 1 (op). This assumes that the cost (in terms of a number of cycles) of all the operators that compose the set ∆ f is known. The examples of such evaluations can be found in various speed benchmarks of hash functions. For instance, those made in Blake [16] estimate that W 1 ( 7 )=1 2cycles, W 1 (addmod 2 32 )=2 4cycles etc. The notation W 1 is used as this criterion reflects the sequential work required to perform the execution of the compression function.
Efficiency E. This metric makes a raw estimation of the efficiency of the tested individual by translating automatically its ET into an C code which is compiled and executed on a reference platform to determine the average time required by the associated compression function to get the fingerprint of random message chunks. In practice, GEPHashSearch benchmarks the time t required to generate consecutively fingerprints of the e first elements of the set ∆ x (the one used for the evaluation

Multi objective optimization in GEPHashSearch
If the first implementation of the GEPHashSearch framework targets the optimization of each fitness object PC k (t), RC, W 1 and E independently, the real goal is to effectively explore and measure the trade-off that might be selected among these four objectives. In practice, we plan to build the set of optimal solutions (largely known as the Pareto-optimal solutions) using the NSGA-II algorithm [23], which works expressly with the notion of dominance. Just as all the individuals in the Pareto front are dominant by definition, NSGA-II is an elitist algorithm that selects across the pareto dominant individuals. One point worth noting is that evaluating over multiple criteria takes a greater amount of time for each additional test, as does checking for dominance over other individuals. While classical Multi-objective EAs that use non-dominated sorting and sharing have been criticized for their non-elitism approach and their computational complexity -O(|Pop| 3 ) in our case (more precisely O(4 ×|Pop| 3 ) where |Pop| is the population size as there are 4 objectives to optimize), NSGA-II is faster as this non-dominated sorting approach has a computational complexity of O(|Pop| 2 ) in the GEPHashSearch context which explains why it received our preference.

Experiments
As mentioned before, we present here a work in progress and we are now in the process of validating the design choices proposed in the article. The implementation of GEPHashSearch consists in two fundamental components: (1) LibGEP 3 (version 0.4.1), a C++ library partly developed by one of the authors which provides a convenient interface to the GEP heuristic; (2) ParadisEO 4 (version 1.3), a portable C++ middleware for the manipulation of EAs heuristics. While at the moment of writing this implementation this has not been finished, we can propose here the first experimental results that were obtained in a mono-objective context. They were obtained by running GEPHashSearch on the resources of the computing cluster of the University of Luxembourg (see http://hpc.uni.lu). The parameters of the evolutionary processes executions are as follows: • Probability of two-point crossover: 0.5 Figure 7 depicts the mono-objective optimization of the propagation criteria PC 1000 (4). More precisely, it illustrates the evolution of the average fitness of the best individual (over 50 executions) -Mean Best Fitness (MBF). We can see that GEPHashSearch converges quickly to optimal solutions (for which the χ 2 statistic observed at the end (42.02559) is below the expected value (46) for the selected level of confidence. This is obviously encouraging and we are now finalizing the implementation of the other criteria, namely RC, W 1 and E to permit concrete runs over NSGA-II withing GEPHashSearch.

Conclusion
Cryptographic hash functions are fundamental primitives in modern cryptography and have many security applications. In this paper, the GEPHashSearch framework was proposed. The objective is to build compression functions with reasonably "good" properties by means of the Gene Exprassion Programming (GEP) heuristic -an efficient alternative to the classical Genetic Programming (GP). As it is still a work in progress, this article focuses on the design aspects of this framework rather Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 13/07/2023 18:19:24 U M C S than on complete implementation details and validation results. In particular, the complete description of the GEP individuals encoding together with four fitness objectives has been detailed. While the way GEP individuals are encoded reflects the organization of the most known hash functions (MD5, SHA-1 but also the five finalists of the SHA-3 contest organized by the NIST were studied), the defined objectives try to catch the expected properties of the underlying compression functions: a descent propagation of small modifications in the input message (captured by the propagation criteria PC k (t)), reasonably good randomness (attested by the RC fitness value which reflects the successful passing of the 15 tests provided by the NIST Statistical Test Suite) and good performances (as measured by the complexity W 1 and the efficiency of the hashing E). The first experimental results are quite promising. Without doubts, the multi-objective optimization of this problem over these four criteria will lead to fruitful contributions to the cryptographic community.