Transforming Source Code to Mathematical Relations for Performance Evaluation

(cid:21) Assessing software quality attributes (such as performance, reliability, and security) from source code is of the utmost importance. The performance of a software system can be improved by its parallel and distributed execution. The aim of the parallel and distributed execution is to speed up by providing the maximum possible concurrency in executing the distributed segments. It is a well known fact that distributing a program cannot be always caused speeding up the execution of it; in some cases, this distribution can have negative e(cid:27)ects on the running time of the program. Therefore, before distributing a source code, it should be speci(cid:28)ed whether its distribution could cause maximum possible concurrency or not. The existing methods and tools cannot achieve this aim from the source code. In this paper, we propose a mathematical relationship for object oriented programs that statically analyze the program by verifying the type of synchronous and asynchronous calls inside the source code. Then, we model the invocations of the software methods by Discrete Time Markov Chains (DTMC). Using the properties of DTMC and the proposed mathematical relationship, we will determine whether or not the source code can be distributed on homogeneous processors. The experimental results showed that we can specify whether the program is distributable or not, before deploying it on the distributed systems.


Introduction
The need for high speed computation in large-scale scientic applications for analyzing complex scientic problems is very high, so that the common computers would not be able to satisfy it.Therefore, nowadays, using the distributed systems and processing power of numerous processors or cores to reach the favorable speed is known as a fact [1].Yet, as a fact, creating a large scale distributed program is always more dicult than creating a non-distributed program with the same functionality, as the creation of a distributed system can change into a tedious and error-prone task.
Since the computational programs have many computations, so its execution requires more time.Therefore, if a program does not have the ability to distribute, there will be a lot of waste time.The most important time of a distributed program is invocation or communication time of their methods.These calls spend the most execution time.Certainly, by distributing a program, if two classes of it can be distributed on two dierent machines, the invocations between those classes will turn into the remote calls.As reference [2] species, in some cases, the program distribution can have negative eects on the running time of the program.When there are many calls between two methods, the network trac increases and as a result, efciency of the distributed program will be lower than the initial sequential program.So, regarding that constructing the distributed program from the source code is complex and time consuming, it is better to predict whether the source code is distributable or not, before distributing a program on the machines.None of the existing methods and tools can to achieve this goal from source code.

1.1
The Problem and the Claim The overall problem addressed in this paper is to specify whether the source code has the potential for parallelization on homogeneous processors; i.e., in case of distribution, whether it brings the maximum concurrency compared to the sequential mode.We claim that it is possible to provide a solution to the mentioned problem by doing the following tasks: (1) Modeling software's method invocations by Markov chains as (described in section III) as : • Markov chains nodes represent methods and edges between nodes represent calls between methods, • weight of the edges in Markov chains, shows the number of calls between the methods.(2) Determine the maximum potential of distributability of each method (described in section III) (3) Determine the expected performance of the source code from obtained Markov chain (described in section III) (4) ompute the speedup.Speedup is dened as the execution time of a sequential program divided by the execution time of a parallel program that computes the same result.In particular, Speedup = T s /T p where T s the sequential time and T p is the expected performance.

The Paper Outline
The other sections of the paper are organized as follows: A literature review on the researches conducted by others is discussed in section II.In section III, we propose a mathematical relation of time estimation by which the potential for distribution of the source code can be specied.Case study is discussed in sections IV.At the end, section V deals with conclusions and future works.

Related Work and Background
The complicated computational applications cannot be executed in an acceptable time on the computation machine, so they should be divided into small tasks.We can use distributed or multiprocessors systems for executing of these tasks.Nowadays, most distributed and multiprocessor tools use scheduling methods for distribution.The aim of scheduling is execution of a program on several processors such that the time of execution of the whole program will be minimal, considering the time of tasks and communication time between the processors [3].The scheduling methods can be divided into two groups; including those which can assurance the quality of service, and those which cannot.The former scheduling systems are preferred to the latter ones.CONDOR [4], SGE [5], PBS [6] and LSF [7] can be referred to as some of the most popular and widely used scheduling systems.These scheduling systems do not guarantee the service quality.These tools perform the scheduling only at the job level and not at applications'.Unlike the above systems, there are some which observe the service quality in scheduling.Such systems observe Job Characteristics, Planning in Scheduling, Rescheduling and Scheduling Optimization in their scheduling.AppleS [8], GrADS [9] and Nimrod/G [10] are among the most famous systems of this kind.Moreover, none of the aforementioned schedulers can predict whether an oered program has the potential to become parallelized, or whether speedup can be achieved in case of parallelization.Also, a tool called DAGC is presented to nd the optimal architecture distribution [11].DAGC uses clustering method for nding optimal architecture distribution.The tool uses a mathematical relation to measure the quality of the obtained clusters.The main problem in mathematical relation used in this tool and such tools is described above it does not have the ability to determine whether a program has the capability of being parallel or not.In the previous work [12], we proposed an analytical model for determining distributability of a specic method.However, our method in the previous work cannot determine overall distributability of a program; also, the eectiveness of each method is not considered in the distribution of it.In this research, we want to determine the overall distributability of a program using DTMC considering the eectiveness of each method.

Overview of Discrete Time Markov Chains
In this section, we discuss Discrete Time Markov Chains (DTMCs), which we use to model the source code's invocations [13].A DTMC is described by its states and transition probabilities between the states; where we indicate the transition probabilities between the states as one-step transition probability matrix.The one-step transition probability is the probability that the process, when in state i at time n, will next transition to state j at time n + 1.We write: (1) Note that all the elements in a row of P add up to 1 and each of the P i,j 's lie in the range [0, 1].For our purpose, we use absorbing DTMC.One DTMC is called absorbing if at least one state has no outgoing transition.Each DTMC with several nal states can be converted into an absorbing DTMC.It is performed by adding a nal state to DTMC.Next, a transition is drawn to the added absorbing state from all the nal states available in DTMC.We can partition the transition probability matrix of an absorbing DTMC as: (2) If the DTMC has n states with m absorbing states, Q would be a (n − m) × (n − m) sub-stochastic matrix (with at least one row sum < 1) describing the probabilities of transition only between transient states, 1 being a m × m identity matrix, 0 would be an n×(n×m) matrix of zeros, and C would be an (n − m) × m matrix describing the probabilities of transition between transient states and absorbing state.The (i, j)-th entry of Q k denotes the probability of arriving to state s j after exactly k steps, starting from state s i .Hence the inverse matrix (I −Q) −1 exists.This is called the fundamental matrix F : Let X i,j represent the number of visits to state j starting from the state i before process is absorbed.It can be shown that the expected number of visits to state j with starting from state i (i.e, E[X i,j ]), before entering an absorbing state is given by the (i, j)-th entry of the fundamental matrix F [14,15].So ( 4) m i,j is the (i, j)-th entry of the fundamental matrix F .The variance of the expected number of visits could also be computed using the fundamental matrix.Let σ i,j denote the variance of the number of visits to the state j starting from state i.Dene F D = [md i,j ] such that: (5) In other words, F D represents a diagonal matrix with the diagonal entries the same as that of F .If we dene Hence: (7) V ar[X i,j ] = σ 2 i,j .

Predicting Performance Of A Source Code
In this section we describe our approach for modeling a software system that method invocations are represented by an absorbing DTMC; such that DTMC states represent the software methods, and the transitions between states represent transfer of control from one method to another.We assume that the system consists of n methods, and has a single initial state denoted by 1, and a single absorbing or exit state denoted by n.Consider Fig. 1.Numbers on edges indicate the probability of movement from one method to another method.In this paper the probability to go from method x to method y is computed as [number of method call from x to y / total number of out method call of x (i.e.fan out)].The method invocations of the source code are given by the one-step transition probability matrix P . .
Let P D i denotes the potential of distributability of method i that indicated by node i in the DTMC.During a single execution, the performance of the software, denoted by the random variable P is given by: (9) where X 1,i denotes the number of visits to the transient state i starting from the state 1.Therefore, the expected performance of a software system is as follows: (10) Thus to obtain the expected performance of the source code, we need to obtain E P D X1,i i , which is the expected potential of distributability of method i for a single run of the software.Using the Taylor series expansion, E P D X1,i i in relation 10 can be written as relation 11. (11) Let E[X 1,i ] = m 1,i and V ar[X i,j ] = σ 2 i,j , relation (11) may be written as: m 1,i is the expected number of visits to state i and σ 2 1,i is the variance of the number of visits to state i. m 1,i and σ 2 1,i can be obtained from DTMC analysis.Relation (10) can thus be written as: ( 13)

Method i
In this section, we are going to determine Potential of Distributability (P D) of each method to determine overall performance (i.e., P ) of a program.For achieve this aim, we determine P D i , to measure the values of dierent distributions for method i. Invocation (or call) between methods are two types of asynchronies and sequential.If by distributing a program, two methods of the program distribute in two dierent machines, calls between those methods will turn into asynchronies; and in sequential call, two methods of the program are placed on the same machine.Considering of communication time, our method considers two asynchronies and sequential mode for each call; to determine which mode (sequential or parallel) can reach a maximum speed up.
To estimate the speed-up, the execution time of all instructions should be estimated.The execution time of all instructions, except the nested calls, can be computed by the existing methods [16][17].The existing methods cannot be applied easily to calculate the execution time of nested calls because the execution time of a caller method is depending on the fact that the calls inside it are carried.out in a sequential or asynchronous manner.For example, consider Listing 1.In the Listing 1, in the time t 1 , the current method (caller method) will continue to work in a non-stop manner until reaching the use point of the results of a callee method.We call these points' synchronization points [18] and is shown by S. So, one method continues to work after calling a method from a remote locations (other distributed segments) and waits for a call response only when requires that response.As shown in Listing 1, the level of concurrency in executing the caller and the callee methods depends on the time interval between the call point and use point of the call results.The problem is the estimation of this interval time.As shown in Listing 1, there may be other calls between the call point and use point and the execution of these calls can be either synchronous or asynchronous.In Listing 1, considering methods m, R and P , if all of them executed sequentially (or synchronously), the estimated execution time will be calculated as follows: We can write above relation for Listing 1 in the recursive form and expand it for the nested call with any depth. (15) Generally, for the sequential call, estimated execution time relation, is as relation: Now we calculate the estimated execution time when methods are executed parallel (or asynchronously).See again Listing 1.If methods m, R and P are executed asynchronously, the estimated execution time will be calculated as follows: (21) C t is the communication time and I init shows the preparation time for doing remote call.Generally, the estimated time relation for the parallel (or asynchronous) is calculated as follows:

Determining the Potential of Distribution
Considering the relations ( 18) and ( 22), the general mathematical form of a PD relation is written as follows: In the above relation, depending on the call to be synchronous or asynchronous, the value of a i is considered as 1 and 0, respectively.The goal is to determine a i , so that this minimizes P D m .In the relation (23), the communication time is C t and t i is the estimated time between the callee point of I i and the synchronization point of S i (use point).
For example, to obtain P D for Listing 1, we need to combine the estimated times for the asynchronous (relation 22) and sequential execution (relations 15-17) as follows: (24) In relation 24, the aim is to determine a 1 and a 2 in a way to minimize P D m , P D R and P D P .The aim of PD relations in (25) is to determine a 1 , a 2 , a 3 , a 4 and a 5 in a way to minimize P D (A.m) , P D (B.m) , P D (C.n) , P D (D.p) and P D (F.g) .We use the Dantzig's simplex algorithm [20] to determine the binary values of a i (for synchronous call the value of a i is considered as 1 and for asynchronous calls, the value of a i is considered as 0).Simplex method is a popular algorithm for linear programming.Then, after determining P D for methods m, n, p and g, we make DTMC for the program of Listing 2 and then we compute the potential of distributability (using relation 24) for each method and then of course we will determine expected performance (relation 12).Also, the sequential execution time of the program is calculated as well.Finally, the speedup is calculated by dividing the sequential time to the expected performance.For relations (25) , the communication overhead is considered as 1 second and T 1 , T 2 , T 3 , T 4 and T 5 (execution time of non-call statements) are considered as 40, 35, 45, 50 and 20 seconds.Table 1 shows the expected distributed potential (using relation 13), sequential, and speed-up execution times for Listing 2. Since speed-up is bigger than one, this indicates that the program is capable of parallel execution; i.e., the parallel execution of the program is faster than the sequential execution of the program.

Evaluation Result
In this section, we evaluate the performance of the proposed method.We want to determine when the speed-up achieved by our method is greater than one; the actual execution will speed up.For achieve this goal, we use jDistributor [2] tool.jDistributor is a tool for automatic distribution of the sequential program on the homogeneous distributed systems using the Java Symphony middleware [19].The algorithm used in the jDistributor is a hierarchical clustering method and its goal is to nd an appropriate clustering for distribution.We use the well-known travelling salesman problem (T SP ) for evaluating of the proposed method.We compute P D sequence and P D asyn from source code.We then predict from P D relation, the estimated time of the parallel and sequential execution for dierent graph nodes and then calculate speed-up by them.Afterwards, we distribute the T SP on the network including three computers by use of the jDistributor tool and then we calculate the parallel and sequential executions time.The results are shown in Table 2.

Conclusion
In this paper, we introduced a new approach to specify whether the source code is distributable or not, before the distribution.For achieve this goal, by considering asynchronous and sequential calls, a mathematical relationship was proposed to measure dierent distributions values from the same program code.Then, we model the software's method invocations by Discrete Time Markov Chains (DTMC).DTMC and its properties and proposed mathematical relationship can determine whether or not the source code distribution capabilities on homogeneous processors.

Figure 1 .
Figure 1.Modelling method invocations for a sample program with DTMC

Table 1 .
Distributed execution times, sequential execution times and speed-up for Listing 2