Parallelizing a new algorithm for the set partition problem

– In this paper we propose a new approach to organizing parallel computing to ﬁnd a sequence of all solutions to a problem. We split the sequence into subsequences and then execute concurrently the processes to ﬁnd these subsequences. We propose a new simple algorithm for the set partition problem and apply the above technique for this algorithm.


Introduction
When solving a computer science problem we have to construct a proper algorithm with the deterministic input and output. The algorithm is programmed. The input is put into a computer. The computer performs the corresponding program to hold all the solutions to the problem after a period of time.
The scheme for computing problem solutions is as presenting in the Fig. 1. The practical meaning of a problem and an algorithm is better if the time period of computing is shorter. One of good methods to reduce computing time is to organize parallel computing where the computation environment allows. There are some methods of organizing parallel computing to find quickly all solutions of a problem, for example, constructing a sequential algorithm first and then transforming it into the concurrent one [6], splitting data into separate blocks and then computing concurrently using the blocks [7]. Such parallel computing is an illustration of the top-down design.
For many problems, we can get to know quite well a number of all desirable solutions and their arrangement in a sequence. The set partition problem [2,3,5] is a typical example. This problem is broadly used in the graph theory [1], in the concurrent systems [4]. So we can split the sequence of all desirable solutions into subsequences and use a common program (algorithm) in the parallel computing environment to find these subsequences concurrently. Therefore, the amount of time required for finding all the solutions will be drastically decreased by the number of subsequences. This computing organization is combination of the bottom-up as well as divide and conquer designs.

Parallel computing of problem solutions by partitioning
To perform the above parallel computing we split the sequence of all desirable solutions of a problem into subsequences. The number of subsequences depends on the number of calculating processors. Let us split the sequence of all solutions of the problem into m subsequences (m ≥ 2). The scheme of the parallel computing organization to find all solutions of a problem is illustrated in the Fig. 2. In order to make the parallel computing organization realistic and optimal, the subsequences of solutions should satisfy the two following criteria: (1) It is easy to determine the input and the termination condition for the computing process of each subsequence.
(2) The smaller is the difference between the subsequences length the better.
Of course, input 1 is the input of the problem and the last termination condition is that of the algorithm.
The first criterion ensures that the partition of solutions is realistic. In many cases, we can use (a part of) the input of the computing process for the next subsequence as the termination condition of the computing process for the previous subsequence. The second criterion implements the balance of computing to the processors. Then the parallel computing processes become optimal.
If the problem has many algorithms, we should choose the algorithm with the least number of variables. Just then, the input and the termination condition of computing processes of subsequences become simple.

Application to the set partition problem
We apply the above presented technique to the set partition problem. For simplicity of presentation, we split the sequence of all desirable solutions into two subsequences. If we want to split the sequence into many subsequences then we apply this technique to each subsequence.

Set partition problem
Let X be a given set.

Definition 1.
A partition of the set X is a family A 1 , A 2 , . . . , A k of subsets of X, satisfying the following properties: Problem: Find all partitions of a given set X. The set partition problem is broadly used in the graph theory [1], in the concurrent systems [4]s. The number of all partitions of an n element set is denoted by the Bell number B n , calculated by the following recursive formula [3,5]: The number of all solutions of the problem grows up as quickly as the factorial function does (see Table 1).
Let identify the set X = {1, 2, 3, . . . , n}. Let π = {A 1 , A 2 , . . . , A k }. Each subset A i is called a block of the partition π. To ensure the uniqueness of representation, blocks in a partition are sorted in the ascending order from the smallest element in the block. In a partition,  1, 2, 3, . . .) has the index i and element 1 always belongs to the block A 1 . Each element j ∈ X, belonging to a block A i has also the index i. It means, every element of X can be represented by the index of a block that includes the element. Of course, the index of element j is not greater than j. Each partition can be represented by a sequence of n indices. The sequence can be considered as a word with the length of n on the alphabet X. So we can sort these words in the ascending order. Then: • This partition consists of n blocks, each block has only one element. This is an unique partition that has a block with the index n. We use an integer array AI [1..n] to represent a partition, where AI[i] stores the index of the block that includes element i. Element 1 always belongs to the first block, element 2 may belong to the first or second block. If element 2 belongs to the first block then element 3 may belong to the first or second block only. If element 2 belongs to the second block then element 3 may belong to the first, second or third block. Hence, the element i may only belong to the blocks: 1, 2, 3, . . . , max(AI [1], AI [2], . . . , AI[i − 1]) + 1.
It means, for every partition: [1], AI [2], . . . , This is an invariant for all partitions of the set X. We use it to find the partitions.

A new algorithm for partition generation
It is easy to determine a partition from its index array representation. So, instead of finding all partitions of the set X we find all index arrays AI [1..n], each of them can represent a partition of X. These index arrays will be sorted in the ascending order.
The first index array is 1 1 1 . . . 1 and the last index array is 1 2 3 . . . n. So the termination condition of the algorithm is AI[n] = n .
.n] be an index array representing a partition of X and let AI'[1.
.n] denote the index array next to AI in the ascending order.
To find the index array AI' we use an integer array Max Based on the above properties of the index arrays, we construct the following algorithm for finding all partitions of a set.

Algorithm 1. (Generation of set partitions)
Input: A positive integer n. Output: A sequence of an n element of set partitions, whose index representations are sorted by ascending. Computation: The algorithm complexity: The algorithm finds an index array and prints the corresponding partition with the complexity of O(n).
Therefore, the total complexity of the algorithm is O(B n , n). It approximates O(nn!). Algorithm 3.2 is much simpler and better than pointer-based algorithm 1.19 presented in [3].

Parallel generation of partitions
To paralellize the above presented sequential algorithm, we split the sequence of desirable all partitions of the set X into two subsequences. The pivot is chosen as a partition represented by the index array: The chosen pivot and the last index array of the first subsequence are illustrated in the Fig.  3.
We have to determine the termination condition for the first computing process and the input of the second one.
The termination condition for the first computing process in instruction 13 is replaced by: The input of the second computing process in instruction 2 will be:  The experimental results indicate that the pivot is nearly in the middle of the sequence of all index arrays. So the above splitting is appropriate.

Conclusions
In this paper we propose a new approach to organize parallel computing for finding all solutions of a problem, whose sequential algorithm takes too long to find them. The parallel computing organization presented above is combination of the bottom-up as well as divide and conquer designs. We also propose a new efficient and simple algorithm for the set partition problem and paralellizing the algorithm.
In future, we will apply this technique to other problems, namely time-series processing, time-series matching, scheduling problem and system control.

Acknowledgement
This work was supported by the Asia Research Center, Vietnam National University, Hanoi.