Modelling role hierarchy structure using the Formal Concept Analysis

– We demonstrate how one can use the formal concept analysis (FCA) to obtain the role hierarchy for the role based access control from the existing access control matrix. We also discuss assesed by means of FCA the quality of security system and ﬁnding users with excess permissions.


Introduction
The formal concept analysis emerged from the paper by Wille [12]. Mathematically it is part of the lattice theory. FCA found many applications as a method of nonstatistical data analysis and visualisation in the areas ranging from software engineering and data mining to psychology and sociology (see e.g. [10]). This work inspired by [7], demonstrates how to use the concept lattices to discover the modular structure of legacy code and assess its quality.
Role based access control (RBAC) [4] can be thought of as a modularisation of permissions. Instead of assigning directly to the user access rights to various objects in the system (e.g, files and folders in the operating system, tables or even parts of table rows in the case of the database system), one does it through roles, which can be thought of as subsets of permissions. Roles clearly simplify security the administration task. One can design a role hierarchy using the background knowledge about the structure of the organisation and the actual roles in the organisation which are implemented by users.
In this paper we consider another scenario. Many organisations do not use RBAC in their database systems or file access systems, but assign permissions directly. When a system contains a small number of users and objects this is reasonable, but when a system grows the administration it can become unmanagable. We present a method, based on the formal concept analysis, which faciliates discovering the roles from the existing permission to the user assignements, i.e, we "modularise" an existing permission system (c.f. [7]). The assumption here is that most of the permissions are given correctly. On the other hand, our method assists in finding those few users who are given excess access rights.
The formal concept analysis was used in several papers in connection with roles and security (e.g., [3,8,9]). In [9] the authors used FCA to discover the hierarchy of security labels. In [3,8] the authors considered a formal context of permissions to roles relation to find the full role hierarchy. (This context is different from that used here.)

Lattices and order
(See e.g. [2] for further information.) Recall that a partial order on a set P is a binary relation " ≤ " ⊆ P × P satisfying all p, q, r ∈ P : p ≤ p, p ≤ q and q ≤ p implies p = q, p ≤ q and q ≤ r implies p ≤ r.
We call a pair (P, ≤), where P is a set and ≤ is a particular partial order on P , a partially ordered set (POSET). When no ambiguity is possible, we will often abuse the notation abbreviating (P, ≤) to P . We will also write p < q if p ≤ q and p = q. As an example of a partially ordered set take (2 X , ⊆) where X is any set. If (P, ≤) is a poset and A ⊆ P then (A, " ≤ " ∩ A × A) is a poset as well (with the induced order). We say that p ∈ P covers q ∈ Q, which we denote by q ≺ p, if q < p and q ≤ r ≤ p implies r = q or r = p, for any r ∈ P . Observe that a finite poset is uniquely defined by a reflexive transitive closure of the covering relation.
Finite posets are visualised using Hasse diagrams. We draw a poset (P, ≤) as a directed graph with P as the set of vertices and with the covering relation ≺ as the set of edges. However, instead of decorating edges with arrows to show direction, we draw vertex p strictly above vertex q if q ≺ p. Fig. 1 is an example of a Hasse diagram of a five element poset

145
An element ∈ P (resp. ⊥ ∈ P ) is called top or the largest (resp. bottom or the smallest) element of a POSET P if p ≤ (resp. ⊥ ≤ p) for all p ∈ P . For a subset A ⊆ P we define: • The set of maximal elements max (P,≤) A : • The supremum (also called the join) (P,≤) A ∈ P defined by the conditions: (2) For any p ∈ {q ∈ P | r ≤ q, ∀r ∈ A} we have (P,≤) A ≤ p. • The infimum (also called the meet) (P,≤) A ∈ P defined by the conditions: (1) (P,≤) A ≤ p for all p ∈ A.
Top and bottom elements, as well as any join or meet are clearly unique if they exist. We will write p ∨ q instead of {p, q} and p ∧ q instead of {p, q}. For brevity, when no ambiguity is possible, we will write max, min, , instead of max (P,≤) , min (P,≤) , (P,≤) , (P,≤) . Note that (P,≤) P = (P,≤) ∅ is the bottom element and (P,≤) P = (P,≤) ∅ is the top element. A POSET (P, ≤) is called a lattice if for any finite and non-empty subset A of P the supremum A and infimum A exist. It is called a complete lattice if A and A exist for any subset A. Note that any finite lattice is complete. In particular, any finite lattice has a top and bottom elements. In this paper we are concerned with finite lattices only. Let (1) for any A, B ⊆ X. We define a set of closed subsets X C = {A ⊆ X | C(A) = A}.
X C with the inclusion order is a complete lattice. Explicitly, for any Ψ ⊆ X C

Formal concept analysis
(See e.g. [1,2,6,10] for further information.) Philosophically speaking a concept consists of the extent: the set of "things" representing the object and the intent: the set of attributes or properties shared by all objects from the extent, and distinguishing them from representatives of other concepts. Because each set of things can be an extent of a concept (trivially, the elements of this set have the distinguishing property of belonging to this set), to make concepts useful we limit ourselves to a specific context: Formal contexts are usually written as tables (see e.g. Fig. 7) with the rows indexed by objects and columns indexed by attributes. One puts a cross (x) in the (g, m)'th entry if gRm, otherwise one leaves an empty space. A binary relation R ⊆ G × M gives rise to a pair (called a Galois connection) of inclusion order reversing maps One can prove that assigning to a concept its extent and intent, respectively, are bijective. B(G, M, R) has a natural partial order relation of the generality: One can prove that the maps have the following properties: The elements of γ(G) (resp. μ(M )) are called object (resp. attribute) concepts. In words γ(g) is the smallest concept containing g in its extent and μ(m) is the largest concept with m in its intent. When drawing the Hasse diagrams of concept lattices, we use γ and μ to label the concepts (this is called reduced labelling): • We write g below the concept γ(g).
• We leave any concept which is neither object nor attribute concept unlabelled.
There are many computer programs which generate concept lattices and their visualisations, in the form of Hasse diagrams, out of the formal contexts. The lattices in this paper (e.g., Fig 8) were created using Concept Explorer ( [14]). The size of nodes is proportional to the number of new objects belonging to the concept's extent (i.e., for each concept x ∈ B(G, M, R) the number of objects g ∈ G such that x = γ(g). The blue (resp. black) halfdiscs indicate attribute (resp. object) concepts. The extent of a concept x can be recovered from the reduced labelling by collecting all object labels on all ascending paths from bottom to x. Similarily, one recovers the intent of x by collecting all attribute labels on all descending paths from top to x. Note that while the formal context (G, M, R) can be recovered from the concept lattice and labelling maps γ and μ (using the property γ(g) ≤ μ(m) if and only if gRm), the lattice structure alone is not sufficient for that. In fact, we have the following result:

Access control matrix
Let O denote the set of all objects in our system accessible to the users. In the case of operating system, it may be the set of all files, folders and volumes, while in the case of a database system we may take for O the set of all tables, views and stored subprograms.
With each object o ∈ O we associate the set of potential permissions P o . For instance if o is a file then P o = {READ, WRITE, EXECUTE, DELETE}. Note that different kinds of objects from O may have different potential permissions. A folder will have a different potential permission set from that of an ordinary file.
Objects of the system are accessed by the programs executed on behalf of the users. The access rights of a program to various objects in the system are identical with those of the user. Note that we do not distinguish between users and subjects. In the literature the subject (usually associated with a particular execution of some program) can have a (sometimes proper) subset of rights of the user understood as a physical person. Here a physical person can correspond to many users (logins). We denote the set of all users by U . We describe the access rights of users to objects by means of the access control matrix.

Definition 6. An access control matrix is a map
where by we denote a disjoint union of sets, such that We read p ∈ A(u, o) as "user u has permission p for object o".
In what follows we will assume that users are not objects, and neither is the access control matrix, i.e., O ∩ U = ∅ and A / ∈ O. This means that our access control matrices will describe the mandatory access control (as opposed to the discretionary one), where all normal users are created by, and all permissions are assigned to by security officers (who are not counted as the users). The users do not own objects and so they cannot pass ownership of "their" objects, nor assign privileges to other users. While rare in the context of operating systems, this is rather typical in the case of corporate databases. customers depending on time and need. Instead of assigning permissions to the user directly, the user is assigned entitities called roles (which would ideally corespond to the particular real world functions performed by the user, like teller or loan officer), carrying with them collections of permissions. A user has the permission p for the object o whenever one of his roles carries permission p for object o.
RBAC clearly simplifies security administration. There are (usually) much fewer roles than objects in the system, and so it is much easier and less error prone to assign a few roles to a new user than it is to construct a row in the access control matrix. Also the role hierarchy should be fully determined by the structure of the organisation, which changes much more slowly than the employee list.
Here we use the modified version of RBAC 1 (e.g. [11,Def. 2]) without sessions. If a person has roles which do not need to (or must not) be executed at the same time (like teller and loan officer in the bank), we will assume that such a person has more than one login, i.e., to the computer system such a person will appear as several users. In this way we do not need to introduce sessions which simplifies the analysis. Another simplification is that we define role hierarchy to be a family of chosen subsets of permissions with the inclusion order, instead of introducing them as abstract entities with some order, and association with permissions determined by many to many relations. (2) The set of roles R ⊆ 2 P with the inclusion order.
(3) The user to roles assignement ur : U −→ 2 R satisfying (in order to avoid redundancy) the condition that for all users u ∈ U elements of ur(u) are incomparable, i.e., r 1 , r 2 ∈ ur(u), r 1 = r 2 ⇒ r 1 r 2 .
The user u ∈ U has permissions ur(u).
Clearly the access control matrix can be reconstructed from RBAC as where pr 2 is the projection on the second component of an ordered pair. of some existing system (for instance database system), where U is the set of users, O is the set of objects and P o is the set of potential permissions for each object o ∈ O (see Def. 6). We want to use the formal concept analysis as a support in generating the role hierarchy in order to simplify the administration of our system. We might also want to judge the correctness and quality of our access control system by finding users who appear to have "too many duties" or to "know too much". Our task is analogous to the problem of modularisation of an old code and judging its quality (cf. [7]). We do not design the role hierarchy from the knowledge of the structure of an organisation, or functionalities which must be implemented by the system, but rather from the existing usage of objects and logins. Because of it, this aproach will work best for the large systems, under the assumption that most of the permissions had been assigned correctly. Large systems are also those which benefit most from introducing roles.

Role discovery
First we need to define a formal context (Def. 4) for the concept analysis: be the set of permissions, and let I A ⊆ U ×P be a binary relation defined by: We call (U , P, I A ) a security context associated with the access control matrix A. The concept lattice B(U , P, I A ) of this context will be called the lattice of candidate roles for the security context (U , P, I A ) or simply a security concept lattice.
Henceforth, for simplicity we shall ignore the internal structure of permissions (each permission is a pair consisting of an object and an access right to this object), which is immaterial for the results presented in the remaining part of the paper. In particular, we will denote permissions by single letters p ∈ P instead of writing explicitly p = (o, q).
We interpret the non-empty intents of the concepts of B(U , P, I A ) as possible roles. It is justified by the fact that intents of concepts of B(U , P, I A ) are the subsets B ⊆ P, closed with respect to () (see eq. (3)), i.e., if p ∈ P is such that every user who has all permissions from B has also permission p, then also p ∈ B. It is clear that creating a role, which is not closed in this sense, would be a bad design decision leading to unnecessary redundancy. A hidden assumption here, is of course that the set of users is large enough to ensure that any implication of type "all users who have permissions from B also have permission p", inferred from the access control matrix, reflects a real world rule as opposed to being a chance artefact of a small data sample.
In general, it is not recommended to interpret the whole lattice B(U , P, I A ) as a role hierarchy. The system designer has to choose a subposet of B (U , P, I A ). Unfortunately, we do not believe that such a choice can be fully automated. On the other hand, for realistic systems, Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 05/04/2022 21:34:44 U M C S the set of candidate roles should be vastly smaller than the powerset 2 P , and hence using the concept lattice is really helpful. Definition 9. We will call any subset R ⊆ {int(x) | x ∈ B(U , P, I A )} with the inclusion order the role hierarchy inferred from A. This role hierarchy defines the role based access control model (Def. 7), where the user to role assignement is given by The role hierarchy R will be called complete if ur(u) = {u} for all u ∈ U.
Note that R with the opposite inclusion order is a subposet of B (U , P, I A ). Note also that if the role hierarchy is not complete (according to the above definition), then not all permissions of users are obtained through roles. Only complete role hierarchy corresponds to the pure role based access model as described in Section 2.4.
The examples below will demonstrate some of the choices which can be made when choosing roles from candidate ones.
Suppose that we have three users U 1, U 2, U 3 and three permissions A, B, C. The context and the corresponding concept lattice is shown in Fig 2. C1 The first note, that if the role hierarchy R inferred from the above concept lattice is to be complete, it must include roles {A, C} and {B, C}. The role candidates {A, B, C} and {C} are then optional. We will assume in what follows that R is complete unless explicitly stated otherwise. Note, that by Def. 9, even if {C} ∈ R, no user will have {C} assigned directly (i.e., not through inheritance) as one of the roles. This situation is possible for roles r equal to one of the intents of non-object concepts. Strictly speaking, in general we have that no user is assigned directly the role r ∈ R (i.e., there exists no u ∈ U such that r ∈ ur(u)) if and only if for any u ∈ r there exists r 1 ∈ R such that In other words we have the choice between creating the compound role through inheritance and assigning this single compound role to the user, or assigning the compounds as separate roles without extending the role hierarchy. Both choices are reasonable as the user may have many roles. Note that we have this choice for the user u ∈ U (while requiring completness) precisely when the object concept the user u belongs to is not an attribute concept, i.e., when γ(u) / ∈ μ(P), or, equivalently, when there does not exist the permission p ∈ P such that {u} = {p} . In the general situation, we can formalize our observation as follows (immediate proof is left to the reader): Proposition 2. Suppose that (U , P, I A ) is a security context. Any associated complete role hierarchy R ⊆ int(B(U , P, I A )) necessarily contains γ(U ) ∩ μ(P).
In particular, consider the example in Fig. 3: the user U 3 is necessarily assigned the role {A, B, C} which inherits from the roles {A} and {B} and in addition, contains a new permission C.
The decision whether to include the role hierarchy intents of non-object concepts, or object concepts which are not attribute concepts, is largely left to the discretion of the system designer, as it is hard to formulate some strict rules. Designer's decision might be influenced by the background knowledge about the intended structure of the organisation for which he works, or by the cardinalities of extents and intents relative to the average number of users and permissions per concept. Suppose, for example, that A, B, C, U 1, U 2, U 3 in Fig. 2 are really subsets of permissions and users instead of being single elements. If U 3 contains a large number of users, it is simpler and less error prone to give each of them the single role {A, B, C} instead of assigning the two separate roles. Also note, that the concept lattice helps us to infer the structure of real world facts from the data, and the concepts with large extents have a good chance of corresponding to real roles in the organisation instead of being random and transitory phenomena. On the other hand, if U 3 contains only a small number of users, it Pobrane z czasopisma Annales AI-Informatica http://ai.annales.umcs.pl Data: 05/04/2022 21:34:44 U M C S is possible that the fact that they "do two things" does not reflect any inherent organisational rule (like in the case of a loan officer doing also the teller job because there are now not enough customers asking for loans to fill the full eight hours). Then it is better to give each user in U 3 the two separate roles instead of artificially extending the role hierarchy.
There is the special choice of the role hierarchy which seems to be the most natural one: i.e., the choice of all the closures of single permissions, or equivalently, all the intents of attribute objects. Note that the set of permissions of the user u is equal to {int(μ(p)) | γ(u) ≤ μ(p)}, hence hierarchy (8) is complete. An advantage of (8) is that one can compute a common set of permissions for a subset of users on the level of roles. Namely, let us denote for any user u Then, for choice (8) of the role hierarchy, the common set of permissions for the subset of users U ⊆ U can be written as (κ R (U )). The other natural choice of the role hierarchy causes each user to have the single role. This choice amounts to sanctioning the consequent usage of the feature of many database and operating systems which allow to create a new user with the same access rights as those of an existing user. Hence the role hierarchy (8) (unlike (9)) indeed reflects the structure of access control matrix on the level of roles. Note, however, that the example in Fig. 4 is the extreme one, in the sense, that no concept is at the same time the attribute and the object (i.e.,γ(U ) ∩ μ(P) = ∅). On the other hand, it is well known that if the concept lattice is meet free, i.e., the meet of any two incomparable elements equals the bottom element:  then each concept different from top and bottom is necessarily an attribute concept. Indeed, if a lattice element (U, P ) = is not a meet of two or more elements, it is covered by the unique element, say (U , P ) (i.e., (U, P ) ≺ (U , P )). It follows, that any element of P \ P (which is non-empty because the map int (4) is an order reversing isomorphism) must appear first in the concept (U, P ), when going from the top. Hence (U, P ) is an attribute concept. The role hierarchy based on the object concepts might be much smaller and simpler in the case of the meet free concept lattice. We conjecture, however, that in practical applications, large security concept lattices will have at least as much non-bottom meets as non-top joins, i.e., if they are asymmetric at all with respect to a number of joins and meets, they will be closer to join free lattices (defined dually to meet free lattices) rather than to meet free. Hence the role hierarchy based on the attribute concepts will tend to be simpler. It is because in a meet free security concept lattice, the users appearing first near the bottom of the lattice (that is the users with more permissions) simply extend the sets of permissions of less privileged users, whereas one of the basic reasons for the user to have more permissions than the other one is that he needs to combine several sources of data. Hence such a user should appear first in the extent of the concept which is the meet of several other concepts. Consequently, we expect many non-bottom meets. On the other hand, if the common set of permissions of two users u 1 , u 2 ∈ U, belonging to incomparable concepts, contains non-public permissions, i.e., if γ(u 1 ) γ(u 2 ) and γ(u 2 ) γ(u 1 ) and γ(u 1 ) ∨ γ(u 2 ) = , Note that each user u which is below some other user u (i.e., γ(u) < γ(u )), first appears in a concept which is a meet of other concepts. It follows that (assuming that each user is given only as many access rights as necessary) each of those users integrates data from many sources. For instance, the user Joe integrates data from Stud Styp and HR Zatrud.

Finding incorrect roles
The security concept lattice can be used not only to discover roles but also to judge the quality of the security system, e.g., to find users with more access rights than necessary. There are two cases to consider. In the first one the user simply has more permissions than he needs. In the second, the user has just as many access rights as he needs but he does not need them all at the same time. For example, the user might need to access files A and B, but none of the programs executed on behalf of him combines the data from the two files. In the first case, one corrects the access control matrix by deleting the excess permissions. In the second case, one divides the user into several new logins.
A good indication of quality of the security system is that the extent of bottom elements of the security concept lattice is empty, i.e., there are no users which can do everything, and the concept lattice decomposes into many connected components after removing the top and the bottom element (i.e., it has a horizontal decomposition). In other words our security system consists of many independent and isolated subsystems.
Note that the non-empty intent of the top element is not a problem: it is simply interpreted as the set of public access rights. If the top element is an object concept then the users u ∈ U such that γ(u) = are public users.
In practice, the demand for a horizontal decomposition is too strong. One needs to consider the broad (block) structure of the security concept lattice. There exists a strict mathematical definition of blocks and algorithms for recognizing such a broad structure (see e.g. [6]) but the details are beyond the scope of this paper.
Consider for example Fig. 6. The two blocks on the right (the green and the pink ones) do not form separate horizontal components of the lattice because they are connected by supremum μ(A), infimum γ(U 1), and internal bridges through μ(B) and γ(U 2).
The supremum μ(A) and infimum γ(U 1) are probably legitimate: A is the common permission (or set of permissions) for the two blocks, and the user U 1 may be integrating data from the two blocks. On the other hand, permission B and the user U 2 look suspicious, because, if not for them, both blocks would separate neatly. In short, if, after removal of a few users or permissions, our lattice splits into clean blocks which "connect" only as a whole, i.e., through suprema and infima of sets of whole blocks, then those few users and permissions are obvious candidates for checking.
If the connections between blocks are illegitimate or unnecessary, we should correct the access control matrix. The situation is the simplest when, say, the user U 2 does not really need the permissions of the user U 4, or the users below μ(C) do not need permission B. Then we can simply delete unnecessary access rights from the users. If the user U 2 needs access rights of the users U 3 and U 4, but not at the same time, we can split this user into two logins -one with access rights of U 3, the other with access rights of U 4. Such splitting removes the bridging concept from the lattice. It is most difficult to deal with bridging supremum-type concepts like μ(B). In principle, it might require splitting all users below μ(C) or μ(D).
Let us consider the security context in Fig. 7 and the corresponding concept lattice in Fig. 8. It splits into two horizontal components. One notices, that all object concepts except one are covered by one or two other concepts. Only γ(P06) is covered by three concepts. Splitting the user P06 (Fig. 9) yields the concept lattice in Fig. 10 which decomposes into four components. Also notice, that γ(P 06) was not a meet of the three subblocks of the left block in Fig. 8

Conclusions
We have presented a method of role discovery from the existing permission system using the formal concept analysis, where the possible roles are chosen from the intents of concepts of the concept lattice associated with the access control matrix. The method also allows to judge the quality of security system (whether it splits into well defined and isolated blocks) and to find suspicious logins which might have excess rights. The examples given were rather small, both with respect to the number of permissions and users. On the other hand, this method works best for very large ACL's where inferences of the form: all users which have permission A also have permission B, are less likely to be the result of chance, but rather genuinely reflect the rules of the organisation. We plan in the future to perform a case study of the existing large database system.
In this paper we have not considered the consequences of internal structure of permissions.
We conjecture that for a real system, where there are only a few types of objects, and so permissions P are almost a cartesian product (like P = {ALL FILES} × {READ, WRITE, EXECUTE}), the concept lattice of role candidates may be susceptible to decomposition (e.g [5]), especially the tensorial decomposition [13].