Our approach

Next: Mining the most simplified Up: Our approach and algorithm Previous: Our approach and algorithm

Our approach

Given the goal, our approach has two main steps. First, we need to find the association rules which imply HIV. Second, we use these association rules to generate the valid generalization rule set with minimize the information loss.

There has been a lot of work done in mining association rules. One of the most popular and state-of-the-art algorithm is the ``Frequent-set'' and its variations and extensions [ATA93] [AS94] [BMUT97] [AM98] [SA95] [SVA97]. Two of them [SA95][SVA97] are most related to our problem. [SA95] introduces several algorithms extended from ``Frequent-set'' mining generalized association rules. [SVA97] can find association rules satisfying given constraints. But our problem has some special properties that we can take advantage of and develop faster algorithms. First, we only need to look for association rules with given consequence, in our example it is ``HIV''. Second, one interesting observation is that if $A ⇒HIV$ and $A B
⇒HIV$ , and in the generalized database, $A P$ , so that $A ⇒HIV$ doesn't hold any more that $A B
⇒HIV$ won't hold any more either. This shows that once we find an association rule $A ⇒HIV$ , we no longer need to search association rules like $A B ⇒HIV$ . We formalize this as the following theorem.

Simplified Association Rule Set S is a set of association rules with the form ${A$ _i ⇒HIV}, where $A$ _i is a literal or item set. If $A$ _i is a subset of $A$ _j, we say $A$ _j includes $A$ _i. We eliminate such $A$ _i which is included by another association rule $A$ _j in S. Thus we can get a subset $S$ ₁ of S. $S$ ₁ is a simplified association rule set of S. The maximum simplified association rule set $S$ _m is the maximum subset of S in which there is no association rule included by another one. We can easily prove that for any S, such a $S$ _m exists and is unique.

Theorem 1. $S$ _m is the most simplified association rule set of $S$ . If $R$ is a valid generalized rule set that makes $S$ _m doesn't hold any more, then $R$ will make $S$ not to hold any more either. If interested in the proof detail, please contact for proof appendix.

Next: Mining the most simplified Up: Our approach and algorithm Previous: Our approach and algorithm

Adrian Perrig
Wed Sep 15 14:27:56 PDT 1999