[Download postscript version]
next up previous
Next: Mining the most simplified Up: Our approach and algorithm Previous: Our approach and algorithm

Our approach

Given the goal, our approach has two main steps. First, we need to find the association rules which imply HIV. Second, we use these association rules to generate the valid generalization rule set with minimize the information loss.

There has been a lot of work done in mining association rules. One of the most popular and state-of-the-art algorithm is the ``Frequent-set'' and its variations and extensions [ATA93] [AS94] [BMUT97] [AM98] [SA95] [SVA97]. Two of them [SA95][SVA97] are most related to our problem. [SA95] introduces several algorithms extended from ``Frequent-set'' mining generalized association rules. [SVA97] can find association rules satisfying given constraints. But our problem has some special properties that we can take advantage of and develop faster algorithms. First, we only need to look for association rules with given consequence, in our example it is ``HIV''. Second, one interesting observation is that if A ⇒HIV and A B ⇒HIV, and in the generalized database, A P, so that A ⇒HIV doesn't hold any more that A B ⇒HIV won't hold any more either. This shows that once we find an association rule A ⇒HIV, we no longer need to search association rules like A B ⇒HIV. We formalize this as the following theorem.

Simplified Association Rule Set S is a set of association rules with the form {Ai ⇒HIV}, where Ai is a literal or item set. If Ai is a subset of Aj, we say Aj includes Ai. We eliminate such Ai which is included by another association rule Aj in S. Thus we can get a subset S1 of S. S1 is a simplified association rule set of S. The maximum simplified association rule set Sm is the maximum subset of S in which there is no association rule included by another one. We can easily prove that for any S, such a Sm exists and is unique.

Theorem 1. Sm is the most simplified association rule set of S. If R is a valid generalized rule set that makes Sm doesn't hold any more, then R will make S not to hold any more either. If interested in the proof detail, please contact for proof appendix.


next up previous
Next: Mining the most simplified Up: Our approach and algorithm Previous: Our approach and algorithm

Adrian Perrig
Wed Sep 15 14:27:56 PDT 1999