Generalizing a Database Database DB has a set of attributes with a taxonomy T over the attributes. T can be modeled as a DAG which defines a pre-order on the DAG. iff. is an ancestor of in the DAG. For each attribute , we define a generalization rule , where in and means generalize-to. A tuple iff. for every i, according to the generalization rules. Hence, given a set of generalization rules, the database DB PDB iff. each tuple in DB the correspondent tuple in PDB.
Information Loss Given a database DB with a set of attributes and a set of generalization rules , DB PDB. The information loss of PDB from DB is , where R is the rank function of the taxonomy with the R(root) = 0. This formula can also easily be extended to a weighted sum.
Valid generalization rule set Given a database DB, S is the complete set of association rules with the form with given threshold of support and confidence. A generalization rule set is valid iff. in the generalized database any association rule in S doesn't hold any more. It's easy to prove that this also implies that in the generalized database there will be no association rule that will imply HIV. Hence the generalized database will not reveal HIV.
Problem Statement Given a database , our goal is to find a valid generalization rule set so that the generalized database obeys the policy and has minimum information loss.