[Download postscript version]
next up previous
Next: Acknowledgements Up: Policy enforcement in releasing Previous: Future Work

Conclusion

We investigate the problem of mining the generalized association rules in the medical database in order to protect the sensitive information when the medical database is released. We developed a faster algorithm based on Cumulate, IBM's frequency counting algorithm, for mining generalized association rules. Our algorithm also automatically prune redundant rules which gives a better result. Firstly by using a bit template for each node in the taxonomy tree, we don't need to extend the tuples in the database to take into account the ancestors of each items in the tuple. Secondly, by extracting the candidates with very high support and confidence at each pass, we highly reduce the sizes of the candidate sets that will be generated in the following pass. Finally, we propose some promising further improvements of the algorithm. We also proposed a fast greedy algorithm on how to generalize databases with minimum information loss.

To test our algorithm, we developed our data generation engine to generate the database according to the medical database model. The experiments over the synthesized datasets show that our algorithm out-performs the existing ones.


next up previous
Next: Acknowledgements Up: Policy enforcement in releasing Previous: Future Work

Adrian Perrig
Wed Sep 15 14:27:56 PDT 1999