Authors: Risi Thonangi and Vikram Pudi
Source: Algorithmic Learning Theory, 16th International Conference, ALT 2005, Singapore, October 2005, Proceedings, (Sanjay Jain, Hans Ulrich Simon and Etsuji Tomita, Eds.), Lecture Notes in Artificial Intelligence 3734, pp. 122 - 134, Springer 2005.
Abstract. Recent studies in classification have proposed ways of exploiting the association rule mining paradigm. These studies have performed extensive experiments to show their techniques to be both efficient and accurate. However, existing studies in this paradigm either do not provide any theoretical justification behind their approaches or assume independence between some parameters. In this work, we propose a new classifier based on association rule mining. Our classifier rests on the maximum entropy principle for its statistical basis and does not assume any independence not inferred from the given dataset. We use the classical generalized iterative scaling algorithm (GIS) to create our classification model. We show that GIS fails in some cases when itemsets are used as features and provide modifications to rectify this problem. We show that this modified GIS runs much faster than the original GIS. We also describe techniques to make GIS tractable for large feature spaces – we provide a new technique to divide a feature space into independent clusters each of which can be handled separately. Our experimental results show that our classifier is generally more accurate than the existing classification methods.
©Copyright 2005, Springer