Building fast decision trees from large training sets Article uri icon

abstract

  • Decision trees are commonly used in supervised classification. Currently, supervised classification problems with large training sets are very common, however many supervised classifiers cannot handle this amount of data. There are some decision tree induction algorithms that are capable to process large training sets, however almost all of them have memory restrictions because they need to keep in main memory the whole training set, or a big amount of it. Moreover, algorithms that do not have memory restrictions have to choose a subset of the training set, needing extra time for this selection; or they require to specify the values for some parameters that could be very difficult to determine by the user. In this paper, we present a new fast heuristic for building decision trees from large training sets, which overcomes some of the restrictions of the state of the art algorithms, using all the instances of the training set without storing all of them in main memory. Experimental results show that our algorithm is faster than the most recent algorithms for building decision trees from large training sets. © 2012 - IOS Press and the authors. All rights reserved.

publication date

  • 2012-01-01