Wardselitelimo is devoted to any sort of latest news in different spheres of life but the main sphere that attracts the attention of a lot of people is financial and which is related to any sort of loans and credits. Another crucial thing for our readers is associated with finance payday loans that can be borrowed within the shortest time possible and you will be able to learn all the latest offers from direct lenders available in the financial market right now.

Thursday, November 4, 2010

METHODOLOGY FOR SUPERVISED MODELING

Most supervised data mining methods apply the following methodology for building and evaluating a model. First, the algorithm is provided with a training set of data, which includes the preclassified values of the target variable in addition to the predictor variables. For example, if we are interested in classifying income bracket, based on age, gender, and occupation, our classification algorithm would need a large pool of records, containing complete (as complete as possible) information about every field, including the target field, income bracket. In other words, the records in the training set need to be preclassified.Aprovisional data mining model is then constructed using the training samples provided in the training data set.

However, the training set is necessarily incomplete; that is, it does not include the new or future data that the data modelers are really interested in classifying. Therefore, the algorithm needs to guard against memorizing the training set and blindly applying all patterns found in the training set to the future data. For example, it may happen that all customers named David in a training set may be in the highincome
bracket.We would presumably not want our final model, to be applied to new data, to include the pattern If the customers first name is David, the customer has a high income. Such a pattern is a spurious artifact of the training set and needs to be verified before deployment.

Therefore, the next step in supervised data mining methodology is to examine how the provisional data mining model performs on a test set of data. In the test set, a holdout data set, the values of the target variable are hidden temporarily from the provisional model, which then performs classification according to the patterns and structure it learned from the training set. The efficacy of the classifications are then evaluated by comparing them against the true values of the target variable. The provisional data mining model is then adjusted to minimize the error rate on the test set.

No comments:

Post a Comment