Surrogate Strategy in Handling Missing Data
In machine learning and data mining, the surrogate strategy is a method used to address missing data effectively in predictive modeling. Specifically, in the context of decision tree algorithms such as CART (Classification and Regression Trees), the surrogate strategy is applied to cope with cases where one or more input variables have missing data.
When an input variable is missing data, CART utilizes the surrogate strategy to identify substitute variables statistically most similar to the missing variable. These substitute variables are selected based on their correlation and statistical similarity to the original variable. Subsequently, these surrogate variables are used to replace the missing variable in the decision tree construction process.
With the surrogate strategy, there is no need to fill in exact values for missing data. Instead, the goal is to find substitute variables that can effectively represent the missing variable in data prediction and classification tasks. This helps reduce complexity and increase flexibility in the machine learning model building process.
Tác giả Hồ Đức Duy. © Sao chép luôn giữ tác quyền