Vote count:
0
I am fitting a random forest and I have split my data into a training set and a test set using the following code:
train <- sample( 1:nrow(Boston), (nrow(Boston))/2) )
EDIT: here, train is obviously just an index and thus the test set follows:
testB <- Boston[-train,]; head(test); length(test)
The name of the response variable is medy and it is the fourteenth column.
I also have the following code for my random forest (actually I am bagging here because the total number of variables in my data set is 13):
bag.boston1 <- randomForest(medv~., data=Boston, subset=train, mtry=13, importance=TRUE, ytest=testB$medv, xtest= )
Is my argument for the ytest= option correct? I assume so since that it is merely the response variable in the test data set.
Also, what argument should I use for the xtest= option?
One idea I had was to just eliminate the response variable from my test data set, thus creating a data frame only the predictors in the test data set, and then I could have the xtest argument be the resulting x matrix:
`x <- testB`
x[14] <- NULL # because the 14th column is the response variable
bag.boston1 <- randomForest(medv~., data=Boston, subset=train, mtry=13,
importance=TRUE, ytest=testB$medv, xtest=x)
What do xtest= and ytest= do in the randomForest algorithm in R?
Aucun commentaire:
Enregistrer un commentaire