vendredi 20 février 2015

What do xtest= and ytest= do in the randomForest algorithm in R?


Vote count:

0




I am fitting a random forest and I have split my data into a training set and a test set using the following code:


train <- sample( 1:nrow(Boston), (nrow(Boston))/2) ) EDIT: here, train is obviously just an index and thus the test set follows:


testB <- Boston[-train,]; head(test); length(test) The name of the response variable is medy and it is the fourteenth column.


I also have the following code for my random forest (actually I am bagging here because the total number of variables in my data set is 13):


bag.boston1 <- randomForest(medv~., data=Boston, subset=train, mtry=13, importance=TRUE, ytest=testB$medv, xtest= )


Is my argument for the ytest= option correct? I assume so since that it is merely the response variable in the test data set.


Also, what argument should I use for the xtest= option?


One idea I had was to just eliminate the response variable from my test data set, thus creating a data frame only the predictors in the test data set, and then I could have the xtest argument be the resulting x matrix:



`x <- testB`

x[14] <- NULL # because the 14th column is the response variable

bag.boston1 <- randomForest(medv~., data=Boston, subset=train, mtry=13,
importance=TRUE, ytest=testB$medv, xtest=x)


asked 19 secs ago







What do xtest= and ytest= do in the randomForest algorithm in R?

Aucun commentaire:

Enregistrer un commentaire