dimanche 12 février 2017

R - RandomForest with two Outcome Variables

Vote count: 0

Fairly new to using randomForest statistical package here.

I'm trying to run a model with 2 outcome variables and 7 predictor variables, but I can't seem to because of the lengths of the outcome variables and/or the nature of fitting the model with 2 outcome variables.

Let's assume this is my model:

m1<-randomForest(cbind(y1,y2)~a+b+c+d+e+f+g, data, mtry=7, importance=TRUE)

When I run this model, I receive this error:

Error in randomForest.default(m, y, ...) : 
  length of response must be the same as predictors

I did some troubleshooting, and find that cbind() the two outcome variables simply places their values together, thus doubling the original length, and possible resulting in the above error. As an example,

length(cbind(y1,y2))
> 418
t(lapply(data, length()))
>  a   b   c   d   e   f   g   y1   y2
 209 209 209 209 209 209 209  209  209

I tried to solve this issue by running randomForest individually on each of the outcome variables and then apply combine() on the regression models, but came across these issues:

m2<-randomForest(y1~a+b+c+d+e+f+g, data, mtry=7, importance=TRUE)
m3<-randomForest(y2~a+b+c+d+e+f+g, data, mtry=7, importance=TRUE)
combine(m2,m3)

Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values.  Are you sure you want to do regression?

I check the range() of y1 and y2, which are both 5. So I decide to apply as.factor() to both outcomes before running the models as classification models, but then came across this new issue:

m4<-randomForest(as.factor(y1)~a+b+c+d+e+f+g, data, mtry=7, importance=TRUE)
m5<-randomForest(as.factor(y2)~a+b+c+d+e+f+g, data, mtry=7, importance=TRUE)
combine(m4,m5)

Error in rf$votes + ifelse(is.na(rflist[[i]]$votes), 0, rflist[[i]]$votes) : 
  non-conformable arrays

My guess is that I can't combine() classification models.

I hope that my inquiry of trying to run a multivariate Random Forest model makes sense. Let me know if there are further questions. I can also go back and make adjustments.

asked 44 secs ago

Let's block ads! (Why?)



R - RandomForest with two Outcome Variables

Aucun commentaire:

Enregistrer un commentaire