Vote count:
0
I have a data set with both continuous and categorical variables. In the end I want to build a logistic regression model to calculate the probability of a response dichotomous variable.
Is it acceptable, or even a good idea, to apply a log linear model to the categorical variables in the model to test their interactions, and then use the indicated interactions as predictors in the logistic model?
Example in R:
Columns in df: CategoricalA, CategoricalB, CategoricalC, CategoricalD, CategoricalE, ContinuousA, ContinuousB, ResponseA
library(MASS)
#Isolate categorical variables in new data frame
catdf <- df[,c("CategoricalA","CategoricalB","CategoricalC", "CategoricalD", "CategoricalE")]
#Create cross table
crosstable <- table(catdf)
#build log-lin model
model <- loglm(formula = ~ CategoricalA * CategoricalB * CategoricalC * CategoricalD * CategoricalE, data = crosstable)
#Use step() to build better model
automodel <- step(object = model, direction = "backward")
Then build a logistic regresion using the output of automodel
and the values of ContinuousA and ContinuousB in order to predict ResponseA (which is binary).
My hunch is that this is not ok, but I cant find the answer definitively one way or the other.
Can/Should I use the output of a log-linear model as the predictors in a logistic regression model?
Aucun commentaire:
Enregistrer un commentaire