Vote count:
0
I have a data set with both continuous and categorical variables. In the end I want to build a logistic regression model to calculate the probability of a response dichotomous variable.
Is it acceptable, or even a good idea, to apply a log linear model to the categorical variables in the model to test their interactions, and then use the indicated interactions as predictors in the logistic model?
Example in R:
Columns in df: CategoricalA, CategoricalB, CategoricalC, CategoricalD, CategoricalE, ContinuousA, ContinuousB, ResponseA
library(MASS)
#Isolate categorical variables in new data frame
catdf <- df[,c("CategoricalA","CategoricalB","CategoricalC", "CategoricalD", "CategoricalE")]
#Create cross table
crosstable <- table(catdf)
#build log-lin model
model <- loglm(formula = ~ CategoricalA * CategoricalB * CategoricalC * CategoricalD * CategoricalE, data = crosstable)
#Use step() to build better model
automodel <- step(object = model, direction = "backward")
Then build a logistic regresion using the output of automodeland the values of ContinuousA and ContinuousB in order to predict ResponseA (which is binary).
My hunch is that this is not ok, but I cant find the answer definitively one way or the other.
Can/Should I use the output of a log-linear model as the predictors in a logistic regression model?
Aucun commentaire:
Enregistrer un commentaire