Vote count:
0
I have a dataset that has 300 variables, with over 300K observations. There are some columns that have lots of null values (up to 90% for some variables). I want to eventually run a clustering algorithm on the dataset, but I need to reduce the number of dimensions first. I plan to use SVD or PCA. Will the null values inhibit me from getting proper results when running SVD or PCA? Is so, any suggestions on what I should do? Omit or impute the data?
Also, the range of the variables vary significantly. Should I normalize the data by transforming the values into standard deviations from the mean for the column?
Thanks, Eric
asked 1 min ago
Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values
Aucun commentaire:
Enregistrer un commentaire