samedi 7 février 2015

Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values


Vote count:

0




I have a dataset that has 300 variables, with over 300K observations. There are some columns that have lots of null values (up to 90% for some variables). I want to eventually run a clustering algorithm on the dataset, but I need to reduce the number of dimensions first. I plan to use SVD or PCA. Will the null values inhibit me from getting proper results when running SVD or PCA? Is so, any suggestions on what I should do? Omit or impute the data?


Also, the range of the variables vary significantly. Should I normalize the data by transforming the values into standard deviations from the mean for the column?


Thanks, Eric



asked 1 min ago

ekim

38






Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values

Aucun commentaire:

Enregistrer un commentaire