3D grphique: Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values

samedi 7 février 2015

Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values

Vote count:

0

I have a dataset that has 300 variables, with over 300K observations. There are some columns that have lots of null values (up to 90% for some variables). I want to eventually run a clustering algorithm on the dataset, but I need to reduce the number of dimensions first. I plan to use SVD or PCA. Will the null values inhibit me from getting proper results when running SVD or PCA? Is so, any suggestions on what I should do? Omit or impute the data?

Also, the range of the variables vary significantly. Should I normalize the data by transforming the values into standard deviations from the mean for the column?

Thanks, Eric

asked 1 min ago

ekim

38

Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values

3D grphique

samedi 7 février 2015

Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

samedi 7 février 2015

Can you run Singular vector decomposition or PCA on a dataset with lots of Null Values

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0