vendredi 2 janvier 2015

Attempting to replace character value in dataframe with numeric value , Error " invalid factor level, NA generated"


Vote count:

0




I’m trying to do some that preprocessing , and want to convert the “classe” factors values {A,B,C,D,E} to {1,2,3,4,5}.


The “classe” column , is of type “factor”, I have provided all steps, see below:



#get the data
training <- read.table("http://ift.tt/1wy64kN",header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
training_df <- data.frame(training,stringsAsFactors=FALSE)

#split to training & test sets
inTrain <- createDataPartition(y=training$classe, p=0.75, list=FALSE)
training_data <- training[inTrain,]
testing_data <- training[-inTrain,]

#subset based on columns of interest, based on previous studies
training_data_subset <- subset(training_data, select=c("avg_roll_belt","var_roll_belt","var_total_accel_belt","amplitude_roll_belt","max_roll_belt","var_roll_belt",
"var_accel_arm","magnet_arm_x","magnet_arm_y","magnet_arm_z","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","gyros_dumbbell_x",
"gyros_dumbbell_y","gyros_dumbbell_z","pitch_forearm","gyros_forearm_x","gyros_forearm_y","classe"))

#see which columns are factors, the training_data_subset#classe feature is a factor
sapply(training_data_subset, class)

#sapply output

avg_roll_belt var_roll_belt var_total_accel_belt amplitude_roll_belt max_roll_belt
"numeric" "numeric" "numeric" "numeric" "numeric"
var_roll_belt.1 var_accel_arm magnet_arm_x magnet_arm_y magnet_arm_z
"numeric" "numeric" "integer" "integer" "integer"
accel_dumbbell_y accel_dumbbell_z magnet_dumbbell_x gyros_dumbbell_x gyros_dumbbell_y
"integer" "integer" "integer" "numeric" "numeric"
gyros_dumbbell_z pitch_forearm gyros_forearm_x gyros_forearm_y classe
"numeric" "numeric" "numeric" "numeric" "factor"


I created a function that attempts to replace A=1,B=2,C=3,D=4,E=5, see below:



factorsToNumeric <- function(data)
{
data_numeric <- data
#loop through the data frame based on replace values
for(i in 1:nrow(data_numeric))
{

if ((data_numeric[i,]$classe == "A") || (data_numeric[i,]$classe == "a"))
{data_numeric[i,]$classe <- "1"}
else if ((data_numeric[i,]$classe == "B") || (data_numeric[i,]$classe == "b"))
{data_numeric[i,]$classe <- "2"}
else if ((data_numeric[i,]$classe == "C") || (data_numeric[i,]$classe == "c"))
{data_numeric[i,]$classe <- "3"}
else if ((data_numeric[i,]$classe == "D") || (data_numeric[i,]$classe == "d"))
{data_numeric[i,]$classe <- "4"}
else if ((data_numeric[i,]$classe == "E") || (data_numeric[i,]$classe == "e"))
{data_numeric[i,]$classe <- "5"}
else
{
#do nothing
}

}
#now that A=1,B=2,C=3,D=4,E=5 , than concert "classe" to numeric
#attempt 1 to coerce row to numeric
data_numeric[,c("classe")] <- as.numeric(as.character(unlist(data_numeric[,c("classe")])))
#attempt 2 to coerce row to numeric
#transform(data_numeric, classe= as.numeric(classe))
return (data_numeric)
}


However, I get this error:



training_data_subset_numeric <- factorsToNumeric(training_data_subset)


Error:


Warning messages: 1: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 2: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 3: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 4: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 5: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 6: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 7: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 8: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated 9: In [<-.factor(*tmp*, iseq, value = "1") : invalid factor level, NA generated


Further inspection shows the column "classe"'s class is converted to "numeric":



sapply(training_data_subset_numeric, class)

avg_roll_belt var_roll_belt var_total_accel_belt amplitude_roll_belt max_roll_belt
"numeric" "numeric" "numeric" "numeric" "numeric"
var_roll_belt.1 var_accel_arm magnet_arm_x magnet_arm_y magnet_arm_z
"numeric" "numeric" "integer" "integer" "integer"
accel_dumbbell_y accel_dumbbell_z magnet_dumbbell_x gyros_dumbbell_x gyros_dumbbell_y
"integer" "integer" "integer" "numeric" "numeric"
gyros_dumbbell_z pitch_forearm gyros_forearm_x gyros_forearm_y classe
"numeric" "numeric" "numeric" "numeric" "numeric"


However, the head function confirms the error above & all the values A,B,C,D,E have been replaced with "NA" incorrectly.


Please help.


Thanks



asked 52 secs ago







Attempting to replace character value in dataframe with numeric value , Error " invalid factor level, NA generated"

Aucun commentaire:

Enregistrer un commentaire