Vote count:
0
Sometimes I need to count the number of non-NA elements in one or another column in my data.table. What is the best data.table-tailored way to do so?
For concreteness, let's work with this:
dt<-data.table(id=(1:100)[sample(10,size=1e6,replace=T)],var=c(1,0,NA)[sample(3,size=1e6,replace=T)],key="id")
The first thing that comes to my mind works like this:
dt[!is.na(var),N:=.N,by=id]
But this has the unfortunate shortcoming that N does not get assigned to any row where var is missing, i.e. dt[is.na(var),N]=NA.
So I work around this by appending:
dt[!is.na(var),N:=.N,by=id][,N:=max(N,na.rm=T),by=id]
However, I'm not sure this is the best approach; another option I thought of and one suggested by the analog to this question for data.frames would be:
dt[,N:=length(var[!is.na(var)]),by=id]
and
dt[,N:=sum(!is.na(var)),by=id]
Comparing computation time of these (average over 100 trials), the last seems to be the fastest:
time 1 | time 2 | time 3
.075 | .065 | .043
Does anyone know a speedier way for data.table?
asked 35 secs ago
Efficiently counting non-NA elements in data.table
Aucun commentaire:
Enregistrer un commentaire