3D grphique: Efficiently counting non-NA elements in data.table

jeudi 16 avril 2015

Efficiently counting non-NA elements in data.table

Vote count:

0

Sometimes I need to count the number of non-NA elements in one or another column in my data.table. What is the best data.table-tailored way to do so?

For concreteness, let's work with this:


dt<-data.table(id=(1:100)[sample(10,size=1e6,replace=T)],var=c(1,0,NA)[sample(3,size=1e6,replace=T)],key="id")

The first thing that comes to my mind works like this:


dt[!is.na(var),N:=.N,by=id]

But this has the unfortunate shortcoming that N does not get assigned to any row where var is missing, i.e. dt[is.na(var),N]=NA.

So I work around this by appending:


dt[!is.na(var),N:=.N,by=id][,N:=max(N,na.rm=T),by=id]

However, I'm not sure this is the best approach; another option I thought of and one suggested by the analog to this question for data.frames would be:


dt[,N:=length(var[!is.na(var)]),by=id]

and


dt[,N:=sum(!is.na(var)),by=id]

Comparing computation time of these (average over 100 trials), the last seems to be the fastest:


time 1 | time 2 | time 3
.075   | .065   | .043

Does anyone know a speedier way for data.table?

asked 35 secs ago

MichaelChirico

446

Efficiently counting non-NA elements in data.table

3D grphique

jeudi 16 avril 2015

Efficiently counting non-NA elements in data.table

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

jeudi 16 avril 2015

Efficiently counting non-NA elements in data.table

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0