mercredi 8 février 2017

Expanding mean grouped by multiple columns in pandas

Vote count: 1

I have a dataframe that I'd like to calculate expanding mean over one column (quiz_score), but need to group by two different columns (userid and week). The data looks like this:

data = {"userid": ['1','1','1','1','1','1','1','1', '2','2','2','2','2','2','2','2'],\
"week": [1,1,2,2,3,3,4,4, 1,2,2,3,3,4,4,5],\ 
"quiz_score": [12, 14, 14, 15, 9, 15, 11, 14, 15, 14, 15, 13, 15, 10, 14, 14]}

>>> df = pd.DataFrame(data, columns = ['userid', 'week', 'quiz_score'])
>>> df
   userid  week  quiz_score
0       1     1          12
1       1     1          14
2       1     2          14
3       1     2          15
4       1     3           9
5       1     3          15
6       1     4          11
7       1     4          14
8       2     1          15
9       2     2          14
10      2     2          15
11      2     3          13
12      2     3          15
13      2     4          10
14      2     4          14
15      2     5          14

I need to calculate expanding means by userid over each week--that is, for each user each week, I need their average quiz score over the preceding weeks. I know that a solution will involve using shift() and pd.expanding_mean() or .expanding().mean() in some form, but I've been unable to get the grouping and shift-ing correct -- even when I try without shifting, the results aren't grouped properly and seem to be just expanding mean across the rows as if there were no grouping at all:

df.groupby(['userid', 'week']).apply(pd.expanding_mean).reset_index()

To be clear, the correct result would look like this:

   userid  week  expanding_mean_quiz_score
0       1     1          NA
1       1     2          13
2       1     3          13.75
3       1     4          13.166666
4       1     5          13
5       1     6          13
6       2     1          NA
7       2     2          15
8       2     3          14.666666
9       2     4          14.4
10      2     5          13.714
11      2     6          13.75

Note that the expanding_mean_quiz_score for each user/week is the mean of the scores for that user across all previous weeks.

Thanks for your help, I've never used expanding_mean() and am stumped here.

asked 45 secs ago

Let's block ads! (Why?)



Expanding mean grouped by multiple columns in pandas

Aucun commentaire:

Enregistrer un commentaire