Vote count: 1
I have a dataframe that I'd like to calculate expanding mean over one column (quiz_score), but need to group by two different columns (userid and week). The data looks like this:
data = {"userid": ['1','1','1','1','1','1','1','1', '2','2','2','2','2','2','2','2'],\
"week": [1,1,2,2,3,3,4,4, 1,2,2,3,3,4,4,5],\
"quiz_score": [12, 14, 14, 15, 9, 15, 11, 14, 15, 14, 15, 13, 15, 10, 14, 14]}
>>> df = pd.DataFrame(data, columns = ['userid', 'week', 'quiz_score'])
>>> df
userid week quiz_score
0 1 1 12
1 1 1 14
2 1 2 14
3 1 2 15
4 1 3 9
5 1 3 15
6 1 4 11
7 1 4 14
8 2 1 15
9 2 2 14
10 2 2 15
11 2 3 13
12 2 3 15
13 2 4 10
14 2 4 14
15 2 5 14
I need to calculate expanding means by userid over each week--that is, for each user each week, I need their average quiz score over the preceding weeks. I know that a solution will involve using shift() and pd.expanding_mean() or .expanding().mean() in some form, but I've been unable to get the grouping and shift-ing correct -- even when I try without shifting, the results aren't grouped properly and seem to be just expanding mean across the rows as if there were no grouping at all:
df.groupby(['userid', 'week']).apply(pd.expanding_mean).reset_index()
To be clear, the correct result would look like this:
userid week expanding_mean_quiz_score
0 1 1 NA
1 1 2 13
2 1 3 13.75
3 1 4 13.166666
4 1 5 13
5 1 6 13
6 2 1 NA
7 2 2 15
8 2 3 14.666666
9 2 4 14.4
10 2 5 13.714
11 2 6 13.75
Note that the expanding_mean_quiz_score for each user/week is the mean of the scores for that user across all previous weeks.
Thanks for your help, I've never used expanding_mean() and am stumped here.
Expanding mean grouped by multiple columns in pandas
Aucun commentaire:
Enregistrer un commentaire