lundi 13 février 2017

pandas GroupBy aggregate only one column

Vote count: 0

I have a DataFrame of the following form:

>>> sales = pd.DataFrame({'seller_id':list('AAAABBBB'),'buyer_id':list('CCDECDEF'),\
                          'amount':np.random.randint(10,20,size=(8,))})
>>> sales = sales[['seller_id','buyer_id','amount']]
>>> sales
  seller_id buyer_id  amount
0         A        C      18
1         A        C      15
2         A        D      11
3         A        E      12
4         B        C      16
5         B        D      18
6         B        E      16
7         B        F      19

Now what I would like to do is for each seller calculate the share of total sale amount taken up by its largest buyer. I have code that does this, but I have to keep resetting the index and grouping again, which is wasteful. There has to be a better way. I would like a solution where I can aggregate one column at a time and keep the others grouped. Here's my current code:

>>> gr2 = sales.groupby(['buyer_id','seller_id'])
>>> seller_buyer_level = gr2['amount'].sum() # sum over different purchases
>>> seller_buyer_level_reset = seller_buyer_level.reset_index('buyer_id')
>>> gr3 = seller_buyer_level_reset.groupby(seller_buyer_level_reset.index)
>>> result = gr3['amount'].max() / gr3['amount'].sum()

>>> result
seller_id
A    0.589286
B    0.275362

I simplified a bit. In reality I also have a time period column, and so I want to do this at the seller and time period level, that's why in gr3 I'm grouping by the multi-index (in this example, it appears as a single index). I thought there would be a solution where instead of reducing and regrouping I would be able to aggregate only one index out of the group, leaving the others grouped, but couldn't find it in the documentation or online. Any ideas?

asked 2 mins ago

Let's block ads! (Why?)



pandas GroupBy aggregate only one column

Aucun commentaire:

Enregistrer un commentaire