Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

Programing

Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

lottogame 2020. 8. 13. 07:38

Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

DataFrame다음과 같은 팬더가 있습니다 .

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
                'value'  : ["first","second","second","first",
                            "second","first","third","fourth",
                            "fifth","second","fifth","first",
                            "first","second","third","fourth","fifth"]})

나는 이것을 [ "id", "value"]로 그룹화하고 각 그룹의 첫 번째 행을 얻고 싶습니다.

        id   value
0        1   first
1        1  second
2        1  second
3        2   first
4        2  second
5        3   first
6        3   third
7        3  fourth
8        3   fifth
9        4  second
10       4   fifth
11       5   first
12       6   first
13       6  second
14       6   third
15       7  fourth
16       7   fifth

예상되는 결과

    id   value
     1   first
     2   first
     3   first
     4  second
     5  first
     6  first
     7  fourth

의 첫 번째 행만 제공하는 다음을 시도했습니다 DataFrame. 이에 관한 도움을 주시면 감사하겠습니다.

In [25]: for index, row in df.iterrows():
   ....:     df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0])

>>> df.groupby('id').first()
     value
id        
1    first
2    first
3    first
4   second
5    first
6    first
7   fourth

id열로 필요한 경우 :

>>> df.groupby('id').first().reset_index()
   id   value
0   1   first
1   2   first
2   3   first
3   4  second
4   5   first
5   6   first
6   7  fourth

n 개의 첫 번째 레코드를 얻으려면 head ()를 사용할 수 있습니다.

>>> df.groupby('id').head(2).reset_index(drop=True)
    id   value
0    1   first
1    1  second
2    2   first
3    2  second
4    3   first
5    3   third
6    4  second
7    4   fifth
8    5   first
9    6   first
10   6  second
11   7  fourth
12   7   fifth

그러면 각 그룹의 두 번째 행이 제공됩니다 (인덱싱 된 0, nth (0)은 first ()와 동일).

df.groupby('id').nth(1)

문서 : http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group

첫 번째 행을 가져와야하는 경우 .nth(0)보다는 사용 하는 것이 좋습니다 .first().

The difference between them is how they handle NaNs, so .nth(0) will return the first row of group no matter what are the values in this row, while .first() will eventually return the first not NaN value in each column.

E.g. if your dataset is :

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
            'value'  : ["first","second","third", np.NaN,
                        "second","first","second","third",
                        "fourth","first","second"]})

>>> df.groupby('id').nth(0)
    value
id        
1    first
2    NaN
3    first
4    first

And

>>> df.groupby('id').first()
    value
id        
1    first
2    second
3    first
4    first

maybe this is what you want

import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'],   ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)

                pop
state1 county1   12
       county2   15
       county3   65
       county4   42
state2 county1   78
       county2   67
       county3   55
       county4   31

df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)

> Out[29]: 
                pop
state1 county3   65
       county4   42
       county2   15
state2 county1   78
       county2   67
       county3   55

If you only need the first row from each group we can do with drop_duplicates, Notice the function default method keep='first'.

df.drop_duplicates('id')
Out[1027]: 
    id   value
0    1   first
3    2   first
5    3   first
9    4  second
11   5   first
12   6   first
15   7  fourth

참고URL : https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group

'Programing' 카테고리의 다른 글

jQuery에서 인덱스로 요소 가져 오기 (0)	2020.08.13
C ++ 11에서 thread_local은 무엇을 의미합니까? (0)	2020.08.13
RSpec에서 "should_receive"를 더 많이 말하는 방법 (0)	2020.08.13
할당 및 동등성 검사가있는이 if 문이 거짓으로 평가되는 이유는 무엇입니까? (0)	2020.08.13
vim과 함께 git commit -a 사용 (0)	2020.08.13

현재글Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

복권의 역사, 로또 정보와 IT 기술 등을 다루는 블로그입니다.

무비순위, java, 볼거리, c#, 자바, 행사, 뮤지컬, 놀거리, 가족나들이, c++, 관광, JQuery, 축제, spring, Spring3, 여행, Javascript, 공연, 극장순위, 연극,

Today :
Yesterday :

lottogame

Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

Pandas 데이터 프레임은 각 그룹의 첫 번째 행을 가져옵니다.

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바