작성일자 : 2023-10-31
Ver 0.1.1
33 ~ 100 문제 풀이 : https://youtu.be/00rctVVSSoA?si=Du3KDMiQeHq7Gp-F
6. Pivot¶
국가별 5세이하 사망비율 통계 : https://www.kaggle.com/utkarshxy/who-worldhealth-statistics-2020-complete
Dataurl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/under5MortalityRate.csv'
import pandas as pd
df9 = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/under5MortalityRate.csv')
df9.head()
Location | Period | Indicator | Dim1 | First Tooltip | |
---|---|---|---|---|---|
0 | Afghanistan | 2019 | Under-five mortality rate (probability of dyin... | Both sexes | 60.27 [47.44-74.62] |
1 | Afghanistan | 2019 | Under-five mortality rate (probability of dyin... | Male | 63.83 [50.04-79.16] |
2 | Afghanistan | 2019 | Under-five mortality rate (probability of dyin... | Female | 56.57 [44.54-70.18] |
3 | Afghanistan | 2018 | Under-five mortality rate (probability of dyin... | Both sexes | 62.54 [50.64-75.41] |
4 | Afghanistan | 2018 | Under-five mortality rate (probability of dyin... | Male | 66.08 [53.41-79.76] |
Q83. Indicator을 삭제하고 First Tooltip 컬럼에서 신뢰구간에 해당하는 표현을 지워라
df9.drop('Indicator',axis=1,inplace=True)
df9['First Tooltip'] = df9['First Tooltip'].map(lambda x: float(x.split("[")[0]))
Ans = df9
Ans.head()
Location | Period | Dim1 | First Tooltip | |
---|---|---|---|---|
0 | Afghanistan | 2019 | Both sexes | 60.27 |
1 | Afghanistan | 2019 | Male | 63.83 |
2 | Afghanistan | 2019 | Female | 56.57 |
3 | Afghanistan | 2018 | Both sexes | 62.54 |
4 | Afghanistan | 2018 | Male | 66.08 |
Q84. 년도가 2015년 이상, Dim1이 Both sexes인 케이스만 추출하라
target = df9.loc[(df9['Period'] >= 2015) & (df9['Dim1'] == 'Both sexes')]
target.head()
Location | Period | Dim1 | First Tooltip | |
---|---|---|---|---|
0 | Afghanistan | 2019 | Both sexes | 60.27 |
3 | Afghanistan | 2018 | Both sexes | 62.54 |
6 | Afghanistan | 2017 | Both sexes | 64.94 |
9 | Afghanistan | 2016 | Both sexes | 67.57 |
12 | Afghanistan | 2015 | Both sexes | 70.44 |
Q85. 84번 문제에서 추출한 데이터로 아래와 같이 나라에 따른 년도별 사망률을 데이터 프레임화 하라
target.pivot( index = 'Location', columns = 'Period', values= 'First Tooltip')
Period | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|
Location | |||||
Afghanistan | 70.44 | 67.57 | 64.94 | 62.54 | 60.27 |
Albania | 9.57 | 9.42 | 9.42 | 9.53 | 9.68 |
Algeria | 25.18 | 24.79 | 24.32 | 23.81 | 23.26 |
Andorra | 3.53 | 3.37 | 3.22 | 3.09 | 2.97 |
Angola | 88.20 | 84.21 | 80.62 | 77.67 | 74.69 |
... | ... | ... | ... | ... | ... |
Syrian Arab Republic | 23.18 | 23.27 | 22.97 | 22.11 | 21.53 |
Tajikistan | 37.75 | 36.82 | 35.81 | 34.80 | 33.78 |
Thailand | 10.80 | 10.32 | 9.86 | 9.42 | 9.01 |
The former Yugoslav Republic of Macedonia | 12.97 | 11.97 | 9.94 | 7.83 | 6.12 |
Timor-Leste | 50.76 | 49.01 | 47.27 | 45.62 | 44.22 |
172 rows × 5 columns
Q86. Dim1에 따른 년도별 사망비율의 평균을 구하라
df9.pivot_table( index = 'Dim1', columns = 'Period', values= 'First Tooltip', aggfunc= 'mean')
Period | 1950 | 1951 | 1952 | 1953 | 1954 | 1955 | 1956 | 1957 | 1958 | 1959 | ... | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dim1 | |||||||||||||||||||||
Both sexes | 147.700816 | 155.537544 | 157.811094 | 156.147206 | 154.539857 | 155.797179 | 159.241163 | 156.170114 | 150.813222 | 150.574000 | ... | 37.718488 | 35.573663 | 34.290988 | 33.099360 | 32.053314 | 31.012093 | 29.956337 | 29.030465 | 28.083837 | 27.191744 |
Female | 140.909796 | 149.210175 | 151.516094 | 150.250882 | 148.688286 | 149.843205 | 153.048721 | 149.988295 | 144.719667 | 144.451474 | ... | 34.953023 | 32.877616 | 31.654070 | 30.521337 | 29.524302 | 28.544360 | 27.542035 | 26.675291 | 25.782616 | 24.945349 |
Male | 154.151224 | 161.538246 | 163.760781 | 161.742059 | 160.081000 | 161.456923 | 165.089535 | 162.015000 | 156.573556 | 156.375053 | ... | 40.340174 | 38.140291 | 36.793081 | 35.543663 | 34.446105 | 33.354302 | 32.242616 | 31.273198 | 30.283023 | 29.350349 |
3 rows × 70 columns
올림픽 메달리스트 정보 데이터: https://www.kaggle.com/the-guardian/olympic-games
dataUrl = https://raw.githubusercontent.com/Datamanim/pandas/main/winter.csv
Q87. 데이터에서 한국 KOR 데이터만 추출하라
df10 = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/winter.csv')
df10.head()
Year | City | Sport | Discipline | Athlete | Country | Gender | Event | Medal | |
---|---|---|---|---|---|---|---|---|---|
0 | 1924 | Chamonix | Biathlon | Biathlon | BERTHET, G. | FRA | Men | Military Patrol | Bronze |
1 | 1924 | Chamonix | Biathlon | Biathlon | MANDRILLON, C. | FRA | Men | Military Patrol | Bronze |
2 | 1924 | Chamonix | Biathlon | Biathlon | MANDRILLON, Maurice | FRA | Men | Military Patrol | Bronze |
3 | 1924 | Chamonix | Biathlon | Biathlon | VANDELLE, André | FRA | Men | Military Patrol | Bronze |
4 | 1924 | Chamonix | Biathlon | Biathlon | AUFDENBLATTEN, Adolf | SUI | Men | Military Patrol | Gold |
Ans = df10.loc[(df10['Country'] == 'KOR')]
Ans
Year | City | Sport | Discipline | Athlete | Country | Gender | Event | Medal | |
---|---|---|---|---|---|---|---|---|---|
2652 | 1992 | Albertville | Skating | Short Track Speed Skating | LEE, Jun-Ho | KOR | Men | 1000M | Bronze |
2653 | 1992 | Albertville | Skating | Short Track Speed Skating | KIM, Ki-Hoon | KOR | Men | 1000M | Gold |
2671 | 1992 | Albertville | Skating | Short Track Speed Skating | KIM, Ki-Hoon | KOR | Men | 5000M Relay | Gold |
2672 | 1992 | Albertville | Skating | Short Track Speed Skating | LEE, Jun-Ho | KOR | Men | 5000M Relay | Gold |
2673 | 1992 | Albertville | Skating | Short Track Speed Skating | MO, Ji-Soo | KOR | Men | 5000M Relay | Gold |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5528 | 2014 | Sochi | Skating | Short Track Speed Skating | PARK, Seung-Hi | KOR | Women | 500M | Bronze |
5544 | 2014 | Sochi | Skating | Speed skating | JOO, Hyong Jun | KOR | Men | Team Pursuit | Silver |
5545 | 2014 | Sochi | Skating | Speed skating | KIM, Cheol Min | KOR | Men | Team Pursuit | Silver |
5546 | 2014 | Sochi | Skating | Speed skating | LEE, Seung Hoon | KOR | Men | Team Pursuit | Silver |
5565 | 2014 | Sochi | Skating | Speed skating | LEE, Sang Hwa | KOR | Women | 500M | Gold |
87 rows × 9 columns
Q88. 한국 올림픽 메달리스트 데이터에서 년도에 따른 medal 갯수를 데이터프레임화 하라
Ans.pivot_table(index = 'Year', columns= 'Medal', aggfunc= 'size' ).fillna(0)
Medal | Bronze | Gold | Silver |
---|---|---|---|
Year | |||
1992 | 1.0 | 5.0 | 1.0 |
1994 | 1.0 | 8.0 | 1.0 |
1998 | 2.0 | 6.0 | 4.0 |
2002 | 0.0 | 5.0 | 2.0 |
2006 | 2.0 | 14.0 | 3.0 |
2010 | 2.0 | 6.0 | 10.0 |
2014 | 2.0 | 7.0 | 5.0 |
Ans.pivot_table(index = 'Medal', columns= 'Year', aggfunc= 'size' ).fillna(0)
Year | 1992 | 1994 | 1998 | 2002 | 2006 | 2010 | 2014 |
---|---|---|---|---|---|---|---|
Medal | |||||||
Bronze | 1.0 | 1.0 | 2.0 | 0.0 | 2.0 | 2.0 | 2.0 |
Gold | 5.0 | 8.0 | 6.0 | 5.0 | 14.0 | 6.0 | 7.0 |
Silver | 1.0 | 1.0 | 4.0 | 2.0 | 3.0 | 10.0 | 5.0 |
Q89. 전체 데이터에서 sport종류에 따른 성별수를 구하여라
df10.pivot_table(index = 'Sport', columns = 'Gender', aggfunc = 'size')
Gender | Men | Women |
---|---|---|
Sport | ||
Biathlon | 270 | 150 |
Bobsleigh | 416 | 36 |
Curling | 97 | 75 |
Ice Hockey | 1231 | 305 |
Luge | 135 | 45 |
Skating | 665 | 564 |
Skiing | 1130 | 651 |
Q90. 전체 데이터에서 Discipline종류에 따른 따른 Medal수를 구하여라
df10.pivot_table(index = 'Discipline', columns = 'Medal', aggfunc = 'size')
Medal | Bronze | Gold | Silver |
---|---|---|---|
Discipline | |||
Alpine Skiing | 141 | 143 | 144 |
Biathlon | 139 | 140 | 141 |
Bobsleigh | 147 | 134 | 141 |
Cross Country Skiing | 263 | 264 | 262 |
Curling | 56 | 58 | 58 |
Figure skating | 118 | 122 | 119 |
Freestyle Skiing | 34 | 34 | 34 |
Ice Hockey | 512 | 510 | 514 |
Luge | 60 | 62 | 58 |
Nordic Combined | 55 | 55 | 55 |
Short Track Speed Skating | 96 | 97 | 97 |
Skeleton | 10 | 10 | 10 |
Ski Jumping | 68 | 69 | 70 |
Snowboard | 30 | 30 | 30 |
Speed skating | 190 | 193 | 197 |