작성일자 : 2023-10-30
Ver 0.1.1
33 ~ 100 문제 풀이 : https://youtu.be/00rctVVSSoA?si=Du3KDMiQeHq7Gp-F
4. Apply, Map¶
카드이용데이터 : https://www.kaggle.com/sakshigoyal7/credit-card-customers
DataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/BankChurnersUp.csv'
import pandas as pd
Q56. 데이터를 로드하고 데이터 행과 열의 갯수를 출력하라
df6 = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/BankChurnersUp.csv', index_col = 0)
df6.head()
CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | Total_Relationship_Count | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 768805383 | Existing Customer | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | 5 | 1 | 3 | 12691.0 | 777 | 11914.0 | 1.335 | 1144 |
1 | 818770008 | Existing Customer | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | 6 | 1 | 2 | 8256.0 | 864 | 7392.0 | 1.541 | 1291 |
2 | 713982108 | Existing Customer | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | 4 | 1 | 0 | 3418.0 | 0 | 3418.0 | 2.594 | 1887 |
3 | 769911858 | Existing Customer | 40 | F | 4 | High School | Unknown | Less than $40K | Blue | 34 | 3 | 4 | 1 | 3313.0 | 2517 | 796.0 | 1.405 | 1171 |
4 | 709106358 | Existing Customer | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | 5 | 1 | 0 | 4716.0 | 0 | 4716.0 | 2.175 | 816 |
Q57. Income_Category의 카테고리를 map 함수를 이용하여 다음과 같이 변경하여 newIncome 컬럼에 매핑하라
Unknown : N
Less than $40K : a
$40K - $60K : b
$60K - $80K : c
$80K - $120K : d
$120K +’ : e
df6['Income_Category'].unique()
array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K', '$120K +', 'Unknown'], dtype=object)
dic = {
'$60K - $80K' : 'c',
'Less than $40K' : 'a',
'$80K - $120K' :'d',
'$40K - $60K' : 'b',
'$120K +' : 'e',
'Unknown' : 'N'
}
dic
{'$60K - $80K': 'c', 'Less than $40K': 'a', '$80K - $120K': 'd', '$40K - $60K': 'b', '$120K +': 'e', 'Unknown': 'N'}
#map 함수 : mapping 한다고 생각하면 됨
df6.loc[ : , 'new_income'] = df6['Income_Category'].map(dic)
df6.head()
CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | Total_Relationship_Count | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | new_income | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 768805383 | Existing Customer | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | 5 | 1 | 3 | 12691.0 | 777 | 11914.0 | 1.335 | 1144 | c |
1 | 818770008 | Existing Customer | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | 6 | 1 | 2 | 8256.0 | 864 | 7392.0 | 1.541 | 1291 | a |
2 | 713982108 | Existing Customer | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | 4 | 1 | 0 | 3418.0 | 0 | 3418.0 | 2.594 | 1887 | d |
3 | 769911858 | Existing Customer | 40 | F | 4 | High School | Unknown | Less than $40K | Blue | 34 | 3 | 4 | 1 | 3313.0 | 2517 | 796.0 | 1.405 | 1171 | a |
4 | 709106358 | Existing Customer | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | 5 | 1 | 0 | 4716.0 | 0 | 4716.0 | 2.175 | 816 | c |
Q58. Income_Category의 카테고리를 apply 함수를 이용하여 다음과 같이 변경하여 newIncome 컬럼에 매핑하라
Unknown : N
Less than $40K : a
$40K - $60K : b
$60K - $80K : c
$80K - $120K : d
$120K +’ : e
def income_mapper(x) :
return dic[x]
df6['Income_Category'].apply(income_mapper)
0 c 1 a 2 d 3 a 4 c .. 10122 b 10123 b 10124 a 10125 b 10126 a Name: Income_Category, Length: 10127, dtype: object
df6.loc[ : , 'newIncome'] = df6['Income_Category'].map(dic)
df6.head()
CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | Total_Relationship_Count | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | new_income | newIncome | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 768805383 | Existing Customer | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | 5 | 1 | 3 | 12691.0 | 777 | 11914.0 | 1.335 | 1144 | c | c |
1 | 818770008 | Existing Customer | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | 6 | 1 | 2 | 8256.0 | 864 | 7392.0 | 1.541 | 1291 | a | a |
2 | 713982108 | Existing Customer | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | 4 | 1 | 0 | 3418.0 | 0 | 3418.0 | 2.594 | 1887 | d | d |
3 | 769911858 | Existing Customer | 40 | F | 4 | High School | Unknown | Less than $40K | Blue | 34 | 3 | 4 | 1 | 3313.0 | 2517 | 796.0 | 1.405 | 1171 | a | a |
4 | 709106358 | Existing Customer | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | 5 | 1 | 0 | 4716.0 | 0 | 4716.0 | 2.175 | 816 | c | c |
Q59. Customer_Age의 값을 이용하여 나이 구간을 AgeState 컬럼으로 정의하라.
(0~9 : 0 , 10~19 :10 , 20~29 :20 … 각 구간의 빈도수를 출력하라
df6['Customer_Age'] // 10
0 4 1 4 2 5 3 4 4 4 .. 10122 5 10123 4 10124 4 10125 3 10126 4 Name: Customer_Age, Length: 10127, dtype: int64
df6.loc[ : ,'AgeState'] = df6['Customer_Age'] // 10
df6['AgeState'].value_counts().sort_index()
AgeState 2 195 3 1841 4 4561 5 2998 6 530 7 2 Name: count, dtype: int64
Q60. Education_Level의 값중 Graduate단어가 포함되는 값은 1 그렇지 않은 경우에는 0으로 변경하여 newEduLevel 컬럼을 정의하고 빈도수를 출력하라
df6['Education_Level'].map(lambda x : 1 if 'Graduate' in x else 0)
0 0 1 1 2 1 3 0 4 0 .. 10122 1 10123 0 10124 0 10125 1 10126 1 Name: Education_Level, Length: 10127, dtype: int64
df6.loc[ : , 'newEduLevel'] = df6['Education_Level'].map(lambda x : 1 if 'Graduate' in x else 0)
df6['newEduLevel'].value_counts()
newEduLevel 0 6483 1 3644 Name: count, dtype: int64
def g(x) :
if 'Graduate' in x :
return 1
else :
return 0
df6['Education_Level'].apply(g)
0 0 1 1 2 1 3 0 4 0 .. 10122 1 10123 0 10124 0 10125 1 10126 1 Name: Education_Level, Length: 10127, dtype: int64
Q61. Credit_Limit 컬럼값이 4500 이상인 경우 1 그외의 경우에는 모두 0으로 하는 newLimit 정의하라. newLimit 각 값들의 빈도수를 출력하라
df6.loc[ : , 'newlLimit'] = df6['Credit_Limit'].map(lambda x : 1 if x >= 4500 else 0)
df6['newlLimit'].value_counts()
newlLimit 1 5096 0 5031 Name: count, dtype: int64
Q62. Marital_Status 컬럼값이 Married 이고 Card_Category 컬럼의 값이 Platinum인 경우 1 그외의 경우에는 모두 0으로 하는 newState 컬럼을 정의하라. newState의 각 값들의 빈도수를 출력하라
def Ft(x) :
print(x)
print('\n\n')
return
df6[['Marital_Status', 'Card_Category']].head(2).apply(Ft, axis= 1)
Marital_Status Married Card_Category Blue Name: 0, dtype: object Marital_Status Single Card_Category Blue Name: 1, dtype: object
0 None 1 None dtype: object
def Ft(x) :
if x['Marital_Status'] == 'Married' and x['Card_Category'] == 'Platinum' :
return 1
else :
return 0
df6['newState'] = df6[['Marital_Status', 'Card_Category']].apply(Ft, axis= 1)
df6.head()
CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | ... | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | new_income | newIncome | AgeState | newEduLevel | newlLimit | newState | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 768805383 | Existing Customer | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | ... | 777 | 11914.0 | 1.335 | 1144 | c | c | 4 | 0 | 1 | 0 |
1 | 818770008 | Existing Customer | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | ... | 864 | 7392.0 | 1.541 | 1291 | a | a | 4 | 1 | 1 | 0 |
2 | 713982108 | Existing Customer | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | ... | 0 | 3418.0 | 2.594 | 1887 | d | d | 5 | 1 | 0 | 0 |
3 | 769911858 | Existing Customer | 40 | F | 4 | High School | Unknown | Less than $40K | Blue | 34 | ... | 2517 | 796.0 | 1.405 | 1171 | a | a | 4 | 0 | 0 | 0 |
4 | 709106358 | Existing Customer | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | ... | 0 | 4716.0 | 2.175 | 816 | c | c | 4 | 0 | 1 | 0 |
5 rows × 24 columns
df6.loc[(df6['Marital_Status'] == 'Married') & (df6['Card_Category'] == 'Platinum')]
CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | ... | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | new_income | newIncome | AgeState | newEduLevel | newlLimit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3800 | 708654933 | Attrited Customer | 51 | F | 2 | Doctorate | Married | Less than $40K | Platinum | 42 | ... | 15987.0 | 193 | 15794.0 | 0.435 | 2021 | a | a | 5 | 0 | 1 |
8863 | 714077583 | Attrited Customer | 43 | M | 3 | Post-Graduate | Married | $40K - $60K | Platinum | 31 | ... | 23981.0 | 593 | 23388.0 | 0.987 | 4758 | b | b | 4 | 1 | 1 |
9071 | 711364233 | Existing Customer | 41 | M | 4 | Unknown | Married | $60K - $80K | Platinum | 26 | ... | 34516.0 | 1559 | 32957.0 | 0.759 | 8888 | c | c | 4 | 0 | 1 |
9113 | 770848308 | Existing Customer | 45 | M | 2 | Graduate | Married | $60K - $80K | Platinum | 31 | ... | 34516.0 | 1308 | 33208.0 | 0.746 | 8773 | c | c | 4 | 1 | 1 |
9306 | 715017858 | Existing Customer | 44 | M | 2 | Uneducated | Married | $120K + | Platinum | 36 | ... | 34516.0 | 1421 | 33095.0 | 0.744 | 14465 | e | e | 4 | 0 | 1 |
9319 | 771626283 | Existing Customer | 54 | M | 2 | Post-Graduate | Married | $60K - $80K | Platinum | 42 | ... | 34516.0 | 1996 | 32520.0 | 0.988 | 15033 | c | c | 5 | 1 | 1 |
9618 | 794494308 | Existing Customer | 42 | M | 3 | Uneducated | Married | $120K + | Platinum | 23 | ... | 34516.0 | 2070 | 32446.0 | 0.880 | 13781 | e | e | 4 | 0 | 1 |
7 rows × 23 columns
Q63. Gender 컬럼값 M인 경우 male F인 경우 female로 값을 변경하여 Gender 컬럼에 새롭게 정의하라. 각 value의 빈도를 출력하라
df6['Gender']
0 M 1 F 2 M 3 F 4 M .. 10122 M 10123 M 10124 F 10125 M 10126 F Name: Gender, Length: 10127, dtype: object
df6['Gender'].map(lambda x : 'male' if x == 'M' else 'Female').value_counts()
Gender Female 5358 male 4769 Name: count, dtype: int64