작성일자 : 2023-10-30

Ver 0.1.1

Datamanim_유형1_4_Apply,Map

33 ~ 100 문제 풀이 : https://youtu.be/00rctVVSSoA?si=Du3KDMiQeHq7Gp-F

4. Apply, Map ¶

카드이용데이터 : https://www.kaggle.com/sakshigoyal7/credit-card-customers
DataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/BankChurnersUp.csv'

In [100]:

import pandas as pd

Q56. 데이터를 로드하고 데이터 행과 열의 갯수를 출력하라

In [107]:

df6 = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/BankChurnersUp.csv', index_col = 0)
df6.head()

Out[107]:

	CLIENTNUM	Attrition_Flag	Customer_Age	Gender	Dependent_count	Education_Level	Marital_Status	Income_Category	Card_Category	Months_on_book	Total_Relationship_Count	Months_Inactive_12_mon	Contacts_Count_12_mon	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt
0	768805383	Existing Customer	45	M	3	High School	Married	$60K - $80K	Blue	39	5	1	3	12691.0	777	11914.0	1.335	1144
1	818770008	Existing Customer	49	F	5	Graduate	Single	Less than $40K	Blue	44	6	1	2	8256.0	864	7392.0	1.541	1291
2	713982108	Existing Customer	51	M	3	Graduate	Married	$80K - $120K	Blue	36	4	1	0	3418.0	0	3418.0	2.594	1887
3	769911858	Existing Customer	40	F	4	High School	Unknown	Less than $40K	Blue	34	3	4	1	3313.0	2517	796.0	1.405	1171
4	709106358	Existing Customer	40	M	3	Uneducated	Married	$60K - $80K	Blue	21	5	1	0	4716.0	0	4716.0	2.175	816

Q57. Income_Category의 카테고리를 map 함수를 이용하여 다음과 같이 변경하여 newIncome 컬럼에 매핑하라
Unknown : N
Less than $40K : a $40K - $60K : b $60K - $80K : c $80K - $120K : d $120K +’ : e

In [108]:

df6['Income_Category'].unique()

Out[108]:

array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K',
       '$120K +', 'Unknown'], dtype=object)

In [115]:

dic = {
        '$60K - $80K' : 'c', 
        'Less than $40K' : 'a',
        '$80K - $120K' :'d', 
        '$40K - $60K' : 'b',
        '$120K +' : 'e',
        'Unknown' : 'N'
      }
dic

Out[115]:

{'$60K - $80K': 'c',
 'Less than $40K': 'a',
 '$80K - $120K': 'd',
 '$40K - $60K': 'b',
 '$120K +': 'e',
 'Unknown': 'N'}

In [117]:

#map 함수 : mapping 한다고 생각하면 됨
df6.loc[ : , 'new_income'] = df6['Income_Category'].map(dic)
df6.head()

Out[117]:

	CLIENTNUM	Attrition_Flag	Customer_Age	Gender	Dependent_count	Education_Level	Marital_Status	Income_Category	Card_Category	Months_on_book	Total_Relationship_Count	Months_Inactive_12_mon	Contacts_Count_12_mon	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt	new_income
0	768805383	Existing Customer	45	M	3	High School	Married	$60K - $80K	Blue	39	5	1	3	12691.0	777	11914.0	1.335	1144	c
1	818770008	Existing Customer	49	F	5	Graduate	Single	Less than $40K	Blue	44	6	1	2	8256.0	864	7392.0	1.541	1291	a
2	713982108	Existing Customer	51	M	3	Graduate	Married	$80K - $120K	Blue	36	4	1	0	3418.0	0	3418.0	2.594	1887	d
3	769911858	Existing Customer	40	F	4	High School	Unknown	Less than $40K	Blue	34	3	4	1	3313.0	2517	796.0	1.405	1171	a
4	709106358	Existing Customer	40	M	3	Uneducated	Married	$60K - $80K	Blue	21	5	1	0	4716.0	0	4716.0	2.175	816	c

Q58. Income_Category의 카테고리를 apply 함수를 이용하여 다음과 같이 변경하여 newIncome 컬럼에 매핑하라
Unknown : N
Less than $40K : a $40K - $60K : b $60K - $80K : c $80K - $120K : d $120K +’ : e

In [118]:

def income_mapper(x) :
    return dic[x]

df6['Income_Category'].apply(income_mapper)

Out[118]:

0        c
1        a
2        d
3        a
4        c
        ..
10122    b
10123    b
10124    a
10125    b
10126    a
Name: Income_Category, Length: 10127, dtype: object

In [119]:

df6.loc[ : , 'newIncome'] = df6['Income_Category'].map(dic)
df6.head()

Out[119]:

	CLIENTNUM	Attrition_Flag	Customer_Age	Gender	Dependent_count	Education_Level	Marital_Status	Income_Category	Card_Category	Months_on_book	Total_Relationship_Count	Months_Inactive_12_mon	Contacts_Count_12_mon	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt	new_income	newIncome
0	768805383	Existing Customer	45	M	3	High School	Married	$60K - $80K	Blue	39	5	1	3	12691.0	777	11914.0	1.335	1144	c	c
1	818770008	Existing Customer	49	F	5	Graduate	Single	Less than $40K	Blue	44	6	1	2	8256.0	864	7392.0	1.541	1291	a	a
2	713982108	Existing Customer	51	M	3	Graduate	Married	$80K - $120K	Blue	36	4	1	0	3418.0	0	3418.0	2.594	1887	d	d
3	769911858	Existing Customer	40	F	4	High School	Unknown	Less than $40K	Blue	34	3	4	1	3313.0	2517	796.0	1.405	1171	a	a
4	709106358	Existing Customer	40	M	3	Uneducated	Married	$60K - $80K	Blue	21	5	1	0	4716.0	0	4716.0	2.175	816	c	c

Q59. Customer_Age의 값을 이용하여 나이 구간을 AgeState 컬럼으로 정의하라.
(0~9 : 0 , 10~19 :10 , 20~29 :20 … 각 구간의 빈도수를 출력하라

In [121]:

df6['Customer_Age'] // 10 

Out[121]:

0        4
1        4
2        5
3        4
4        4
        ..
10122    5
10123    4
10124    4
10125    3
10126    4
Name: Customer_Age, Length: 10127, dtype: int64

In [122]:

df6.loc[ : ,'AgeState'] = df6['Customer_Age'] // 10 
df6['AgeState'].value_counts().sort_index()

Out[122]:

AgeState
2     195
3    1841
4    4561
5    2998
6     530
7       2
Name: count, dtype: int64

Q60. Education_Level의 값중 Graduate단어가 포함되는 값은 1 그렇지 않은 경우에는 0으로 변경하여 newEduLevel 컬럼을 정의하고 빈도수를 출력하라

In [124]:

df6['Education_Level'].map(lambda x : 1 if 'Graduate' in x else 0)

Out[124]:

0        0
1        1
2        1
3        0
4        0
        ..
10122    1
10123    0
10124    0
10125    1
10126    1
Name: Education_Level, Length: 10127, dtype: int64

In [128]:

df6.loc[ : , 'newEduLevel'] = df6['Education_Level'].map(lambda x : 1 if 'Graduate' in x else 0)
df6['newEduLevel'].value_counts()

Out[128]:

newEduLevel
0    6483
1    3644
Name: count, dtype: int64

In [132]:

def g(x) :
    if 'Graduate' in x :
        return 1
    else :
        return 0
    
df6['Education_Level'].apply(g)

Out[132]:

0        0
1        1
2        1
3        0
4        0
        ..
10122    1
10123    0
10124    0
10125    1
10126    1
Name: Education_Level, Length: 10127, dtype: int64

Q61. Credit_Limit 컬럼값이 4500 이상인 경우 1 그외의 경우에는 모두 0으로 하는 newLimit 정의하라. newLimit 각 값들의 빈도수를 출력하라

In [133]:

df6.loc[ : , 'newlLimit'] = df6['Credit_Limit'].map(lambda x : 1 if x >= 4500 else 0)
df6['newlLimit'].value_counts()

Out[133]:

newlLimit
1    5096
0    5031
Name: count, dtype: int64

Q62. Marital_Status 컬럼값이 Married 이고 Card_Category 컬럼의 값이 Platinum인 경우 1 그외의 경우에는 모두 0으로 하는 newState 컬럼을 정의하라. newState의 각 값들의 빈도수를 출력하라

In [139]:

def Ft(x) :
    print(x)
    print('\n\n')
    return 

df6[['Marital_Status', 'Card_Category']].head(2).apply(Ft, axis= 1)

Marital_Status    Married
Card_Category        Blue
Name: 0, dtype: object



Marital_Status    Single
Card_Category       Blue
Name: 1, dtype: object

Out[139]:

0    None
1    None
dtype: object

In [146]:

def Ft(x) :
    if x['Marital_Status'] == 'Married' and x['Card_Category'] == 'Platinum' :
        return 1
    else :
        return 0

df6['newState'] = df6[['Marital_Status', 'Card_Category']].apply(Ft, axis= 1)
df6.head()

Out[146]:

	CLIENTNUM	Attrition_Flag	Customer_Age	Gender	Dependent_count	Education_Level	Marital_Status	Income_Category	Card_Category	Months_on_book	...	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt	new_income	newIncome	AgeState	newEduLevel	newlLimit
0	768805383	Existing Customer	45	M	3	High School	Married	$60K - $80K	Blue	39	...	777	11914.0	1.335	1144	c	c	4	0	1
1	818770008	Existing Customer	49	F	5	Graduate	Single	Less than $40K	Blue	44	...	864	7392.0	1.541	1291	a	a	4	1	1
2	713982108	Existing Customer	51	M	3	Graduate	Married	$80K - $120K	Blue	36	...	0	3418.0	2.594	1887	d	d	5	1	0
3	769911858	Existing Customer	40	F	4	High School	Unknown	Less than $40K	Blue	34	...	2517	796.0	1.405	1171	a	a	4	0	0
4	709106358	Existing Customer	40	M	3	Uneducated	Married	$60K - $80K	Blue	21	...	0	4716.0	2.175	816	c	c	4	0	1

5 rows × 24 columns

In [144]:

df6.loc[(df6['Marital_Status'] == 'Married') & (df6['Card_Category'] == 'Platinum')]

Out[144]:

	CLIENTNUM	Attrition_Flag	Customer_Age	Gender	Dependent_count	Education_Level	Marital_Status	Income_Category	Card_Category	Months_on_book	...	Credit_Limit	Total_Revolving_Bal	Avg_Open_To_Buy	Total_Amt_Chng_Q4_Q1	Total_Trans_Amt	new_income	newIncome	AgeState	newEduLevel	newlLimit
3800	708654933	Attrited Customer	51	F	2	Doctorate	Married	Less than $40K	Platinum	42	...	15987.0	193	15794.0	0.435	2021	a	a	5	0	1
8863	714077583	Attrited Customer	43	M	3	Post-Graduate	Married	$40K - $60K	Platinum	31	...	23981.0	593	23388.0	0.987	4758	b	b	4	1	1
9071	711364233	Existing Customer	41	M	4	Unknown	Married	$60K - $80K	Platinum	26	...	34516.0	1559	32957.0	0.759	8888	c	c	4	0	1
9113	770848308	Existing Customer	45	M	2	Graduate	Married	$60K - $80K	Platinum	31	...	34516.0	1308	33208.0	0.746	8773	c	c	4	1	1
9306	715017858	Existing Customer	44	M	2	Uneducated	Married	$120K +	Platinum	36	...	34516.0	1421	33095.0	0.744	14465	e	e	4	0	1
9319	771626283	Existing Customer	54	M	2	Post-Graduate	Married	$60K - $80K	Platinum	42	...	34516.0	1996	32520.0	0.988	15033	c	c	5	1	1
9618	794494308	Existing Customer	42	M	3	Uneducated	Married	$120K +	Platinum	23	...	34516.0	2070	32446.0	0.880	13781	e	e	4	0	1

7 rows × 23 columns

Q63. Gender 컬럼값 M인 경우 male F인 경우 female로 값을 변경하여 Gender 컬럼에 새롭게 정의하라. 각 value의 빈도를 출력하라

In [154]:

df6['Gender']

Out[154]:

0        M
1        F
2        M
3        F
4        M
        ..
10122    M
10123    M
10124    F
10125    M
10126    F
Name: Gender, Length: 10127, dtype: object

In [155]:

df6['Gender'].map(lambda x : 'male' if x == 'M'  else 'Female').value_counts()

Out[155]:

Gender
Female    5358
male      4769
Name: count, dtype: int64

저작자표시

일	월	화	수	목	금	토
		1	1	1	1	1
1	1	1	1	1	1	1
1	1	1	1	1	1	1
1	1	1	1	1	1	1
1	1	1	1

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[빅분기] 유형1_Apply,Map

4. Apply, Map ¶

4. Apply, Map ¶

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

4. Apply, Map¶

4. Apply, Map¶

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

4. Apply, Map ¶

4. Apply, Map ¶