작성일자 : 2023-10-30
Ver 0.1.1
1 ~ 32 문제 풀이 : https://youtu.be/PxTIbZJ3xrA?si=VnVsvEjvH9sRM_mS
33 ~ 100 문제 풀이 : https://youtu.be/00rctVVSSoA?si=Du3KDMiQeHq7Gp-F
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
2. Filtering & Sorting¶
Q20. 데이터를 로드하라.
URL : https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv
df3 = pd.read_csv('https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv')
df3.head()
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NaN | $2.39 |
1 | 1 | 1 | Izze | [Clementine] | $3.39 |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NaN | $2.39 |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 |
type(df3)
pandas.core.frame.DataFrame
Q21. quantity컬럼 값이 3인 데이터를 추출하여 첫 5행을 출력하라
df3.loc[(df3['quantity'] == 3)].head()
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
409 | 178 | 3 | Chicken Bowl | [[Fresh Tomato Salsa (Mild), Tomatillo-Green C... | $32.94 |
445 | 193 | 3 | Bowl | [Braised Carnitas, Pinto Beans, [Sour Cream, C... | $22.20 |
689 | 284 | 3 | Canned Soft Drink | [Diet Coke] | $3.75 |
818 | 338 | 3 | Bottled Water | NaN | $3.27 |
850 | 350 | 3 | Canned Soft Drink | [Sprite] | $3.75 |
Q22. quantity컬럼 값이 3인 데이터를 추출하여 index를 0부터 정렬하고 첫 5행을 출력하라
df3.loc[(df3['quantity'] == 3)].reset_index(drop = True).head()
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
0 | 178 | 3 | Chicken Bowl | [[Fresh Tomato Salsa (Mild), Tomatillo-Green C... | $32.94 |
1 | 193 | 3 | Bowl | [Braised Carnitas, Pinto Beans, [Sour Cream, C... | $22.20 |
2 | 284 | 3 | Canned Soft Drink | [Diet Coke] | $3.75 |
3 | 338 | 3 | Bottled Water | NaN | $3.27 |
4 | 350 | 3 | Canned Soft Drink | [Sprite] | $3.75 |
Q23. quantity , item_price 두개의 컬럼으로 구성된 새로운 데이터 프레임을 정의하라
df4 = df3[['quantity', 'item_price']].copy()
df4
quantity | item_price | |
---|---|---|
0 | 1 | $2.39 |
1 | 1 | $3.39 |
2 | 1 | $3.39 |
3 | 1 | $2.39 |
4 | 2 | $16.98 |
... | ... | ... |
4617 | 1 | $11.75 |
4618 | 1 | $11.75 |
4619 | 1 | $11.25 |
4620 | 1 | $8.75 |
4621 | 1 | $8.75 |
4622 rows × 2 columns
Q24. item_price 컬럼의 달러표시 문자를 제거하고 float 타입으로 저장하여 new_price 컬럼에 저장하라
df3.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4622 entries, 0 to 4621 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 4622 non-null int64 1 quantity 4622 non-null int64 2 item_name 4622 non-null object 3 choice_description 3376 non-null object 4 item_price 4622 non-null object dtypes: int64(2), object(3) memory usage: 180.7+ KB
df3['item_price'].replace('$','') #바뀌지 않음
0 $2.39 1 $3.39 2 $3.39 3 $2.39 4 $16.98 ... 4617 $11.75 4618 $11.75 4619 $11.25 4620 $8.75 4621 $8.75 Name: item_price, Length: 4622, dtype: object
df3['new_price'] = df3['item_price'].str.replace('$','').astype(float)
df3
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NaN | $2.39 | 2.39 |
1 | 1 | 1 | Izze | [Clementine] | $3.39 | 3.39 |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NaN | $2.39 | 2.39 |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 | 16.98 |
... | ... | ... | ... | ... | ... | ... |
4617 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... | $11.75 | 11.75 |
4618 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... | $11.75 | 11.75 |
4619 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $11.25 | 11.25 |
4620 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | $8.75 | 8.75 |
4621 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $8.75 | 8.75 |
4622 rows × 6 columns
Q25. new_price 컬럼이 5이하의 값을 가지는 데이터프레임을 추출하고, 전체 갯수를 구하여라
len(df3.loc[(df3['new_price'] <= 5)])
1652
df3.loc[(df3['new_price'] <= 5)].shape[0]
1652
Q26. item_name명이 Chicken Salad Bowl 인 데이터 프레임을 추출하라고 index 값을 초기화 하여라
df3.loc[(df3['item_name'] == 'Chicken Salad Bowl')].reset_index(drop = True)
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
0 | 20 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $8.75 | 8.75 |
1 | 60 | 2 | Chicken Salad Bowl | [Tomatillo Green Chili Salsa, [Sour Cream, Che... | $22.50 | 22.50 |
2 | 94 | 2 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $22.50 | 22.50 |
3 | 111 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | $8.75 | 8.75 |
4 | 137 | 2 | Chicken Salad Bowl | [Fresh Tomato Salsa, Fajita Vegetables] | $17.50 | 17.50 |
... | ... | ... | ... | ... | ... | ... |
105 | 1813 | 2 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $17.50 | 17.50 |
106 | 1822 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Black Beans, Cheese, Gua... | $11.25 | 11.25 |
107 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $11.25 | 11.25 |
108 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | $8.75 | 8.75 |
109 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $8.75 | 8.75 |
110 rows × 6 columns
Q27. new_price값이 9 이하이고 item_name 값이 Chicken Salad Bowl 인 데이터 프레임을 추출하라
df3.loc[(df3['new_price'] <= 9) & (df3['item_name'] == 'Chicken Salad Bowl')].head()
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
44 | 20 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $8.75 | 8.75 |
256 | 111 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | $8.75 | 8.75 |
526 | 220 | 1 | Chicken Salad Bowl | [Roasted Chili Corn Salsa, [Black Beans, Sour ... | $8.75 | 8.75 |
528 | 221 | 1 | Chicken Salad Bowl | [Tomatillo Green Chili Salsa, [Fajita Vegetabl... | $8.75 | 8.75 |
529 | 221 | 1 | Chicken Salad Bowl | [Tomatillo Green Chili Salsa, [Fajita Vegetabl... | $8.75 | 8.75 |
Q28. df의 new_price 컬럼 값에 따라 오름차순으로 정리하고 index를 초기화 하여라
df3.sort_values(by = 'new_price', ascending = True ).reset_index(drop = True).head()
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
0 | 471 | 1 | Bottled Water | NaN | $1.09 | 1.09 |
1 | 338 | 1 | Canned Soda | [Coca Cola] | $1.09 | 1.09 |
2 | 1575 | 1 | Canned Soda | [Dr. Pepper] | $1.09 | 1.09 |
3 | 47 | 1 | Canned Soda | [Dr. Pepper] | $1.09 | 1.09 |
4 | 1014 | 1 | Canned Soda | [Coca Cola] | $1.09 | 1.09 |
Q29. df의 item_name 컬럼 값중 Chips 포함하는 경우의 데이터를 출력하라
df3.loc[(df3['item_name'].str.contains('Chips'))]
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NaN | $2.39 | 2.39 |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NaN | $2.39 | 2.39 |
6 | 3 | 1 | Side of Chips | NaN | $1.69 | 1.69 |
10 | 5 | 1 | Chips and Guacamole | NaN | $4.45 | 4.45 |
14 | 7 | 1 | Chips and Guacamole | NaN | $4.45 | 4.45 |
... | ... | ... | ... | ... | ... | ... |
4596 | 1826 | 1 | Chips and Guacamole | NaN | $4.45 | 4.45 |
4600 | 1827 | 1 | Chips and Guacamole | NaN | $4.45 | 4.45 |
4605 | 1828 | 1 | Chips and Guacamole | NaN | $4.45 | 4.45 |
4613 | 1831 | 1 | Chips | NaN | $2.15 | 2.15 |
4616 | 1832 | 1 | Chips and Guacamole | NaN | $4.45 | 4.45 |
1084 rows × 6 columns
Q30. df의 짝수번째 컬럼만을 포함하는 데이터프레임을 출력하라
#df.loc[ : , 시작: 끝: 스텝 ]
df3.loc[ : , : :2]
order_id | item_name | item_price | |
---|---|---|---|
0 | 1 | Chips and Fresh Tomato Salsa | $2.39 |
1 | 1 | Izze | $3.39 |
2 | 1 | Nantucket Nectar | $3.39 |
3 | 1 | Chips and Tomatillo-Green Chili Salsa | $2.39 |
4 | 2 | Chicken Bowl | $16.98 |
... | ... | ... | ... |
4617 | 1833 | Steak Burrito | $11.75 |
4618 | 1833 | Steak Burrito | $11.75 |
4619 | 1834 | Chicken Salad Bowl | $11.25 |
4620 | 1834 | Chicken Salad Bowl | $8.75 |
4621 | 1834 | Chicken Salad Bowl | $8.75 |
4622 rows × 3 columns
#df.iloc[ : , 시작: 끝: 스텝 ]
df3.iloc[ : , : :2]
order_id | item_name | item_price | |
---|---|---|---|
0 | 1 | Chips and Fresh Tomato Salsa | $2.39 |
1 | 1 | Izze | $3.39 |
2 | 1 | Nantucket Nectar | $3.39 |
3 | 1 | Chips and Tomatillo-Green Chili Salsa | $2.39 |
4 | 2 | Chicken Bowl | $16.98 |
... | ... | ... | ... |
4617 | 1833 | Steak Burrito | $11.75 |
4618 | 1833 | Steak Burrito | $11.75 |
4619 | 1834 | Chicken Salad Bowl | $11.25 |
4620 | 1834 | Chicken Salad Bowl | $8.75 |
4621 | 1834 | Chicken Salad Bowl | $8.75 |
4622 rows × 3 columns
#홀수 컬럼 추출하고 싶을때
df3.iloc[ : , 1: :2]
quantity | choice_description | new_price | |
---|---|---|---|
0 | 1 | NaN | 2.39 |
1 | 1 | [Clementine] | 3.39 |
2 | 1 | [Apple] | 3.39 |
3 | 1 | NaN | 2.39 |
4 | 2 | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | 16.98 |
... | ... | ... | ... |
4617 | 1 | [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... | 11.75 |
4618 | 1 | [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... | 11.75 |
4619 | 1 | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | 11.25 |
4620 | 1 | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | 8.75 |
4621 | 1 | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | 8.75 |
4622 rows × 3 columns
Q31. df의 new_price 컬럼 값에 따라 내림차순으로 정리하고 index를 초기화 하여라
df3.sort_values(by = 'new_price', ascending = False).reset_index(drop = True).head()
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
0 | 1443 | 15 | Chips and Fresh Tomato Salsa | NaN | $44.25 | 44.25 |
1 | 1398 | 3 | Carnitas Bowl | [Roasted Chili Corn Salsa, [Fajita Vegetables,... | $35.25 | 35.25 |
2 | 511 | 4 | Chicken Burrito | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | $35.00 | 35.00 |
3 | 1443 | 4 | Chicken Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Chees... | $35.00 | 35.00 |
4 | 1443 | 3 | Veggie Burrito | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | $33.75 | 33.75 |
Q32. df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 인덱싱하라
df3.loc[(df3['item_name'] == 'Steak Salad') | (df3['item_name'] == 'Bowl')]
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
445 | 193 | 3 | Bowl | [Braised Carnitas, Pinto Beans, [Sour Cream, C... | $22.20 | 22.20 |
664 | 276 | 1 | Steak Salad | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $8.99 | 8.99 |
673 | 279 | 1 | Bowl | [Adobo-Marinated and Grilled Steak, [Sour Crea... | $7.40 | 7.40 |
752 | 311 | 1 | Steak Salad | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $8.99 | 8.99 |
893 | 369 | 1 | Steak Salad | [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | $8.99 | 8.99 |
3502 | 1406 | 1 | Steak Salad | [[Lettuce, Fajita Veggies]] | $8.69 | 8.69 |
Q33. df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 첫번째 케이스만 남겨라
df3.loc[(df3['item_name'] == 'Steak Salad') | (df3['item_name'] == 'Bowl')].drop_duplicates( subset = 'item_name')
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
445 | 193 | 3 | Bowl | [Braised Carnitas, Pinto Beans, [Sour Cream, C... | $22.20 |
664 | 276 | 1 | Steak Salad | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $8.99 |
Q34. df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 마지막 케이스만 남겨라
df3.loc[(df3['item_name'] == 'Steak Salad') | (df3['item_name'] == 'Bowl')].drop_duplicates( subset = 'item_name', keep = 'last')
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
673 | 279 | 1 | Bowl | [Adobo-Marinated and Grilled Steak, [Sour Crea... | $7.40 |
3502 | 1406 | 1 | Steak Salad | [[Lettuce, Fajita Veggies]] | $8.69 |
Q35. df의 데이터 중 new_price값이 new_price값의 평균값 이상을 가지는 데이터들을 인덱싱하라
con1 = df3['new_price'] >= df3['new_price'].mean()
df3[con1].head()
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 | 16.98 |
5 | 3 | 1 | Chicken Bowl | [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | $10.98 | 10.98 |
7 | 4 | 1 | Steak Burrito | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | $11.75 | 11.75 |
8 | 4 | 1 | Steak Soft Tacos | [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... | $9.25 | 9.25 |
9 | 5 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... | $9.25 | 9.25 |
Q36. df의 데이터 중 item_name의 값이 Izze 데이터를 Fizzy Lizzy로 수정하라
df3['item_name'].unique()
array(['Chips and Fresh Tomato Salsa', 'Izze', 'Nantucket Nectar', 'Chips and Tomatillo-Green Chili Salsa', 'Chicken Bowl', 'Side of Chips', 'Steak Burrito', 'Steak Soft Tacos', 'Chips and Guacamole', 'Chicken Crispy Tacos', 'Chicken Soft Tacos', 'Chicken Burrito', 'Canned Soda', 'Barbacoa Burrito', 'Carnitas Burrito', 'Carnitas Bowl', 'Bottled Water', 'Chips and Tomatillo Green Chili Salsa', 'Barbacoa Bowl', 'Chips', 'Chicken Salad Bowl', 'Steak Bowl', 'Barbacoa Soft Tacos', 'Veggie Burrito', 'Veggie Bowl', 'Steak Crispy Tacos', 'Chips and Tomatillo Red Chili Salsa', 'Barbacoa Crispy Tacos', 'Veggie Salad Bowl', 'Chips and Roasted Chili-Corn Salsa', 'Chips and Roasted Chili Corn Salsa', 'Carnitas Soft Tacos', 'Chicken Salad', 'Canned Soft Drink', 'Steak Salad Bowl', '6 Pack Soft Drink', 'Chips and Tomatillo-Red Chili Salsa', 'Bowl', 'Burrito', 'Crispy Tacos', 'Carnitas Crispy Tacos', 'Steak Salad', 'Chips and Mild Fresh Tomato Salsa', 'Veggie Soft Tacos', 'Carnitas Salad Bowl', 'Barbacoa Salad Bowl', 'Salad', 'Veggie Crispy Tacos', 'Veggie Salad', 'Carnitas Salad'], dtype=object)
#loc를 통해 값 수정
df3.loc[df3['item_name'] == 'Izze', 'item_name'] = 'Fizzy Lizzy'
df3
order_id | quantity | item_name | choice_description | item_price | new_price | |
---|---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NaN | $2.39 | 2.39 |
1 | 1 | 1 | Fizzy Lizzy | [Clementine] | $3.39 | 3.39 |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NaN | $2.39 | 2.39 |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 | 16.98 |
... | ... | ... | ... | ... | ... | ... |
4617 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... | $11.75 | 11.75 |
4618 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... | $11.75 | 11.75 |
4619 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $11.25 | 11.25 |
4620 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | $8.75 | 8.75 |
4621 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $8.75 | 8.75 |
4622 rows × 6 columns
Q37. df의 데이터 중 choice_description 값이 NaN 인 데이터의 갯수를 구하여라
df3['choice_description'].isnull().sum()
1246
Q38 .df의 데이터 중 choice_description 값이 NaN 인 데이터를 NoData 값으로 대체하라(loc 이용)
#방법1 (loc 사용)
df3.loc[df3['choice_description'].isnull(), 'choice_description'] = 'NoData'
print(df3['choice_description'].isnull().sum())
print(len(df3['choice_description'] == 'Nodata'))
0 4622
#방법2 (fillna 사용)
df3.loc[ :,'choice_description'] = df3['choice_description'].fillna('NoData')
Q39. df의 데이터 중 choice_description 값에 Black이 들어가는 경우를 인덱싱하라
print(len(df3.loc[df3['choice_description'].str.contains('Black')]))
df3.loc[df3['choice_description'].str.contains('Black')].head()
1353
order_id | quantity | item_name | choice_description | item_price | new_price | NoData | |
---|---|---|---|---|---|---|---|
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 | 16.98 | NaN |
7 | 4 | 1 | Steak Burrito | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | $11.75 | 11.75 | NaN |
9 | 5 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... | $9.25 | 9.25 | NaN |
11 | 6 | 1 | Chicken Crispy Tacos | [Roasted Chili Corn Salsa, [Fajita Vegetables,... | $8.75 | 8.75 | NaN |
12 | 6 | 1 | Chicken Soft Tacos | [Roasted Chili Corn Salsa, [Rice, Black Beans,... | $8.75 | 8.75 | NaN |
Q40. df의 데이터 중 choice_description 값에 Vegetables 들어가지 않는 경우의 갯수를 출력하라
df3['choice_description'].str.contains('Vegetables')
0 False 1 False 2 False 3 False 4 False ... 4617 False 4618 False 4619 True 4620 True 4621 True Name: choice_description, Length: 4622, dtype: bool
#Boolean 결과 바꾸기
~df3['choice_description'].str.contains('Vegetables')
0 True 1 True 2 True 3 True 4 True ... 4617 True 4618 True 4619 False 4620 False 4621 False Name: choice_description, Length: 4622, dtype: bool
print(df3.loc[~df3['choice_description'].str.contains('Vegetables')].shape[0])
df3.loc[~df3['choice_description'].str.contains('Vegetables')]
3900
order_id | quantity | item_name | choice_description | item_price | new_price | NoData | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NoData | $2.39 | 2.39 | True |
1 | 1 | 1 | Fizzy Lizzy | [Clementine] | $3.39 | 3.39 | NaN |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NoData | $2.39 | 2.39 | True |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 | 16.98 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
4614 | 1831 | 1 | Bottled Water | NoData | $1.50 | 1.50 | True |
4615 | 1832 | 1 | Chicken Soft Tacos | [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream]] | $8.75 | 8.75 | NaN |
4616 | 1832 | 1 | Chips and Guacamole | NoData | $4.45 | 4.45 | True |
4617 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... | $11.75 | 11.75 | NaN |
4618 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... | $11.75 | 11.75 | NaN |
3900 rows × 7 columns
Q41. df의 데이터 중 item_name 값이 N으로 시작하는 데이터를 모두 추출하라
df3['item_name'].str[0] #각 행의 첫번째 문자열만 출력
0 C 1 F 2 N 3 C 4 C .. 4617 S 4618 S 4619 C 4620 C 4621 C Name: item_name, Length: 4622, dtype: object
print(df3.loc[df3['item_name'].str[0] == 'N'].shape)
df3.loc[df3['item_name'].str[0] == 'N']
(27, 7)
order_id | quantity | item_name | choice_description | item_price | new_price | NoData | |
---|---|---|---|---|---|---|---|
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
22 | 11 | 1 | Nantucket Nectar | [Pomegranate Cherry] | $3.39 | 3.39 | NaN |
105 | 46 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
173 | 77 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
205 | 91 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
436 | 189 | 1 | Nantucket Nectar | [Pomegranate Cherry] | $3.39 | 3.39 | NaN |
601 | 247 | 2 | Nantucket Nectar | [Pineapple Orange Banana] | $6.78 | 6.78 | NaN |
925 | 381 | 1 | Nantucket Nectar | [Pomegranate Cherry] | $3.39 | 3.39 | NaN |
1356 | 553 | 1 | Nantucket Nectar | [Pomegranate Cherry] | $3.39 | 3.39 | NaN |
1585 | 641 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
1626 | 656 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
1706 | 690 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
2162 | 872 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
2379 | 947 | 2 | Nantucket Nectar | [Peach Orange] | $6.78 | 6.78 | NaN |
2381 | 947 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
2430 | 965 | 1 | Nantucket Nectar | [Pomegranate Cherry] | $3.39 | 3.39 | NaN |
2653 | 1053 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
2818 | 1118 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
2838 | 1128 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
2853 | 1133 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
2949 | 1172 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
3318 | 1330 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
3368 | 1351 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
3570 | 1433 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
3845 | 1541 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
4019 | 1609 | 1 | Nantucket Nectar | [Pineapple Orange Banana] | $3.39 | 3.39 | NaN |
4078 | 1632 | 1 | Nantucket Nectar | [Peach Orange] | $3.39 | 3.39 | NaN |
Q42. df의 데이터 중 item_name 값의 단어갯수가 15개 이상인 데이터를 인덱싱하라 (띄어쓰기 포함)
a = 'abc d e f'
len(a)
9
df3.loc[df3['item_name'].str.len() >= 15]
order_id | quantity | item_name | choice_description | item_price | new_price | NoData | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NoData | $2.39 | 2.39 | True |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NoData | $2.39 | 2.39 | True |
8 | 4 | 1 | Steak Soft Tacos | [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... | $9.25 | 9.25 | NaN |
10 | 5 | 1 | Chips and Guacamole | NoData | $4.45 | 4.45 | True |
... | ... | ... | ... | ... | ... | ... | ... |
4615 | 1832 | 1 | Chicken Soft Tacos | [Fresh Tomato Salsa, [Rice, Cheese, Sour Cream]] | $8.75 | 8.75 | NaN |
4616 | 1832 | 1 | Chips and Guacamole | NoData | $4.45 | 4.45 | True |
4619 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $11.25 | 11.25 | NaN |
4620 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Lettu... | $8.75 | 8.75 | NaN |
4621 | 1834 | 1 | Chicken Salad Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Pinto... | $8.75 | 8.75 | NaN |
2373 rows × 7 columns
Q43. df의 데이터 중 new_price값이 lst에 해당하는 경우의 데이터 프레임을 구하고 그 갯수를 출력하라
lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
lst
[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
df3.loc[df3['new_price'].isin(lst)]
order_id | quantity | item_name | choice_description | item_price | new_price | NoData | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NoData | $2.39 | 2.39 | True |
1 | 1 | 1 | Fizzy Lizzy | [Clementine] | $3.39 | 3.39 | NaN |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 | 3.39 | NaN |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NoData | $2.39 | 2.39 | True |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 | 16.98 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
4610 | 1830 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... | $11.75 | 11.75 | NaN |
4612 | 1831 | 1 | Carnitas Bowl | [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... | $9.25 | 9.25 | NaN |
4616 | 1832 | 1 | Chips and Guacamole | NoData | $4.45 | 4.45 | True |
4617 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Sour ... | $11.75 | 11.75 | NaN |
4618 | 1833 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Sour Cream, Cheese... | $11.75 | 11.75 | NaN |
1393 rows × 7 columns